Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to change single copy VIA xpmem execution to the sender process #10019

Open
arun-chandran-edarath opened this issue Jul 17, 2024 · 3 comments

Comments

@arun-chandran-edarath
Copy link
Contributor

Hi Everyone,

@yosefe @tvegas1

I am currently examining the execution of MPI_Send (Blocking send) with UCX in an intra_node scenario. At present, the memory transfer (ucs_memcpy_relaxed()) is executed in the receiver process (rank or processor), as depicted below.

reciver_process_ntbt

By executing the same in the sender process, as shown below, we could significantly reduce cache-to-cache data transfers and conserve memory bandwidth.

sender_process_ntbt

However, I am struggling to find a runtime configuration that would allow me to execute this transfer in the sender process with the hint UCS_ARCH_MEMCPY_NT_DEST and benchmark it. Could anyone provide some guidance or suggestions on this matter?

Thank you in advance for your assistance.

--Arun

@yosefe
Copy link
Contributor

yosefe commented Jul 17, 2024

Currently rkey_ptr protocol always does memcpy on the receiver. In order to do memcpy on the sender would need to implement a new variant of this protocol (with extra control message)

@tvegas1
Copy link
Contributor

tvegas1 commented Jul 17, 2024

@arun-chandran-edarath, in case you would want more details, without much thinking and unsure about perf result, it might be possible to to implement as PoC either at:

  • UCT: src/uct/sm/mm/base/mm_*.c: maybe adding return fifo that would receive aggregated src/dst/len to perform memcpy at original source
  • UCP: an rndv rtr flow using sm/mm put primitives
@arun-chandran-edarath
Copy link
Contributor Author

@yosefe and @tvegas1,

Thank you for your responses. I would like to clarify if the two suggestions provided are identical:

a) Implementing a new variant of the rkey_ptr protocol (with an extra control message) to perform memcpy on the sender.
b) Using an rndv rtr flow with sm/mm put primitives in UCP.

Could you please provide more specific details or elaborate on these suggestions? Additionally, it would be helpful if you could point me towards the relevant source code files or any examples that I could refer to.

--Arun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants