How to change single copy VIA xpmem execution to the sender process #10019

arun-chandran-edarath · 2024-07-17T07:09:05Z

Hi Everyone,

I am currently examining the execution of MPI_Send (Blocking send) with UCX in an intra_node scenario. At present, the memory transfer (ucs_memcpy_relaxed()) is executed in the receiver process (rank or processor), as depicted below.

By executing the same in the sender process, as shown below, we could significantly reduce cache-to-cache data transfers and conserve memory bandwidth.

However, I am struggling to find a runtime configuration that would allow me to execute this transfer in the sender process with the hint UCS_ARCH_MEMCPY_NT_DEST and benchmark it. Could anyone provide some guidance or suggestions on this matter?

Thank you in advance for your assistance.

--Arun

yosefe · 2024-07-17T07:12:05Z

Currently rkey_ptr protocol always does memcpy on the receiver. In order to do memcpy on the sender would need to implement a new variant of this protocol (with extra control message)

tvegas1 · 2024-07-17T08:22:08Z

@arun-chandran-edarath, in case you would want more details, without much thinking and unsure about perf result, it might be possible to to implement as PoC either at:

UCT: src/uct/sm/mm/base/mm_*.c: maybe adding return fifo that would receive aggregated src/dst/len to perform memcpy at original source
UCP: an rndv rtr flow using sm/mm put primitives

arun-chandran-edarath · 2024-07-17T10:57:26Z

@yosefe and @tvegas1,

Thank you for your responses. I would like to clarify if the two suggestions provided are identical:

a) Implementing a new variant of the rkey_ptr protocol (with an extra control message) to perform memcpy on the sender.
b) Using an rndv rtr flow with sm/mm put primitives in UCP.

Could you please provide more specific details or elaborate on these suggestions? Additionally, it would be helpful if you could point me towards the relevant source code files or any examples that I could refer to.

--Arun

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to change single copy VIA xpmem execution to the sender process #10019

How to change single copy VIA xpmem execution to the sender process #10019

arun-chandran-edarath commented Jul 17, 2024

yosefe commented Jul 17, 2024

tvegas1 commented Jul 17, 2024 •

edited

Loading

arun-chandran-edarath commented Jul 17, 2024

How to change single copy VIA xpmem execution to the sender process #10019

How to change single copy VIA xpmem execution to the sender process #10019

Comments

arun-chandran-edarath commented Jul 17, 2024

yosefe commented Jul 17, 2024

tvegas1 commented Jul 17, 2024 • edited Loading

arun-chandran-edarath commented Jul 17, 2024

tvegas1 commented Jul 17, 2024 •

edited

Loading