BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230831T095746Z
LOCATION:Davos
DTSTART;TZID=Europe/Stockholm:20230627T101200
DTEND;TZID=Europe/Stockholm:20230627T101300
UID:submissions.pasc-conference.org_PASC23_sess110_pos139@linklings.com
SUMMARY:P41 - MPI for Multi-Core, Multi Socket, and GPU Architectures: Opt
 imised Shared Memory Allreduce
DESCRIPTION:Poster\n\nAndreas Jocksch and Jean-Guillaume Piccinali (ETH Zu
 rich / CSCS)\n\nIn the literature the benefits of shared memory collective
 s especially allreduce have been shown. This intra-node communication is n
 ot only necessary for single node communications but it is also a key comp
 onent of more complex inter-node communication algorithms [1]. In contrast
  to [2], our implementation of shared memory usage is invisible to the use
 r of the library, the data of the send and receive buffers is not required
  to reside in shared memory already but the data from the send buffer is c
 opied into the shared memory segment in parallel chunks where commutative 
 reduction operations are necessary. Subsequently, the data is further redu
 ced within the shared memory segment using a tree-based algorithm. The fin
 al result is then copied to the receive buffer. The reduction operations a
 nd synchronization barriers are combined during this process, and the algo
 rithm is adapted, depending on performance measurements. <br /> [1] Jocksc
 h, A., Ohana, N., Lanti, E., Koutsaniti, E., Karakasis, V., Villard, L.: A
 n optimisation of allreduce communication in message-passing systems. Para
 llel Computing 107, 102812 (2021)<br />[2] Li, S., Hoefler, T., Hu, C., Sn
 ir, M.: Improved MPI collectives for MPI processes in shared address space
 s. Cluster computing 17(4), 1139–1155 (2014)\n\nSession Chair: Jibonananda
  Sanyal (National Renewable Energy Laboratory)
END:VEVENT
END:VCALENDAR
