BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:Europe/Stockholm
X-LIC-LOCATION:Europe/Stockholm
BEGIN:DAYLIGHT
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
TZNAME:CET
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20230831T095745Z
LOCATION:Sanada I
DTSTART;TZID=Europe/Stockholm:20230626T150000
DTEND;TZID=Europe/Stockholm:20230626T153000
UID:submissions.pasc-conference.org_PASC23_sess171_msa268@linklings.com
SUMMARY:Application Optimization and Scalability of NICAM-LETKF on Fugaku
DESCRIPTION:Minisymposium\n\nHisashi Yashiro (National Institute for Envir
 onmental Studies), Koji Terasaki (Meteorological Research Institute), Yuta
  Kawai (RIKEN), Shuhei Kudo (The University of Electro-Communications), Ta
 kemasa Miyoshi and Toshiyuki Imamura (RIKEN), Masuo Nakano and Chihiro Kod
 ama (JAMSTEC), Masaki Satoh (The University of Tokyo), and Hirofumi Tomita
  (RIKEN)\n\nTechnological trends in supercomputers change year by year, an
 d the bottleneck factors for weather forecasting and climate prediction si
 mulations are also changing accordingly. Weather/climate models are collec
 tions of interdisciplinary algorithms with no obvious computational hotspo
 ts, and the entire application code must be tuned. Developing and maintain
 ing those models are high-cost and require decision-making among many trad
 e-offs. In the development project of the supercomputer Fugaku, we selecte
 d the weather data assimilation experiment using NICAM-LETKF as one of the
  target problems. This practical benchmark test required performance optim
 ization in all data transfer paths, from processing units to file I/O. Thr
 ough our system-application co-design activities, we conducted the followi
 ng optimizations: 1) Distributed file I/O using local SSDs, 2) Efficient i
 nter-node data transposition between parallel multi-node simulations and e
 nsemble data assimilation, 3 ) Elimination of time loss that is difficult 
 to capture by performance profilers, 4) Aggressive use of mixed-precision 
 floating-point arithmetic. These optimizations contributed to the realizat
 ion of a 3.5 km, 1024-member ensemble data assimilation experiment with th
 e help of Fugaku's key features, such as large memory bandwidth, high thre
 ad scalability, eco-mode ALU, and the aid of a special-purpose eigenvalue 
 solver.\n\nDomain: Climate, Weather and Earth Sciences\n\nSession Chair: P
 eter Messmer (NVIDIA Inc.)
END:VEVENT
END:VCALENDAR
