Skip to content

WeeklyTelcon_20210914

Geoffrey Paulsen edited this page Sep 14, 2021 · 1 revision

Open MPI Weekly Telecon ---

Attendees (on Web-ex)

  • Geoffrey Paulsen (IBM)
  • Raghu Raja
  • Austen Lauria (IBM)
  • Jeff Squyres (Cisco)
  • Hessam Mirsadeghi (NVIDIA))
  • Josh Hursey (IBM)
  • Siripaul (Intel)
  • Todd Kordenbrock (Sandia)
  • Howard Pritchard (LANL)
  • Thomas Naughton (ORNL)
  • William Zhang (AWS)
  • Michael Heinz (Cornelis Networks)
  • Brendan Cunningham (Cornelis Networks)
  • Joseph Schuchart (HLRS)
  • Tomislav Janjusic (NVIDIA)
  • Matthew Dosanjh (Sandia)
  • George Bosilca (UTK)

not there today (I keep this for easy cut-n-paste for future notes)

  • Brian Barrett (AWS) - Welcome Back!
  • David Bernholdt (ORNL)
  • Harumi Kuno (HPE)
  • Marisa Roman (Cornelius)
  • Ralph Castain (Intel)
  • Sam Gutierrez (LANL)
  • Akshay Venkatesh (NVIDIA)
  • Artem Polyakov (NVIDIA)
  • Aurelien Bouteiller (UTK)
  • Brandon Yates (Intel)
  • Charles Shereda (LLNL)
  • Christoph Niethammer (HLRS)
  • Edgar Gabriel (UH)
  • Erik Zeiske (HPE)
  • Geoffroy Vallee (ARM)
  • Joshua Ladd (NVIDIA)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Noah Evans (Sandia)
  • Scott Breyer (Sandia?)
  • Shintaro iwasaki
  • Xin Zhao (NVIDIA)

New Topics For Today

v4.0.x

v4.1.x

  • Schedule: Behind schedule, approximate?
    • Possibly make an RC this week.
    • One more pending on v4.1.x Jenkins had some issues that Brian is looking at.
  • ROMIO 3.2.1 based PR 8371 do we want to take this?
    • v4.1.x does this need to go back to v4.0.x?

v5.0.x

  • Schedule: aiming for rc1 on Sept 23rd.
  • PMIx and/or PRRTE are releasing a new minor rev that we'll pickup for v5.0.x
  • Github Project of [critical v5.0.x issues|https://github.com/open-mpi/ompi/projects/3]
    • Issue 8983 - Nathan volunteered to put out a fix.
    • If we partially disable OSC/TCP BTL - Not breaking MPI compliance, just breaking One-sided performance badly.
    • Described approach of rc1 on Sept 23, disabling any functionality that are blockers to allow for the rc.
      • Worried that blockers might not be fixed in time, so will put in code to issue an error at runtime to prevent getting into those paths, and document it heavily.
  • MPIAlltoallw needs to go in. Is a PR from Giles George
  • Janjust - has a long outstanding one incomming.
    • Still working on
  • Portals bugfixes incomming.
    • Todd's working on this. Hasn't posted yet. Will post this week.
  • https://github.com/open-mpi/ompi/pull/9326 should get into 5.0 too
    • This fixes a correctness issues, and George is concerned about performance.
    • Is argobots now unsupported?
      • no. Our integration allow users to call MPI withing a blocking argobot function and this still works.
      • What we think is a thread that will block in libevent, because libevent isn't aware of argobots, so libeven will block entire thread.
    • George joined about this time. I think he said this was ready or that he'd re-read.

Master

  • No discussion
  • MTT results look pretty good

Documentation

  • No update
  • Don't do the old system, use this new system for v5.0.0

MPI 4.0 API

  • No discussion [Open MPI 4.0 API Compliance Github Project|https://github.com/open-mpi/ompi/projects/2]

MTT

  • Looking okay.
  • Ciscos results are still hidden by default.

Longer Term discussions

  • No discussion.
Clone this wiki locally