Skip to content

WeeklyTelcon_20191001

Geoffrey Paulsen edited this page Oct 1, 2019 · 1 revision

Open MPI Weekly Telecon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees (on Web-ex)

  • Akshay Venkatesh (NVIDIA)
  • Austen Lauria (IBM)
  • Brendan Cunningham (Intel)
  • Geoffrey Paulsen (IBM)
  • Harumi Kuno (HPE)
  • Howard Pritchard (LANL)
  • Jeff Squyres (Cisco)
  • Joshua Ladd (Mellanox)
  • Matthew Dosanjh (Sandia)
  • Michael Heinz (Intel)
  • Noah Evans (Sandia)
  • Ralph Castain (Intel)
  • Thomas Naughton (ORNL)
  • Todd Kordenbrock (Sandia)
  • William Zhang (AWS)

not there today (I keep this for easy cut-n-paste for future notes)

  • Artem Polyakov (Mellanox)
  • Brandon Yates (Intel)
  • Brian Barrett (AWS)
  • Charles Shereda (LLNL)
  • David Bernhold (ORNL)
  • Edgar Gabriel (UH)
  • Erik Zeiske
  • George Bosilca (UTK)
  • Josh Hursey (IBM)
  • Mark Allen (IBM)
  • Matias Cabral (Intel)
  • Nathan Hjelm (Google)
  • Tom Naughton
  • Xin Zhao (Mellanox)
  • mohan (AWS)

Agenda/New Business

lists.open-mpi.org isn't working

  • Jeff changed a setting, and they seem to be working now.

    • Unclear
  • Introduced Austen Lauria (IBM) who will be working more directly with Open MPI


Infrastrastructure

Submodule prototype

  • OMPI has been waiting for some git submodule work in Jenkins on AWS.

    • It's been a few months, with no progress.
    • Three pieces: Jenkins, CI, bot.
      • AWS has a libfabirc setup like this for testing.
      • Issue is that they're reworking the design, and will rollout for both libfabric and open-mpi.
    • William Zhang talked to Brian
      • Not something AWS team will work on, but Brian will work on it.
    • Jeff will talk to Brian as well.
  • Howard and Jeff have access to Jenkins on AWS. Part of the problem is that we don't have much expertise on Jenkins/AWS.

    • William will probably be admining the Jenkins/AWS or communicating with those who will.
  • Merged --recurse-submodules update into ompi-scripts Jenkins script as first step. Let's see if that works.

  • Modular thread re-write (noah)

    • UGNI and Vader BTLs were getting better performance, not sure why.
    • For modular threading library, might be interesting to decide at compile time or runtime.
    • Previously similar things seemed to be related to ICACHE.
    • Status of this?

Release Branches

Review v3.0.x Milestones v3.0.4

Review v3.1.x Milestones v3.1.4

  • Release goal of Oct 31st.
  • Need to put an RC out soon (will discuss date with Brian)
  • Start drawing up a list of fixes that won't be backported to v3.0.x
    • Datatype bug won't be backported, because it snowballed too big.
    • Will put out a list at new 3.0.x and 3.1.x releases of issues fixed in v4.0.x that's NOT being backported... please upgrade, in either NEWS or README.

Review v4.0.x Milestones v4.0.2

  • Put out v4.0.2rc3 Monday

  • Release v4.0.2 this week.

  • XPMEM is failing, Howard will create issue.

  • Geoffroy Vallee has a system setup to run cross-compatibility, and can report out which versions are failing. Ralph will forward info to devel-core.

  • See older weekday notes for prior items.

Review Master Master Pull Requests

  • Compile failure on Master - OFI / Libfabric
    • Fixed on v4.0.x, needs to be cherry-picked to master and other branches.
    • OMPI_UNLIKELY versus OPAL_UNLIKELY

CI status

  • IBM's PGI test has NEVER worked. Is it a real issue or local to IBM.
  • Absoft 32bit fortran failures.

v5.0.0

  • Schedule: April 2020?
    • Wiki page
    • Some items:
      • MPI1 removed stuff.
  • Need a Face to face.
    • Jeff will send out face-to-face doodle for weeks in Jan/Feb

Depdendancies

PMIx Update

ORTE/PRRTE


Next face to face

MTT


Back to 2019 WeeklyTelcon-2019

Clone this wiki locally