Skip to content

WeeklyTelcon_20160216

Jeff Squyres edited this page Nov 18, 2016 · 1 revision

Open MPI Weekly Telcon


  • Dialup Info: (Do not post to public mailing list or public wiki)

Attendees

  • Geoff Paulsen
  • Jeff Squyres
  • Brad Benton
  • Edgar Gabriel
  • Howard
  • Josh Hursey
  • Ryan Grant
  • Todd Kordenbrock
  • Joshua Ladd
  • Ralph
  • Sylvain Jeaugey

Agenda

Review 1.10

  • Milestones: https://github.com/open-mpi/ompi-release/milestones/v1.10.3
  • Targeting beginning April for 1.10.3 - no new drivers
  • Nathan - 0 byte send issue?
  • Howard - verbs usNIC build default issue? - PR 938 waiting for Howard to review.
  • Jeff - Fortran08? - ralph just committed.
  • Issue 1136 - SLES12 - Longrunning jobs mpirun SIGCHLD at end of Job?
  • nVidia now showing MTT failures, were silently failing before.
    • hello_alloc_memusempi - 1sided. Slyvian should open an issue against 1.10.x.
    • Some race condition. so possibly not fixed on master and 2.x, might just not hit it.

Review 2.0.x

Review Master?

MTT status:

  • From Last week:
    • lot of issues are usNIC related. Jeff will STILLlook at.
      • non-one-sided failures with usNIC cluster. Perhaps cluster network setup.
    • nVidia look like dynamics related. Slyvian fixing something about way it launches.
      • Turned of nVidida MTT tests right now. Just started getting different errors.
        • BOTH Master and 2.x - some CUDA related things are broken. IS collective related.
        • Some new errors for 1.10 - because jeff committed some fixes on the test, that is now SHOWING the error.
      • Hope to get testing back online today or tomorrow.
    • Nathan will look at all one-sided failures.
    • tcp btl might have an issue, getting tried to lock resource but already locked warning.

Status Updates:

  • LANL - Release stuff, Some investigations for meeting next week.
    • Now that we have KNL boxes, been working some with Open MPI and MPICH KNL, vast improvement over KNC.
    • Binaries will work on KNL or Haswells.
    • Want to get back to OMPI_PLACES setting. Not sure where to put it. Discuss at face2face.
      • will need to use NESTED OMP parallelism. Want to make that easy.
    • Want to make sure everything is clean for 1-sided for 2.0
    • Trying to find last error with MPOOL re-write. Asking for feedback, and asking how people like the new organization.
      • Really want George's comment here.
      • will give us ability to use MEMKIND, and will take some work of getting everything to use same allocators.
      • Can expose performance variables to tweak settings.
  • Houston - Mostly using release branch, done a little more code development for glass
  • IBM -
    • Getting MTT and builds setup internally.
    • Defining support matrix for new open MPI product.
    • Will be using RFC process for some bigger features.
    • Problem with MTT reporter. Josh put patch for it. Still running off svn repo, but we'll need to do a swap.
    • During the swap MTT will be down.

Status Update Rotation

  1. LANL, Houston, IBM
  2. Cisco, ORNL, UTK, NVIDIA
  3. Mellanox, Sandia, Intel

Back to 2016 WeeklyTelcon-2016

Clone this wiki locally