Skip to content

UCC Virtual F2F Meeting Information

Manjunath Gorentla Venkata edited this page May 11, 2020 · 35 revisions

UCC Virtual F2F Meeting (May 11-13th and May 18-19th)

Registration

Please fill in the form here

Agenda

Day1

Meeting Notes

Monday, May 11th, 2020

Time Topic Telecon
7:00 am - 7:30 PT Kickoff and Opening Remarks (Gilad Shainer)
7:30 - 8:15 PT Highlights of UCC API (Review) (Manju)
8:15 - 8:30 AM PT Break
8:30 - 9:30 AM PT Teams API (Manju; All/Discussion)
9:30 - 9:45 AM PT Break
9:45 - 11:00 AM PT Endpoints / Collective Operations (Manju; All/Discussion)

Day_1_Notes

Participants

  • Manjunath Gorentla Venkata
  • Alex Margolin
  • Sergey Lebedev
  • Valentin Petrov
  • Rami Nudelman
  • Baker, Matthew
  • Tony
  • Gilad Shainer
  • James S Dinan .
  • Chambreau, Chris
  • Gil Bloch
  • Dmitry Gladkov
  • Arturo
  • Pavel Shamis
  • Ravi, Naveen
  • Raffenetti, Kenneth J.
  • Akshay Venkatesh

Discussion

  • Initialization

    • Have a flexible infrastructure for initialization and selection of library functionality
    • Discuss final options during component arch discussion
    • UCC config interface to follow UCS config. 
    • Rename ucc_config to ucc_params to reflect UCX style  
  • Context

    • Do we need sync model config on the context create ?
      • Yes for enabling RDMA based implementations
      • The drawback - might have to create more contexts (sync and non-sync)
        • Yes, might require multiple objects but not necessarily multiple resources
        • Explore explicit device abstraction and ability to express affinity and propose to the WG group
  • Team Creation

    • Need to revisit endpoints (as this seems to be implementation specific) after presentation from Alex
    • Can we hide endpoint from interface and enable agnostic way of creating teams
  • Collective Operations

    • Need to define the mapping of programming model (src, dst) to UCC (src, dst) for cases like MPI broadcast, which has only set of buffers.
    • Is there a need for multiple outstanding persistent collective operations of same type ? No use case yet.

Day2

Tuesday, May 12th, 2020

  • Join the Meeting
  • +1 425-659-5232 United States, Seattle (Toll)
  • (844) 612-0969 United States (Toll-free)
  • Conference ID: 997 771 404#
Time Topic Telecon
7:00 am - 7:45 PT Topology Aware Collectives (Sameh)
7:45 - 8:00 AM PT Break
8:00 am - 8:45 PT Collectives API - the Reactive alternative (Alex)
8:45 - 9:00 AM PT Break
9:00 - 11:00 PT Task and Plan API Discussion

Day3

Wednesday, May 13th, 2020

  • Join the Meeting
  • +1 425-659-5232 United States, Seattle (Toll)
  • (844) 612-0969 United States (Toll-free)
  • Conference ID: 874 275 202#
Time Topic Telecon
7:00 am - 7:45 PT GPUs/DL (TBD)
7:45 - 9:00 PT API Discussion
9:15 - 11:00 PT API Discussion

Day4

Monday, May 18th, 2020

Time Topic Telecon
7:00 am - 7:45 PT OMPI-X / ADAPT (George Bosilca/Talk)
7:45 - 9:00 PT Component Architecture (Review for non-WG participants)(Alex/Val/Discussion) / Algorithm Selection / Memory Registration

Day5

Tuesday, May 19th, 2020

Time Topic Telecon
7:00 am - 11:00 PT

Topics

(Laundry List)

  • Kickoff (Gilad)
  • Highlights of UCC API (Review for non-WG participants) (Manju)
  • OMPI-X / ADAPT (George Bosilca/Talk)
  • Requirements from the AI Users/Deep Learning/GPUs (NVIDIA; All)
  • API Discussion (Incase not completed in WG)
    • Library Initialization
    • Resource Abstraction (Contexts)
    • Teams API (Manju; All/Discussion)
    • Endpoints (Manju; All/Discussion)
    • Collective Operations (Manju; All/Discussion)
    • Task API (Manju; All/Discussion)
    • Alternative Control-path API (Initialization and communicator creation) (Alex; All/Discussion)
    • Alternative Data-path API (Starting and progressing collectives) (Alex; All/Discussion)
  • Component Architecture (Review for non-WG participants)(Alex/Val/Discussion)
  • Flesh out UCC.H Header (All)
  • Unit tests and CI infrastructure (?)
  • Documentation (doxygen ?)(?)
  • Multirail Support (Sergey)
  • Topology-aware collectives (Sameh/Talk)
  • Memory registration (Discussion)
  • Algorithm selection (Discussion)
Clone this wiki locally