Skip to content
Pavel Shamis / Pasha edited this page Jul 27, 2016 · 61 revisions

UCX Hackton 2016

Registration

Registration for the event

Dates

August 9-12

Location

  • ARM, Austin
  • Address: 1, 5707 Southwest Pkwy #100, Austin, TX 78735
  • Google maps: click here

Agenda

We are planning to have the annual UCX meeting in spring of 2016. This page will track topics on the agenda and details of the meeting (TBD).

List of proposed topics:

  • Invited talks

    • 2 confirmed talks ~45 each one. Should go as the first talk of the day.
  • Intro to UCX

    • Project overview (4h, must of on the first day)
  • UCP API aspects:

    • Non-contiguous data types - 2h (moderate priority)
    • UCP active messages - 2h (moderate priority)
    • Client/server connection establishment - 1h (moderate priority)
    • Error handling and fault tolerance - 1h (moderate priority)
    • Finalize UCP API - 2h (high priority, one of the last topic to finalize the meeting)
  • UCP internals:

    • Ordering with tag matching over loose ordered transport(s) - 1h (high priorioty)
    • Multi-transport, multi-rail - 1h (high priority)
    • Interrupt driven progress - 1h (moderate priority)
    • ucp_progress (calling in other functions, multiple entrance, etc.) - 1h (high priority)
  • GPU integration - 3h

  • Results and conclusions

    • A comparison of UCX and libfabrics in terms of functionality and abstraction level - 1h

Schedule

Work in progress

Date Time Topic Speaker
Aug-9 9:00 Registration ARM Visitor Office
9:30 Opening Talk Pavel Shamis
9:45 "Future Technologies" Steve Poole
10:30 UCX Introduction - Architecture Overview Pavel Shamis
11:15 Break
11:25 UCX Introduction - UCP Yossi Itigin
12:15 Lunch
1:00 UCX Introduction - UCT TBD
1:45 UCX Introduction - UCS TBD
2:30 Break
3:00 "UCX support in MPICH"
This talk will give a short introduction into the new CH4 layer in MPICH and the UCX implementation.
Kenneth Raffenetti and Lena Oden
3:30 UCP Internals - Ordering with tag matching over loose ordered transport(s) TBD
4:30 UCP Internals - ucp_progress (calling in other functions, multiple entrance, etc...) TBD
5:30 Break for the day
Aug-10 9:00 Coffee
9:20 Highlights from prior day- intro theme of the day Pavel Shamis
9:30 "Supporting GPU Acceleration in Network"
NVIDIA GPUs have been evolving and improving as accelerated computing units over the last 10 years, leading to their adoption in areas as diverse as robotics, autonomous driving, medical imaging, seismic analysis, machine learning and supercomputers. The explosion of innovation in deep learning on these GPUs in the last 3 years is now driving their adoption even faster. It is clear now that Exascale will happen through accelerated computing, and is much less than another 10 years away. It is essential and urgent that together we deliver a well-designed distributed system that enables well-orchestrated interaction between asynchronous compute tasks on the GPU, and data movement operations across the memories of multi-node systems. In this way we permit the strong scaling across these machines necessary to reach the goal of Exascale
Nvidia
10:15 Break
10:30 UCP Internals - Interrupt driven progress
11:30 UCP Internals - Multi transport, multi rail
12:30 Lunch
1:30 "UCX over Infiniband: Performance analysis and sources of overhead "
This talk gives a deep analysis of UCP and UCT performance differences and analysis their sources. We identify some overheads that can be avoided to reach a better performance
Lena Oden, Nikela Papodopulus
2:00 GPU Integration TBD
4:00 Break
4:30 GPU Integration - continued TBD
5:30 Break
6:30 Social gather event location TBD
Aug-11 9:00 Coffee
9:20 Highlights from prior day- intro theme of day Pavel Shamis
9:30 "A System Software Approach for Enabling Integrated HPC and Big Data Applications"
The notion that one operating system or a single unified software stack will support the emerging and future needs of the HPC and Big Data application communities is unrealistic. There are many technical and non-technical reasons why functional partitioning through specialized software stacks will continue to persist. Rather than pursuing a single software stack that satisfies a diverse and competing set of requirements, approaches that enable the use and integration of multiple software stacks should be pursued. This talk describes the challenges that motivate the need to support multiple concurrent software stacks for enabling application composition, more complex application workflows, and a potentially richer set of usage models for extreme-scale HPC systems. We describe the operating system infrastructure for supporting multiple concurrent software stacks that is being developed in the Hobbes OS project and discuss issues, challenges, and potential approaches for enabling integrated HPC and Big Data applications on extreme-scale computing systems.
Barney Maccabe, Oak Ridge National Laboratory (in collaboration with Ron Brightwell & Kevin Pedretti, Sandia National Laboratories and David Bernholdt, Oak Ridge National Laboratory)
10:15 Break
10:30 UCP API - Non contiguous data types TBD
12:30 Lunch
1:30 UCP API - UCP Active messages
3:30 Break
4:00 UCP API - Client / Server connection establishment TBD
5:00 UCP API - Error handling and fault tolerance TBD
6:00 Break for the day
Aug-12 9:00 Coffee
9:20 Highlights from prior day - intro theme of day Pavel Shamis
9:30 "Next Generation of Co-Processors Emerges – In-Network Computing".
The latest revolution in HPC is the move to a co-design architecture, a collaborative effort among industry, academia, and manufacturers to reach Exascale performance by taking a holistic system-level approach to fundamental performance improvements. Co-design architecture exploits system efficiency and optimizes performance by creating synergies between the hardware and the software. Co-design recognizes that the CPU has reached the limits of its scalability, and offers an intelligent network as the new “co-processor” to share the responsibility for handling and accelerating application workloads. By placing data-related algorithms on an intelligent network, we can dramatically improve the data center and applications performance.
Gilad Shainer (Mellanox)
10:15 Break
10:30 Finalize UCP API TBD
11:30 A comparison of UCX and libfabrics in terms of functionality and abstraction level TBD
10:30 Parallel session - UCX board meeting
12:30 Lunch
1:30 Results and conclusions TBD
Clone this wiki locally