Skip to content

Commit

Permalink
Merge pull request #833 from rapidsai/branch-0.24
Browse files Browse the repository at this point in the history
[RELEASE] ucx-py v0.24
  • Loading branch information
raydouglass authored Feb 2, 2022
2 parents 0bffcc5 + 22e9e82 commit ba0aae7
Show file tree
Hide file tree
Showing 36 changed files with 364 additions and 332 deletions.
2 changes: 1 addition & 1 deletion ci/release/update-version.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,4 @@ function sed_runner() {
}

# cpp update
sed_runner "s/export RAPIDS_VERSION=.*/export RAPIDS_VERSION=\"${NEXT_MAJOR}.${NEXT_MINOR}\"/g" ci/gpu/build.sh
# sed_runner "s/export RAPIDS_VERSION=.*/export RAPIDS_VERSION=\"${NEXT_MAJOR}.${NEXT_MINOR}\"/g" ci/gpu/build.sh
55 changes: 41 additions & 14 deletions docs/source/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,16 +15,41 @@ UCX/UCX-Py either with environment variables or programmatically during initiali
.. note::
When programmatically configuring UCX-Py, the ``UCX`` prefix is not used.

For novice users we recommend the following settings:
For novice users we recommend using UCX-Py defaults, see the next section for details.

UCX-Py vs UCX Defaults
----------------------

UCX-Py redefines some of the UCX defaults for a variety of reasons, including better performance for the more common Python use cases, or to work around known limitations or bugs of UCX. To verify UCX default configurations, for the currently installed UCX version please run the command-line tool ``ucx_info -f``.

Below is a list of the UCX-Py redefined default values, and what conditions are required for them to apply.

Apply to all UCX versions:

::

UCX_RNDV_THRESH=8192
UCX_RNDV_SCHEME=get_zcopy

Apply to UCX < 1.11.0, newer UCX versions rely on UCX defaults:

::

UCX_MEMTYPE_CACHE=n UCX_TLS=all
UCX_SOCKADDR_TLS_PRIORITY=sockcm

Apply to UCX >= 1.12.0, older UCX versions rely on UCX defaults:

::

``UCX_TLS=all`` configures UCX to try all available transport methods. However, users who want to define specific transport methods to use and/or other optional settings may do so. Below we define the more common options and provide some example combinations and usage.
UCX_CUDA_COPY_MAX_REG_RATIO=1.0
UCX_MAX_RNDV_RAILS=1

Env Vars
--------
Please note that ``UCX_CUDA_COPY_MAX_REG_RATIO=1.0`` is only set provided at least one GPU is present with a BAR1 size smaller than its total memory (e.g., NVIDIA T4).

UCX Environment Variables in UCX-Py
-----------------------------------

In this section we go over a brief overview of some of the more relevant variables for current UCX-Py usage, along with some comments on their uses and limitations. To see a complete list of UCX environment variables, their descriptions and default values, please run the command-line tool ``ucx_info -f``.

DEBUG
~~~~~
Expand Down Expand Up @@ -55,28 +80,30 @@ This is a UCX CUDA Memory optimization which enables/disables a remote endpoint

Values: ``n``/``y``

UCX_MAX_RNDV_RAILS
``````````````````

Limitting the number of rails (network devices) to ``1`` allows UCX to use only the closest device according to NUMA locality and system topology. Particularly useful with InfiniBand and CUDA GPUs, ensuring all transfers from/to the GPU will use the closest InfiniBand device and thus implicitly enable GPUDirectRDMA.

Values: Int (UCX default: ``2``)

UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES
`````````````````````````````````

By defining ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda``, UCX enables registration cache based on a buffer's base address, thus preventing multiple time-consuming registrations for the same buffer. This is particularly useful when using a CUDA memory pool, thus requiring a single registration between two ends for the entire pool, providing considerable performance gains, especially when using InfiniBand.
By defining ``UCX_MEMTYPE_REG_WHOLE_ALLOC_TYPES=cuda`` (default in UCX >= 1.12.0), UCX enables registration cache based on a buffer's base address, thus preventing multiple time-consuming registrations for the same buffer. This is particularly useful when using a CUDA memory pool, thus requiring a single registration between two ends for the entire pool, providing considerable performance gains, especially when using InfiniBand.

Requires UCX 1.11 and above.

TRANSPORTS
~~~~~~~~~~

UCX_MAX_RNDV_RAILS
``````````````````

Limiting the number of rails (network devices) to ``1`` allows UCX to use only the closest device according to NUMA locality and system topology. Particularly useful with InfiniBand and CUDA GPUs, ensuring all transfers from/to the GPU will use the closest InfiniBand device and thus implicitly enable GPUDirectRDMA.

Values: Int (UCX default: ``2``)

UCX_RNDV_THRESH
```````````````

This is a configurable parameter used by UCX to help determine which transport method should be used. For example, on machines with multiple GPUs, and with NVLink enabled, UCX can deliver messages either through TCP or NVLink. Sending GPU buffers over TCP is costly as it triggers a device-to-host on the sender side, and then host-to-device transfer on the receiver side -- we want to avoid these kinds of transfers when NVLink is available. If a buffer is below the threshold, `Rendezvous-Protocol <https://github.com/openucx/ucx/wiki/Rendezvous-Protocol>`_ is triggered and for UCX-Py users, this will typically mean messages will be delivered through TCP. Depending on the application, messages can be quite small, therefore, we recommend setting a small value if the application uses NVLink or InfiniBand: ``UCX_RNDV_THRESH=8192``

Values: Int (UCX-Py default: ``8192``)


UCX_RNDV_SCHEME
```````````````

Expand Down
135 changes: 0 additions & 135 deletions docs/source/dask.rst

This file was deleted.

1 change: 0 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ UCX-Py is the Python interface for `UCX <https://github.com/openucx/ucx>`_, a lo
quickstart
install
configuration
dask
deployment
ucx-debug

Expand Down
1 change: 1 addition & 0 deletions docs/source/transport-monitoring.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ CUDA IPC/NVLink
Monitor traffic over all GPUs

::

nvidia-smi nvlink -gt d


Expand Down
4 changes: 2 additions & 2 deletions tests/test_from_worker_address_error.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,11 +150,11 @@ def test_from_worker_address_error(error_type):
assert not server.exitcode

if ucp.get_ucx_version() < (1, 12, 0) and client.exitcode == 1:
if error_type == "timeout_send":
if all(t in error_type for t in ["timeout", "send"]):
pytest.xfail(
"Requires https://github.com/openucx/ucx/pull/7527 with rc/ud."
)
elif error_type == "timeout_recv":
elif all(t in error_type for t in ["timeout", "recv"]):
pytest.xfail(
"Requires https://github.com/openucx/ucx/pull/7531 with rc/ud."
)
Expand Down
46 changes: 40 additions & 6 deletions ucp/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,24 +20,58 @@
from ._version import get_versions as _get_versions # noqa
from .core import * # noqa
from .core import get_ucx_version # noqa
from .utils import get_address, get_ucxpy_logger # noqa
from .utils import get_ucxpy_logger # noqa
from ._libs.ucx_api import get_address # noqa

# Setup UCX-Py logger
logger = get_ucxpy_logger()


if "UCX_SOCKADDR_TLS_PRIORITY" not in os.environ and get_ucx_version() < (1, 11, 0):
logger.debug(
logger.info(
"Setting env UCX_SOCKADDR_TLS_PRIORITY=sockcm, "
"which is required to connect multiple nodes"
)
os.environ["UCX_SOCKADDR_TLS_PRIORITY"] = "sockcm"

if not os.environ.get("UCX_RNDV_THRESH", False):
if "UCX_RNDV_THRESH" not in os.environ:
logger.info("Setting UCX_RNDV_THRESH=8192")
os.environ["UCX_RNDV_THRESH"] = "8192"

if not os.environ.get("UCX_RNDV_SCHEME", False):
if "UCX_RNDV_SCHEME" not in os.environ:
logger.info("Setting UCX_RNDV_SCHEME=get_zcopy")
os.environ["UCX_RNDV_SCHEME"] = "get_zcopy"

if "UCX_CUDA_COPY_MAX_REG_RATIO" not in os.environ and get_ucx_version() >= (1, 12, 0):
try:
import pynvml

# After handling of environment variable logging, add formatting to the logger
logger = get_ucxpy_logger()
pynvml.nvmlInit()
device_count = pynvml.nvmlDeviceGetCount()
large_bar1 = [False] * device_count

for dev_idx in range(device_count):
handle = pynvml.nvmlDeviceGetHandleByIndex(dev_idx)
total_memory = pynvml.nvmlDeviceGetMemoryInfo(handle).total
bar1_total = pynvml.nvmlDeviceGetBAR1MemoryInfo(handle).bar1Total

if total_memory <= bar1_total:
large_bar1[dev_idx] = True

if all(large_bar1):
logger.info("Setting UCX_CUDA_COPY_MAX_REG_RATIO=1.0")
os.environ["UCX_CUDA_COPY_MAX_REG_RATIO"] = "1.0"
except (
ImportError,
pynvml.NVMLError_LibraryNotFound,
pynvml.NVMLError_DriverNotLoaded,
pynvml.NVMLError_Unknown,
):
pass

if "UCX_MAX_RNDV_RAILS" not in os.environ and get_ucx_version() >= (1, 12, 0):
logger.info("Setting UCX_MAX_RNDV_RAILS=1")
os.environ["UCX_MAX_RNDV_RAILS"] = "1"


__version__ = _get_versions()["version"]
Expand Down
4 changes: 4 additions & 0 deletions ucp/_libs/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
# See file LICENSE for terms.

from .utils import nvtx_annotate # noqa
57 changes: 57 additions & 0 deletions ucp/_libs/exceptions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Copyright (c) 2019-2021, NVIDIA CORPORATION. All rights reserved.
# See file LICENSE for terms.

import contextlib
import logging

logger = logging.getLogger("ucx")


@contextlib.contextmanager
def log_errors(reraise_exception=False):
try:
yield
except BaseException as e:
logger.exception(e)
if reraise_exception:
raise


class UCXBaseException(Exception):
pass


class UCXError(UCXBaseException):
pass


class UCXConfigError(UCXError):
pass


class UCXWarning(UserWarning):
pass


class UCXCloseError(UCXBaseException):
pass


class UCXCanceled(UCXBaseException):
pass


class UCXConnectionReset(UCXBaseException):
pass


class UCXMsgTruncated(UCXBaseException):
pass


class UCXNotConnected(UCXBaseException):
pass


class UCXUnreachable(UCXBaseException):
pass
Loading

0 comments on commit ba0aae7

Please sign in to comment.