Skip to content

Instrumentation and monitoring tool

Raul Akhmetshin edited this page Jul 29, 2021 · 7 revisions

Introduction

UCX library provides a tool to analyze UCX-based applications in runtime. The tool creates a representation of each process that uses UCX library in Virtual Filesystem (VFS). The VFS hierarchy of directories shows relations between objects of UCX library. Files grouped in directories describe properties of UCX library object. The file content characterizes a specific property of the object.

How to

The tool is based on Filesystem in Userspace (FUSE) interface. FUSE v3 development package is required to build the tool. If the tool was successfully built, there will be a binary file in the UCX install directory. Launch a daemon process to enable analysis of UCX-based applications using the following command:

$ <path_to_ucx_install_dir>/bin/ucx_vfs

Each running process, that uses UCX library, has corresponding directory in /tmp/ucx/<PID>. Stop the daemon, if you don’t want to analyze your applications anymore using the following command:

$ <path_to_ucx_install_dir>/bin/ucx_vfs stop

VFS hierarchy

Directory /tmp/ucx/<PID> represents usage of UCX library by corresponding process. The directory contains three grouping sub-directories: UCP, UCT, UCS. A directory represents a UCX library object, or combines to groups objects of the same type, or properties of an object. A file describes a UCX library object property.

Files in VFS

UCP

context

File name Description
mem_address Memory address of the pointer

endpoint

File name Description
error_mode Error handling mode
mem_address Memory address of the pointer
peer_name Remote worker address name

listener

File name Description
ip Listening sockaddr: IP address
port Listening sockaddr: Port number

worker

File name Description
address_name Worker address name composed of host name and process id
keepalive/ep_count Keepalive: Number of endpoints processed in current time slot
keepalive/round_count Keepalive: Number of rounds done
mem_address Memory address of the pointer
num_all_eps Number of all endpoints (except internal endpoints)
thread_mode Thread safety mode which worker and the associated resources should be created with

UCS

global_opts

File name Description
log_level Log level above which log messages will be printed

rcache

File name Description
gc_list/length Number of regions to destroy, regions could not be destroyed from memhook
inv_q/length Number of regions which were invalidated during memory events
max_regions Maximum number of regions
max_size Maximum total size of regions
num_regions Total number of managed regions
total_size Total size of registered memory

UCT

dct

File name Description
qp_num Number of queue pairs

dci

File name Description
available Number of available queue pairs
unsignaled Number of unsignaled completion
qp_num Number of queue pairs
sw_pi Producer index for next work queue entry
prev_sw_pi Producer index where last WQE started
qstart Pointer to the begining of queue
qend Pointer to the end of queue
bb_max Maximum building block number
sig_pi Producer index for last signaled WQE
hw_ci Consumer index

iface

attribute/capability

The presence of the file means that the interface supports the feature.

File name Description
am_bcopy Buffered active message
am_dup Active messages may be received with duplicates
am_short Short active message
am_zcopy Zero-copy active message
atomic_cpu Atomic communications are consistent with respect to CPU operations
atomic_device Atomic communications are consistent only with respect to other atomics on the same device
cb_async Supports setting a callback which will be invoked within a reasonable amount of time if uct_worker_progress() is not being called
cb_sync Supports setting a callback which is invoked only from the calling context of uct_worker_progress()
connect_to_ep Supports connecting to specific endpoint
connect_to_iface Supports connecting to interface
connect_to_sockaddr Supports connecting to sockaddr
ep_check Endpoint check
ep_keepalive Transport endpoint has built-in keepalive feature
errhandle_am_id Invalid AM id on remote
errhandle_bcopy_buf Invalid buffer for buffered operation
errhandle_bcopy_len Invalid length for buffered operation
errhandle_peer_failure Remote peer failures/outage
errhandle_remote_mem Remote memory access
errhandle_short_buf Invalid buffer for short operation
errhandle_zcopy_buf Invalid buffer for zero copy operation
get_bcopy Buffered get
get_short Short get
get_zcopy Zero-copy get
pending Pending operations
put_bcopy Buffered put
put_short Short put
put_zcopy Zero-copy put
tag_eager_bcopy Hardware tag matching buffered eager support
tag_eager_short Hardware tag matching short eager support
tag_eager_zcopy Hardware tag matching zero-copy eager support
tag_rndv_zcopy Hardware tag matching rendezvous zero-copy support

attribute/am

File name Description
align_mtu MTU used for alignment
max_bcopy Total maximum size (including header) for buffered active message
max_hdr Maximum header size for zero-copy active message
max_iov Maximum number of elements in iov for zero-copy active message
max_short Total maximum size (including header) for short active message
max_zcopy Total maximum size (including header) for zero-copy active message
min_zcopy Minimum size for zero-copy active message
opt_zcopy_align Optimal alignment for zero-copy buffer address

attribute/get

File name Description
align_mtu MTU used for alignment
max_bcopy Total maximum size (including header) for buffered get
max_iov Maximum number of elements in iov for zero-copy get
max_short Total maximum size (including header) for short get
max_zcopy Total maximum size (including header) for zero-copy get
min_zcopy Minimum size for zero-copy get
opt_zcopy_align Optimal alignment for zero-copy buffer address

attribute/put

File name Description
align_mtu MTU used for alignment
max_bcopy Total maximum size (including header) for buffered put
max_iov Maximum number of elements in iov for zero-copy put
max_short Total maximum size (including header) for short put
max_zcopy Total maximum size (including header) for zero-copy put
min_zcopy Minimum size for zero-copy put
opt_zcopy_align Optimal alignment for zero-copy buffer address
Clone this wiki locally