Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.13.0 update NEWS and AUTHORS #8241

Merged
merged 2 commits into from
May 26, 2022
Merged

Conversation

evgeny-leksikov
Copy link
Contributor

hoopoepg
hoopoepg previously approved these changes May 18, 2022
NEWS Outdated
## 1.13.0 (May 17, 2022)
#### Features
##### Core
* Added new objects to VFS: ep local and remote address (e5dff69d5), number of total/failed EP creations (b970b7302), ep destroys and failed eps (6bf7f7d2f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a new approach that we specify sha?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just to structurize, I can delete them before merge if it's not needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to delete hash values before release

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ep->endpoint

NEWS Outdated
* Added support of user memh in proto_v1 (58275b3d9)
* Added support selecting local address, UCP_EP_PARAM_FIELD_LOCAL_SOCK_ADDR (bac14cafb)
* Added option to limit GPUDirectRDMA size in rndv, UCX_RNDV_MEMTYPE_DIRECT_SIZE (0442bbef6)
* Added support pre-registered memory in ucx_perftest (8125b882c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it really UCP? Probably it corresponds to tools section

NEWS Outdated
* Increased port range + make it dependent from agent_id (c6443fc1b)
* Checked compiler minimum version (b2cfe5025)
* Set GOCACHE a local directory that is cleared for each job in CI (3a3a55aa7)
* Disableed module for goperftest (0a559075f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabled

NEWS Outdated
#### JAVA
* Throw exception if ucp_mem_query failed
#### GO
* Disabled go bindings in rpmbuild. (53ce67838)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz remove . for consistency

AUTHORS Outdated
@@ -17,6 +17,8 @@ Devendar Bureddy <devendar@mellanox.com>
Devesh Sharma <devesh.sharma@broadcom.com>
Dmitry Gladkov <dmitrygla@mellanox.com>
Doug Jacobsen <dmjacobsen@lbl.gov>
Edgar <edgar.gabriel@amd.com>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edgar -> Edgar Gabrriel

AUTHORS Outdated
Jakir Kham <jakirkham@gmail.com>
Jason Gunthorpe <jgg@mellanox.com>
Jeff Daily <jeff.daily@amd.com>
JKLiang9714 <1023587725@qq.com>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have full name somewhere. All contributors should have signed CLA.

NEWS Outdated
## 1.13.0 (May 17, 2022)
#### Features
##### Core
* Added new objects to VFS: ep local and remote address (e5dff69d5), number of total/failed EP creations (b970b7302), ep destroys and failed eps (6bf7f7d2f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to delete hash values before release

NEWS Outdated
## 1.13.0 (May 17, 2022)
#### Features
##### Core
* Added new objects to VFS: ep local and remote address (e5dff69d5), number of total/failed EP creations (b970b7302), ep destroys and failed eps (6bf7f7d2f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ep->endpoint

NEWS Outdated
* Added support of UCX static libraries (43f6c3944, 15ce31e89, feed230b3, 942e551aa, e672c9560, e4949b8f4, b6825c64e, ea4d096be, dc46ec23b, 2fb94a67e, 574836e74, 2ea0bf33a, 59f9005b7, cc0c2f328, 47374d098)
* Capitalized lines in CodeStyle documentation (1d6bc74ee)
* Added profiling for rkey management routines (dddfbcfe6)
* Enabled relaxed order for AMD CPUs (59c66aa4c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabled by default PCIe relaxed order...

NEWS Outdated
* Fixed query local/remote saddr (844cf1eb3)
#### GPU (CUDA, ROCM)
* Fixed a bug in invalidating address range in CUDA_IPC (c6b13e738)
* Fixed cuda ctx caching and cleanup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda->CUDA

NEWS Outdated
* Fixed a bug in invalidating address range in CUDA_IPC (c6b13e738)
* Fixed cuda ctx caching and cleanup
* Fixed error handling on cuda memtype cache initialization (119a81b0f)
* Fixed rocmem initialization (f8fa567b4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCM

NEWS Outdated
* Fixed cuda ctx caching and cleanup
* Fixed error handling on cuda memtype cache initialization (119a81b0f)
* Fixed rocmem initialization (f8fa567b4)
* Fixed rocm components compilation (d0f4b1a48)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCM

NEWS Outdated
* Fixed potential segfault in rocm iface_ops
* Fixed IPC tls reachability check (3dccc3043)
#### KNEM
* Fixed mem registration cost (ca891ef7d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memory

NEWS Outdated
#### UCM
* Fixed potential hang on init (bf66412b4)
#### UCS
* Fixed solve name shadow problem in CentOS6.x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed solve ?

@edgargabriel
Copy link
Contributor

edgargabriel commented May 18, 2022

Are the news of the 1.12.1 release missing from the file?

Beyond this, I think it looks mostly good to me, from the ROCm code there are three issues that have been merged into the 1.13.0 branch today, not sure whether/how to add that:

increase max. number of hsa agents (7baac93)
fix rocm memory type detection (4a876b5)
use rocm remote_agent if available (79ea940)

@evgeny-leksikov
Copy link
Contributor Author

Are the news of the 1.12.1 release missing from the file?

good point, 1.12.x branch has its own NEWS history, seems like it should be backported to here but I'm not sure if everything from 1.12.1 was backported to master/1.13.x, on other hand diff 1.12.0...1.13.0 has all the changes including 1.12.1 backports. @yosefe is there a good way to resolve this?

@edgargabriel Are increase "max. number of hsa agents (7baac93)" and "use rocm remote_agent if available (79ea940)" features or bugfixes?

@edgargabriel
Copy link
Contributor

I think the first item (increase number of hsa agents) is a feature, the other two items are bug fixes

@shamisp
Copy link
Contributor

shamisp commented May 19, 2022

Are the news of the 1.12.1 release missing from the file?

good point, 1.12.x branch has its own NEWS history, seems like it should be backported to here but I'm not sure if everything from 1.12.1 was backported to master/1.13.x, on other hand diff 1.12.0...1.13.0 has all the changes including 1.12.1 backports. @yosefe is there a good way to resolve this?

@edgargabriel Are increase "max. number of hsa agents (7baac93)" and "use rocm remote_agent if available (79ea940)" features or bugfixes?

I typically include changes from all previous releases.

NEWS Outdated Show resolved Hide resolved
NEWS Outdated Show resolved Hide resolved
NEWS Outdated Show resolved Hide resolved
NEWS Outdated Show resolved Hide resolved
NEWS Outdated Show resolved Hide resolved
NEWS Outdated Show resolved Hide resolved
NEWS Outdated Show resolved Hide resolved
NEWS Outdated Show resolved Hide resolved
NEWS Outdated Show resolved Hide resolved
NEWS Outdated Show resolved Hide resolved
yosefe
yosefe previously approved these changes May 19, 2022
@shamisp
Copy link
Contributor

shamisp commented May 19, 2022

Can you please bring over 1.12.x changes to the news file as well ?

@evgeny-leksikov
Copy link
Contributor Author

evgeny-leksikov commented May 19, 2022

Can you please bring over 1.12.x changes to the news file as well ?

@shamisp 1.12.1 is here, see line 150

@shamisp
Copy link
Contributor

shamisp commented May 20, 2022

Can you please bring over 1.12.x changes to the news file as well ?

@shamisp 1.12.1 is here, see line 150

oh, somehow overlooked it. thanks

@shamisp shamisp self-requested a review May 20, 2022 02:11
shamisp
shamisp previously approved these changes May 20, 2022
@evgeny-leksikov
Copy link
Contributor Author

@yosefe ok to merge?

@yosefe
Copy link
Contributor

yosefe commented May 20, 2022

@edgargabriel @Akshay-Venkatesh @tonycurtis can you pls review?

@edgargabriel
Copy link
Contributor

@edgargabriel @Akshay-Venkatesh @tonycurtis can you pls review?

@yosefe I do not have a button to submit a review, but it looks good to me.

NEWS Outdated
## 1.13.0 (May 19, 2022)
#### Features
##### Core
* Added new objects to VFS: local and remote address of enpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enpoint -> endpoint

NEWS Outdated
#### Features
##### Core
* Added new objects to VFS: local and remote address of enpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints
* Added support of UCX static libraries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of -> for

NEWS Outdated
* Added new objects to VFS: local and remote address of enpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints
* Added support of UCX static libraries
* Added profiling for rkey management routines
* Enabled by default PCIe relaxed order for AMD CPUs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would reword: PCIe ... enabled by default

NEWS Outdated
* Added API to pass pre-registered memory handle to UCP operations
* Added implementation of AM rendezvous protocol
* Added 2-stage pipeline rendezvous protocol for GPU
* Added support of fragment mem_type for v1 pipeline proto, disabled by default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of -> for

NEWS Outdated
* Added support of fragment mem_type for v1 pipeline proto, disabled by default
* Added active message support for proto v2
* Added UCP memory registration cache
* Improved adaptive progress - deactivate iface when discard/cleanup p2p lanes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discarding / cleaning up

NEWS Outdated
* Fixed configure if can't find go compiler
* Standalone performance benchmark
* Increased port range + make it dependent from agent_id
* Checked compiler minimum version
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Check"?

NEWS Outdated
* Standalone performance benchmark
* Increased port range + make it dependent from agent_id
* Checked compiler minimum version
* Set GOCACHE a local directory that is cleared for each job in CI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GOCACHE to a ...

NEWS Outdated
* Set GOCACHE a local directory that is cleared for each job in CI
* Disabled module for goperftest
* Fixed OOS build
* Wait for thread to write to the channel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs some context

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is minor test change, so removing it

NEWS Outdated
* Fixed memtype cache fallback when memory hooks are not installed
* Fixed parsing header flags of worker address
* Fixed pipeline protocol when sending from host memory to GPU memory
* Fixed transport progress not deactivated when all its connections are closed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its == transport's ?

NEWS Outdated
lead to runtime errors or data corruption when Cuda memory is used and linked
statically with cuda runtime.
In order to revert to previous behavior (when the application is linked
dynamically with cuda runtime), can set UCX_MEM_CUDA_HOOK_MODE=reloc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you / the user can set

edgargabriel
edgargabriel previously approved these changes May 20, 2022
@evgeny-leksikov
Copy link
Contributor Author

@tonycurtis your comments are addressed, could you please review again?

tonycurtis
tonycurtis previously approved these changes May 25, 2022
NEWS Outdated
* Added support for fragment mem_type for v1 pipeline proto, disabled by default
* Added active message support for proto v2
* Added UCP memory registration cache
* Improved adaptive progress - deactivate iface when discarding / cleanup up p2p lanes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup up looks weird

NEWS Outdated
* Fixed EP refcount overflow
* Fixed EP error handling flow
* Fixed wire compatibility in address unpacking
* Fixed ucp_ep_close_nb for failed endpoint when related requests have registered memory which should be invalidated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which->that

yosefe
yosefe previously approved these changes May 26, 2022
@evgeny-leksikov
Copy link
Contributor Author

squashed

@yosefe yosefe merged commit 75f0c25 into openucx:v1.13.x May 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants