Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One Cython Extension per class/file #712

Open
pentschev opened this issue Apr 23, 2021 · 3 comments
Open

One Cython Extension per class/file #712

pentschev opened this issue Apr 23, 2021 · 3 comments

Comments

@pentschev
Copy link
Member

To make the code more readable and later make it easier to upstream, we have split ucx_api.pyx into multiple files in #711 . Each file is included into ucx_api.pyx to keep the model of ucx_api being the only extension/module/.so file and thus be a non-breaking API change. Another alternative is to add a new .pxd file for each of the newly created .pyx files, but this requires creating a new Cython Extension, which leads to a new .so file for each extension. I did such a change in this branch, but this also leads to an increase of almost 3x in binary size:

$ ls -l ucx_api_v2/
total 9521
-rw-rw-r-- 1 pentschev pentschev     392 Apr 23 03:01 __init__.py
-rwxrwxr-x 1 pentschev pentschev  935264 Apr 23 03:01 packed_remote_key.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev 1718696 Apr 23 03:01 transfer.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  105944 Apr 23 03:01 typedefs.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  934448 Apr 23 03:01 ucx_address.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  475120 Apr 23 03:01 ucx_context.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  357232 Apr 23 03:01 ucx_endpoint.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  421488 Apr 23 03:01 ucx_listener.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev 1005648 Apr 23 03:01 ucx_memory_handle.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  281640 Apr 23 03:01 ucx_object.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  371056 Apr 23 03:01 ucx_request.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  980816 Apr 23 03:01 ucx_rkey.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev 1831968 Apr 23 03:01 ucx_worker.cpython-38-x86_64-linux-gnu.so
-rwxrwxr-x 1 pentschev pentschev  326352 Apr 23 03:01 utils.cpython-38-x86_64-linux-gnu.so
$ du -sh ucx_api_v2/
9.3M    ucx_api_v2/
$ du -sh ucx_api.cpython-38-x86_64-linux-gnu.so
3.6M    ucx_api.cpython-38-x86_64-linux-gnu.so

Are there other reasons why we should or should not do as above?

cc @jakirkham @madsbk

@jakirkham
Copy link
Member

Some reorganizing seems reasonable and should help with upstreaming.

Was this all done in PR ( #711 ) or is there still pending work?

@jakirkham
Copy link
Member

Also as to binary size, this in part a consequence of Cython including some repeated symbols in different libraries. Please see issue ( cython/cython#2356 ) for context. There are some things we can do in terms of compiler flags, but there are some things that are out-of-our-hands

@pentschev
Copy link
Member Author

Some reorganizing seems reasonable and should help with upstreaming.

Was this all done in PR ( #711 ) or is there still pending work?

I think #711 was already a big chunk of that work, followed by #716 , we don't anymore have any files in the "core" library that are over 300 lines, with the only exception being https://github.com/rapidsai/ucx-py/blob/3111d97421f8df8f2ebb0042a84800e2b1b5ced4/ucp/_libs/ucx_api_dep.pxd which includes all the C prototypes only, currently just under 500 lines. In any case, IMO this is already enough for the upstreaming effort, as it matches UCX's limit of 500 lines of code per PR from the General Guidelines for Contributors.

The reason I opened this issue was to discuss whether it's worth creating a new Extension per-class, or just including them all in a single Extension as done in https://github.com/rapidsai/ucx-py/blob/3111d97421f8df8f2ebb0042a84800e2b1b5ced4/ucp/_libs/ucx_api.pyx now. Particularly, I think that would increase both code and binary complexities without any clear advantage, so I'm not sure it's worth doing so.

Also as to binary size, this in part a consequence of Cython including some repeated symbols in different libraries. Please see issue ( cython/cython#2356 ) for context. There are some things we can do in terms of compiler flags, but there are some things that are out-of-our-hands

Thanks for referencing that issue. To me it doesn't seem like we have a much better alternative to how we could improve UCX-Py's core library to what it currently is.

Based on what I wrote above, do you have any opinions on whether we should do something different (e.g., one Extension per class) or if it looks reasonable in its current state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants