Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton model versioning #8935

Closed

Conversation

kpedro88
Copy link
Contributor

This PR is kept as a draft until the cms-data updates are integrated (see cms-sw/cmssw#43695).

This PR provides a tool to automatically check and update model checksums in config.pbtxt files. This tool is installed in CMSSW for general use. A small patch is applied to protobuf to make the tool work more nicely. The tool also uses the Triton python client, which is added as a Python package external.

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kpedro88 (Kevin Pedro) for branch IB/CMSSW_14_0_X/master.

@iarspider, @cmsbuild, @aandvalenzuela, @smuzaffar can you please review it and eventually sign? Thanks.
@sextonkennedy, @rappoccio, @antoniovilela you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Jan 10, 2024

cms-bot internal usage

@@ -312,6 +312,9 @@ find external/%cmsplatf -type l | xargs ls -l
%{?PatchReleaseSymlinkRelocate:%PatchReleaseSymlinkRelocate}
echo "%{cmsroot}" > %{i}/config/scram_basedir

# install checksum tool for local use (via scram)
cp data/cmsTritonChecksumTool %i/bin/
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smuzaffar is this the right way to install the tool into CMSSW? Running a full build to test this is a bit beyond my current resources...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpedro88 , scram-project-build is used to build scram based project e.g. coral/cmssw etc. If this tool need to go in to cmssw/bin directory then why not add it under some package in cms-sw/cmssw repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok @kpedro88 , after reading cms-sw/cmssw#43695 I remember where this tool is used. It should be copied here in scram-project-build.file.

This tool is needed by cms-data externals. Can you please remind me how to run this for a cms-data repo? All cms-data external use https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_0_X/master/cmsswdata-github.file . So for selected cms-data externals (where we want to run it), we need to create a file e.g. cmsdist/data/data-HeterogeneousCore-SonicTriton.file with contents like

%define PostBuild %{cmsdist_directory}/data/cmsTritonChecksumTool <args>

so that %{cmsdist_directory}/data/cmsTritonChecksumTool can be run here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it needs to be here so it can run on cms-data packages during build time, and will then be copied into CMSSW. If there's a better place to execute that copy operation, I'm happy to move it.

My untested attempt to execute the tool during build time is here: (not in this PR, because I want to have everything in place before we start enforcing it)
kpedro88@44f13f2

This will be a no-op for most cms-data packages that don't have any config.pbtxt files, but using find avoids having to maintain a manual list (which is just one more thing to synchronize, i.e. one more potential source of failures when people forget to synchronize it).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @kpedro88 , about kpedro88@44f13f2, we should not modify the content of cmsswdata-github.file file otherwise it will rebuild all the data externals. We can add

%define PostBuild for CONFIG in $(find . -name config.pbtxt); do %{cmsdist_directory}/data/cmsTritonChecksumTool -c $CONFIG; done

in the cmsdist/data/data-<Package-Name>.file for only those data externals which contain config.pbtxt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern, though, is that this is an additional manual step that has to be taken any time a Triton model is added to a (new or existing) cms-data package. This build-time check should be automatic. Rebuilding all the data externals would only happen once, right? Is that such a big problem?

Should I move the line from the "Prep" step to the "Build" step either way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes rebuilding all data means re-distributing 20GB of data again. Although this is not a concern for CVMFS distribution but for online where cmssw is install on local disk this can cause issues.

Adding new data external is automated, bot know how to add a new data external in cmsdist. We can teach bot to create this cmsdist/data/data-.file for new data external if it contains config.pbtxt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smuzaffar thanks! This looks like a good approach that satisfies both of our concerns.

A few questions:

  1. the call to cmsTritonChecksumTool -c $CONFIG will go in the file data/cmsTritonPostBuild.file, once we're ready to enable it?
  2. will this automatically propagate to the existing cms-data repos that contain config.pbtxt files, or do we have to add the data-X-Y.file files for those manually in cmsdist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. the call to cmsTritonChecksumTool -c $CONFIG will go in the file data/cmsTritonPostBuild.file, once we're ready to enable it?

Yes

will this automatically propagate to the existing cms-data repos that contain config.pbtxt files, or do we have to add the data-X-Y.file files for those manually in cmsdist?

No , not by default. data-X-Y.file will be created only when those data repos opens a new PR.

@kpedro88 kpedro88 mentioned this pull request Jan 10, 2024
17 tasks
@@ -0,0 +1,105 @@
#!/usr/bin/env python3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpedro88 , I think it is better to add this in to cmssw repo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah OK, this is need for cms-data externals. So yes it sould not go in to cmssw

@@ -359,6 +359,8 @@ toolz==0.7.1
tornado==6.3.3
tqdm==4.65.0
traitlets==5.9.0
# always sync version number with triton-inference-client.spec (C++ version)
tritonclient==2.25.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kpedro88 , this alone is not going to include python tritonclient in cmssw. Please add py3-tritonclient dependency in https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_0_X/master/python_tools.spec .

@cmsbuild
Copy link
Contributor

Pull request #8935 was updated.

@smuzaffar
Copy link
Contributor

@kpedro88 , I have tried to build locally and I think this is not the right approch.

  1. Which this approch the data packages start depending on many externals (e.g py3-tritonclient py3-protobuf which brings in other externals) and will get rebuild any time one of these externals gets updated.
  2. py3-tritonclient from upstream is only available in wheel form ( no sources) and has cuda dependency. We should enable the PYTHON_GRPC flag in our triton client spec to get the python triton client instead of using prebuild wheel

Why not we just add a unit test in cmssw which runs cmsTritonChecksumTool by searching all config.pbtxt files under CMSSW_SEARCHPATH . We can make sure that unit tests is run during PR tests whenever any cmsswdata is changed. This way we do not rebuild data externals due to change in tools needed by cmsTritonChecksumTool .

@kpedro88
Copy link
Contributor Author

@smuzaffar I agree that a unit test would be conceptually simpler. Originally I wanted to do it this way, but I didn't know how to make the unit test (which usually only runs when the CMSSW package itself is modified) run whenever someone makes a cms-data PR. But since we have to update cms-bot anyway, just incorporating a CMSSW unit test into the bot's PR tests (which is already done for a few other cases, I think) is probably simpler. I will work on rearranging things next week. That will also simplify this PR, since the checksum tool can be reabsorbed into the general config tool in CMSSW.

I will try to set up the Triton python client through our existing C++ external. I think there was some reason I hadn't done that originally, but hopefully it works. (pip installing in CMSSW via scram-venv worked fine for me when developing these scripts, but I suppose that's because I was using a CUDA-enabled CMSSW build.)

@smuzaffar
Copy link
Contributor

@kpedro88 , #8943 should build the triton-client python interface

@kpedro88 kpedro88 closed this Jan 16, 2024
@kpedro88
Copy link
Contributor Author

superseded by #8943, #8947, cms-sw/cmssw#43696 in the new plan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants