Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix testAOTTools for non-amd64 arch #46315

Merged
merged 1 commit into from
Oct 10, 2024
Merged

Conversation

iarspider
Copy link
Contributor

PR description:

This unit test fails with the following message:

Traceback (most recent call last):
  File "/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02858/el8_aarch64_gcc12/cms/cmssw/CMSSW_14_2_TF_X_2024-10-08-1100/external/el8_aarch64_gcc12/bin/saved_model_cli3", line 10, in <module>
    sys.exit(main())
  File "/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02858/el8_aarch64_gcc12/external/py3-tensorflow/2.17.0-19f19889e80ed84709258505f8c10b73/lib/python3.9/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1340, in main
    app.run(smcli_main)
  File "/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02858/el8_aarch64_gcc12/external/py3-absl-py/2.1.0-62ec116f971a239f4f311c7908bb3abf/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02858/el8_aarch64_gcc12/external/py3-absl-py/2.1.0-62ec116f971a239f4f311c7908bb3abf/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02858/el8_aarch64_gcc12/external/py3-tensorflow/2.17.0-19f19889e80ed84709258505f8c10b73/lib/python3.9/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1338, in smcli_main
    args.func()
  File "/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02858/el8_aarch64_gcc12/external/py3-tensorflow/2.17.0-19f19889e80ed84709258505f8c10b73/lib/python3.9/site-packages/tensorflow/python/tools/saved_model_cli.py", line 1140, in aot_compile_cpu
    saved_model_aot_compile.aot_compile_cpu_meta_graph_def(
  File "/cvmfs/cms-ib.cern.ch/sw/aarch64/nweek-02858/el8_aarch64_gcc12/external/py3-tensorflow/2.17.0-19f19889e80ed84709258505f8c10b73/lib/python3.9/site-packages/tensorflow/python/tools/saved_model_aot_compile.py", line 395, in aot_compile_cpu_meta_graph_def
    _pywrap_tfcompile.Compile(
RuntimeError: XLA compilation failed: TargetRegistry::lookupTarget failed: No available targets are compatible with triple "x86_64-pc-linux"

I propose to disable this test for non-amd64 arches.

PR validation:

Bot tests

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

cms-bot internal usage

@iarspider
Copy link
Contributor Author

please test for CMSSW_14_2_TF_X/el8_aarch64_gcc12

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

A new Pull Request was created by @iarspider for master.

It involves the following packages:

  • PhysicsTools/TensorFlowAOT (ml)

@valsdav, @y19y19 can you please review it and eventually sign? Thanks.
@makortel, @riga this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor

makortel commented Oct 9, 2024

@cms-sw/ml-l2 Does this mean the TF AOT works only on x86-64?

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bfebb6/42068/summary.html
COMMIT: de98204
CMSSW: CMSSW_14_2_TF_X_2024-10-08-1100/el8_aarch64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46315/42068/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bfebb6/42068/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bfebb6/42068/git-merge-result

@smuzaffar
Copy link
Contributor

@iarspider , this works for TF 2.12 aarch64 ... right? Do you already know why it does not work for TF 2.17 .... may be we have missed something for TF 2.17?

@smuzaffar
Copy link
Contributor

smuzaffar commented Oct 9, 2024

@iarspider , looks like one need to pass --additional-flags to cms_tfaot_compile (https://github.com/cms-sw/cmsdist/blob/IB/CMSSW_14_2_X/master/tfaot-compile.file). Can you please update https://github.com/cms-sw/cmssw/blob/master/PhysicsTools/TensorFlowAOT/test/testAOTTools.py and add --additional-flags=--target_triple=<arch>-pc-linux (where <arch> is string returned by platform.processor()

see https://github.com/cms-externals/cms-tfaot/blob/master/cms_tfaot/scripts/tfaot_compile.py#L190-L194

@iarspider iarspider changed the title Disable testAOTTools for non-amd64 arch Fix testAOTTools for non-amd64 arch Oct 9, 2024
@iarspider
Copy link
Contributor Author

please test for CMSSW_14_2_TF_X/el8_aarch64_gcc12

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

Pull request #46315 was updated. @valsdav, @y19y19 can you please check and sign again.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 9, 2024

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bfebb6/42080/summary.html
COMMIT: d4b344b
CMSSW: CMSSW_14_2_TF_X_2024-10-08-1100/el8_aarch64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46315/42080/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bfebb6/42080/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-bfebb6/42080/git-merge-result

Unit Tests

I found 2 errors in the following unit tests:

---> test test-das-selected-lumis had ERRORS
---> test test_edmPickEvents had ERRORS

@smuzaffar
Copy link
Contributor

ignore tests-rejected with external-failure

unit tests failure are not due to this PR. Looks like some DAS glitch

@smuzaffar
Copy link
Contributor

@cms-sw/ml-l2 , can you please review this? This should fix the unittest for non-x86_64 archs.

@valsdav
Copy link
Contributor

valsdav commented Oct 10, 2024

+ml

Thanks for fixing this!

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (test failures were overridden). This pull request will now be reviewed by the release team before it's merged. @mandrenguyen, @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 1444863 into cms-sw:master Oct 10, 2024
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants