Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Heterogeneous Memory Pool #37952

Open
wants to merge 39 commits into
base: master
Choose a base branch
from
Open

Conversation

VinInn
Copy link
Contributor

@VinInn VinInn commented May 15, 2022

This PR replaces the old "notcub" cache allocator with a memory pool featuring

lockfree operations
backend agnostic implementation
The data interface is based on a simple Buffer that is completely backend agnostic
The allocation interface (makeBuffer) currently depends on cudaStream_t that can be easily hidden behind void * or a light opaque struct
A new feature is a "Bundle deleter": buffers can be bundle together and then freed in just one operation: this reduces the number of cuda calls.
All previous users of the cache allocator (at least for Pixel wf) have been migrated.

Tests passes: it is not slower than previous implementation. Need a free machine to make definitive tests.

Some cleanup is still required to remove debug statements.

Purely technical no regression expected.

Draft Slides for a possible presentation available @ https://cernbox.cern.ch/index.php/s/Ax4NHYGLHbG8N1C

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37952/30020

@cmsbuild
Copy link
Contributor

cmsbuild commented May 15, 2022

A new Pull Request was created by @VinInn (Vincenzo Innocente) for master.

It involves the following packages:

  • CUDADataFormats/BeamSpot (heterogeneous, reconstruction)
  • CUDADataFormats/Common (heterogeneous)
  • CUDADataFormats/SiPixelDigi (heterogeneous, reconstruction)
  • CUDADataFormats/Track (heterogeneous, reconstruction)
  • CUDADataFormats/TrackingRecHit (heterogeneous, reconstruction)
  • CUDADataFormats/Vertex (heterogeneous, reconstruction)
  • EventFilter/SiPixelRawToDigi (reconstruction)
  • HeterogeneousCore/CUDACore (heterogeneous)
  • HeterogeneousCore/CUDAServices (heterogeneous)
  • HeterogeneousCore/CUDAUtilities (heterogeneous)
  • RecoLocalTracker/SiPixelRecHits (reconstruction)
  • RecoPixelVertexing/PixelTrackFitting (reconstruction)
  • RecoPixelVertexing/PixelTriplets (reconstruction)
  • RecoPixelVertexing/PixelVertexFinding (reconstruction)
  • RecoVertex/BeamSpotProducer (reconstruction, alca)

@malbouis, @yuanchao, @makortel, @slava77, @clacaputo, @cmsbuild, @fwyzard, @jpata, @tvami, @francescobrivio can you please review it and eventually sign? Thanks.
@tvami, @makortel, @felicepantaleo, @GiacomoSguazzoni, @JanFSchulte, @rovere, @VinInn, @Martin-Grunewald, @missirol, @OzAmram, @tocheng, @ferencek, @mtosi, @gpetruc, @mmusich, @dkotlins, @threus, @dgulhan, @francescobrivio this is something you requested to watch as well.
@perrotta, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@VinInn
Copy link
Contributor Author

VinInn commented May 15, 2022

@cmsbuild , please test

@VinInn
Copy link
Contributor Author

VinInn commented May 15, 2022

enable gpu

@cmsbuild
Copy link
Contributor

-1

Failed Tests: UnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-651b42/24728/summary.html
COMMIT: b8d0837
CMSSW: CMSSW_12_4_X_2022-05-15-0000/slc7_amd64_gcc10
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/37952/24728/install.sh to create a dev area with all the needed externals and cmssw changes.

Unit Tests

I found errors in the following unit tests:

---> test cpuVertexFinderByDensity_t had ERRORS
---> test cpuVertexFinderIterative_t had ERRORS

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 4
  • DQMHistoTests: Total histograms compared: 19874
  • DQMHistoTests: Total failures: 1171
  • DQMHistoTests: Total nulls: 1
  • DQMHistoTests: Total successes: 18702
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 3 files compared)
  • Checked 12 log files, 9 edm output root files, 4 DQM output files
  • TriggerResults: found differences in 3 / 3 workflows

Comparison Summary

@slava77 comparisons for the following workflows were not done due to missing matrix map:

  • /data/cmsbld/jenkins/workspace/compare-root-files-short-matrix/data/PR-651b42/11634.301_TTbar_14TeV+2021_Run3FS+TTbar_14TeV_TuneCP5_GenSim+HARVESTNano

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3741432
  • DQMHistoTests: Total failures: 92
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3741318
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 208 log files, 45 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@VinInn
Copy link
Contributor Author

VinInn commented May 16, 2022

@cmsbuild , please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-37952/30028

@mandrenguyen
Copy link
Contributor

-reconstruction
Cleaning reco queue, feel free to keep it open of course.

@cmsbuild
Copy link
Contributor

cmsbuild commented Feb 6, 2024

Milestone for this pull request has been moved to CMSSW_14_1_X. Please open a backport if it should also go in to CMSSW_14_0_X.

@cmsbuild cmsbuild modified the milestones: CMSSW_14_0_X, CMSSW_14_1_X Feb 6, 2024
@smuzaffar
Copy link
Contributor

ping

@cmsbuild
Copy link
Contributor

Milestone for this pull request has been moved to CMSSW_14_2_X. Please open a backport if it should also go in to CMSSW_14_1_X.

@antoniovilela
Copy link
Contributor

ping (to make bot change milestone)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants