Fix issue with CodeCarbon lock #265

regisss · 2024-09-18T10:00:29Z

From CodeCarbon v2.7, a lock file is introduced (see here) to check whether another instance of codecarbon is running. If yes, an error is raised. One has to call my_energy_tracker.stop() to release the lock file.

This PR adds a stop method in the EnergyTracker class that should be called once the energy tracker is not needed anymore.
It also updates an import from huggingface_hub since v0.25 introduced a change (the new import is backward-compatible).

Fixes #260.

regisss · 2024-09-18T12:16:23Z

@IlyasMoutawwakil We need to merge this fix in Optimum to make the CLI CUDA Torch-ORT Multi- and Single-GPU tests pass: huggingface/optimum#2028

regisss · 2024-09-18T12:19:49Z

For the CLI ROCm Pytorch Multi- and Single-GPU tests , the error is:

ERROR! Intel® Extension for PyTorch* needs to work with PyTorch 2.4.*, but PyTorch 2.2.0.dev20231010+rocm5.7 is found. Please switch to the matching version and run again.

Not sure why the Intel extension for PyTorch is installed for this test as I don't think we need it. Should we explicitly uninstall it in the workflow @IlyasMoutawwakil ?
Not sure if this is the actual reason for the failed tests though 🤔

IlyasMoutawwakil · 2024-09-19T06:38:58Z

the failing tests are from the ongoing changes in #263

IlyasMoutawwakil · 2024-09-19T06:56:24Z

Acquiring the lock and releasing it between processes will probably not work (or cause issues) in multi-gpu/process setting (torchrun). I can see one process waiting for the other to release the lock to start measuring and causing async issues. We actually don't test that (codecarbon+multi-gpu). I think a better thing to do here is to set allow_multiple_runs to True.
https://github.com/mlco2/codecarbon/blob/v2.7.0/codecarbon/emissions_tracker.py#L239-L255

regisss · 2024-09-19T08:21:59Z

Acquiring the lock and releasing it between processes will probably not work (or cause issues) in multi-gpu/process setting (torchrun). I can see one process waiting for the other to release the lock to start measuring and causing async issues. We actually don't test that (codecarbon+multi-gpu). I think a better thing to do here is to set allow_multiple_runs to True. https://github.com/mlco2/codecarbon/blob/v2.7.0/codecarbon/emissions_tracker.py#L239-L255

True for multi-process this is needed. I wouldn't set allow_multiple_runs to True for single-process benchmark though, we don't really know how several trackers can interfere with each other.

regisss · 2024-09-19T08:34:19Z

the failing tests are from the ongoing changes in #263

Let's wait for #263 to be merged then

IlyasMoutawwakil · 2024-09-19T17:55:05Z

we don't really know how several trackers can interfere with each other

Tbh it would be much better if we have one tracker in distributed settings

regisss added 3 commits September 18, 2024 11:59

Add a stop method to EnergyTracker

723d10d

Fix import for huggingface_hub v0.25

199b321

No need to pin huggingface_hub

516c794

baptistecolle mentioned this pull request Sep 20, 2024

fix broken cuda and rocm images #263

Merged

IlyasMoutawwakil closed this Sep 20, 2024

IlyasMoutawwakil reopened this Sep 20, 2024

IlyasMoutawwakil marked this pull request as ready for review September 20, 2024 09:39

IlyasMoutawwakil merged commit 1992de3 into main Sep 20, 2024
49 of 56 checks passed

regisss deleted the fix_codecarbon_lock_issue branch September 20, 2024 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue with CodeCarbon lock #265

Fix issue with CodeCarbon lock #265

regisss commented Sep 18, 2024 •

edited

Loading

regisss commented Sep 18, 2024

regisss commented Sep 18, 2024 •

edited

Loading

IlyasMoutawwakil commented Sep 19, 2024

IlyasMoutawwakil commented Sep 19, 2024 •

edited

Loading

regisss commented Sep 19, 2024

regisss commented Sep 19, 2024

IlyasMoutawwakil commented Sep 19, 2024 •

edited

Loading

Fix issue with CodeCarbon lock #265

Fix issue with CodeCarbon lock #265

Conversation

regisss commented Sep 18, 2024 • edited Loading

regisss commented Sep 18, 2024

regisss commented Sep 18, 2024 • edited Loading

IlyasMoutawwakil commented Sep 19, 2024

IlyasMoutawwakil commented Sep 19, 2024 • edited Loading

regisss commented Sep 19, 2024

regisss commented Sep 19, 2024

IlyasMoutawwakil commented Sep 19, 2024 • edited Loading

regisss commented Sep 18, 2024 •

edited

Loading

regisss commented Sep 18, 2024 •

edited

Loading

IlyasMoutawwakil commented Sep 19, 2024 •

edited

Loading

IlyasMoutawwakil commented Sep 19, 2024 •

edited

Loading