You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If two instances of a CometLogger are created (for example, in order to obtain artifacts from an existing Comet experiment before starting a new one), the first will throw an exception on any Comet API action attempted after the second one is created.
What version are you seeing the problem on?
v2.1, v2.2
How to reproduce the bug
"""First make sure you have credentials on the comet instance you're going to use, most likely by setting the COMET_API_KEY env variable. """fromlightning.pytorch.loggers.cometimportCometLoggerfromcomet_mlimportAPI, Artifact, ExistingExperimentfromosimportenvirondefget_existing_experiment(
workspace,
project_name,
experiment
):
api_experiment=API().get_experiment(
workspace=workspace, #"etayl",project_name=project_name, #"angie-tutorial",experiment=experiment#"spatial_sturgeon_1690",
)
exp_obj=ExistingExperiment(experiment_key=api_experiment.key)
returnexp_obj, api_experiment.keydefget_new_experiment(
workspace,
project_name,
experiment_name
):
api_experiment= \
API()._create_experiment(
workspace=workspace,
project_name=project_name,
experiment_name=experiment_name,
)
exp_obj=ExistingExperiment(experiment_key=api_experiment.key)
returnexp_obj, api_experiment.keydefadd_artifact_to_experiment(existing_experiment: ExistingExperiment, artifact_name: str):
new_artifact=Artifact(artifact_name) # "artifact-file.txt"withopen("artifact-file.txt", "w") ashandler:
handler.write("file content")
new_artifact.add("artifact-file.txt")
existing_experiment.log_artifact(new_artifact)
definit_comet_logger(
workspace, #"etayl",project_name, #"angie-tutorial",experiment_name, #"spatial_sturgeon_1690",experiment_key
):
comet_logger=CometLogger(
workspace=workspace, #"etayl",project_name=project_name, #"angie-tutorial",experiment_name=experiment_name, #"spatial_sturgeon_1690",experiment_key=experiment_key, #experiment_key,offline=False,
save_dir="/homes/etayl/code/menta3/temp_save_dir",
auto_output_logging="native",
auto_metric_logging=False,
log_env_details=True,
log_env_gpu=True,
log_env_cpu=True,
log_env_host=True
)
returncomet_loggerdeffail_when_two_loggers_live_at_the_same_time():
workspace="etayl"project_name="angie-tutorial"experiment_name="spatial_sturgeon_1690"artifact_name="etayrtifact"experiment_obj, experiment_key=get_existing_experiment(
workspace,
project_name,
experiment_name
)
first_comet_logger=init_comet_logger(workspace, project_name, experiment_name, experiment_key)
add_artifact_to_experiment(experiment_obj, artifact_name)
new_experiment, new_experiment_key=get_new_experiment(
workspace,
project_name,
"new_"+experiment_name
)
first_comet_logger.experiment.get_artifact(artifact_name)
print("\n ### that succeeds ### \n")
second_comet_logger=init_comet_logger(workspace, project_name, new_experiment.name, new_experiment_key)
first_comet_logger.experiment.get_artifact(artifact_name)
print("\n ### this too ### \n")
add_artifact_to_experiment(second_comet_logger.experiment, artifact_name)
try:
first_comet_logger.experiment.get_artifact(artifact_name)
except:
print("\n ### but this fails! ### \n")
fail_when_two_loggers_live_at_the_same_time()
Error messages and logs
Traceback (most recent call last):
File "/code_path/get_artifact_from_dead_experiment.py", line 146, in <module>
fail_when_two_loggers_live_at_the_same_time()
File "/code_path/get_artifact_from_dead_experiment.py", line 134, in fail_when_two_loggers_live_at_the_same_time
first_comet_logger.experiment.get_artifact(artifact_name)
File "/made_up_env_path/lib/python3.11/site-packages/comet_ml/_online.py", line 1097, in get_artifact
raise ExperimentNotAlive(
comet_ml.exceptions.ExperimentNotAlive: Experiment <comet_ml._online.ExistingExperiment object at 0x7fffd23da910> is not alive, cannot get artifact
Environment
Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): Logger (specifically, CometLogger)
#- PyTorch Lightning Version (e.g., 1.5.0): 2.2 (and 2.1)
#- Lightning App Version (e.g., 0.5.2): 2.2 (and 2.1)
#- PyTorch Version (e.g., 2.0): 2.2.1
#- Python version (e.g., 3.9): 3.11
#- OS (e.g., Linux): Linux
#- How you installed Lightning(`conda`, `pip`, source): Conda (Mamba)
#- Running environment of LightningApp (e.g. local, cloud): local
More info
Theory for source of the bug
the Pytorch Lightning CometLogger class has an ._experiment attribute that holds either an Experiment object or an ExistingExperiment object. To reset the active experiment the logger obj is talking to, you can set ._experiment to None (or call the .finalize() utility which does that too).
the CometLogger has a "public" @Attribute method .experiment, which checks first if ._experiment is None, and if so -creates a new Experimet/ExistingExperiment obj behind the scenes.
However, each Experimet/ExistingExperiment also has a boolean "alive" flag, and will only work if it is set to True. The flag can manually be set to false by calling the .end() method, but it also appears that Comet only supports having one alive experiment at a time, so when one Experiment is being interacted with - for example, by logging an artifact to it, the .alive flag of other experiments is set to False
Suggested fix
The source of the problem seems to be that the Lightning logger attempts to maintain a live experiment by toggling an experiment to None once it is done with it - and this is a different mechanism than the one built-in to Comet, which works with a .alive attribute on each instance of an Api object.
Resolving the current and potential future issues can be achieved by using the built-in mechanism instead of the current seperate one. As long as there are two different mechanisms, there is a potential for them becoming out of sync with each other.
The text was updated successfully, but these errors were encountered:
Bug description
If two instances of a CometLogger are created (for example, in order to obtain artifacts from an existing Comet experiment before starting a new one), the first will throw an exception on any Comet API action attempted after the second one is created.
What version are you seeing the problem on?
v2.1, v2.2
How to reproduce the bug
Error messages and logs
Environment
Current environment
More info
Theory for source of the bug
the Pytorch Lightning CometLogger class has an ._experiment attribute that holds either an Experiment object or an ExistingExperiment object. To reset the active experiment the logger obj is talking to, you can set ._experiment to None (or call the .finalize() utility which does that too).
the CometLogger has a "public" @Attribute method .experiment, which checks first if ._experiment is None, and if so -creates a new Experimet/ExistingExperiment obj behind the scenes.
However, each Experimet/ExistingExperiment also has a boolean "alive" flag, and will only work if it is set to True. The flag can manually be set to false by calling the .end() method, but it also appears that Comet only supports having one alive experiment at a time, so when one Experiment is being interacted with - for example, by logging an artifact to it, the .alive flag of other experiments is set to False
Suggested fix
The source of the problem seems to be that the Lightning logger attempts to maintain a live experiment by toggling an experiment to None once it is done with it - and this is a different mechanism than the one built-in to Comet, which works with a .alive attribute on each instance of an Api object.
Resolving the current and potential future issues can be achieved by using the built-in mechanism instead of the current seperate one. As long as there are two different mechanisms, there is a potential for them becoming out of sync with each other.
The text was updated successfully, but these errors were encountered: