Saving learning rate schedules with Fabric #18493
Labels
bug
Something isn't working
checkpointing
Related to checkpointing
fabric
lightning.fabric.Fabric
lr scheduler
ver: 2.0.x
Milestone
Bug description
It is unclear to me how learning rate schedules should be used alongside
fabric.save
. I've defaulted to manually saving withfabric.save(..., {..., 'schedule': schedule.state_dict()}
and manually loading withckpt=fabric.load(...); sched.load(ckpt['schedule'])
. As stated in #18482, the recommended way of loading is using a state object. This assumes you passsched
to thefabric.save
, notsched.state_dict()
. This leads to some weird behavior, see code below.It is also unclear to me whether a learning rate schedule object should be given the bare optimizer, or the wrapped Fabric optimizer object.
What version are you seeing the problem on?
v2.0
How to reproduce the bug
Error messages and logs
If run with
INSTANTIATE_SCHEDULE_ON_WRAPPER = False
, the methodsno_schedule()
andexp_decay()
succeed, butcyclic
fails with:If run with
INSTANTIATE_SCHEDULE_ON_WRAPPER = True
, bothexp_decay
andcyclic
fail withIf we change
state={..., "lr_schedule": lr_schedule}
tostate={..., "lr_schedule": lr_schedule.state_dict()}
, saving succeeds for both options ofINSTANTIATE_SCHEDULE_ON_WRAPPER
. Usingstate_dict()
directly means that calls toschedule.step()
will not be registered in a central state object.Environment
Current environment
More info
No response
cc @awaelchli @carmocca @justusschock
The text was updated successfully, but these errors were encountered: