Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience method for learning rate factor #2888

Merged
merged 3 commits into from
Aug 6, 2022
Merged

Conversation

alanakbik
Copy link
Collaborator

This PR adds a parameter to set a factor on the learning rate of the decoder, if fine-tuning a model.

Usage:

trainer.fine_tune(f"path/to/output/folder",
                  mini_batch_size=4,
                  learning_rate=5e-5,
                  decoder_lr_factor=10,
                  )

@helpmefindaname
Copy link
Collaborator

Hi @alanakbik,
if I understand this PR correctly, this is to set a different LR for pretrained weights and randomly initialized weights on all DefaultClassifier's?

I suppose this could be extended even further to for example SequenceTagger, by filtering by embedding e.g:

embedding_parameters = [param for name, param in self.model.named_parameters() if "embedding" in name]
model_parameters = [param for name, param in self.model.named_parameters() if "embedding" not in name]

Such that we could train a Transformer-Bert model with a higher LT for the CRF part

@alanakbik
Copy link
Collaborator Author

alanakbik commented Aug 6, 2022

Yes, it is for training the non-pretrained (i.e. randomly initialized) parts with a higher LR. Since the decoder is always randomly initialized, it is handled here. Extending this to the LSTM-CRF of the SequenceTagger would be great, but some embeddings (like CharacterEmbeddings) are randomly initialized, while others are not. So I think it's not easy to come up with a good heuristic to identify those parts.

Edit: I'll merge this now for experimentation, but any ideas to improve are welcome!

@alanakbik alanakbik merged commit a927b30 into master Aug 6, 2022
alanakbik added a commit that referenced this pull request Aug 6, 2022
@alanakbik alanakbik deleted the learning_rate_factor branch August 6, 2022 20:22
alanakbik added a commit that referenced this pull request Aug 10, 2022
GH-2888: Experiment with alternative heuristic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants