Convenience method for learning rate factor #2888

alanakbik · 2022-08-05T19:51:57Z

This PR adds a parameter to set a factor on the learning rate of the decoder, if fine-tuning a model.

Usage:

trainer.fine_tune(f"path/to/output/folder",
                  mini_batch_size=4,
                  learning_rate=5e-5,
                  decoder_lr_factor=10,
                  )

helpmefindaname · 2022-08-06T16:26:33Z

Hi @alanakbik,
if I understand this PR correctly, this is to set a different LR for pretrained weights and randomly initialized weights on all DefaultClassifier's?

I suppose this could be extended even further to for example SequenceTagger, by filtering by embedding e.g:

embedding_parameters = [param for name, param in self.model.named_parameters() if "embedding" in name]
model_parameters = [param for name, param in self.model.named_parameters() if "embedding" not in name]

Such that we could train a Transformer-Bert model with a higher LT for the CRF part

alanakbik · 2022-08-06T18:49:58Z

Yes, it is for training the non-pretrained (i.e. randomly initialized) parts with a higher LR. Since the decoder is always randomly initialized, it is handled here. Extending this to the LSTM-CRF of the SequenceTagger would be great, but some embeddings (like CharacterEmbeddings) are randomly initialized, while others are not. So I think it's not easy to come up with a good heuristic to identify those parts.

Edit: I'll merge this now for experimentation, but any ideas to improve are welcome!

GH-2888: Experiment with alternative heuristic

alanakbik added 2 commits August 5, 2022 21:49

Add convenience method for setting a higher decoder learning rate

8612c7f

Black formatting

31db2eb

Merge branch 'master' into learning_rate_factor

4522185

alanakbik merged commit a927b30 into master Aug 6, 2022

alanakbik added a commit that referenced this pull request Aug 6, 2022

GH-2888: Experiment with alternative heuristic

c883f9a

alanakbik mentioned this pull request Aug 6, 2022

GH-2888: Experiment with alternative heuristic #2893

Merged

alanakbik deleted the learning_rate_factor branch August 6, 2022 20:22

alanakbik added a commit that referenced this pull request Aug 10, 2022

Merge pull request #2893 from flairNLP/decoder_lr_mod

8e32cea

GH-2888: Experiment with alternative heuristic

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convenience method for learning rate factor #2888

Convenience method for learning rate factor #2888

alanakbik commented Aug 5, 2022

helpmefindaname commented Aug 6, 2022

alanakbik commented Aug 6, 2022 •

edited

Loading

Convenience method for learning rate factor #2888

Convenience method for learning rate factor #2888

Conversation

alanakbik commented Aug 5, 2022

helpmefindaname commented Aug 6, 2022

alanakbik commented Aug 6, 2022 • edited Loading

alanakbik commented Aug 6, 2022 •

edited

Loading