TokenClassifier model #3203

alanakbik · 2023-04-19T19:21:41Z

This PR introduces the TokenClassifier class, a renamed and extended version of WordTagger. It directly inherits from DefaultClassifier and should be used for all token-level prediction tasks that do not require an LSTM-CRF decoder (for such tasks, the SequenceTagger should be used).

The main idea is to offer a model that inherits from DefaultClassifier for each label type we predict, i.e.:

TokenClassifier for predicting Token labels
TextPairClassifier for predicting TextPair labels
RelationClassifier for predicting Relation labels
SpanClassifier for predicting Span labels (this class is currently called EntityLinker and should be renamed)
TextClassifier for predicting Sentence labels (might need to be renamed to SentenceClassifier)

An advantage of such a structure is that most functionality (such as new decoders) needs to only be implemented once in DefaultClassifier and then is immediately usable for all model classes.

Edit: This class also changes the default behavior of the make_label_dictionary method. The UNK token is no longer automatically added to a dictionary. We now skip unknown labels to handle loss computation in such cases.

helpmefindaname · 2023-04-19T19:33:17Z

flair/models/word_tagger_model.py

 from flair.embeddings import TokenEmbeddings

 log = logging.getLogger("flair")


-class WordTagger(flair.nn.DefaultClassifier[Sentence, Token]):
+class TokenClassifier(flair.nn.DefaultClassifier[Sentence, Token]):
    """This is a simple class of models that tags individual words in text."""


I think we shouldn't just remove the WordTagger as this breaks models using the WordTagger (Especially if they are part of a MultiTaskModel).

I would suggest, that we use a DeprecationHelper like specified here, but specify the version when we delete (let's say 0.14.0; incrementing by 2.)

That way we could give users a chance to upgrade between multiple versions.

We can also discuss this further in private

Thanks, good idea!

alanakbik added 5 commits April 14, 2023 22:33

Refactor WordTagger to TokenClassifier class

37ecf66

Adjust loss computation for unknown labels

95ca322

Re-use span logic from data.py

b60053b

Fix end of sentence prediction

6d54ed3

Merge branch 'master' into token_classifier

76b6b1a

helpmefindaname reviewed Apr 19, 2023

View reviewed changes

alanakbik and others added 7 commits April 19, 2023 21:54

Fix unit tests and flake check

a9e66ac

Fix unit tests

aa0c92b

flake8 removal f-string

11778e0

fix test, add unk label since default is False

6a19541

Fix formatting

c554417

Fix mypy

ad457a8

Add deprecation warning

af2d3d0

alanakbik merged commit e050238 into master Apr 20, 2023

alanakbik deleted the token_classifier branch April 20, 2023 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TokenClassifier model #3203

TokenClassifier model #3203

alanakbik commented Apr 19, 2023 •

edited

Loading

helpmefindaname Apr 19, 2023

alanakbik Apr 19, 2023

TokenClassifier model #3203

TokenClassifier model #3203

Conversation

alanakbik commented Apr 19, 2023 • edited Loading

helpmefindaname Apr 19, 2023

Choose a reason for hiding this comment

alanakbik Apr 19, 2023

Choose a reason for hiding this comment

alanakbik commented Apr 19, 2023 •

edited

Loading