Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TokenClassifier model #3203

Merged
merged 12 commits into from
Apr 20, 2023
Merged

TokenClassifier model #3203

merged 12 commits into from
Apr 20, 2023

Conversation

alanakbik
Copy link
Collaborator

@alanakbik alanakbik commented Apr 19, 2023

This PR introduces the TokenClassifier class, a renamed and extended version of WordTagger. It directly inherits from DefaultClassifier and should be used for all token-level prediction tasks that do not require an LSTM-CRF decoder (for such tasks, the SequenceTagger should be used).

The main idea is to offer a model that inherits from DefaultClassifier for each label type we predict, i.e.:

  • TokenClassifier for predicting Token labels
  • TextPairClassifier for predicting TextPair labels
  • RelationClassifier for predicting Relation labels
  • SpanClassifier for predicting Span labels (this class is currently called EntityLinker and should be renamed)
  • TextClassifier for predicting Sentence labels (might need to be renamed to SentenceClassifier)

An advantage of such a structure is that most functionality (such as new decoders) needs to only be implemented once in DefaultClassifier and then is immediately usable for all model classes.

Edit: This class also changes the default behavior of the make_label_dictionary method. The UNK token is no longer automatically added to a dictionary. We now skip unknown labels to handle loss computation in such cases.

from flair.embeddings import TokenEmbeddings

log = logging.getLogger("flair")


class WordTagger(flair.nn.DefaultClassifier[Sentence, Token]):
class TokenClassifier(flair.nn.DefaultClassifier[Sentence, Token]):
"""This is a simple class of models that tags individual words in text."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't just remove the WordTagger as this breaks models using the WordTagger (Especially if they are part of a MultiTaskModel).

I would suggest, that we use a DeprecationHelper like specified here, but specify the version when we delete (let's say 0.14.0; incrementing by 2.)

That way we could give users a chance to upgrade between multiple versions.

We can also discuss this further in private

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good idea!

@alanakbik alanakbik merged commit e050238 into master Apr 20, 2023
@alanakbik alanakbik deleted the token_classifier branch April 20, 2023 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants