Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TextClassifier label predictions always have score 1.0 #605

Closed
algomaks opened this issue Mar 9, 2019 · 11 comments · Fixed by #664
Closed

TextClassifier label predictions always have score 1.0 #605

algomaks opened this issue Mar 9, 2019 · 11 comments · Fixed by #664
Labels
bug Something isn't working

Comments

@algomaks
Copy link
Contributor

algomaks commented Mar 9, 2019

Hi guys,

I have trained my own TextClassifier following this tutorial: https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_7_TRAINING_A_MODEL.md

I am using the option multi_label=False, as each sentence should be assigned only one label. In terms of embeddings, I use FlairEmbeddings mix-forward and mix-backward.

The issue is that every time I predict the label of a new unseen sentence, I get a label score = 1.0. It seems that it never has any different value between 0.0 and 1.0. It is always 1.0.

Is this the expected behavior? What am I doing wrong?

Thanks!

@algomaks algomaks added the question Further information is requested label Mar 9, 2019
@alanakbik
Copy link
Collaborator

Hello @AlgoMax that should not happen. Which dataset are you using? Does the model train OK, i.e. does it reach good F-scores?

@algomaks
Copy link
Contributor Author

Hi @alanakbik,

I have a relatively small dataset, around 3000 examples in total. The examples fall in around 30 different classes, so about 100 examples per class. The examples are short sentences - things that you would typically say in a chat conversation (e.g "how are you today" or "cancel this task please"). Something specific about the dataset is that many of the examples within one category are quite similar - usually a few words difference.

When I train the classification model, I get pretty good micro and macro F1 scores - above 90%. However, I face two main problems when I apply the model to new unseen examples:

  1. Everything is classified with score 1.0. Always.
  2. When I try to predict an example which is clearly not in any of the 30 classes, the model predicts one of the 30 classes with confidence 1.0.

It seems that the model does not learn a good representation of the data. Could it be due to overfitting? How could I train the model so that examples which do not fall in any class are not assigned a label?

Thanks!

@alanakbik
Copy link
Collaborator

Hi @AlgoMax score 1.0 definitely sound like a bug so this is something we need to check out.

Is your training data always labeled, i.e. are there any examples that do not belong to any of the 30 classes? If you do not have such examples, the model will believe that each item must belong to one of the 30 classes - simply because in the training data it has never seen otherwise. So this explains at least why a class is always predicted, but not why confidence is always 1.0.

I'll add a bug label to this issue - we'll take a closer look.

@alanakbik alanakbik added bug Something isn't working and removed question Further information is requested labels Mar 18, 2019
@ashahzada
Copy link

Hi @alanakbik,

Do we have an update on this? I am facing a similar issue.

Thanks!

@algomaks
Copy link
Contributor Author

algomaks commented Apr 7, 2019

Hi @ashahzada,

As far as I know, there is no update so far on this issue. It seems to be a bug or we are somehow training the model incorrectly. Until this issue is resolved, I decided to use a different text classification tool. I hope this helps. Regards!

@ashahzada
Copy link

Thank you @AlgoMax !

@alanakbik
Copy link
Collaborator

Hello @ashahzada @AlgoMax - we haven't looked into the error yet, but I'll be back in office next week (currently travelling) and will take a look.

@algomaks
Copy link
Contributor Author

I found certain clues in the code that might be causing the issue:

  1. The first clue is that this line in the predict method of the TextClassifier class:
scores = self.forward(batch)

in my case generates a tensor like this:

tensor([[ 2.2129,  0.2944,  5.3489, -0.7817,  1.1176, -0.6716, -0.5714, -2.0608,
          3.6055,  3.1275,  1.4636,  4.2829,  1.8713, -2.6787, -0.4784,  0.6990,
          0.5859, -1.5066, -3.6451, -1.7429, -4.3391,  2.3591, -1.4598, -0.2570,
         -1.7566,  0.0564, -5.3699, -5.1569,  2.4890, -2.1881,  4.4994,  0.4503,
          3.9420, -4.4185,  1.2770,  3.5883,  5.4932, -0.7858, -4.2145,  2.3449,
         -2.8780]])

Please notice that the scores in the tensor range between -6.0 to +6.0.

  1. The second clue is that the score setter of the Label class check whether the score is within the range [0, 1] and if not, sets it to 1.0. This is exactly why we always get predictions with score 1.0. For example, when the max score of 5.3489 is selected, it will be reduced to 1.0 by the setter.
@score.setter
    def score(self, score):
        if 0.0 <= score <= 1.0:
            self._score = score
        else:
            self._score = 1.0

The next step probably is to figure out why certain scores are estimated outside the [0, 1] range. Any thoughts?

@alanakbik
Copy link
Collaborator

Thanks! Looks like there's a softmax missing - I'll put in a PR!

@alanakbik
Copy link
Collaborator

Added the fix to master - should work now but let me know if it doesn't!

@algomaks
Copy link
Contributor Author

algomaks commented Apr 21, 2019

I did some tests and it seems to work very well. Thanks once again! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants