TextClassifier label predictions always have score 1.0 #605

algomaks · 2019-03-09T13:56:27Z

Hi guys,

I have trained my own TextClassifier following this tutorial: https://github.com/zalandoresearch/flair/blob/master/resources/docs/TUTORIAL_7_TRAINING_A_MODEL.md

I am using the option multi_label=False, as each sentence should be assigned only one label. In terms of embeddings, I use FlairEmbeddings mix-forward and mix-backward.

The issue is that every time I predict the label of a new unseen sentence, I get a label score = 1.0. It seems that it never has any different value between 0.0 and 1.0. It is always 1.0.

Is this the expected behavior? What am I doing wrong?

Thanks!

alanakbik · 2019-03-12T21:55:45Z

Hello @AlgoMax that should not happen. Which dataset are you using? Does the model train OK, i.e. does it reach good F-scores?

algomaks · 2019-03-18T00:33:23Z

Hi @alanakbik,

I have a relatively small dataset, around 3000 examples in total. The examples fall in around 30 different classes, so about 100 examples per class. The examples are short sentences - things that you would typically say in a chat conversation (e.g "how are you today" or "cancel this task please"). Something specific about the dataset is that many of the examples within one category are quite similar - usually a few words difference.

When I train the classification model, I get pretty good micro and macro F1 scores - above 90%. However, I face two main problems when I apply the model to new unseen examples:

Everything is classified with score 1.0. Always.
When I try to predict an example which is clearly not in any of the 30 classes, the model predicts one of the 30 classes with confidence 1.0.

It seems that the model does not learn a good representation of the data. Could it be due to overfitting? How could I train the model so that examples which do not fall in any class are not assigned a label?

Thanks!

alanakbik · 2019-03-18T14:37:52Z

Hi @AlgoMax score 1.0 definitely sound like a bug so this is something we need to check out.

Is your training data always labeled, i.e. are there any examples that do not belong to any of the 30 classes? If you do not have such examples, the model will believe that each item must belong to one of the 30 classes - simply because in the training data it has never seen otherwise. So this explains at least why a class is always predicted, but not why confidence is always 1.0.

I'll add a bug label to this issue - we'll take a closer look.

ashahzada · 2019-04-05T15:13:33Z

Hi @alanakbik,

Do we have an update on this? I am facing a similar issue.

Thanks!

algomaks · 2019-04-07T17:18:26Z

Hi @ashahzada,

As far as I know, there is no update so far on this issue. It seems to be a bug or we are somehow training the model incorrectly. Until this issue is resolved, I decided to use a different text classification tool. I hope this helps. Regards!

ashahzada · 2019-04-08T10:10:38Z

Thank you @AlgoMax !

alanakbik · 2019-04-09T04:18:14Z

Hello @ashahzada @AlgoMax - we haven't looked into the error yet, but I'll be back in office next week (currently travelling) and will take a look.

algomaks · 2019-04-14T04:35:52Z

I found certain clues in the code that might be causing the issue:

The first clue is that this line in the predict method of the TextClassifier class:

scores = self.forward(batch)

in my case generates a tensor like this:

tensor([[ 2.2129,  0.2944,  5.3489, -0.7817,  1.1176, -0.6716, -0.5714, -2.0608,
          3.6055,  3.1275,  1.4636,  4.2829,  1.8713, -2.6787, -0.4784,  0.6990,
          0.5859, -1.5066, -3.6451, -1.7429, -4.3391,  2.3591, -1.4598, -0.2570,
         -1.7566,  0.0564, -5.3699, -5.1569,  2.4890, -2.1881,  4.4994,  0.4503,
          3.9420, -4.4185,  1.2770,  3.5883,  5.4932, -0.7858, -4.2145,  2.3449,
         -2.8780]])

Please notice that the scores in the tensor range between -6.0 to +6.0.

The second clue is that the score setter of the Label class check whether the score is within the range [0, 1] and if not, sets it to 1.0. This is exactly why we always get predictions with score 1.0. For example, when the max score of 5.3489 is selected, it will be reduced to 1.0 by the setter.

@score.setter
    def score(self, score):
        if 0.0 <= score <= 1.0:
            self._score = score
        else:
            self._score = 1.0

The next step probably is to figure out why certain scores are estimated outside the [0, 1] range. Any thoughts?

alanakbik · 2019-04-15T15:15:21Z

Thanks! Looks like there's a softmax missing - I'll put in a PR!

…nfidence GH-605: add softmax to single label classification for confidence score

alanakbik · 2019-04-16T09:14:48Z

Added the fix to master - should work now but let me know if it doesn't!

algomaks · 2019-04-21T00:28:10Z

I did some tests and it seems to work very well. Thanks once again! :)

algomaks added the question Further information is requested label Mar 9, 2019

alanakbik added bug Something isn't working and removed question Further information is requested labels Mar 18, 2019

alanakbik pushed a commit that referenced this issue Apr 16, 2019

GH-605: add softmax to single label classification for confidence score

0b0312c

alanakbik mentioned this issue Apr 16, 2019

GH-605: add softmax to single label classification for confidence score #664

Merged

alanakbik closed this as completed in #664 Apr 16, 2019

alanakbik pushed a commit that referenced this issue Apr 16, 2019

Merge pull request #664 from zalandoresearch/GH-605-classification-co…

f1c3d3c

…nfidence GH-605: add softmax to single label classification for confidence score

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TextClassifier label predictions always have score 1.0 #605

TextClassifier label predictions always have score 1.0 #605

algomaks commented Mar 9, 2019

alanakbik commented Mar 12, 2019

algomaks commented Mar 18, 2019

alanakbik commented Mar 18, 2019

ashahzada commented Apr 5, 2019

algomaks commented Apr 7, 2019 •

edited

Loading

ashahzada commented Apr 8, 2019

alanakbik commented Apr 9, 2019

algomaks commented Apr 14, 2019

alanakbik commented Apr 15, 2019

alanakbik commented Apr 16, 2019

algomaks commented Apr 21, 2019 •

edited

Loading

TextClassifier label predictions always have score 1.0 #605

TextClassifier label predictions always have score 1.0 #605

Comments

algomaks commented Mar 9, 2019

alanakbik commented Mar 12, 2019

algomaks commented Mar 18, 2019

alanakbik commented Mar 18, 2019

ashahzada commented Apr 5, 2019

algomaks commented Apr 7, 2019 • edited Loading

ashahzada commented Apr 8, 2019

alanakbik commented Apr 9, 2019

algomaks commented Apr 14, 2019

alanakbik commented Apr 15, 2019

alanakbik commented Apr 16, 2019

algomaks commented Apr 21, 2019 • edited Loading

algomaks commented Apr 7, 2019 •

edited

Loading

algomaks commented Apr 21, 2019 •

edited

Loading