en-sentiment model contains labels that don't match IMDB dataset #1165

pommedeterresautee · 2019-09-29T07:01:28Z

Describe the bug

According to TUTORIAL 2, classification model en-sentiment has been trained on IMDB dataset.

evaluate() function on en-sentiment model trained on IMDB produces 0 score when tested on test set of IMDB.

Reasons seem to be a change in label names.

Model: ['POSITIVE'], ['NEGATIVE']
dataset: ['pos'], ???

I have not a single prediction with negative label. (which is another issue)

IMDB is not marked as deprecated in source code.

To Reproduce

from flair.datasets import IMDB, DataLoader

from flair.models import TextClassifier

classifier = TextClassifier.load('en-sentiment')
corpus = IMDB()

sentences = list(corpus.test)

test_results, a = classifier.evaluate(data_loader=DataLoader(sentences[:100], batch_size=16))
print(test_results.detailed_results)

Print:

# print(test_results.detailed_results)
MICRO_AVG: acc 0.0 - f1-score 0.0
MACRO_AVG: acc 0.0 - f1-score 0.0
NEGATIVE   tp: 0 - fp: 13 - fn: 0 - tn: 87 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
POSITIVE   tp: 0 - fp: 87 - fn: 0 - tn: 13 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000

Expected behavior
A score which is not zero

Solution

Retrain / reshare a new en-sentiment model? // redo IMDB dataset

Context
PiPy version of Flair / same bug on master branch

Note

Btw, the model seems to not work very well.

"I never saw something that bad." -> positive
"I do not like this film" -> positive
"I hate this film" -> negative (finally)

The text was updated successfully, but these errors were encountered:

tombburnell · 2019-10-02T14:20:06Z

FYI I ran the Imdb model against some opinion based news and I didn't find the results to be all that meaningful.

alanakbik · 2019-10-02T14:55:44Z

Yes this model was trained using IMDB data, i.e. film related, so its only a sentiment model for movie reviews. I have to check why the label names changed, very strange!

pommedeterresautee · 2019-10-05T21:15:55Z

@alanakbik This is very strange.
I have cleaned $HOME/.flair folder.
Then I executed that code

from flair.datasets import IMDB, DataLoader

from flair.models import TextClassifier

classifier = TextClassifier.load('en-sentiment')
corpus = IMDB()

sentences = list(corpus.test)

It downloads what it has to.

2019-10-05 23:01:17,042 https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/models-v0.4/classy-imdb-en-rnn-cuda%3A0/imdb-v0.4.pt not found in cache, downloading to /tmp/tmpgxn7mqor
100%|██████████| 1501979561/1501979561 [00:32<00:00, 45600891.88B/s]
2019-10-05 23:01:50,139 copying /tmp/tmpgxn7mqor to cache at /home/.../.flair/models/imdb-v0.4.pt
2019-10-05 23:01:51,112 removing temp file /tmp/tmpgxn7mqor
2019-10-05 23:01:51,227 loading file /home/.../.flair/models/imdb-v0.4.pt
2019-10-05 23:02:00,766 http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz not found in cache, downloading to /tmp/tmpm1n0lup3
100%|██████████| 84125825/84125825 [00:10<00:00, 8308111.70B/s] 
2019-10-05 23:02:11,202 copying /tmp/tmpm1n0lup3 to cache at /home/.../.flair/datasets/imdb/aclImdb_v1.tar.gz
2019-10-05 23:02:11,267 removing temp file /tmp/tmpm1n0lup3
2019-10-05 23:02:22,099 Reading data from /home/.../.flair/datasets/imdb
2019-10-05 23:02:22,099 Train: /home/.../.flair/datasets/imdb/train.txt
2019-10-05 23:02:22,099 Dev: None
2019-10-05 23:02:22,099 Test: /home/.../.flair/datasets/imdb/test.txt

Then I do

sentences[0].labels
# Out[12]: [pos (1.0)]

And

a = classifier.predict(sentences=sentences[:100],
                   mini_batch_size=16,
                   embedding_storage_mode="none")
a[0].labels
# Out[13]: [POSITIVE (0.9999998807907104)]

So labels ARE different.

Of course, now:

sentences[0].labels
# Out[14]: [POSITIVE (0.9999998807907104)]

as sentence labels are overriden by predict() call.

first line of /home/.../.flair/datasets/imdb/test.txt

__label__pos Back in 1982 a little film called MAKING LOVE shocked audiences with its frank and open depiction of a romantic love story that just happened to be about two men.<br /><br />I have been waiting for years for a good, old-fashioned romance between two men; LATTER DAYS is all that and more.<br /><br />Yes, it is soapy, melodramatic, cliché-ridden, and quite corny. That is what makes it so wonderful. There is nothing like a good romantic movie, and this movie is romantic in the best sense of the word.<br /><br />As to the issue of religion, sorry folks, but these things do happen and are happening to gay people even now. It is not just the Mormon church that rejects its gay members. Gay people in every religion have faced harsh judgment and rejection.<br /><br />I loved this movie. It has a perfect blend of a fantasy-romance grounded in the reality of the day-to-day lives of the characters. If I could give it more than ten stars I would. Good love stories never go out of style; great love stories like LATTER DAYS are unforgettable.<br /><br />It's about time!

To make it short, label from model is POSITIVE and label from dataset is pos.

stale · 2020-04-29T20:11:05Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alanakbik · 2020-04-29T21:34:44Z

PR #1545 adds new sentiment datasets and homogenizes labels across datasets. Also, there is now an option to define "name maps" to map label names to other names.

pommedeterresautee added the bug Something isn't working label Sep 29, 2019

pommedeterresautee mentioned this issue Oct 2, 2019

RTX2080 GPU is only being used at 30% - whilst a single CPU core is maxed out #1080

Closed

stale bot added the wontfix This will not be worked on label Apr 29, 2020

alanakbik closed this as completed Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

en-sentiment model contains labels that don't match IMDB dataset #1165

en-sentiment model contains labels that don't match IMDB dataset #1165

pommedeterresautee commented Sep 29, 2019 •

edited

Loading

tombburnell commented Oct 2, 2019

alanakbik commented Oct 2, 2019

pommedeterresautee commented Oct 5, 2019

stale bot commented Apr 29, 2020

alanakbik commented Apr 29, 2020

en-sentiment model contains labels that don't match IMDB dataset #1165

en-sentiment model contains labels that don't match IMDB dataset #1165

Comments

pommedeterresautee commented Sep 29, 2019 • edited Loading

tombburnell commented Oct 2, 2019

alanakbik commented Oct 2, 2019

pommedeterresautee commented Oct 5, 2019

stale bot commented Apr 29, 2020

alanakbik commented Apr 29, 2020

pommedeterresautee commented Sep 29, 2019 •

edited

Loading