Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

en-sentiment model contains labels that don't match IMDB dataset #1165

Closed
pommedeterresautee opened this issue Sep 29, 2019 · 5 comments
Closed
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@pommedeterresautee
Copy link
Contributor

pommedeterresautee commented Sep 29, 2019

Describe the bug

According to TUTORIAL 2, classification model en-sentiment has been trained on IMDB dataset.

evaluate() function on en-sentiment model trained on IMDB produces 0 score when tested on test set of IMDB.

Reasons seem to be a change in label names.

Model: ['POSITIVE'], ['NEGATIVE']
dataset: ['pos'], ???

I have not a single prediction with negative label. (which is another issue)

IMDB is not marked as deprecated in source code.

To Reproduce

from flair.datasets import IMDB, DataLoader

from flair.models import TextClassifier

classifier = TextClassifier.load('en-sentiment')
corpus = IMDB()

sentences = list(corpus.test)

test_results, a = classifier.evaluate(data_loader=DataLoader(sentences[:100], batch_size=16))
print(test_results.detailed_results)

Print:

# print(test_results.detailed_results)
MICRO_AVG: acc 0.0 - f1-score 0.0
MACRO_AVG: acc 0.0 - f1-score 0.0
NEGATIVE   tp: 0 - fp: 13 - fn: 0 - tn: 87 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000
POSITIVE   tp: 0 - fp: 87 - fn: 0 - tn: 13 - precision: 0.0000 - recall: 0.0000 - accuracy: 0.0000 - f1-score: 0.0000

Expected behavior
A score which is not zero

Solution

Retrain / reshare a new en-sentiment model? // redo IMDB dataset

Context
PiPy version of Flair / same bug on master branch

Note

Btw, the model seems to not work very well.

"I never saw something that bad." -> positive
"I do not like this film" -> positive
"I hate this film" -> negative (finally)

@tombburnell
Copy link

FYI I ran the Imdb model against some opinion based news and I didn't find the results to be all that meaningful.

@alanakbik
Copy link
Collaborator

Yes this model was trained using IMDB data, i.e. film related, so its only a sentiment model for movie reviews. I have to check why the label names changed, very strange!

@pommedeterresautee
Copy link
Contributor Author

@alanakbik This is very strange.
I have cleaned $HOME/.flair folder.
Then I executed that code

from flair.datasets import IMDB, DataLoader

from flair.models import TextClassifier

classifier = TextClassifier.load('en-sentiment')
corpus = IMDB()

sentences = list(corpus.test)

It downloads what it has to.

2019-10-05 23:01:17,042 https://s3.eu-central-1.amazonaws.com/alan-nlp/resources/models-v0.4/classy-imdb-en-rnn-cuda%3A0/imdb-v0.4.pt not found in cache, downloading to /tmp/tmpgxn7mqor
100%|██████████| 1501979561/1501979561 [00:32<00:00, 45600891.88B/s]
2019-10-05 23:01:50,139 copying /tmp/tmpgxn7mqor to cache at /home/.../.flair/models/imdb-v0.4.pt
2019-10-05 23:01:51,112 removing temp file /tmp/tmpgxn7mqor
2019-10-05 23:01:51,227 loading file /home/.../.flair/models/imdb-v0.4.pt
2019-10-05 23:02:00,766 http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz not found in cache, downloading to /tmp/tmpm1n0lup3
100%|██████████| 84125825/84125825 [00:10<00:00, 8308111.70B/s] 
2019-10-05 23:02:11,202 copying /tmp/tmpm1n0lup3 to cache at /home/.../.flair/datasets/imdb/aclImdb_v1.tar.gz
2019-10-05 23:02:11,267 removing temp file /tmp/tmpm1n0lup3
2019-10-05 23:02:22,099 Reading data from /home/.../.flair/datasets/imdb
2019-10-05 23:02:22,099 Train: /home/.../.flair/datasets/imdb/train.txt
2019-10-05 23:02:22,099 Dev: None
2019-10-05 23:02:22,099 Test: /home/.../.flair/datasets/imdb/test.txt

Then I do

sentences[0].labels
# Out[12]: [pos (1.0)]

And

a = classifier.predict(sentences=sentences[:100],
                   mini_batch_size=16,
                   embedding_storage_mode="none")
a[0].labels
# Out[13]: [POSITIVE (0.9999998807907104)]

So labels ARE different.

Of course, now:

sentences[0].labels
# Out[14]: [POSITIVE (0.9999998807907104)]

as sentence labels are overriden by predict() call.

first line of /home/.../.flair/datasets/imdb/test.txt

__label__pos Back in 1982 a little film called MAKING LOVE shocked audiences with its frank and open depiction of a romantic love story that just happened to be about two men.<br /><br />I have been waiting for years for a good, old-fashioned romance between two men; LATTER DAYS is all that and more.<br /><br />Yes, it is soapy, melodramatic, cliché-ridden, and quite corny. That is what makes it so wonderful. There is nothing like a good romantic movie, and this movie is romantic in the best sense of the word.<br /><br />As to the issue of religion, sorry folks, but these things do happen and are happening to gay people even now. It is not just the Mormon church that rejects its gay members. Gay people in every religion have faced harsh judgment and rejection.<br /><br />I loved this movie. It has a perfect blend of a fantasy-romance grounded in the reality of the day-to-day lives of the characters. If I could give it more than ten stars I would. Good love stories never go out of style; great love stories like LATTER DAYS are unforgettable.<br /><br />It's about time!

To make it short, label from model is POSITIVE and label from dataset is pos.

@stale
Copy link

stale bot commented Apr 29, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Apr 29, 2020
@alanakbik
Copy link
Collaborator

PR #1545 adds new sentiment datasets and homogenizes labels across datasets. Also, there is now an option to define "name maps" to map label names to other names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants