Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does flair handles text sentences longer than 512 when using bert? #2281

Closed
user06039 opened this issue May 20, 2021 · 3 comments
Closed

Comments

@user06039
Copy link

I am currently using flair with bert embeddings for a ner model but the input sequence length of my sentences are around 1000-2000 words.

I am using this setup,

embedding_types = [
    TransformerWordEmbeddings('bert-base-cased', fine_tune=True, layers='-1', allow_long_sentences=True),
    FlairEmbeddings('news-forward-fast'),
    FlairEmbeddings('news-backward-fast'),
]

embeddings : StackedEmbeddings = StackedEmbeddings(
                             embeddings=embedding_types)


tagger : SequenceTagger = SequenceTagger(hidden_size=256,
                                         embeddings=embeddings,
                                         tag_dictionary=tag_dictionary,
                                         tag_type='ner',
                                         use_crf=True)

trainer : ModelTrainer = ModelTrainer(tagger, corpus, torch.optim.Adam)
    
trainer.train('jd_models/jd-ner',
               learning_rate=0.0001,
               mini_batch_size=32,
               mini_batch_chunk_size=8,
               max_epochs=30,
               min_learning_rate=0.00001)
  1. I am guessing it works because of this parameter called allow_long_sentences? But can I know how it actually works behind the scenes. Does it split my one sequence into multiple 512 chunks and process them separately and combine later (or) just ignores the words after 512 words.
  2. Can we use models like reformer etc., which can handle large text sentences.

Please help out.

@user06039 user06039 changed the title How long flair handles when using bert with sentences longer than 512? How does flair handles text sentences longer than 512 when using bert? May 20, 2021
@schelv
Copy link
Contributor

schelv commented May 25, 2021

Here is an explanation: #1680 (comment)

@user06039
Copy link
Author

@schelv Thanks for explaining. If I am understanding correctly,

Lets say I use any transformer model each has a limit of 512 tokens. I have a sentence with 1024 tokens.

If I use allow_long_sentences=True in TransformerWordEmbeddings It uses strides to split my sentence to 512 and 512 and then concat at the end into one embedding.

Do I have to set allow_long_sentences=True to make this work (or) Is it enabled by default.?

@schelv
Copy link
Contributor

schelv commented Jun 10, 2021

Yes I think in newer versions it is enabled by default.
The default stride length is 256 for a 512-token model.

for 1024 this gives three 512 token pieces, that have a bit of overlap.
The purpose is to get more context into the embeddings of the words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants