How does flair handles text sentences longer than 512 when using bert? #2281

user06039 · 2021-05-20T12:58:07Z

I am currently using flair with bert embeddings for a ner model but the input sequence length of my sentences are around 1000-2000 words.

I am using this setup,

embedding_types = [
    TransformerWordEmbeddings('bert-base-cased', fine_tune=True, layers='-1', allow_long_sentences=True),
    FlairEmbeddings('news-forward-fast'),
    FlairEmbeddings('news-backward-fast'),
]

embeddings : StackedEmbeddings = StackedEmbeddings(
                             embeddings=embedding_types)


tagger : SequenceTagger = SequenceTagger(hidden_size=256,
                                         embeddings=embeddings,
                                         tag_dictionary=tag_dictionary,
                                         tag_type='ner',
                                         use_crf=True)

trainer : ModelTrainer = ModelTrainer(tagger, corpus, torch.optim.Adam)
    
trainer.train('jd_models/jd-ner',
               learning_rate=0.0001,
               mini_batch_size=32,
               mini_batch_chunk_size=8,
               max_epochs=30,
               min_learning_rate=0.00001)

I am guessing it works because of this parameter called allow_long_sentences? But can I know how it actually works behind the scenes. Does it split my one sequence into multiple 512 chunks and process them separately and combine later (or) just ignores the words after 512 words.
Can we use models like reformer etc., which can handle large text sentences.

Please help out.

The text was updated successfully, but these errors were encountered:

schelv · 2021-05-25T08:28:39Z

Here is an explanation: #1680 (comment)

user06039 · 2021-06-10T11:50:56Z

@schelv Thanks for explaining. If I am understanding correctly,

Lets say I use any transformer model each has a limit of 512 tokens. I have a sentence with 1024 tokens.

If I use allow_long_sentences=True in TransformerWordEmbeddings It uses strides to split my sentence to 512 and 512 and then concat at the end into one embedding.

Do I have to set allow_long_sentences=True to make this work (or) Is it enabled by default.?

schelv · 2021-06-10T12:19:47Z

Yes I think in newer versions it is enabled by default.
The default stride length is 256 for a 512-token model.

for 1024 this gives three 512 token pieces, that have a bit of overlap.
The purpose is to get more context into the embeddings of the words.

user06039 changed the title ~~How long flair handles when using bert with sentences longer than 512?~~ How does flair handles text sentences longer than 512 when using bert? May 20, 2021

user06039 closed this as completed Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does flair handles text sentences longer than 512 when using bert? #2281

How does flair handles text sentences longer than 512 when using bert? #2281

user06039 commented May 20, 2021

schelv commented May 25, 2021

user06039 commented Jun 10, 2021

schelv commented Jun 10, 2021

How does flair handles text sentences longer than 512 when using bert? #2281

How does flair handles text sentences longer than 512 when using bert? #2281

Comments

user06039 commented May 20, 2021

schelv commented May 25, 2021

user06039 commented Jun 10, 2021

schelv commented Jun 10, 2021