pytorch-pretrained-bert to pytorch-transformers upgrade #873

stefan-it · 2019-07-11T15:49:53Z

Hi,

the upcoming 1.0 version of pytorch-pretrained-bert will introduce several API changes, new models and even a name change to pytorch-transformers.

After the final 1.0 release, flair could support 7 different Transformer-based architectures:

BERT -> BertEmbeddings
OpenAI GPT -> OpenAIGPTEmbeddings
OpenAI GPT-2 -> OpenAIGPT2Embeddings 🛡️
Transformer-XL -> TransformerXLEmbeddings
XLNet -> XLNetEmbeddings 🛡️
XLM -> XLMEmbeddings 🛡️
RoBERTa -> RoBERTaEmbeddings 🛡️ (currently not covered by pytorch-transformers)

🛡️ indicates a new embedding class for flair.

It also introduces an universal API for all models, so quite a few changes in flair are necessary so support both old and new embedding classes.

This issue tracks the implementation status for all 6 embedding classes 😊

The text was updated successfully, but these errors were encountered:

alanakbik · 2019-07-12T10:20:48Z

Awesome - really look forward to supporting this in Flair!

aychang95 · 2019-07-16T19:09:22Z

pytorch-transformers 1.0 was released today: at https://github.com/huggingface/pytorch-transformers

Migrations summary can be found on the readme here

Main takeaways from the migration process are:

Models now output tuples: Seems like the quickest fix is to retrieve the first element of our model outputs with a safe 0 index
Serialization: Serialization methods have been standardized with the saved_pretrained(save_directory), and models are in evaluation mode by default and must be set to training mode during training
Optimizers: BertAdam and OpenAIAdam are now to be replaced with AdamW, and schedulers are not part of the optimizer anymore. Scheduler step must be specified per batch and are now standard PyTorch learning schedulers

Otherwise, pytorch-transformers 1.0 looks great and looking forward to use them on flair as well as standalone

alanakbik · 2019-07-17T07:02:47Z

Looks great!

stefan-it · 2019-07-17T08:56:19Z

Using the last_hidden_states works e.g. for Transformer-XL, but badly fails for XLNet and OpenGPT-2. As we use a feature-based approach, I'm currently doing some extensive per-layer analysis for all of the architectures. I'll post the results here (I'm mainly using a 0.1 downsampled CoNLL-2003 English corpus for NER).

stefan-it · 2019-07-17T09:55:25Z

XLNet

I ran some per-layer analysis with the large XLNet model:

Layer	F-Score (0.1 downsampled CoNLL-2003 NER corpus)
1	82.91
2	81.94
3	80.10
4	82.62
5	84.16
6	79.19
7	80.76
8	81.85
9	82.64
10	74.29
11	78.99
12	79.34
13	76.22
14	79.67
15	77.07
16	73.49
17	73.20
18	74.36
19	72.32
20	71.30
21	74.97
22	75.04
23	66.84
24	03.37

`XLNetEmbeddings`

To use the new XLNet embeddings in flair just do:

from flair.data import Sentence
from flair.embeddings import XLNetEmbeddings

embeddings = XLNetEmbeddings()

s = Sentence("Berlin and Munich are nice cities .")
embeddings.embed(s)

# Get embeddings
for token in s.tokens:
  print(token.embedding)

XLNetEmbeddings has two parameters:

model: just specify the XLNet model. pytorch-transformers currenly comes with xlnet-large-cased and xlnet-base-cased
layers: comma-separated string of layers. Default is 1, to use more layers (will be concatenated then) just pass: 1,2,3,4
pooling_operation: defines the pooling operation of subwords. By default first and last subword embeddings are concatenated and used. Other pooling operations are also available: first, last and mean

…ransformers)

stefan-it · 2019-07-18T10:44:21Z

Transformer-XL

I also ran some per-layer analysis for the Transformer-XL embeddings:

Layer	F-Score (0.1 downsampled CoNLL-2003 NER corpus)
1	80.88
2	81.68
3	82.88
4	80.89
5	84.74
6	80.68
7	82.65
8	79.53
9	79.25
10	79.64
11	80.07
12	84.26
13	81.22
14	80.59
15	81.31
16	78.95
17	79.85
18	80.69

Experiments with combination of layers:

Layers	F-Score (0.1 downsampled CoNLL-2003 NER corpus)
1,2	80.84
1,2,3	81.99
1,2,3,4	78.44
1,2,3,4,5	80.89

That's the reason why I choose layers 1,2,3 as default for the TransformerXLEmbeddings for now.

`TransformerXLEmbeddings`

TransformerXLEmbeddings has two parameters:

model: just specify the Transformer-XL model. pytorch-transformers currenly comes with transfo-xl-wt103
layers: comma-separated string of layers. Default are 1,2,3, to use more layers (will be concatenated then) just pass: 1,2,3,4

…torch-transformers

stefan-it · 2019-07-18T12:51:16Z

@alanakbik I'm planning to run per-layer analysis for all of the Transformer-based models.

However, it is really hard to give a recommendation for default layer(s).

Recently, I found this NAACL paper: Linguistic Knowledge and Transferability of Contextual Representations that uses a "scalar mix" of all layers. An implementation can be found in the allennlp repo, see it here. Do you have any idea how we can use this technique here 🤔

Would be awesome if we can adopt that 🤗

alanakbik · 2019-07-18T13:15:03Z

@stefan-it from the paper it also seems that best approach / layers would vary by task so giving overall recommendations might be really difficult. From a quick look it seems like their implementation could be integrated as part of an embeddings class, i.e. after retrieving the layers put them through this code. Might be interesting to try out!

stefan-it · 2019-07-18T20:09:24Z

OpenAI GPT-1

Here somes some per-layer analysis for the first GPT model:

Layer	F-Score (0.1 downsampled CoNLL-2003 NER corpus)
1	74.90
2	66.93
3	64.62
4	67.62
5	62.01
6	58.19
7	53.13
8	55.19
9	52.61
10	52.02
11	64.59
12	70.08

I implemented a first prototype of the scalar mix approach. I was able to get a F-Score of 71.01 (over all layers, incl. word embedding layer)!

`OpenAIGPTEmbeddings`

The OpenAIGPTEmbeddings comes with three parameters: model, layers and pooling_operation.

ilham-bintang · 2019-07-19T01:57:51Z

XLNet

I ran some per-layer analysis with the large XLNet model:

Layer F-Score (0.1 downsampled CoNLL-2003 NER corpus)
1 82.91
2 81.94
3 80.10
4 82.62
5 84.16
6 79.19
7 80.76
8 81.85
9 82.64
10 74.29
11 78.99
12 79.34
13 76.22
14 79.67
15 77.07
16 73.49
17 73.20
18 74.36
19 72.32
20 71.30
21 74.97
22 75.04
23 66.84
24 03.37

XLNetEmbeddings

To use the new XLNet embeddings in flair just do:
from flair.data import Sentence
from flair.embeddings import XLNetEmbeddings

embeddings = XLNetEmbeddings()

s = Sentence("Berlin and Munich are nice cities .")
embeddings.embed(s)

# Get embeddings
for token in s.tokens:
  print(token.embeddings)
XLNetEmbeddings has two parameters:

model: just specify the XLNet model. pytorch-transformers currenly comes with xlnet-large-cased and xlnet-base-cased

layers: comma-separated string of layers. Default is 1, to use more layers (will be concatenated then) just pass: 1,2,3,4

pooling_operation: defines the pooling operation of subwords. By default first and last subword embeddings are concatenated and used. Other pooling operations are also available: first, last and mean

Hi, I use branch GH-873-pytorch-transformers and try it. But it raised an error:
AttributeError: 'Token' object has no attribute 'embeddings'

DecentMakeover · 2019-07-19T07:05:49Z

Not able to import

ImportError: cannot import name 'XLNetEmbeddings'

Any suggestions?

ilham-bintang · 2019-07-19T07:13:22Z

Not able to import
ImportError: cannot import name 'XLNetEmbeddings'
Any suggestions?

You need to change branch to GH-873

DecentMakeover · 2019-07-19T07:15:23Z

Thanks for the quick reply, ill check

stefan-it · 2019-07-19T07:20:45Z

@NullPhantom just use:

for token in s.tokens:
  print(token.embedding)

:)

alanakbik · 2019-07-19T09:03:58Z

@stefan-it interesting results with the scalar mix! How is the effect on runtime, i.e. for instance comparing scalar mix with only one layer?

stefan-it · 2019-07-22T10:47:06Z

OpenAI GPT-2

I ran some per-layer experiments on the GPT-2 and the GPT-2 medium model:

Layer	`GPT-2` F-Score (0.1 downsampled CoNLL-2003 NER corpus)	`GPT-2 medium` F-Score (0.1 downsampled CoNLL-2003 NER corpus)
1	42.41	45.58
2	10.26	48.52
3	15.20	2.17
4	22.51	18.50
5	0.00	16.22
6	21.71	8.03
7	12.70	15.85
8	14.10	17.74
9	0.00	6.70
10	18.75	0.00
11	0.00	3.22
12	5.62	11.18
13		17.09
14		14.25
15		0.00
16		7.02
17		8.03
18		0.00
19		0.00
20		9.49
21		10.65
22		8.38
23		18.74
24		5.15

It does not look very promising, so scalar mix could help here!

`OpenAIGPT2Embeddings`

To play around with the embeddings from the GPT-2 models, just use:

from flair.data import Sentence
from flair.embeddings import OpenAIGPT2Embeddings

embeddings = OpenAIGPT2Embeddings()

s = Sentence("Berlin and Munich")
embeddings.embed(s)

for token in s.tokens:
  print(token.embedding)

stefan-it · 2019-07-22T10:48:42Z

@alanakbik In my preliminary experiments with scalar mix implementation, I couldn't see any big performance issues, but I'll measure it whenever the implementation is ready.

I'm currently focussing on per-layer analysis for the XLM model :)

GH-873: Add new type of embeddings: XLMEmbeddings

alanakbik · 2019-07-22T11:51:12Z

@stefan-it cool! Really looking forward to XLM! Strange that GPT-2 is not doing so well.

stefan-it · 2019-07-23T13:04:52Z

XLM

Here are the results from a per-layer analysis for the English XLM model:

Layer	F-Score (0.1 downsampled CoNLL-2003 NER corpus)
1	76.92
2	75.91
3	75.61
4	73.52
5	73.66
6	70.75
7	70.90
8	63.58
9	64.04
10	57.38
11	54.70
12	56.96

`XLMEmbeddings`

The following snippet demonstrates the usage of the newXLMEmbeddings class:

from flair.data import Sentence
from flair.embeddings import XLMEmbeddings

embeddings = XLMEmbeddings()

s = Sentence("It is very hot in Munich now .")
embeddings.embed(s)

for token in s.tokens:
  print(token.embedding)

stefan-it · 2019-07-23T13:07:08Z

I'm currently updating the BertEmbeddings class to the new pytorch-transformers API.

Btw: using scalar mix does not help when using the OpenAIGPT2Embeddings 😞

stefan-it · 2019-07-25T09:51:37Z

I adjusted the BertEmbeddings class to make it compatible with the new pytorch-transformers API.

In order to avoid any regression bugs, I compared the performance with the old pytorch-pretrained-BERT library. Here's the per-layer analysis:

Layer	`BERT` with `pytorch-pretrained-BERT` F-Score (0.1 downsampled CoNLL-2003 NER corpus)	`BERT` with `pytorch-transformers` F-Score (0.1 downsampled CoNLL-2003 NER corpus)
1	80.15	81.35
2	79.49	82.65
3	84.20	83.44
4	83.71	84.58
5	87.71	88.81
6	86.56	87.34
7	87.61	87.13
8	86.67	85.20
9	88.17	87.73
10	89.20	85.66
11	86.65	87.44
12	85.82	87.06

stefan-it · 2019-07-25T09:55:38Z

@alanakbik Can I fill a PR for these new embeddings? Maybe we can define some kind of roadmap for the new introduced Transformer-based embeddings:

PR for the current implementations
Next step: add support for scalar mix
Next step: performance tuning (maybe we can use a batch of sentences? But then we need to implement a kind of mapping: original token and subword embeddings that belong to the original token)
Future steps: fine-tuning of these Transformer-based embeddings (instead of using a feature-based approach)

alanakbik · 2019-07-25T10:04:53Z

@stefan-it absolutely! This is a major upgrade that lots of people will want to use. With all the new features, it's probably time to do another Flair release (v0.4.3), the question is whether we wait for the features you outline or release in the very near future?

stefan-it · 2019-07-25T13:06:49Z

I'm going to work a bit on it until next week. As I just found a great suggestion/improve in the pytorch-transformers repo see issue here, I would make the following code changes:

Currently, a lot of duplicate code is used for implementing at least 5 different models.
Duplication: Pooling operation of subword based architectures, tokenization and layer concatenation

So I would be better to have a kind of generic base class :)

stefan-it · 2019-07-31T23:00:48Z

PR for the "first phase" is coming soon.

I also add support for RoBERTa, see the RoBERTa: A Robustly Optimized BERT Pretraining Approach paper for more information.

RoBERTa is currently not integrated into pytorch-transformers, so I wrote an embedding class around the torch.hub module. I tested the model, here a some results for the base model:

Layer	`RoBERTa` F-Score (0.1 downsampled CoNLL-2003 NER corpus)
1	75.16
2	80.29
3	81.01
4	80.41
5	80.52
6	80.23
7	81.31
8	84.25
9	80.12
10	78.09
11	76.50
12	81.12

Add support for new RoBERTa embeddings (wrapper around torch.hub module)

…ing classes

…bookresearch/fairseq#929)

stefan-it · 2019-08-02T10:16:47Z

The variance for a 0.1 downsampled CoNLL corpus is very high. I did some experiments for RoBERTa in order to compare different pooling operations for subwords using scalar mix:

Pooling operation	Run 1	Run 2	Run 3	Run 4	Avg.
`first_last`	76.12	76.56	79.12	79.17	77.74
`first`	78.91	81.41	76.79	80.00	79.28

I also used the complete CoNLL corpus (one run) with scalar mix:

Pooling operation	F-Score
`first_last`	86.97
`first`	87.40

BERT (base) achieves 92.2 (reported in their paper). Now I'm going to run some experiments with BERT (base) and scalar mix to have a better comparison :)

Update: BERT (base) achieves a F-Score of 91.38 on the full CoNLL corpus with scalar mix.

The following Transformer-based architectures are now supported via pytorch-transformers: - BertEmbeddings (Updated API) - OpenAIGPTEmbeddings (Updated API, various fixes) - OpenAIGPT2Embeddings (New) - TransformerXLEmbeddings (Updated API, tokenization fixes) - XLNetEmbeddings (New) - XLMEmbeddings (New) - RoBERTaEmbeddings (New, via torch.hub module) It also possible to use a scalar mix of specified layers from the Transformer-based models. Scalar mix is proposed by Liu et al. (2019). The scalar mix implementation is copied and slightly modified from the allennlp repo (Apache 2.0 license).

stefan-it · 2019-08-04T12:11:34Z

An update:

I've re-written the complete tokenization logic for all Transformer-based embeddings (except BERT). In the old version, I did pass each token for a sentence into the model (which is not very efficient and causes major problems with the GPT-2 tokenizer).

The latest version passes the complete sentence into the model. The embeddings for subwords are then aligned back to each "Flair" token in a sentence (I wrote some unit tests for that...).

I also added the code for scalar mix from the allennlp repo.

Here are some experiments with the new implemenation on a downsampled (0.1) CoNLL corpus for NER. F-Score is measured and averaged over 4 runs, scalar mix is used:

Model	Pooling	# 1	# 2	# 3	# 4	Avg.
RoBERTa (base)	`first`	86.34	86.30	90.21	87.28	87.53
GPT-1	`first_last`	75.21	77.53	74.90	76.33	75.99
GPT-1	`first`	74.31	75.42	74.01	76.56	75.08
GPT-2 (medium)	`first_last`	85.18	76.86	79.93	81.02	80.75
GPT-2 (medium)	`first`	78.88	79.23	80.31	76.80	78.81
XLM (en)	`first_last`	84.65	86.50	84.63	84.97	85.19
XLM (en)	`first`	86.66	88.28	87.55	85.82	87.08
Transformer-XL	-	81.03	80.17	78.67	81.34	80.53
XLNet (base)	`first_last`	85.66	88.59	85.74	87.36	86.84
XLNet (base)	`first`	88.81	86.65	86.01	85.72	86.80

I'm currently running experiments on the whole CoNLL corpus. Here are some results (only one run):

Model	Pooling	Dev	Test
BERT (base, cased)	`first`	94.74	91.38
BERT (base, uncased)	`first`	94.61	91.03
BERT (large, cased)	`first`	95.23	91.69
BERT (large, uncased)	`first`	94.78	91.49
BERT (large, cased, whole-word-masking)	`first`	94.88	91.16
BERT (large, uncased, whole-word-masking)	`first`	94.94	91.20
RoBERTa (base)	`first`	95.35	91.51
RoBERTa (large)	`first`	95.83	92.11
RoBERTa (large)	`mean`	96.31	92.31
XLNet (base)	`first_last`	94.56	90.73
XLNet (large)	`first_last`	95.47	91.49
XLNet (large)	`first`	95.14	91.71
XLM (en)	`first_last`	94.31	90.68
XLM (en)	`first`	94.00	90.73
GPT-2	`first_last`	91.35	87.47
GPT-2 (large)	`first_last`	94.09	90.63

Notice: The feature-based result from the BERT paper is 96.1 (dev) and 92.4 - 92.8 for base and large model (test). But they "include the maximal document context provided by the data". I found an issue in the allennlp repo (here) and a dev score of 95.3 seems to be possible (without using the document context).

But from these preliminary experiments, RoBERTa (/cc @myleott) seems to perform slightly better at the moment :)

…st --runslow tests

alanakbik · 2019-08-06T17:36:37Z

@stefan-it were you using the scalar mix in these experiments on the full CoNLL? Were you always using the default layers as set in the constructor of each class?

stefan-it · 2019-08-07T07:16:13Z

I used scalar mix for all layers (incl. word embedding layer, which is located at index 0) on the full CoNLL. E.g. for RoBERTa the init. would be:

emb = RoBERTaEmbeddings(model="roberta.base", layers="0,1,2,3,4,5,6,7,8,9,10,11,12", pooling_operation="first", use_scalar_mix=True)

I'm not sure about the default parameters when using no scalar mix, because there's no literature about that, except BERT (base), where a concat of the last four layers was proposed.

…eeded when RoBERTa embeddings are used)

alanakbik · 2019-08-07T07:49:37Z

Ah great - ok, I'll run a similar experiment and report numbers. But aside from this, I think we are ready to merge the PR.

alanakbik · 2019-08-07T07:54:17Z

BTW here some results with using RoBERTa with default parameters and scalar mix, i.e. instantiated like this:

RoBERTaEmbeddings(use_scalar_mix=True)

Results of three runs:

# 1	# 2	# 3
92.03	92.05	91.96

Using otherwise the exact same parameters as here.

GH-873: PyTorch-Transformers update

stefan-it · 2019-08-07T12:43:45Z

Documentation for the new PyTorch-Transformers embeddings are coming very soon :)

I'll close that issue now (PR was merged).

DecentMakeover · 2019-08-08T13:06:38Z

@stefan-it
if i do git pull will i be able to access these new additions or do i have to checkout to GH-873? Thanks

stefan-it · 2019-08-08T13:08:07Z

You can just use the latest master branch :) Or install it via:

pip install --upgrade git+https://github.com/zalandoresearch/flair.git

:)

DecentMakeover · 2019-08-08T13:09:31Z

okay thanks !

DecentMakeover · 2019-08-08T13:13:47Z

@stefan-it Even after running pip install --upgrade git+https://github.com/zalandoresearch/flair.git

When i try to import
from flair.embeddings import XLNetEmbeddings

i get ImportError: cannot import name 'XLNetEmbeddings'

dshaprin · 2019-08-21T12:25:16Z

@DecentMakeover You can try again, I installed the latest version of flair and the problem disappeared. The last commit is before 5 days.

DecentMakeover · 2019-08-21T12:26:52Z

@dshaprin okay ,ill check.

stefan-it self-assigned this Jul 11, 2019

stefan-it added the feature A new feature label Jul 11, 2019

stefan-it added a commit that referenced this issue Jul 17, 2019

GH-873: update TransformerXLEmbeddings (now compatible with pytorch-t…

2626652

…ransformers)

stefan-it added a commit that referenced this issue Jul 18, 2019

GH-873: adjust default layers for TransformerXLEmbeddings

87bf74a

stefan-it added a commit that referenced this issue Jul 18, 2019

GH-873: update OpenAIGPTEmbeddings interface to be compatible with py…

ee4b735

…torch-transformers

ilham-bintang pushed a commit to ilham-bintang/flair that referenced this issue Jul 19, 2019

flairNLPGH-873: Add new type of embeddings: XLMEmbeddings

2ebf220

ilham-bintang mentioned this issue Jul 19, 2019

GH-873: Add new type of embeddings: XLMEmbeddings #914

Merged

stefan-it added a commit that referenced this issue Jul 22, 2019

GH-873: add support for new OpenAI GPT-2 embeddings

387c444

stefan-it added a commit that referenced this issue Jul 22, 2019

GH-873: add support for new OpenAI GPT-2 embeddings

902bfa9

stefan-it added a commit that referenced this issue Jul 22, 2019

Merge pull request #914 from nullphantom/GH-873-pytorch-transformers

f7b72b8

GH-873: Add new type of embeddings: XLMEmbeddings

stefan-it added a commit that referenced this issue Jul 31, 2019

GH-873: remove duplicate code for extracting subtoken embeddings.

4452553

Add support for new RoBERTa embeddings (wrapper around torch.hub module)

stefan-it added a commit that referenced this issue Aug 1, 2019

GH-873: make RoBERTa model interface compatible with all other embedd…

cd814f1

…ing classes

stefan-it mentioned this issue Aug 1, 2019

GH-873: PyTorch-Transformers update #941

Merged

stefan-it added a commit that referenced this issue Aug 1, 2019

GH-873: sync tokenization bug fix for RoBERTa with upstream (see face…

b08d6d1

…bookresearch/fairseq#929)

stefan-it added a commit that referenced this issue Aug 2, 2019

GH-873: use first subword embedding for RoBERTa

7202075

stefan-it added a commit that referenced this issue Aug 4, 2019

GH-873: add extensive unit tests for all Transformer-based embeddings

d55e7e9

stefan-it added a commit that referenced this issue Aug 4, 2019

GH-873: fix scalar mix calculation for BERT 😅

d88b6da

stefan-it added a commit that referenced this issue Aug 5, 2019

GH-873: extensive transformer embeddings tests should be run via pyte…

03878cb

…st --runslow tests

stefan-it added a commit that referenced this issue Aug 7, 2019

GH-873: fix log_line import and add fastBPE as new dependency (only n…

1f3f785

…eeded when RoBERTa embeddings are used)

yosipk added a commit that referenced this issue Aug 7, 2019

Merge pull request #941 from zalandoresearch/GH-873-pytorch-transformers

4db491d

GH-873: PyTorch-Transformers update

stefan-it closed this as completed Aug 7, 2019

pytorch-pretrained-bert to pytorch-transformers upgrade #873

pytorch-pretrained-bert to pytorch-transformers upgrade #873

Comments

stefan-it commented Jul 11, 2019 • edited Loading

alanakbik commented Jul 12, 2019

aychang95 commented Jul 16, 2019

alanakbik commented Jul 17, 2019

stefan-it commented Jul 17, 2019

stefan-it commented Jul 17, 2019 • edited Loading

XLNet

XLNetEmbeddings

stefan-it commented Jul 18, 2019

Transformer-XL

TransformerXLEmbeddings

stefan-it commented Jul 18, 2019 • edited Loading

alanakbik commented Jul 18, 2019

stefan-it commented Jul 18, 2019 • edited Loading

OpenAI GPT-1

OpenAIGPTEmbeddings

ilham-bintang commented Jul 19, 2019

XLNet

XLNetEmbeddings

DecentMakeover commented Jul 19, 2019

ilham-bintang commented Jul 19, 2019

DecentMakeover commented Jul 19, 2019

stefan-it commented Jul 19, 2019

alanakbik commented Jul 19, 2019

stefan-it commented Jul 22, 2019

OpenAI GPT-2

OpenAIGPT2Embeddings

stefan-it commented Jul 22, 2019

alanakbik commented Jul 22, 2019

stefan-it commented Jul 23, 2019

XLM

XLMEmbeddings

stefan-it commented Jul 23, 2019

stefan-it commented Jul 25, 2019

stefan-it commented Jul 25, 2019 • edited Loading

alanakbik commented Jul 25, 2019

stefan-it commented Jul 25, 2019

stefan-it commented Jul 31, 2019

stefan-it commented Aug 2, 2019 • edited Loading

stefan-it commented Aug 4, 2019 • edited Loading

alanakbik commented Aug 6, 2019

stefan-it commented Aug 7, 2019

alanakbik commented Aug 7, 2019

alanakbik commented Aug 7, 2019 • edited Loading

stefan-it commented Aug 7, 2019

DecentMakeover commented Aug 8, 2019

stefan-it commented Aug 8, 2019

DecentMakeover commented Aug 8, 2019

DecentMakeover commented Aug 8, 2019

dshaprin commented Aug 21, 2019

DecentMakeover commented Aug 21, 2019

stefan-it commented Jul 11, 2019 •

edited

Loading

stefan-it commented Jul 17, 2019 •

edited

Loading

`XLNetEmbeddings`

`TransformerXLEmbeddings`

stefan-it commented Jul 18, 2019 •

edited

Loading

stefan-it commented Jul 18, 2019 •

edited

Loading

`OpenAIGPTEmbeddings`

`XLNetEmbeddings`

`OpenAIGPT2Embeddings`

`XLMEmbeddings`

stefan-it commented Jul 25, 2019 •

edited

Loading

stefan-it commented Aug 2, 2019 •

edited

Loading

stefan-it commented Aug 4, 2019 •

edited

Loading

alanakbik commented Aug 7, 2019 •

edited

Loading