ONNX compatible models #2640

helpmefindaname · 2022-02-20T17:01:16Z

To be removed, once it is done: Please add the appropriate label to this ticket, e.g. feature or enhancement.

Is your feature/enhancement request related to a problem? Please describe.
ONNX support is a frequently requested feature, some issues mention it (#2625, #2451, #2317, #1936, #1423, #999)
so I think there is a big desire for the community to support it.
I suppose the usual ONNX compatibility would also make the models compatible to torch.jit (#2528) or AWS Neutron (#2443)

ONNX provides large enhancements in terms of production readiness, it creates a static computational graph which can be quantized and optimized towards specific hardware, see https://onnxruntime.ai/docs/performance/tune-performance.html (it claims to be 17x faster)

Describe the solution you'd like
I'd suggest iterative progression as multiple architecture changes are required:

split the forward/forward_pass methods, such that all models have a method _prepare_tensors which converts all DataPoints to tensors and a forward which takes in tensors and outputs tensors (e.g. for the SeqeuenceTagger we the forward has the signature def forward(self, sentence_tensor: torch.Tensor, lengths: torch.LongTensor) and returns a single tensor scores)
this change allows conversion to ONNX models, however the logic (like decoding crf scores, filling up sentence results, extracting tensors) won't be implemented. Also embeddings won't be part of the ONNX model.
create the same forward/_prepare_tensors architecture for embeddings, such that those could be converted too.
This would allow converting embeddings to ONNX models, but again without logic.
change the architecture, that both embeddings and models have the logic part (creating inputs, adding outputs to data points) and the pytorch part be split, such that the pytorch model part can be replaced by a converted ONNX model.
create an end-to-end model wrapper, that both embeddings & the model can be converted to a single ONNX model and used as such.

Notice that this would be 4 different PRs and probably all of them would be very large and should be tested a lot before moving to the next PR,
I would offer to do the first one and then see how much effort this is/how much time I have for this.

The text was updated successfully, but these errors were encountered:

kieron-guinamard-privitar · 2022-03-08T16:27:40Z

This would be very useful. Do you have any idea how large a piece of work this is (my gut feel is very)?

I can see if we can help with some of the work - I'll be honest, this wouldn't be my own speciality.

helpmefindaname · 2022-03-09T15:50:27Z

The first part is almost finished: #2643 is ready for review.

That one surprisingly straight forward: First think of how to refactor a model and then apply the same to all other models too (as it is manly the same).
Only the lemmatization model (encoder decoder architecture) has increased complexity.

The hardest part is deciding what kind of refactoring to apply, there it might be helpful already to just discuss/brainstorm how to do it.

I have some thoughts on the open tasks:

The TransformersEmbeddings likely will be a bigger piece, maybe flair (pool) embeddings too, one would convert the lengths and indices to LongTensors to ensure everything is convertible.
also, I think it would make sense to change the architecture, that Sentence stores the full embedding vector for the whole sequence instead of tokens storing the individual embeddings. That way, the forward method of the embeddings could return the already padded sequences. And embeddings.embed could return the raw tensors.
We could make _prepare_tensors return a dictionary {embedding_name: tensor} so stacked embeddings have an easy way to handle them separated by embedding.
This one struggles me a lot, the new architecture should be in a way that you don't need to load the pytorch weights if you use the onnx model and reversed. This could be done, by splitting the class into 2 classes (logic, vs Model), however it should also be easy to implement new models and slitting them up might make it too complicated.

aytugkaya · 2022-03-28T15:18:40Z

Is this code refactoring only for making Flair models compatible with ONNX?
OR is it possible to quantize the Flair models without use of ONNX before code is refactored?

helpmefindaname · 2022-03-29T18:08:14Z

As long as you are not using the flair embeddings with flair version < 0.11, you can apply dynamic quantisation on all flair models that run on cpu. However, you cannot store them due to the way embeddings are stored.

…forward GH-2640/transformer forward

stale · 2022-07-30T19:05:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

helpmefindaname · 2022-08-09T17:12:19Z

ping to reveive issue as it isn't dead

GH-2640: tensor forward

edoust · 2022-09-11T14:38:54Z

@helpmefindaname Were you able to finish this? I did an export of the german flair model to a single ONNX ~2 years ago but need the english version too, did you make any further progress in this matter?

edoust · 2022-09-11T22:46:46Z

I created a script for ONNX export for the de-pos model, it is running just fine on the onnxruntime on .NET. Will test if I can also get an export for Core ML to work, in case anyone needs it you can find it here: https://github.com/edoust/flair/commits/master

edoust · 2022-09-13T17:05:16Z

I created single file ONNX models from de-pos, flair/upos-multi, flair/upos-multi-fast that work with variable batch and sentence sizes.

Basically it first computes the forward and backward embeddings, selects the right embedding tensors from the total embeddings using the forwardIndices and backwardIndices. Then it concatenates the selected tensors, and "stripes" them into the final sentence_tensor using the striping

Input	Shape	Example Shape	Example	Description
forward	characters x sentences	`(9,2)`	`[mapped with char table]`	The mapped character input sequence for the forward language model
forwardIndices	total_tokens	`(4)`	`[6,14,5,17]`	The indices of the embeddings to take from the full embedding tensor
backward		`(9,2)`	`[mapped with char table]`
backwardIndices		`(4)`	`[14,6,5,17]`
striping	total_embeddings	`(8)`	`[0,4,1,5,2,6,3,7]`	Used to generate the sentence tensor from the concatenated forward and backward embedding tensors, e.g.
characterLengths	sentences	`(2)`	`[9,4]`	Required for keeping dynamic shapes right
lengths	sentences	`(2)`	`[2,1]`	Required for keeping dynamic shapes right

The above example values are given for the two short sentences Pla Gon and Xu.

@alanakbik @helpmefindaname
Does this make sense to you, or is there an easier/better way to achieve a single ONNX model export that includes the embeddings? Did I miss anything? Any feedback would be appreciated

This is the visual model representation:

helpmefindaname · 2022-09-14T17:43:48Z

Hi @edoust ,
sorry for the late answer

I think it will take a long time to finish this. So far the models can be exported without embeddings and the transformer embeddings themselves can be exported. The way I want to integrate the onnx export should be, that you can use torch.onnx.export and can use the exported model within the flair library. For this there are quite some architectural changes required where I am currently not sure how to handle them at best.

For the use case that you want to export it to another language (and therefore anyways have to recreate the code to handle inputs and outputs), I would say that your script looks quite solid.
The only thing there is that I wonder if the striping is really necessary? Shouldn't it be possible to concatenate the embeddings on the embedding dimension at line master...edoust:flair:master#diff-2cdd6b2846dd6d89526228ebe147fc75f9b0aa7c999593a4ee32db2ae142adfdR74 ?

edoust · 2022-09-16T08:18:55Z

Hi @helpmefindaname

thanks for the reply, you are right striping is not necessary, thanks for that :)

Regarding the ONNX export, I think it would be great to have the possibility to create single file ONNX model exports from various Flair models (combining embeddings and tagging model), otherwise it is a very high effort to include such a Flair model in any app. Having such an option would make the integration into (native) non-python apps/services much easier

jonashaag · 2022-10-01T19:45:29Z

Hi all, I'm interested in this as well, to speed up Flair inference. Do you have any measurements of performance of some models? I'd be interested in GPU vs. CPU vanilla vs. CPU ONNX/TorchScript.

helpmefindaname · 2022-10-02T21:09:20Z

Hi @jonashaag
I did some evaluation for the TransformerEmbeddings in this PR.
Notice that the times heavily depend on your devices. A cheap CPU will be way slower than a strong CPU and same for GPU. At the end, you have to evaluate it yourself for your hardware

jonashaag · 2022-10-03T06:36:29Z

I can’t find any numbers there. Can you please point me to them?

helpmefindaname · 2022-10-05T21:55:08Z

sorry, wrong PR, I meant this one: #2739

stale · 2023-03-18T19:05:44Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

helpmefindaname mentioned this issue Feb 21, 2022

GH-2640: tensor forward #2643

Merged

helpmefindaname added a commit to helpmefindaname/flair that referenced this issue Apr 16, 2022

flairNLPGH-2640: make transformer embeddings implement forward method

32a8209

helpmefindaname mentioned this issue Apr 23, 2022

GH-2640/transformer forward #2739

Merged

8 tasks

helpmefindaname added a commit to helpmefindaname/flair that referenced this issue May 16, 2022

flairNLPGH-2640: make transformer embeddings implement forward method

c3dfe64

helpmefindaname added a commit to helpmefindaname/flair that referenced this issue Jun 24, 2022

flairNLPGH-2640: make transformer embeddings implement forward method

b1dcfe1

helpmefindaname added a commit to helpmefindaname/flair that referenced this issue Jun 29, 2022

flairNLPGH-2640: make transformer embeddings implement forward method

747ded0

alanakbik added a commit that referenced this issue Jul 4, 2022

Merge pull request #2739 from helpmefindaname/bf/GH-2640/transformer_…

726c870

…forward GH-2640/transformer forward

stale bot added the wontfix This will not be worked on label Jul 30, 2022

alanakbik removed the wontfix This will not be worked on label Aug 10, 2022

alanakbik added a commit that referenced this issue Aug 18, 2022

Merge pull request #2643 from helpmefindaname/bf/tensor_forward

083362f

GH-2640: tensor forward

stale bot added the wontfix This will not be worked on label Mar 18, 2023

stale bot closed this as completed Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX compatible models #2640

ONNX compatible models #2640

helpmefindaname commented Feb 20, 2022 •

edited

Loading

kieron-guinamard-privitar commented Mar 8, 2022

helpmefindaname commented Mar 9, 2022

aytugkaya commented Mar 28, 2022

helpmefindaname commented Mar 29, 2022

stale bot commented Jul 30, 2022

helpmefindaname commented Aug 9, 2022

edoust commented Sep 11, 2022

edoust commented Sep 11, 2022

edoust commented Sep 13, 2022

helpmefindaname commented Sep 14, 2022

edoust commented Sep 16, 2022

jonashaag commented Oct 1, 2022

helpmefindaname commented Oct 2, 2022

jonashaag commented Oct 3, 2022

helpmefindaname commented Oct 5, 2022

stale bot commented Mar 18, 2023

ONNX compatible models #2640

ONNX compatible models #2640

Comments

helpmefindaname commented Feb 20, 2022 • edited Loading

kieron-guinamard-privitar commented Mar 8, 2022

helpmefindaname commented Mar 9, 2022

aytugkaya commented Mar 28, 2022

helpmefindaname commented Mar 29, 2022

stale bot commented Jul 30, 2022

helpmefindaname commented Aug 9, 2022

edoust commented Sep 11, 2022

edoust commented Sep 11, 2022

edoust commented Sep 13, 2022

helpmefindaname commented Sep 14, 2022

edoust commented Sep 16, 2022

jonashaag commented Oct 1, 2022

helpmefindaname commented Oct 2, 2022

jonashaag commented Oct 3, 2022

helpmefindaname commented Oct 5, 2022

stale bot commented Mar 18, 2023

helpmefindaname commented Feb 20, 2022 •

edited

Loading