Request For Adding ParSeq - text recognition model #1003

nikokks · 2022-07-30T12:13:11Z

🚀 The feature

Hello,

I mainly use the text detection and text recognition models with your framework.

As I have seen: the most recent models that you propose in text recognition, namely MASTER and SAR, are not yet operational.

However at the text recognition level, there is a recent model that gets very impressive performances: PasrSeq.

Here are the references:
https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/master/README.md
https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/README.md
https://paperswithcode.com/paper/scene-text-recognition-with-permuted#code
https://github.com/baudm/parseq

Would it be possible to add this recognition text to the models you propose?

Thanks a lot for your work !

Motivation, pitch

I'm working with text recognition models and a recent model in state of the art outperforms on all test datasets.
I would like use this model with your framework pipelines

Alternatives

No response

Additional context

No response

felixdittrich92 · 2022-07-30T21:53:02Z

Hi @nikokks 👋 ,
fyi MASTER and SAR are fixed in both frameworks on main branch will be released soon with 0.5.2

I agree with the ParSeq addition firstly i have had in mind to add ViTSTR but yes i think we can switch directly to ParSeq instead of this👍

Are you maybe interested to open a PR for this model we would be happy to help with !
Otherwise we could take it in the near future we have definitly on track to add more models and i totally agree that ParSeq has to be one of it :)

Do you have experience / tested the models latency on cpu ? Would be interesting to see

frgfm · 2022-08-01T08:24:29Z

I agree that this could be a good candidate for new text recognition models in 0.6.0 :)
Perhaps we should open a tracker issue for all new models request that we would stage for 0.6.0, or directly link them in the release tracker?

felixdittrich92 · 2022-08-01T08:33:04Z

@frgfm I would say a seperate issue where we can track all requested model additions (splitted in detection / recognition / TF / PT with paper/repo link) and link this issue in the release tracker. wdyt ? Do you like to open it ? :)

frgfm · 2022-08-01T15:12:47Z

@felixdittrich92 Done :)

felixdittrich92 · 2022-08-01T16:49:46Z

@nikokks @frgfm Do you agree if we close this ticket ? Should be fine if we track it in mentioned issue :)

frgfm · 2022-08-01T17:46:52Z

I think we should keep this:

the PR will close this issue, and the release tracker will get the checkbox ticked

So no need to close it, and that will notify @nikokks when this gets resolved, which I guess is of interest to him :)

felixdittrich92 · 2022-08-01T17:50:11Z

👍

nikokks · 2022-08-01T22:23:09Z

I am inspecting the baudm code on Parseq (https://github.com/baudm/parseq).
I think the code might not be too complicated to integrate.

On my side I managed to connect your ocr_prediction and to integrate the reco_predictor of baudm successfully. The performances are not to good for french documents like yours actually => needs to be finetuned on your secret data 😉

Several questions will come from me on the choices of implementation and integration in doctr:

can we integrate libraries like timm
do I have to do it in TF too (torch only if possible)
if I can add additional utils.py in the future ParSeq folder if this is not your habits.
etc.

I will clarify my questions rather next weekend 😀

You can close this issue

felixdittrich92 · 2022-08-02T08:27:24Z

Hi @nikokks 👋 ,

lets keep it open for further conversation about ParSeq 👍

About your points:

timm is a great library but it is mostly pinned in librarys which use it to a fixed version specifier in fact of maintaince and it would be another extra dependency in doctr. So i would suggest to implement ViT from scratch inside the classification models to use it as feature extractor (@frgfm we should move transformers components from models/recognition to models that we can reuse the modules wdyt?)
it would be great to implement it in both frameworks but it's also fine if you only want to focus on PT ... anyone else could do the translation 👍
if it is stuff which is only ParSeq related and does not depends on PT or TF i would say base.py is the right place (but lets take a look if we see it in a PR 😅

We are definitly happy to help with. I would say if you are ready open a PR (starting with the classification ViT integration) and we iterate on this wdyt ?

nikokks · 2022-08-06T15:09:41Z

Hi,

ok for timm.

Other question: can we integrate 'pytorch-lightning~=1.6.5' to the requirements-pt.txt ?

felixdittrich92 · 2022-08-06T15:39:47Z

Hi @nikokks 👋 ,
Why do you think we need this ? :)
We should implement all models in plain pytorch / tensorflow without any wrappers

nikokks · 2022-08-06T16:22:33Z

ok, it sounds good for me :)

I have added parseq class on my fork.
Now I need to match all args of each methods between your wrapper and the model class parseq =) : (the most difficult part)
I'll do it in the next days or next weekend.

felixdittrich92 · 2022-08-06T18:21:51Z

@nikokks I would suggest the following steps (every should be one PR)

move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)
implement ViT as classification model
implement the ParSeq Decoder side which use ViT as feature extractor

frgfm · 2022-08-08T05:31:31Z

move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)

I agree 👍

implement ViT as classification model

Yup, but giving credits to the rightful contributors / source of inspiration when relevant!

felixdittrich92 · 2022-08-08T09:47:16Z

@nikokks Now you can reuse the already implemented transformer parts for ViT 👍

felixdittrich92 · 2022-09-04T17:21:26Z

Hi @nikokks short update i have not forget it will (hopefully) start with ViTSTR next week then it should be easy to implement the decoder from parseq also 👍

felixdittrich92 · 2022-09-16T13:29:20Z

Hi @nikokks,
are you still interested to add ParSeq ? :)
After #1063 ViT has finally found it's way in doctr as backbone.
And #1055 will be updated soon (which could be used as template also for ParSeq)

nikokks · 2023-05-29T19:14:33Z

Hello, I am currently implementing ParSeq.
It is now working with quite good predictions :)
Do you have any good advice to do a good pull request ?
best

felixdittrich92 · 2023-05-29T20:13:56Z

Hey @nikokks 👋 ,

You can take a look into https://github.com/mindee/doctr/pull/1055/files
which is i think a good template (implementation, tests, docs) for implementing parseq in doctr :)
Otherweise open a PR and we will guide you step by step 👍

PS: If you have only the PT implementation that's fine we can port it later to TF :)

felixdittrich92 · 2023-06-23T06:40:01Z

Finished after #1227 and #1228 are merged :)

nikokks added the type: enhancement Improvement label Jul 30, 2022

felixdittrich92 added this to the 0.6.0 milestone Jul 30, 2022

felixdittrich92 added module: models Related to doctr.models framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: text recognition Related to the task of text recognition labels Jul 30, 2022

frgfm added type: new feature New feature and removed type: enhancement Improvement labels Aug 1, 2022

frgfm mentioned this issue Aug 1, 2022

Upcoming support for new model architectures #1007

Closed

3 tasks

nikokks closed this as completed Aug 1, 2022

felixdittrich92 reopened this Aug 2, 2022

felixdittrich92 mentioned this issue Aug 8, 2022

move transformers implementation to modules #1013

Merged

felixdittrich92 mentioned this issue Sep 6, 2022

[DRAFT] [models] add ViTSTR in TF and PT #1048

Closed

2 tasks

felixdittrich92 modified the milestones: 0.6.0, 0.7.0 Sep 26, 2022

felixdittrich92 closed this as completed Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request For Adding ParSeq - text recognition model #1003

Request For Adding ParSeq - text recognition model #1003

nikokks commented Jul 30, 2022 •

edited

Loading

felixdittrich92 commented Jul 30, 2022 •

edited

Loading

frgfm commented Aug 1, 2022 •

edited

Loading

felixdittrich92 commented Aug 1, 2022

frgfm commented Aug 1, 2022

felixdittrich92 commented Aug 1, 2022

frgfm commented Aug 1, 2022

felixdittrich92 commented Aug 1, 2022

nikokks commented Aug 1, 2022 •

edited

Loading

felixdittrich92 commented Aug 2, 2022

nikokks commented Aug 6, 2022

felixdittrich92 commented Aug 6, 2022

nikokks commented Aug 6, 2022

felixdittrich92 commented Aug 6, 2022

frgfm commented Aug 8, 2022

felixdittrich92 commented Aug 8, 2022

felixdittrich92 commented Sep 4, 2022

felixdittrich92 commented Sep 16, 2022

nikokks commented May 29, 2023

felixdittrich92 commented May 29, 2023 •

edited

Loading

felixdittrich92 commented Jun 23, 2023

Request For Adding ParSeq - text recognition model #1003

Request For Adding ParSeq - text recognition model #1003

Comments

nikokks commented Jul 30, 2022 • edited Loading

🚀 The feature

Motivation, pitch

Alternatives

Additional context

felixdittrich92 commented Jul 30, 2022 • edited Loading

frgfm commented Aug 1, 2022 • edited Loading

felixdittrich92 commented Aug 1, 2022

frgfm commented Aug 1, 2022

felixdittrich92 commented Aug 1, 2022

frgfm commented Aug 1, 2022

felixdittrich92 commented Aug 1, 2022

nikokks commented Aug 1, 2022 • edited Loading

felixdittrich92 commented Aug 2, 2022

nikokks commented Aug 6, 2022

felixdittrich92 commented Aug 6, 2022

nikokks commented Aug 6, 2022

felixdittrich92 commented Aug 6, 2022

frgfm commented Aug 8, 2022

felixdittrich92 commented Aug 8, 2022

felixdittrich92 commented Sep 4, 2022

felixdittrich92 commented Sep 16, 2022

nikokks commented May 29, 2023

felixdittrich92 commented May 29, 2023 • edited Loading

felixdittrich92 commented Jun 23, 2023

nikokks commented Jul 30, 2022 •

edited

Loading

felixdittrich92 commented Jul 30, 2022 •

edited

Loading

frgfm commented Aug 1, 2022 •

edited

Loading

nikokks commented Aug 1, 2022 •

edited

Loading

felixdittrich92 commented May 29, 2023 •

edited

Loading