Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request For Adding ParSeq - text recognition model #1003

Closed
nikokks opened this issue Jul 30, 2022 · 20 comments
Closed

Request For Adding ParSeq - text recognition model #1003

nikokks opened this issue Jul 30, 2022 · 20 comments
Labels
framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend module: models Related to doctr.models topic: text recognition Related to the task of text recognition type: new feature New feature
Milestone

Comments

@nikokks
Copy link
Contributor

nikokks commented Jul 30, 2022

🚀 The feature

Hello,

I mainly use the text detection and text recognition models with your framework.

As I have seen: the most recent models that you propose in text recognition, namely MASTER and SAR, are not yet operational.

However at the text recognition level, there is a recent model that gets very impressive performances: PasrSeq.

Here are the references:
https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/master/README.md
https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/README.md
https://paperswithcode.com/paper/scene-text-recognition-with-permuted#code
https://github.com/baudm/parseq

Would it be possible to add this recognition text to the models you propose?

Thanks a lot for your work !

Motivation, pitch

I'm working with text recognition models and a recent model in state of the art outperforms on all test datasets.
I would like use this model with your framework pipelines

Alternatives

No response

Additional context

No response

@nikokks nikokks added the type: enhancement Improvement label Jul 30, 2022
@felixdittrich92
Copy link
Contributor

felixdittrich92 commented Jul 30, 2022

Hi @nikokks 👋 ,
fyi MASTER and SAR are fixed in both frameworks on main branch will be released soon with 0.5.2

I agree with the ParSeq addition firstly i have had in mind to add ViTSTR but yes i think we can switch directly to ParSeq instead of this👍

Are you maybe interested to open a PR for this model we would be happy to help with !
Otherwise we could take it in the near future we have definitly on track to add more models and i totally agree that ParSeq has to be one of it :)

Do you have experience / tested the models latency on cpu ? Would be interesting to see

@felixdittrich92 felixdittrich92 added this to the 0.6.0 milestone Jul 30, 2022
@felixdittrich92 felixdittrich92 added module: models Related to doctr.models framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend topic: text recognition Related to the task of text recognition labels Jul 30, 2022
@frgfm frgfm added type: new feature New feature and removed type: enhancement Improvement labels Aug 1, 2022
@frgfm
Copy link
Collaborator

frgfm commented Aug 1, 2022

I agree that this could be a good candidate for new text recognition models in 0.6.0 :)
Perhaps we should open a tracker issue for all new models request that we would stage for 0.6.0, or directly link them in the release tracker?

@felixdittrich92
Copy link
Contributor

@frgfm I would say a seperate issue where we can track all requested model additions (splitted in detection / recognition / TF / PT with paper/repo link) and link this issue in the release tracker. wdyt ? Do you like to open it ? :)

@frgfm
Copy link
Collaborator

frgfm commented Aug 1, 2022

@felixdittrich92 Done :)

@felixdittrich92
Copy link
Contributor

@nikokks @frgfm Do you agree if we close this ticket ? Should be fine if we track it in mentioned issue :)

@frgfm
Copy link
Collaborator

frgfm commented Aug 1, 2022

I think we should keep this:

  • the PR will close this issue, and the release tracker will get the checkbox ticked

So no need to close it, and that will notify @nikokks when this gets resolved, which I guess is of interest to him :)

@felixdittrich92
Copy link
Contributor

👍

@nikokks
Copy link
Contributor Author

nikokks commented Aug 1, 2022

I am inspecting the baudm code on Parseq (https://github.com/baudm/parseq).
I think the code might not be too complicated to integrate.

On my side I managed to connect your ocr_prediction and to integrate the reco_predictor of baudm successfully. The performances are not to good for french documents like yours actually => needs to be finetuned on your secret data 😉

Several questions will come from me on the choices of implementation and integration in doctr:

  • can we integrate libraries like timm
  • do I have to do it in TF too (torch only if possible)
  • if I can add additional utils.py in the future ParSeq folder if this is not your habits.
  • etc.

I will clarify my questions rather next weekend 😀

You can close this issue

@felixdittrich92
Copy link
Contributor

Hi @nikokks 👋 ,

lets keep it open for further conversation about ParSeq 👍

About your points:

  • timm is a great library but it is mostly pinned in librarys which use it to a fixed version specifier in fact of maintaince and it would be another extra dependency in doctr. So i would suggest to implement ViT from scratch inside the classification models to use it as feature extractor (@frgfm we should move transformers components from models/recognition to models that we can reuse the modules wdyt?)
  • it would be great to implement it in both frameworks but it's also fine if you only want to focus on PT ... anyone else could do the translation 👍
  • if it is stuff which is only ParSeq related and does not depends on PT or TF i would say base.py is the right place (but lets take a look if we see it in a PR 😅

We are definitly happy to help with. I would say if you are ready open a PR (starting with the classification ViT integration) and we iterate on this wdyt ?

@nikokks
Copy link
Contributor Author

nikokks commented Aug 6, 2022

Hi,

ok for timm.

Other question: can we integrate 'pytorch-lightning~=1.6.5' to the requirements-pt.txt ?

@felixdittrich92
Copy link
Contributor

Hi @nikokks 👋 ,
Why do you think we need this ? :)
We should implement all models in plain pytorch / tensorflow without any wrappers

@nikokks
Copy link
Contributor Author

nikokks commented Aug 6, 2022

ok, it sounds good for me :)

I have added parseq class on my fork.
Now I need to match all args of each methods between your wrapper and the model class parseq =) : (the most difficult part)
I'll do it in the next days or next weekend.

@felixdittrich92
Copy link
Contributor

@nikokks I would suggest the following steps (every should be one PR)

  • move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)
  • implement ViT as classification model
  • implement the ParSeq Decoder side which use ViT as feature extractor

@frgfm
Copy link
Collaborator

frgfm commented Aug 8, 2022

  • move models/recognition/transformer to models/modules/transformer (to reuse the implementations we need this for much more models so it should be more global @frgfm wdyt?)

I agree 👍

  • implement ViT as classification model

Yup, but giving credits to the rightful contributors / source of inspiration when relevant!

@felixdittrich92
Copy link
Contributor

@nikokks Now you can reuse the already implemented transformer parts for ViT 👍

@felixdittrich92
Copy link
Contributor

Hi @nikokks short update i have not forget it will (hopefully) start with ViTSTR next week then it should be easy to implement the decoder from parseq also 👍

@felixdittrich92
Copy link
Contributor

Hi @nikokks,
are you still interested to add ParSeq ? :)
After #1063 ViT has finally found it's way in doctr as backbone.
And #1055 will be updated soon (which could be used as template also for ParSeq)

@felixdittrich92 felixdittrich92 modified the milestones: 0.6.0, 0.7.0 Sep 26, 2022
@nikokks
Copy link
Contributor Author

nikokks commented May 29, 2023

Hello, I am currently implementing ParSeq.
It is now working with quite good predictions :)
Do you have any good advice to do a good pull request ?
best

@felixdittrich92
Copy link
Contributor

felixdittrich92 commented May 29, 2023

Hey @nikokks 👋 ,

You can take a look into https://github.com/mindee/doctr/pull/1055/files
which is i think a good template (implementation, tests, docs) for implementing parseq in doctr :)
Otherweise open a PR and we will guide you step by step 👍

PS: If you have only the PT implementation that's fine we can port it later to TF :)

@felixdittrich92
Copy link
Contributor

Finished after #1227 and #1228 are merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework: pytorch Related to PyTorch backend framework: tensorflow Related to TensorFlow backend module: models Related to doctr.models topic: text recognition Related to the task of text recognition type: new feature New feature
Projects
None yet
Development

No branches or pull requests

3 participants