-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request For Adding ParSeq - text recognition model #1003
Comments
Hi @nikokks 👋 , I agree with the ParSeq addition firstly i have had in mind to add ViTSTR but yes i think we can switch directly to ParSeq instead of this👍 Are you maybe interested to open a PR for this model we would be happy to help with ! Do you have experience / tested the models latency on cpu ? Would be interesting to see |
I agree that this could be a good candidate for new text recognition models in 0.6.0 :) |
@frgfm I would say a seperate issue where we can track all requested model additions (splitted in detection / recognition / TF / PT with paper/repo link) and link this issue in the release tracker. wdyt ? Do you like to open it ? :) |
@felixdittrich92 Done :) |
I think we should keep this:
So no need to close it, and that will notify @nikokks when this gets resolved, which I guess is of interest to him :) |
👍 |
I am inspecting the baudm code on Parseq (https://github.com/baudm/parseq). On my side I managed to connect your ocr_prediction and to integrate the reco_predictor of baudm successfully. The performances are not to good for french documents like yours actually => needs to be finetuned on your secret data 😉 Several questions will come from me on the choices of implementation and integration in doctr:
I will clarify my questions rather next weekend 😀 You can close this issue |
Hi @nikokks 👋 , lets keep it open for further conversation about ParSeq 👍 About your points:
We are definitly happy to help with. I would say if you are ready open a PR (starting with the classification ViT integration) and we iterate on this wdyt ? |
Hi, ok for timm. Other question: can we integrate 'pytorch-lightning~=1.6.5' to the requirements-pt.txt ? |
Hi @nikokks 👋 , |
ok, it sounds good for me :) I have added parseq class on my fork. |
@nikokks I would suggest the following steps (every should be one PR)
|
I agree 👍
Yup, but giving credits to the rightful contributors / source of inspiration when relevant! |
@nikokks Now you can reuse the already implemented transformer parts for ViT 👍 |
Hi @nikokks short update i have not forget it will (hopefully) start with ViTSTR next week then it should be easy to implement the decoder from parseq also 👍 |
Hello, I am currently implementing ParSeq. |
Hey @nikokks 👋 , You can take a look into https://github.com/mindee/doctr/pull/1055/files PS: If you have only the PT implementation that's fine we can port it later to TF :) |
🚀 The feature
Hello,
I mainly use the text detection and text recognition models with your framework.
As I have seen: the most recent models that you propose in text recognition, namely MASTER and SAR, are not yet operational.
However at the text recognition level, there is a recent model that gets very impressive performances: PasrSeq.
Here are the references:
https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/master/README.md
https://github.com/open-mmlab/mmocr/blob/main/configs/textrecog/abinet/README.md
https://paperswithcode.com/paper/scene-text-recognition-with-permuted#code
https://github.com/baudm/parseq
Would it be possible to add this recognition text to the models you propose?
Thanks a lot for your work !
Motivation, pitch
I'm working with text recognition models and a recent model in state of the art outperforms on all test datasets.
I would like use this model with your framework pipelines
Alternatives
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: