-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add a basic example of text detection #999
Conversation
Codecov Report
@@ Coverage Diff @@
## main #999 +/- ##
==========================================
- Coverage 94.85% 94.83% -0.02%
==========================================
Files 134 134
Lines 5558 5558
==========================================
- Hits 5272 5271 -1
- Misses 286 287 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. |
Hi @ianardee 👋,
Sry for the lazy review i have tried to do it from my mobile phone 😅 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
@charlesmindee @frgfm I think there are some missing things (mentioned above) wdyt ? 🤔 |
@felixdittrich92 I'll make another PR to answer some of your concerns. |
@ianardee top 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the script!
I added some comments if you do a follow-up PR :)
@@ -0,0 +1,118 @@ | |||
# Copyright (C) 2021-2022, Mindee. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright (C) 2022, Mindee.
This script didn't exist in 2021 :)
for word in line["words"]: | ||
out_txt += word["value"] + " " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps
out_txt += " ".join(word["value"] for word in line["words"])
or wrapping more nested loops inside a list comprehensions 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out.render() ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah actually, I had forgotten about this haha
model = ocr_predictor(args.detection, args.recognition, pretrained=True) | ||
path = Path(args.path) | ||
if path.is_dir(): | ||
allowed = (".pdf", ".jpeg", ".jpg", ".png", ".tif", ".tiff", ".bmp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps we should move this at the top of the files with other constants?
fh.write(out_str) | ||
else: | ||
out_str = _process_file(model, path, args.format) | ||
print(out_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in one case, we dump the string into a file and in the other we print it?
Hi @ianardee 👋 , any updates about the refactor PR ? :) |
A script to extract text from files.