feat: add a basic example of text detection #999

ianardee · 2022-07-27T13:41:51Z

A script to extract text from files.

codecov · 2022-07-27T14:06:31Z

Codecov Report

Merging #999 (99bbc86) into main (23d1a1e) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##             main     #999      +/-   ##
==========================================
- Coverage   94.85%   94.83%   -0.02%     
==========================================
  Files         134      134              
  Lines        5558     5558              
==========================================
- Hits         5272     5271       -1     
- Misses        286      287       +1

Flag	Coverage Δ
unittests	`94.83% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
doctr/transforms/functional/base.py	`94.20% <0.00%> (-1.45%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us.

felixdittrich92 · 2022-07-27T15:26:47Z

Hi @ianardee 👋,
thanks for the PR :)
Some minor things:

could we add also the xml output ?
missing (minor/lightweight) CI test
maybe rename to extract_text otherwise as a user i would expect to get only the box coords wdyt ?
missing .cuda() if backend is pytorch and gpu available (silent move)
pass can be removed
for string export you should use the .render() method

Sry for the lazy review i have tried to do it from my mobile phone 😅

charlesmindee

LGTM, thanks

felixdittrich92 · 2022-07-27T16:18:16Z

@charlesmindee @frgfm I think there are some missing things (mentioned above) wdyt ? 🤔

ianardee · 2022-07-27T16:20:01Z

@felixdittrich92 I'll make another PR to answer some of your concerns.

felixdittrich92 · 2022-07-27T16:20:45Z

@ianardee top 👍

frgfm

Thanks a lot for the script!

I added some comments if you do a follow-up PR :)

frgfm · 2022-07-27T20:47:19Z

scripts/detect_text.py

@@ -0,0 +1,118 @@
+# Copyright (C) 2021-2022, Mindee.


# Copyright (C) 2022, Mindee.

This script didn't exist in 2021 :)

frgfm · 2022-07-27T20:49:15Z

scripts/detect_text.py

+                    for word in line["words"]:
+                        out_txt += word["value"] + " "


Perhaps

out_txt += " ".join(word["value"] for word in line["words"])

or wrapping more nested loops inside a list comprehensions 🤷‍♂️

out.render() ?

Oh yeah actually, I had forgotten about this haha

frgfm · 2022-07-27T20:50:00Z

scripts/detect_text.py

+    model = ocr_predictor(args.detection, args.recognition, pretrained=True)
+    path = Path(args.path)
+    if path.is_dir():
+        allowed = (".pdf", ".jpeg", ".jpg", ".png", ".tif", ".tiff", ".bmp")


perhaps we should move this at the top of the files with other constants?

frgfm · 2022-07-27T20:52:30Z

scripts/detect_text.py

+                fh.write(out_str)
+    else:
+        out_str = _process_file(model, path, args.format)
+        print(out_str)


in one case, we dump the string into a file and in the other we print it?

felixdittrich92 · 2022-08-31T17:42:26Z

Hi @ianardee 👋 , any updates about the refactor PR ? :)

feat: add a basic example of text detection

99bbc86

ianardee requested review from charlesmindee and SiddhantBahuguna July 27, 2022 13:41

felixdittrich92 added this to the 0.6.0 milestone Jul 27, 2022

felixdittrich92 added type: enhancement Improvement ext: scripts Related to scripts folder topic: text detection Related to the task of text detection labels Jul 27, 2022

charlesmindee approved these changes Jul 27, 2022

View reviewed changes

charlesmindee merged commit ada1bf2 into main Jul 27, 2022

charlesmindee deleted the add-text-detect-script branch July 27, 2022 16:09

frgfm reviewed Jul 27, 2022

View reviewed changes

felixdittrich92 mentioned this pull request Sep 26, 2022

Release tracker - v0.6.0 #791

Closed

85 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add a basic example of text detection #999

feat: add a basic example of text detection #999

ianardee commented Jul 27, 2022

codecov bot commented Jul 27, 2022

felixdittrich92 commented Jul 27, 2022 •

edited

Loading

charlesmindee left a comment

felixdittrich92 commented Jul 27, 2022 •

edited

Loading

ianardee commented Jul 27, 2022

felixdittrich92 commented Jul 27, 2022

frgfm left a comment

frgfm Jul 27, 2022

frgfm Jul 27, 2022

felixdittrich92 Jul 28, 2022

frgfm Jul 28, 2022

frgfm Jul 27, 2022

frgfm Jul 27, 2022

felixdittrich92 commented Aug 31, 2022

feat: add a basic example of text detection #999

feat: add a basic example of text detection #999

Conversation

ianardee commented Jul 27, 2022

codecov bot commented Jul 27, 2022

Codecov Report

felixdittrich92 commented Jul 27, 2022 • edited Loading

charlesmindee left a comment

Choose a reason for hiding this comment

felixdittrich92 commented Jul 27, 2022 • edited Loading

ianardee commented Jul 27, 2022

felixdittrich92 commented Jul 27, 2022

frgfm left a comment

Choose a reason for hiding this comment

frgfm Jul 27, 2022

Choose a reason for hiding this comment

frgfm Jul 27, 2022

Choose a reason for hiding this comment

felixdittrich92 Jul 28, 2022

Choose a reason for hiding this comment

frgfm Jul 28, 2022

Choose a reason for hiding this comment

frgfm Jul 27, 2022

Choose a reason for hiding this comment

frgfm Jul 27, 2022

Choose a reason for hiding this comment

felixdittrich92 commented Aug 31, 2022

felixdittrich92 commented Jul 27, 2022 •

edited

Loading

felixdittrich92 commented Jul 27, 2022 •

edited

Loading