Skip to content

v0.4.0: Full support of PyTorch and a growing pretrained model zoo

Compare
Choose a tag to compare
@fg-mindee fg-mindee released this 01 Oct 18:58
· 492 commits to main since this release
51663dd

This release brings the support of PyTorch out of beta, makes text recognition more robust, and provides light architectures for complex tasks.

Note: doctr 0.4.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

Highlights

No more width limitation for text recognition

Some documents such as French ID card include very long strings that can be challenging to transcribe:

fr_id_card_sample (copy)

This release enables a smart split/merge strategy for wide crops to avoid performance drops. Previously the whole crop was analyzed altogether, while right now, it is split into reasonably sized crops, the inference is performed in batch then predictions are merged together.

The following snippet:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

doc = DocumentFile.from_images('path/to/img.png')
predictor = ocr_predictor(pretrained=True)
print(predictor(doc).pages[0])

used to yield:

Page(
  dimensions=(447, 640)
  (blocks): [Block(
    (lines): [Line(
      (words): [
        Word(value='1XXXXXX', confidence=0.0023),
        Word(value='1XXXX', confidence=0.0018),
      ]
    )]
    (artefacts): []
  )]
)

and now yields:

Page(
  dimensions=(447, 640)
  (blocks): [Block(
    (lines): [Line(
      (words): [
        Word(value='IDFRABERTHIER<<<<<<<<<<<<<<<<<<<<<<', confidence=0.49),
        Word(value='8806923102858CORINNE<<<<<<<6512068F6', confidence=0.22),
      ]
    )]
    (artefacts): []
  )]
)

Framework specific predictors

PyTorch support is now no longer in beta, so we made some efforts so that switching from one deep learning backend to another is unified 🙌 Predictors are designed to be the recommended interface for inference with your models!

0.3.1 (TensorFlow) 0.3.1 (PyTorch) 0.4.0
>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor(pretrained=True)
>>> out = predictor(doc, training=False)
>>> from doctr.models import detection_predictor
>>> import torch
>>> predictor = detection_predictor(pretrained=True)
>>> predictor.model.eval()
>>> with torch.no_grad(): out = predictor(doc)
>>> from doctr.models import detection_predictor
>>> predictor = detection_predictor(pretrained=True)
>>> out = predictor(doc)

An evergrowing model zoo 🦓

As PyTorch goes out of beta, we have bridged the gap between PyTorch & TensorFlow pretrained models' availability. Additionally, by leveraging our integration of light backbones, this release comes with lighter architectures for text detection and text recognition:

  • db_mobilenet_v3_large
  • crnn_mobilenet_v3_small
  • crnn_mobilenet_v3_large

The full list of supported architectures is available 👉 here

Demo live on HuggingFace Spaces

If you have enjoyed the Streamlit demo, but prefer not to run in on your own hardware, feel free to check out the online version on HuggingFace Spaces:
Hugging Face Spaces

Courtesy of @osanseviero for deploying it, and HuggingFace for hosting & serving 🙏

Breaking changes

Deprecated crnn_resnet31 & sar_vgg16_bn

After going over some backbone compatibility and re-assessing whether all combinations should be trained, DocTR is focusing on reproducing the paper's authors' will or improve upon it. As such, we have deprecated the following recognition models (that had no pretrained params): crnn_resnet31, sar_vgg16_bn.

Deprecated models.export

Since doctr.models.export was specific to TensorFlow and it didn't bring much more value than TensorFlow tutorials, we added instructions in the documentation and deprecated the submodule.

New features

Datasets

Resources to access data in efficient ways

IO

Features to manipulate input & outputs

Models

Deep learning model building and inference

Utils

Utility features relevant to the library use cases.

Transforms

Data transformations operations

Test

Verifications of the package well-being before release

Documentation

Online resources for potential users

References

Reference training scripts

  • Added option to select vocab in the training of character classification and text recognition #502 (@fg-mindee)

Others

Other tools and implementations

Bug fixes

Datasets

Models

Transforms

Utils

  • Fixed page synthesis for characters outside of latin-1 #496 (@fg-mindee)

Documentation

References

Others

Improvements

Datasets

Models

  • Deprecated doctr.models.export #463 (@fg-mindee)
  • Deprecated crnn_resnet31 & sar_vgg16_bn recognition models #468 (@fg-mindee)
  • Relocated DocumentBuilder to doctr.models.builder, split predictor into framework-specific objects #481 (@fg-mindee)
  • Added more robust argument checks in DocumentBuilder & refactored crop preparation and result processing in ocr predictors #497 (@fg-mindee)
  • Reflected changes of detection target formats on detection models #491 (@fg-mindee)

Utils

Documentation

Tests

References

  • Reflected changes of detection dataset target format #491 (@fg-mindee)

Others

Many thanks to our contributors, we are delighted to see that there are more every week!