Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs in using_doctr #993

Merged
merged 1 commit into from
Jul 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/source/modules/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,16 @@ doctr.models.detection

.. autofunction:: doctr.models.detection.linknet_resnet18

.. autofunction:: doctr.models.detection.linknet_resnet18_rotation

.. autofunction:: doctr.models.detection.linknet_resnet34

.. autofunction:: doctr.models.detection.linknet_resnet50

.. autofunction:: doctr.models.detection.db_resnet50

.. autofunction:: doctr.models.detection.differentiable_binarization.pytorch.db_resnet50_rotation
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to set here full path otherwise sphinx can't find this function due to this code:

if is_tf_available():
from .tensorflow import *
elif is_torch_available():
from .pytorch import * # type: ignore[misc]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should remove this for now then?

So far, we've been using tabs to have usage examples in PyTorch & TensorFlow, but we haven't yet planned to have seperate builds for pytorch and tensorflow (namely because we try to document high level features that have the same input & output signatures)

What do you think? :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frgfm If possible, the db_resnet50_rotation should be deployed on Tensorflow as well to stick with our high level features vision, but I don't know if it requires a lot a work. :-/

Maybe we can move this line in a dedicated section in which some models are only available in one framework (I'm not aware if it's the case for other models). Then we can reintegrate it when the model is available on both framework 🤔


.. autofunction:: doctr.models.detection.db_mobilenet_v3_large

.. autofunction:: doctr.models.detection.detection_predictor
Expand Down
38 changes: 18 additions & 20 deletions docs/source/using_doctr/using_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ Choosing the right model
The full Optical Character Recognition task can be seen as two consecutive tasks: text detection and text recognition.
Either performed at once or separately, to each task corresponds a type of deep learning architecture.

.. currentmodule:: doctr.models

For a given task, docTR provides a Predictor, which is composed of 2 components:

* PreProcessor: a module in charge of making inputs directly usable by the deep learning model.
Expand All @@ -24,14 +22,14 @@ Available architectures

The following architectures are currently supported:

* `linknet_resnet18 <models.html#doctr.models.detection.linknet_resnet18>`_
* `db_resnet50 <models.html#doctr.models.detection.db_resnet50>`_
* `db_mobilenet_v3_large <models.html#doctr.models.detection.db_mobilenet_v3_large>`_
* :py:meth:`linknet_resnet18 <doctr.models.detection.linknet_resnet18>`
* :py:meth:`db_resnet50 <doctr.models.detection.db_resnet50>`
* :py:meth:`db_mobilenet_v3_large <doctr.models.detection.db_mobilenet_v3_large>`
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved

We also provide 2 models working with any kind of rotated documents:

* `linknet_resnet18_rotation <models.html#doctr.models.detection.linknet_resnet18_rotation>`_
* `db_resnet50_rotation <models.html#doctr.models.detection.db_resnet50_rotation>`_
* :py:meth:`linknet_resnet18_rotation <doctr.models.detection.linknet_resnet18_rotation>`
* :py:meth:`db_resnet50_rotation <doctr.models.detection.differentiable_binarization.pytorch.db_resnet50_rotation>`
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved

For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets:

Expand All @@ -52,13 +50,13 @@ Explanations about the metrics being used are available in :ref:`metrics`.

*Disclaimer: both FUNSD subsets combined have 199 pages which might not be representative enough of the model capabilities*

FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>`_ AWS instance (CPU Xeon Platinum 8275L).
FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>` AWS instance (CPU Xeon Platinum 8275L).


Detection predictors
^^^^^^^^^^^^^^^^^^^^

`detection_predictor <models.html#doctr.models.detection.detection_predictor>`_ wraps your detection model to make it easily useable with your favorite deep learning framework seamlessly.
:py:meth:`detection_predictor <doctr.models.detection.detection_predictor>` wraps your detection model to make it easily useable with your favorite deep learning framework seamlessly.
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved

>>> import numpy as np
>>> from doctr.models import detection_predictor
Expand Down Expand Up @@ -91,11 +89,11 @@ Available architectures

The following architectures are currently supported:

* `crnn_vgg16_bn <models.html#doctr.models.recognition.crnn_vgg16_bn>`_
* `crnn_mobilenet_v3_small <models.html#doctr.models.recognition.crnn_mobilenet_v3_small>`_
* `crnn_mobilenet_v3_large <models.html#doctr.models.recognition.crnn_mobilenet_v3_large>`_
* `sar_resnet31 <models.html#doctr.models.recognition.sar_resnet31>`_
* `master <models.html#doctr.models.recognition.master>`_
* :py:meth:`crnn_vgg16_bn <doctr.models.recognition.crnn_vgg16_bn>`
* :py:meth:`crnn_mobilenet_v3_small <doctr.models.recognition.crnn_mobilenet_v3_small>`
* :py:meth:`crnn_mobilenet_v3_large <doctr.models.recognition.crnn_mobilenet_v3_large>`
* :py:meth:`sar_resnet31 <doctr.models.recognition.sar_resnet31>`
* :py:meth:`master <doctr.models.recognition.master>`
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved


For a comprehensive comparison, we have compiled a detailed benchmark on publicly available datasets:
Expand Down Expand Up @@ -153,12 +151,12 @@ While most of our recognition models were trained on our french vocab (cf. :ref:

*Disclaimer: both FUNSD subsets combine have 30595 word-level crops which might not be representative enough of the model capabilities*

FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>`_ AWS instance (CPU Xeon Platinum 8275L).
FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed tensors per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>` AWS instance (CPU Xeon Platinum 8275L).


Recognition predictors
^^^^^^^^^^^^^^^^^^^^^^
`recognition_predictor <models.html#doctr.models.recognition.recognition_predictor>`_ wraps your recognition model to make it easily useable with your favorite deep learning framework seamlessly.
:py:meth:`recognition_predictor <doctr.models.recognition.recognition_predictor>` wraps your recognition model to make it easily useable with your favorite deep learning framework seamlessly.
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved

>>> import numpy as np
>>> from doctr.models import recognition_predictor
Expand Down Expand Up @@ -208,7 +206,7 @@ Explanations about the metrics being used are available in :ref:`metrics`.

*Disclaimer: both FUNSD subsets combine have 199 pages which might not be representative enough of the model capabilities*

FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed frames per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>`_ AWS instance (CPU Xeon Platinum 8275L).
FPS (Frames per second) is computed after a warmup phase of 100 tensors (where the batch size is 1), by measuring the average number of processed frames per second over 1000 samples. Those results were obtained on a `c5.x12large <https://aws.amazon.com/ec2/instance-types/c5/>` AWS instance (CPU Xeon Platinum 8275L).

Since you may be looking for specific use cases, we also performed this benchmark on private datasets with various document types below. Unfortunately, we are not able to share those at the moment since they contain sensitive information.

Expand Down Expand Up @@ -238,7 +236,7 @@ Since you may be looking for specific use cases, we also performed this benchmar

Two-stage approaches
^^^^^^^^^^^^^^^^^^^^
Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block. Everything is wrapped up with `ocr_predictor <models.html#doctr.models.ocr_predictor>`_.
Those architectures involve one stage of text detection, and one stage of text recognition. The text detection will be used to produces cropped images that will be passed into the text recognition block. Everything is wrapped up with :py:meth:`ocr_predictor <doctr.models.ocr_predictor>`.
odulcy-mindee marked this conversation as resolved.
Show resolved Hide resolved

>>> import numpy as np
>>> from doctr.models import ocr_predictor
Expand Down Expand Up @@ -336,8 +334,8 @@ To export the outpout as XML (hocr-format) you can use the `export_as_xml` metho

xml_output = result.export_as_xml()
for output in xml_output:
xml_bytes_string = output[0]
xml_element = output[1]
xml_bytes_string = output[0]
xml_element = output[1]

For reference, here is a sample XML byte string output::

Expand Down