Skip to content

Commit

Permalink
docs: Add export example in README (#348)
Browse files Browse the repository at this point in the history
* feat: add export example

* fix: typing

* refacto: moved to the docs

* refacto: moved to the docs

* fix: readme link

* fix: typos
  • Loading branch information
charlesmindee committed Jul 6, 2021
1 parent b2ded17 commit 6a558df
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 2 deletions.
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,12 +63,15 @@ result.show(doc)

![DocTR example](https://github.com/mindee/doctr/releases/download/v0.1.1/doctr_example_script.gif)

or export them to JSON format (to get a better understanding of our document model, check our [documentation](https://mindee.github.io/doctr/documents.html#document-structure)):
The ocr_predictor returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).
To get a better understanding of our document model, check our [documentation](https://mindee.github.io/doctr/documents.html#document-structure):

You can also export them as a nested dict, more appropriate for JSON format:

```python
json_output = result.export()
```

For examples & further details about the export format, please refer to [this section](https://mindee.github.io/doctr/models.html#export-model-output) of the documentation

## Installation

Expand Down
1 change: 1 addition & 0 deletions docs/source/documents.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ doctr.documents
The documents module enables users to easily access content from documents and export analysis
results to structured formats.

.. _document_structure:

Document structure
------------------
Expand Down
68 changes: 68 additions & 0 deletions docs/source/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,74 @@ Those architectures involve one stage of text detection, and one stage of text r

.. autofunction:: doctr.models.zoo.ocr_predictor

Export model output
^^^^^^^^^^^^^^^^^^^^

The ocr_predictor returns a `Document` object with a nested structure (with `Page`, `Block`, `Line`, `Word`, `Artefact`).
To get a better understanding of our document model, check our :ref:`document_structure` section

Here is a typical `Document` layout::

Document(
(pages): [Page(
dimensions=(340, 600)
(blocks): [Block(
(lines): [Line(
(words): [
Word(value='No.', confidence=0.91),
Word(value='RECEIPT', confidence=0.99),
Word(value='DATE', confidence=0.96),
]
)]
(artefacts): []
)]
)]
)

You can also export them as a nested dict, more appropriate for JSON format::

json_output = result.export()

For reference, here is the JSON export for the same `Document` as above::

{
'pages': [
{
'page_idx': 0,
'dimensions': (340, 600),
'orientation': {'value': None, 'confidence': None},
'language': {'value': None, 'confidence': None},
'blocks': [
{
'geometry': ((0.1357421875, 0.0361328125), (0.8564453125, 0.8603515625)),
'lines': [
{
'geometry': ((0.1357421875, 0.0361328125), (0.8564453125, 0.8603515625)),
'words': [
{
'value': 'No.',
'confidence': 0.914085328578949,
'geometry': ((0.5478515625, 0.06640625), (0.5810546875, 0.0966796875))
},
{
'value': 'RECEIPT',
'confidence': 0.9949972033500671,
'geometry': ((0.1357421875, 0.0361328125), (0.51171875, 0.1630859375))
},
{
'value': 'DATE',
'confidence': 0.9578408598899841,
'geometry': ((0.1396484375, 0.3232421875), (0.185546875, 0.3515625))
}
]
}
],
'artefacts': []
}
]
}
]
}

Model export
------------
Expand Down

0 comments on commit 6a558df

Please sign in to comment.