[DRAFT] [models] add ViTSTR in TF and PT #1048

felixdittrich92 · 2022-09-06T14:10:54Z

This PR:

adds VisionTransformer as module in models/modules/vision_transformer (TF/PT)
add Vitstr head (TF / PT)

TODOS:

toy run / check that all works fine
check code quality

Any feedback is welcome :)

@frgfm wdyt ? I think it is a more flexible way instead of adding ViT as a fixed classification model (where we run into trouble with different sizes for patch_embeds) - can easily be extended needs only a head 😅

related to: #513 #1003

codecov · 2022-09-06T14:28:44Z

Codecov Report

Merging #1048 (006446b) into main (1cc073d) will increase coverage by 0.12%.
The diff coverage is 97.73%.

@@            Coverage Diff             @@
##             main    #1048      +/-   ##
==========================================
+ Coverage   94.94%   95.06%   +0.12%     
==========================================
  Files         135      141       +6     
  Lines        5634     5893     +259     
==========================================
+ Hits         5349     5602     +253     
- Misses        285      291       +6

Flag	Coverage Δ
unittests	`95.06% <97.73%> (+0.12%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
doctr/models/recognition/vitstr/pytorch.py	`94.59% <94.59%> (ø)`
doctr/models/recognition/vitstr/tensorflow.py	`97.43% <97.43%> (ø)`
doctr/models/modules/__init__.py	`100.00% <100.00%> (ø)`
doctr/models/modules/transformer/pytorch.py	`100.00% <100.00%> (ø)`
doctr/models/modules/transformer/tensorflow.py	`98.79% <100.00%> (+0.04%)`	⬆️
...octr/models/modules/vision_transformer/__init__.py	`100.00% <100.00%> (ø)`
doctr/models/modules/vision_transformer/pytorch.py	`100.00% <100.00%> (ø)`
...tr/models/modules/vision_transformer/tensorflow.py	`100.00% <100.00%> (ø)`
doctr/models/recognition/__init__.py	`100.00% <100.00%> (ø)`
doctr/models/recognition/vitstr/__init__.py	`100.00% <100.00%> (ø)`
... and 1 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

felixdittrich92 · 2022-09-07T06:58:13Z

NOTE:

ViT works fine (also tested with timm's ViT implementation)
slow tests (incl. onnx export) passes TF and PT

To debug:

loss does not decrease (needs debugging) tested also with timm's ViT implementation same stuck at ~2.9 loss
tested with org img / patch sizes from paper makes no difference
test data again toy runs with 500K MjSynth split

felixdittrich92 · 2022-09-07T10:36:55Z

@frgfm before i continue to debug this let me know your ideas you have in mind 🤗
Go this way to keep ViT as module or maybe start to implement as classification model and then think how we can make the patch_embedding flexible ? wdyt ?

frgfm

Thanks Felix! I added some comments :)

frgfm · 2022-09-08T09:15:28Z

doctr/models/modules/transformer/pytorch.py

@@ -57,10 +57,10 @@ def scaled_dot_product_attention(
 class PositionwiseFeedForward(nn.Sequential):
    """Position-wise Feed-Forward Network"""

-    def __init__(self, d_model: int, ffd: int, dropout: float = 0.1) -> None:
+    def __init__(self, d_model: int, ffd: int, dropout: float = 0.1, use_gelu: bool = False) -> None:


boolean value for activation selection is rather limited: are we positive that only relu & gelu can be used for such architecture types?

frgfm · 2022-09-08T09:17:40Z

doctr/models/modules/transformer/tensorflow.py

        super(PositionwiseFeedForward, self).__init__()
+        self.use_gelu = use_gelu


instantiate a self.activation_fn in the constructor to avoid conditional execution in the call 👍

frgfm · 2022-09-08T09:19:01Z

doctr/models/modules/vision_transformer/pytorch.py

+class PatchEmbedding(nn.Module):
+    """Compute 2D patch embedding"""
+
+    # Inpired by: https://github.com/rwightman/pytorch-image-models/blob/master/timm/models/layers/patch_embed.py


Just FYI: can you confirm that you made a lot of modifications?
inspired by is rather light
borrowed from is more significant

frgfm · 2022-09-08T09:19:43Z

doctr/models/modules/vision_transformer/pytorch.py

+    """VisionTransformer architecture as described in
+    `"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale",
+    <https://arxiv.org/pdf/2010.11929.pdf>`_."""


let's specify the constructor args

frgfm · 2022-09-08T09:20:24Z

doctr/models/modules/vision_transformer/tensorflow.py

+    """VisionTransformer architecture as described in
+    `"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale",
+    <https://arxiv.org/pdf/2010.11929.pdf>`_."""


felixdittrich92 · 2022-09-08T09:48:39Z

@frgfm haha i do currently focus to implement it as classification model 😅 we have to do a decision !

felixdittrich92 · 2022-09-08T10:25:32Z

outdated by: #1050

felixdittrich92 added this to the 0.6.0 milestone Sep 6, 2022

felixdittrich92 self-assigned this Sep 6, 2022

felixdittrich92 added 16 commits September 7, 2022 07:45

vit opening

69446ec

add patch embed and make FF gelu optional

21e1d37

update Modules start PT Vit

1a0a699

move to modules

2e6fc96

add ViT TF and PT

c3f3867

remove patch emb from tf transformers file

8db9236

update test todo

6437335

update test todos

446aa97

fix style

79e98c6

style

08621cc

add vitstr

d736495

minor fixes

a4b93d1

minor fixes

65da057

black fix

50a500b

fix minor

86b1302

fix vocab len and unused cls token pos

c63d594

fix onnx test vitstr tf

006446b

felixdittrich92 force-pushed the vit branch from 20e7f93 to 006446b Compare September 7, 2022 10:33

felixdittrich92 requested a review from frgfm September 7, 2022 10:34

frgfm reviewed Sep 8, 2022

View reviewed changes

felixdittrich92 closed this Sep 8, 2022

felixdittrich92 deleted the vit branch September 8, 2022 10:25

felixdittrich92 removed this from the 0.6.0 milestone Sep 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] [models] add ViTSTR in TF and PT #1048

[DRAFT] [models] add ViTSTR in TF and PT #1048

felixdittrich92 commented Sep 6, 2022 •

edited

Loading

codecov bot commented Sep 6, 2022 •

edited

Loading

felixdittrich92 commented Sep 7, 2022 •

edited

Loading

felixdittrich92 commented Sep 7, 2022

frgfm left a comment

frgfm Sep 8, 2022

frgfm Sep 8, 2022

frgfm Sep 8, 2022

frgfm Sep 8, 2022

frgfm Sep 8, 2022

felixdittrich92 commented Sep 8, 2022

felixdittrich92 commented Sep 8, 2022

		super(PositionwiseFeedForward, self).__init__()
		self.use_gelu = use_gelu

[DRAFT] [models] add ViTSTR in TF and PT #1048

[DRAFT] [models] add ViTSTR in TF and PT #1048

Conversation

felixdittrich92 commented Sep 6, 2022 • edited Loading

codecov bot commented Sep 6, 2022 • edited Loading

Codecov Report

felixdittrich92 commented Sep 7, 2022 • edited Loading

felixdittrich92 commented Sep 7, 2022

frgfm left a comment

Choose a reason for hiding this comment

frgfm Sep 8, 2022

Choose a reason for hiding this comment

frgfm Sep 8, 2022

Choose a reason for hiding this comment

frgfm Sep 8, 2022

Choose a reason for hiding this comment

frgfm Sep 8, 2022

Choose a reason for hiding this comment

frgfm Sep 8, 2022

Choose a reason for hiding this comment

felixdittrich92 commented Sep 8, 2022

felixdittrich92 commented Sep 8, 2022

felixdittrich92 commented Sep 6, 2022 •

edited

Loading

codecov bot commented Sep 6, 2022 •

edited

Loading

felixdittrich92 commented Sep 7, 2022 •

edited

Loading