feat: Optimized data loading for PyTorch #362

fg-mindee · 2021-07-06T19:00:59Z

Following up on #190, this PR introduces the following modifications:

added automatic workers number selection for PyTorch trainings
switched image reading backend from torchvision to PIL + tensor conversion
reflected changes on training scripts
had to update PIL requirements due to Pillow 8.3 and NumPy python-pillow/Pillow#5571

Iterating with the dataloader on FUNSD gets a 25%+ speedup with the same number of workers (3X compared to the original default number of workers)

Closes #190

Any feedback is welcome!

codecov · 2021-07-06T19:26:17Z

Codecov Report

Merging #362 (cde22e5) into main (db1e034) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #362      +/-   ##
==========================================
+ Coverage   96.11%   96.15%   +0.03%     
==========================================
  Files          83       83              
  Lines        3426     3460      +34     
==========================================
+ Hits         3293     3327      +34     
  Misses        133      133

Flag	Coverage Δ
unittests	`96.15% <100.00%> (+0.03%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
doctr/datasets/datasets/pytorch.py	`100.00% <100.00%> (ø)`
doctr/models/_utils.py	`93.10% <0.00%> (-0.84%)`	⬇️
doctr/utils/geometry.py	`100.00% <0.00%> (ø)`
doctr/transforms/functional/pytorch.py	`100.00% <0.00%> (ø)`
doctr/transforms/functional/tensorflow.py	`100.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db1e034...cde22e5. Read the comment docs.

charlesmindee · 2021-07-07T07:37:02Z

references/detection/train_pytorch.py

    torch.backends.cudnn.benchmark = True

    st = time.time()
    val_set = DetectionDataset(
        img_folder=os.path.join(args.data_path, 'val'),
        label_folder=os.path.join(args.data_path, 'val_labels'),
-        sample_transforms=Compose([
-            Lambda(lambda x: x / 255),


Why do you remove the normalization here ?

The to_tensor in the dataset __getitem__ already does that :)

charlesmindee · 2021-07-07T07:37:10Z

references/detection/train_pytorch.py

@@ -137,7 +138,6 @@ def main(args):
        img_folder=os.path.join(args.data_path, 'train'),
        label_folder=os.path.join(args.data_path, 'train_labels'),
        sample_transforms=Compose([
-            Lambda(lambda x: x / 255),


cf. previous comment

fg-mindee added 3 commits July 6, 2021 20:47

feat: Improved image loading speed for PyTorch

0c7650c

feat: Added dynamic worker selection for PyTorch

79c8e46

refactor: Reflected changes from data loading

e32170b

fg-mindee added type: enhancement Improvement critical High priority module: datasets Related to doctr.datasets ext: references Related to references folder framework: pytorch Related to PyTorch backend labels Jul 6, 2021

fg-mindee added this to the 0.3.1 milestone Jul 6, 2021

fg-mindee self-assigned this Jul 6, 2021

fg-mindee mentioned this pull request Jul 6, 2021

[datasets] Optimize dataloaders for faster iterations #190

Closed

chore: Updated PIL requirements

cde22e5

fg-mindee requested a review from charlesmindee July 6, 2021 19:18

charlesmindee reviewed Jul 7, 2021

View reviewed changes

fg-mindee requested a review from charlesmindee July 7, 2021 07:38

charlesmindee approved these changes Jul 7, 2021

View reviewed changes

fg-mindee merged commit e429ec0 into main Jul 7, 2021

fg-mindee deleted the loader-optim branch July 7, 2021 07:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Optimized data loading for PyTorch #362

feat: Optimized data loading for PyTorch #362

fg-mindee commented Jul 6, 2021 •

edited

Loading

codecov bot commented Jul 6, 2021

charlesmindee Jul 7, 2021

fg-mindee Jul 7, 2021

charlesmindee Jul 7, 2021

fg-mindee Jul 7, 2021

feat: Optimized data loading for PyTorch #362

feat: Optimized data loading for PyTorch #362

Conversation

fg-mindee commented Jul 6, 2021 • edited Loading

codecov bot commented Jul 6, 2021

Codecov Report

charlesmindee Jul 7, 2021

Choose a reason for hiding this comment

fg-mindee Jul 7, 2021

Choose a reason for hiding this comment

charlesmindee Jul 7, 2021

Choose a reason for hiding this comment

fg-mindee Jul 7, 2021

Choose a reason for hiding this comment

fg-mindee commented Jul 6, 2021 •

edited

Loading