Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 915 Bytes

README.md

File metadata and controls

17 lines (12 loc) · 915 Bytes

PyTorch Multi30k

Patch for PyTorch's Multi30k machine translation dataset (WMT16).

As of August 16, 2023, torchtext.datasets.Multi30k tries to retrieve data from http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt, which is still unreachable.

To fix this, add these lines before using the Multi30k dataset:

from torchtext.datasets import multi30k

multi30k.URL['train'] = 'https://raw.githubusercontent.com/tanjeffreyz/pytorch-multi30k/main/training.tar.gz'
multi30k.URL['valid'] = 'https://raw.githubusercontent.com/tanjeffreyz/pytorch-multi30k/main/validation.tar.gz'
multi30k.URL['test'] = 'https://raw.githubusercontent.com/tanjeffreyz/pytorch-multi30k/main/mmt16_task1_test.tar.gz'
multi30k.MD5['test'] = 'd914ec964e2c5f0534e5cdd3926cd2fe628d591dad9423c3ae953d93efdb27a6'

Disclaimer: All rights belong to the authors of the original dataset under the original license.