Implementation of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.
The model has been trained on double digit MNIST dataset obtained from here. To train the model from scratch download the dataset (see below) and check this notebook.
# Download, unzip and move
cd show-attend-and-tell
gdown https://drive.google.com/uc?id=1NMLh34zDjrI-bOIK6jgLJAqRrUY3uETC
unzip double_mnist.zip
mv labels.csv data/labels.csv