Skip to content

Commit

Permalink
[feat] Add dataset index and scripts to support mim download dataset
Browse files Browse the repository at this point in the history
update setup.py

update manifest

add shebang for script

rename variable

add document about mim download dataset

add more dataset

add doc for kinetics
  • Loading branch information
cir7 committed Jun 16, 2023
1 parent 1dc3a9a commit 32d2e61
Show file tree
Hide file tree
Showing 14 changed files with 209 additions and 41 deletions.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
include mmaction/.mim/model-index.yml
include mmaction/.mim/dataset-index.yml
recursive-include mmaction/.mim/configs *.py *.yml
recursive-include mmaction/.mim/tools *.sh *.py
39 changes: 39 additions & 0 deletions dataset-index.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
kinetics400:
dataset: Kinetics-400
download_root: data
data_root: data/kinetics400
script: tools/data/kinetics/k400_preprocess.sh

kinetics600:
dataset: Kinetics600
download_root: data
data_root: data/kinetics600
script: tools/data/kinetics/k600_preprocess.sh

kinetics700:
dataset: Kinetics_700
download_root: data
data_root: data/kinetics700
script: tools/data/kinetics/k700_preprocess.sh

sthv2:
dataset: sthv2
download_root: data
data_root: data/sthv2
script: tools/data/sthv2/preprocess.sh

ucf-101:
dataset: UCF101
download_root: data
data_root: data/gym

finegym:
dataset: FineGym
download_root: data
data_root: data/ucf101

diving48:
dataset: diving48
download_root: data
data_root: data/diving48
script: tools/data/diving48/preprocess.sh
9 changes: 7 additions & 2 deletions docs/en/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,13 @@ def get_version():
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc', 'sphinx.ext.napoleon', 'sphinx.ext.viewcode',
'sphinx_markdown_tables', 'sphinx_copybutton', 'myst_parser'
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinx.ext.viewcode',
'sphinx_markdown_tables',
'sphinx_copybutton',
'myst_parser',
'sphinx_tabs.tabs',
]

# numpy and torch are required
Expand Down
1 change: 1 addition & 0 deletions requirements/docs.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ opencv-python
-e git+https://github.com/open-mmlab/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme
scipy
sphinx==4.0.2
sphinx-tabs
sphinx_copybutton
sphinx_markdown_tables
sphinx_rtd_theme==0.5.2
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ def add_mim_extension():
else:
return

filenames = ['tools', 'configs', 'model-index.yml']
filenames = ['tools', 'configs', 'model-index.yml', 'dataset-index.yml']
repo_path = osp.dirname(__file__)
mim_path = osp.join(repo_path, 'mmaction', '.mim')
os.makedirs(mim_path, exist_ok=True)
Expand Down
28 changes: 24 additions & 4 deletions tools/data/diving48/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,28 @@
```

For basic dataset information, you can refer to the official dataset [website](http://www.svcl.ucsd.edu/projects/resound/dataset.html).
Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/diving48/`.

`````{tabs}
````{group-tab} Download by MIM
MIM supports downloading from OpenDataLab and preprocessing Diving48 dataset with one command line.
```Bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab
odl login
# download and preprocess by MIM
mim download mmaction2 --dataset diving48
```
````
````{group-tab} Download form Official Source
## Step 1. Prepare Annotations
You can run the following script to download annotations (considering the correctness of annotation files, we only download V2 version here).
Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/diving48/`.
```shell
bash download_annotations.sh
Expand Down Expand Up @@ -81,7 +98,10 @@ bash generate_videos_filelist.sh
bash generate_rawframes_filelist.sh
```
## Step 5. Check Directory Structure
````
`````

### Check Directory Structure

After the whole data process for Diving48 preparation,
you will get the rawframes (RGB + Flow), videos and annotation files for Diving48.
Expand All @@ -97,15 +117,15 @@ mmaction2
│ ├── diving48
│ │ ├── diving48_{train,val}_list_rawframes.txt
│ │ ├── diving48_{train,val}_list_videos.txt
│ │ ├── annotations
│ │ ├── annotations (optinonal)
│ | | ├── Diving48_V2_train.json
│ | | ├── Diving48_V2_test.json
│ | | ├── Diving48_vocab.json
│ | ├── videos
│ | | ├── _8Vy3dlHg2w_00000.mp4
│ | | ├── _8Vy3dlHg2w_00001.mp4
│ | | ├── ...
│ | ├── rawframes
│ | ├── rawframes (optional)
│ | | ├── 2x00lRzlTVQ_00000
│ | | | ├── img_00001.jpg
│ | | | ├── img_00002.jpg
Expand Down
28 changes: 23 additions & 5 deletions tools/data/diving48/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,26 @@
```

用户可参考该数据集的 [官网](http://www.svcl.ucsd.edu/projects/resound/dataset.html),以获取数据集相关的基本信息。
在数据集准备前,请确保命令行当前路径为 `$MMACTION2/tools/data/diving48/`

`````{tabs}
````{group-tab} 使用 MIM 下载
# MIM 支持下载 Diving48 数据集。用户可以通过一行命令,从 OpenDataLab 进行下载,并进行预处理。
```Bash
# 安装 OpenDataLab CLI 工具
pip install -U opendatalab
# 登录 OpenDataLab
odl login
# 通过 MIM 进行数据集下载,预处理。注意这将花费较长时间
mim download mmaction2 --dataset diving48
```
````
````{group-tab} 从官方源下载
## 步骤 1. 下载标注文件
用户可以使用以下命令下载标注文件(考虑到标注的准确性,这里仅下载 V2 版本)。
用户可以使用以下命令下载标注文件(考虑到标注的准确性,这里仅下载 V2 版本)。在数据集准备前,请确保命令行当前路径为 `$MMACTION2/tools/data/diving48/`。
```shell
bash download_annotations.sh
Expand Down Expand Up @@ -81,7 +96,10 @@ bash generate_videos_filelist.sh
bash generate_rawframes_filelist.sh
```
## 步骤 5. 检查文件夹结构
````
`````

### 检查文件夹结构

在完成所有 Diving48 数据集准备流程后,
用户可以获得对应的 RGB + 光流文件,视频文件以及标注文件。
Expand All @@ -97,15 +115,15 @@ mmaction2
│ ├── diving48
│ │ ├── diving48_{train,val}_list_rawframes.txt
│ │ ├── diving48_{train,val}_list_videos.txt
│ │ ├── annotations
│ │ ├── annotations(可选)
│ | | ├── Diving48_V2_train.json
│ | | ├── Diving48_V2_test.json
│ | | ├── Diving48_vocab.json
│ | ├── videos
│ | | ├── _8Vy3dlHg2w_00000.mp4
│ | | ├── _8Vy3dlHg2w_00001.mp4
│ | | ├── ...
│ | ├── rawframes
│ | ├── rawframes(可选)
│ | | ├── 2x00lRzlTVQ_00000
│ | | | ├── img_00001.jpg
│ | | | ├── img_00002.jpg
Expand Down
8 changes: 8 additions & 0 deletions tools/data/diving48/preprocess.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/usr/bin/env bash

DOWNLOAD_DIR=$1
DATA_ROOT=$2

cat $DOWNLOAD_DIR/diving48/raw/*.tar.gz.* | tar -xvz -C $(dirname $DATA_ROOT)
tar -xvf $DATA_ROOT/diving48.tar -C $(dirname $DATA_ROOT)
rm $DATA_ROOT/diving48.tar
49 changes: 35 additions & 14 deletions tools/data/kinetics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@
}
```

For basic dataset information, please refer to the official [website](https://deepmind.com/research/open-source/open-source-datasets/kinetics/). The scripts can be used for preparing kinetics400, kinetics600, kinetics700. To prepare different version of kinetics, you need to replace `${DATASET}` in the following examples with the specific dataset name. The choices of dataset names are `kinetics400`, `kinetics600` and `kinetics700`.
Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/${DATASET}/`.
For basic dataset information, please refer to the official [website](https://deepmind.com/research/open-source/open-source-datasets/kinetics/).

:::{note}
Because of the expirations of some YouTube links, the sizes of kinetics dataset copies may be different. Here are the sizes of our kinetics dataset copies that used to train all checkpoints.
Expand All @@ -29,8 +28,36 @@ Because of the expirations of some YouTube links, the sizes of kinetics dataset

:::

`````{tabs}
````{group-tab} Download by MIM
:::{note}
All experiments on Kinetics in MMAction2 are based on this version, we recommend users to try this version.
:::
MIM supports downloading from OpenDataLab and preprocessing Kinetics-400/600/700 dataset with one command line.
```Bash
# install OpenDataLab CLI tools
pip install -U opendatalab
# log in OpenDataLab
odl login
# download and preprocess Kinetics-400 by MIM. Note that this might take a long time.
mim download mmaction2 --dataset kinetics400
# download and preprocess Kinetics-600 by MIM. Note that this might take a long time.
mim download mmaction2 --dataset kinetics600
# download and preprocess Kinetics-700 by MIM. Note that this might take a long time.
mim download mmaction2 --dataset kinetics700
```
````
````{group-tab} Download form Official Source
## Step 1. Prepare Annotations
The scripts can be used for preparing kinetics400, kinetics600, kinetics700. To prepare different version of kinetics, you need to replace `${DATASET}` in the following examples with the specific dataset name. The choices of dataset names are `kinetics400`, `kinetics600` and `kinetics700`.
Before we start, please make sure that the directory is located at `$MMACTION2/tools/data/${DATASET}/`.
First of all, you can run the following script to prepare annotations by downloading from the official [website](https://deepmind.com/research/open-source/open-source-datasets/kinetics/).
```shell
Expand All @@ -48,15 +75,6 @@ bash download_backup_annotations.sh ${DATASET}
## Step 2. Prepare Videos
### Option 1: Download from OpenDataLab

**Recommend**: [OpenDataLab](https://opendatalab.com/) provides the Kinetics dataset ([Kinetics400](https://opendatalab.com/Kinetics-400), [Kinetics600](https://opendatalab.com/Kinetics600), [Kinetics700](https://opendatalab.com/Kinetics_700)), users can download Kinetics dataset with short edge 320 pixels from here.

:::{note}
All experiments on Kinetics in MMAction2 are based on this version, we recommend users to try this version.

### Option 2: Download from Other Source

you can run the following script to prepare videos.
The codes are adapted from the [official crawler](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics). Note that this might take a long time.
Expand Down Expand Up @@ -126,7 +144,10 @@ bash generate_videos_filelist.sh ${DATASET}
bash generate_rawframes_filelist.sh ${DATASET}
```
## Step 5. Folder Structure
````
`````

### Folder Structure

After the whole data pipeline for Kinetics preparation.
you can get the rawframes (RGB + Flow), videos and annotation files for Kinetics.
Expand All @@ -153,8 +174,8 @@ mmaction2
│ │ │ ├── wrapping_present
│ │ │ ├── ...
│ │ │ ├── zumba
│ │ ├── rawframes_train
│ │ ├── rawframes_val
│ │ ├── rawframes_train (optional)
│ │ ├── rawframes_val (optional)
```

Expand Down
50 changes: 35 additions & 15 deletions tools/data/kinetics/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@
}
```

请参照 [官方网站](https://deepmind.com/research/open-source/open-source-datasets/kinetics/) 以获取数据集基本信息。此脚本用于准备数据集 kinetics400,kinetics600,kinetics700。为准备 kinetics 数据集的不同版本,用户需将脚本中的 `${DATASET}` 赋值为数据集对应版本名称,可选项为 `kinetics400``kinetics600``kinetics700`
在开始之前,用户需确保当前目录为 `$MMACTION2/tools/data/${DATASET}/`
请参照 [官方网站](https://deepmind.com/research/open-source/open-source-datasets/kinetics/) 以获取数据集基本信息。

:::{note}
由于部分 YouTube 链接失效,爬取的 Kinetics 数据集大小可能与原版不同。以下是我们所使用 Kinetics 数据集的大小:
Expand All @@ -26,9 +25,36 @@
| Kinetics400 | 240436 | 19796 |
| Kinetics600 | 383393 | 27910 |
| Kinetics700 | 542357 | 34824 |
| ::: | | |

`````{tabs}
````{group-tab} 使用 MIM 下载
:::{note}
MMAction2 代码仓库中提供的 Kinetics 实验性能,都是基于这个版本的数据得到的。我们建议用户使用这个版本的 Kinetics 数据集进行实验。
:::
# MIM 支持下载 Kinetics-400/600/700 数据集。用户可以通过一行命令,从 OpenDataLab 进行下载,并进行预处理。
```Bash
# 安装 OpenDataLab CLI 工具
pip install -U opendatalab
# 登录 OpenDataLab
odl login
# 通过 MIM 进行 Kinetics-400 数据集下载,预处理。注意这将花费较长时间
mim download mmaction2 --dataset kinetics400
# 通过 MIM 进行 Kinetics-600 数据集下载,预处理。注意这将花费较长时间
mim download mmaction2 --dataset kinetics600
# 通过 MIM 进行 Kinetics-700 数据集下载,预处理。注意这将花费较长时间
mim download mmaction2 --dataset kinetics700
```
````
````{group-tab} 从官方源下载
## 1. 准备标注文件
此脚本用于准备数据集 kinetics400,kinetics600,kinetics700。为准备 kinetics 数据集的不同版本,用户需将脚本中的 `${DATASET}` 赋值为数据集对应版本名称,可选项为 `kinetics400`,`kinetics600`, `kinetics700`。
在开始之前,用户需确保当前目录为 `$MMACTION2/tools/data/${DATASET}/`。
首先,用户可以使用如下脚本从 [Kinetics 数据集官网](https://deepmind.com/research/open-source/open-source-datasets/kinetics/)下载标注文件并进行预处理:
```shell
Expand All @@ -45,15 +71,6 @@ bash download_backup_annotations.sh ${DATASET}
## 2. 准备视频
### 选项 1: 从 OpenDataLab 下载

**推荐**[OpenDataLab](https://opendatalab.com/) 提供了 Kinetics 数据集 ([Kinetics400](https://opendatalab.com/Kinetics-400), [Kinetics600](https://opendatalab.com/Kinetics600), [Kinetics700](https://opendatalab.com/Kinetics_700)), 用户可以从这里下载短边长度为 320 的 Kinetics 数据集。

:::{note}
MMAction2 代码仓库中提供的 Kinetics 实验性能,都是基于这个版本的数据得到的。我们建议用户使用这个版本的 Kinetics 数据集进行实验。

### 选项 2:从其他数据源下载

用户可以使用以下脚本准备视频,视频准备代码修改自 [官方爬虫](https://github.com/activitynet/ActivityNet/tree/master/Crawler/Kinetics)。注意这一步骤将花费较长时间。
```shell
Expand Down Expand Up @@ -121,7 +138,10 @@ bash generate_videos_filelist.sh ${DATASET}
bash generate_rawframes_filelist.sh ${DATASET}
```
## 5. 目录结构
````
`````

### 目录结构

在完整完成 Kinetics 的数据处理后,将得到帧文件夹(RGB 帧和光流帧),视频以及标注文件。

Expand All @@ -136,7 +156,7 @@ mmaction2
│ ├── ${DATASET}
│ │ ├── ${DATASET}_train_list_videos.txt
│ │ ├── ${DATASET}_val_list_videos.txt
│ │ ├── annotations
│ │ ├── annotations(可选)
│ │ ├── videos_train
│ │ ├── videos_val
│ │ │ ├── abseiling
Expand All @@ -146,8 +166,8 @@ mmaction2
│ │ │ ├── wrapping_present
│ │ │ ├── ...
│ │ │ ├── zumba
│ │ ├── rawframes_train
│ │ ├── rawframes_val
│ │ ├── rawframes_train(可选)
│ │ ├── rawframes_val(可选)
```

Expand Down
9 changes: 9 additions & 0 deletions tools/data/kinetics/preprocess_k400.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env bash

set -x

DOWNLOAD_DIR=$1
DATA_ROOT=$2

cat $DOWNLOAD_DIR/Kinetics-400/raw/*.tar.gz* | tar -xvz -C $(dirname $DATA_ROOT)
mv $(dirname $DATA_ROOT)/Kinetics-400 $DATA_ROOT
Loading

0 comments on commit 32d2e61

Please sign in to comment.