From 603b5532f76fc439d9731ad2d498db47d8f1b7e3 Mon Sep 17 00:00:00 2001 From: stu1130 Date: Wed, 19 Sep 2018 15:47:53 -0700 Subject: [PATCH 1/3] update the example data format and link to each others --- docs/faq/recordio.md | 29 +++++++++++++++++------------ docs/tutorials/basic/data.md | 2 ++ 2 files changed, 19 insertions(+), 12 deletions(-) diff --git a/docs/faq/recordio.md b/docs/faq/recordio.md index 10ab6c71d209..bcde7fc0fe49 100644 --- a/docs/faq/recordio.md +++ b/docs/faq/recordio.md @@ -6,35 +6,40 @@ RecordIO implements a file format for a sequence of records. We recommend storin * Packing data together allows continuous reading on the disk. * RecordIO has a simple way to partition, simplifying distributed setting. We provide an example later. -We provide the [im2rec tool](https://github.com/dmlc/mxnet/blob/master/tools/im2rec.cc) so you can create an Image RecordIO dataset by yourself. The following walkthrough shows you how. +We provide the [im2rec tool](https://github.com/dmlc/mxnet/blob/master/tools/im2rec.cc) so you can create an Image RecordIO dataset by yourself. The following walkthrough shows you how. Note that there is python version of [im2rec tool](https://github.com/apache/incubator-mxnet/blob/master/tools/im2rec.py) and [example](https://mxnet.incubator.apache.org/tutorials/basic/data.html) using real-world data. ### Prerequisites + Download the data. You don't need to resize the images manually. You can use ```im2rec``` to resize them automatically. For details, see the "Extension: Using Multiple Labels for a Single Image," later in this topic. ### Step 1. Make an Image List File + +* Note that the im2rec.py provide a param `--list` to generate the list for you but im2rec.cc don't support it. + After you download the data, you need to make an image list file. The format is: ``` integer_image_index \t label_index \t path_to_image ``` Typically, the program takes the list of names of all of the images, shuffles them, then separates them into two lists: a training filename list and a testing filename list. Write the list in the right format. - +You can This is an example file: ```bash -95099 464 n04467665_17283.JPEG -10025081 412 ILSVRC2010_val_00025082.JPEG -74181 789 n01915811_2739.JPEG -10035553 859 ILSVRC2010_val_00035554.JPEG -10048727 929 ILSVRC2010_val_00048728.JPEG -94028 924 n01980166_4956.JPEG -1080682 650 n11807979_571.JPEG -972457 633 n07723039_1627.JPEG -7534 11 n01630670_4486.JPEG -1191261 249 n12407079_5106.JPEG +95099 464.000000 n04467665_17283.JPEG +10025081 412.000000 ILSVRC2010_val_00025082.JPEG +74181 789.000000 n01915811_2739.JPEG +10035553 859.000000 ILSVRC2010_val_00035554.JPEG +10048727 929.000000 ILSVRC2010_val_00048728.JPEG +94028 924.000000 n01980166_4956.JPEG +1080682 650.000000 n11807979_571.JPEG +972457 633.000000 n07723039_1627.JPEG +7534 11.000000 n01630670_4486.JPEG +1191261 249.000000 n12407079_5106.JPEG ``` ### Step 2. Create the Binary File + To generate a binary image, use `im2rec` in the tool folder. `im2rec` takes the path of the `_image list file_` you generated, the `_root path_` of the images, and the `_output file path_` as input. This process usually takes several hours, so be patient. Sample command: diff --git a/docs/tutorials/basic/data.md b/docs/tutorials/basic/data.md index 0a5dd59c1ce1..b5d0884f7490 100644 --- a/docs/tutorials/basic/data.md +++ b/docs/tutorials/basic/data.md @@ -315,6 +315,8 @@ print(mx.recordio.unpack_img(s)) You can also convert raw images into *RecordIO* format using the ``im2rec.py`` utility script that is provided in the MXNet [src/tools](https://github.com/dmlc/mxnet/tree/master/tools) folder. An example of how to use the script for converting to *RecordIO* format is shown in the `Image IO` section below. +* Note that there is a C++ version of [im2rec](https://github.com/dmlc/mxnet/blob/master/tools/im2rec.cc), please refer to [here](https://mxnet.incubator.apache.org/faq/recordio.html) for more information. + ## Image IO In this section, we will learn how to preprocess and load image data in MXNet. From faf499447522554e6f59a0992f5592c0acb63367 Mon Sep 17 00:00:00 2001 From: stu1130 Date: Wed, 19 Sep 2018 16:04:29 -0700 Subject: [PATCH 2/3] fix wording --- docs/faq/recordio.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/faq/recordio.md b/docs/faq/recordio.md index bcde7fc0fe49..ae6ca14e61da 100644 --- a/docs/faq/recordio.md +++ b/docs/faq/recordio.md @@ -14,7 +14,7 @@ Download the data. You don't need to resize the images manually. You can use ``` ### Step 1. Make an Image List File -* Note that the im2rec.py provide a param `--list` to generate the list for you but im2rec.cc don't support it. +* Note that the im2rec.py provide a param `--list` to generate the list for you but im2rec.cc doesn't support it. After you download the data, you need to make an image list file. The format is: From 540917d741dffcb0e107af114295a01bead73a02 Mon Sep 17 00:00:00 2001 From: stu1130 Date: Wed, 19 Sep 2018 16:08:39 -0700 Subject: [PATCH 3/3] delete the typo --- docs/faq/recordio.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/faq/recordio.md b/docs/faq/recordio.md index ae6ca14e61da..f61571882bd7 100644 --- a/docs/faq/recordio.md +++ b/docs/faq/recordio.md @@ -22,7 +22,6 @@ After you download the data, you need to make an image list file. The format is integer_image_index \t label_index \t path_to_image ``` Typically, the program takes the list of names of all of the images, shuffles them, then separates them into two lists: a training filename list and a testing filename list. Write the list in the right format. -You can This is an example file: ```bash