Gh 798 add csv reader #826

alanakbik · 2019-06-21T10:20:42Z

This PR adds another way for loading classification datasets, namely datasets that are in CSV format. You need to pass a column format (like in ColumnCorpus), which indicates which column(s) in the CSV holds the text and which field(s) the label(s).

corpus = CSVClassificationCorpus(
    # path to the data folder containing train / test / dev files
    data_folder='path/to/data',
    # indicates which columns are text and labels
    column_name_map={4: "text", 1: "label_topic", 2: "label_subtopic"},
    # if CSV has a header, you can skip it
    skip_header=True)

closes #798

…air into GH-798-add-csv-reader

alanakbik · 2019-06-21T10:57:59Z

👍

yosipk · 2019-06-21T11:49:02Z

👍

aakbik added 7 commits June 13, 2019 19:36

GH-798: add CSV classification dataset support

7c76f40

GH-798: add CSV classification dataset support

f804c15

Merge branch 'GH-798-add-csv-reader' of github.com:zalandoresearch/fl…

e167a00

…air into GH-798-add-csv-reader

focus

b040903

GH-798: CSV text classification dataset

53fc7ba

remove focal loss from this PR

750ad75

GH-798: add skip_header and multi-text field support

c3d4dff

alanakbik merged commit 3e781d6 into master Jun 21, 2019

alanakbik deleted the GH-798-add-csv-reader branch June 21, 2019 11:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gh 798 add csv reader #826

Gh 798 add csv reader #826

alanakbik commented Jun 21, 2019 •

edited

Loading

alanakbik commented Jun 21, 2019

yosipk commented Jun 21, 2019

Gh 798 add csv reader #826

Gh 798 add csv reader #826

Conversation

alanakbik commented Jun 21, 2019 • edited Loading

alanakbik commented Jun 21, 2019

yosipk commented Jun 21, 2019

alanakbik commented Jun 21, 2019 •

edited

Loading