Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gh 798 add csv reader #826

Merged
merged 7 commits into from
Jun 21, 2019
Merged

Gh 798 add csv reader #826

merged 7 commits into from
Jun 21, 2019

Conversation

alanakbik
Copy link
Collaborator

@alanakbik alanakbik commented Jun 21, 2019

This PR adds another way for loading classification datasets, namely datasets that are in CSV format. You need to pass a column format (like in ColumnCorpus), which indicates which column(s) in the CSV holds the text and which field(s) the label(s).

corpus = CSVClassificationCorpus(
    # path to the data folder containing train / test / dev files
    data_folder='path/to/data',
    # indicates which columns are text and labels
    column_name_map={4: "text", 1: "label_topic", 2: "label_subtopic"},
    # if CSV has a header, you can skip it
    skip_header=True)

closes #798

@alanakbik
Copy link
Collaborator Author

👍

1 similar comment
@yosipk
Copy link
Collaborator

yosipk commented Jun 21, 2019

👍

@alanakbik alanakbik merged commit 3e781d6 into master Jun 21, 2019
@alanakbik alanakbik deleted the GH-798-add-csv-reader branch June 21, 2019 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Corpus reader for CSV text classification files
2 participants