Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for biomedical NER datasets #510

Closed
stefan-it opened this issue Feb 15, 2019 · 7 comments
Closed

Add support for biomedical NER datasets #510

stefan-it opened this issue Feb 15, 2019 · 7 comments
Assignees
Labels
enhancement Improving of an existing feature

Comments

@stefan-it
Copy link
Member

A great enhancement would be to have built-in dataset loaders for biomedical NER datasets.

A good resource are the datasets mentioned in the BioBERT paper. The datasets can be downloaded form the BioBERT repository.

Thus, the following datasets can be supported:

  • NCBI disease
  • BC5CDR (Disease and Drug/Chemical)
  • BC4CHEMD
  • BC2GM
  • JNLPBA
  • LINNAEUS
  • Species-800
@stefan-it stefan-it self-assigned this Feb 15, 2019
@stefan-it stefan-it added the enhancement Improving of an existing feature label Feb 15, 2019
@alanakbik
Copy link
Collaborator

Yes that's a great idea! Together with the ELMo pubmed model (#502) and the Flair pubmed embeddings (#518) this could enable more research into biomedical data.

@stefan-it
Copy link
Member Author

For now I'm going to write importers for the following datasets:

  • JNLPBA
  • NCBI-disease
  • bc5cdr

Thanks to the SciBERT authors, these datasets are already preprocessed and can simply be fetched from their GitHub repository :)

@shreyashub
Copy link

@stefan-it Were you able to write importers for some of the datasets? If yes, could you reference the PR here?
If no, then I could help in that, along with the other medical NER datasets that were not in SciBERT.

@stale
Copy link

stale bot commented Apr 30, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Apr 30, 2020
@alanakbik
Copy link
Collaborator

There is a highly active branch currently in which my colleagues are adding a large amount of different biomedical datasets. See #1513. I think it will be merged soon!

@stale stale bot removed the wontfix This will not be worked on label Apr 30, 2020
@stale
Copy link

stale bot commented Aug 28, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Aug 28, 2020
@alanakbik alanakbik removed the wontfix This will not be worked on label Aug 28, 2020
@alanakbik
Copy link
Collaborator

Support for biomedical datasets was added in Flair 0.6. If any datasets are missing let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants