Skip to content

The story of Underthesea

Vu Anh edited this page Jun 9, 2024 · 1 revision

What is Underthesea?

Underthesea is a toolkit that supports the research and development of Vietnamese natural language processing. Underthesea was created in March 2017, in a context where Vietnam already had some good toolkits like vn.vitk and pyvi, but still lacked a complete, open-source, easy to install and use toolkit comparable to English counterparts like nltk, polyglot, and spacy.

The Development Team's Goals

  • Underthesea is a tool that is easy to use, easy to install, and comes with comprehensive guides and documentation.
  • Underthesea is open-source software, allowing everyone to use it freely for educational, research, and commercial purposes.
  • Underthesea is always ready to support, with all issues being responded to quickly.
  • Underthesea supports language processing tasks including text, speech, and handwritten image processing.

In the distant future,

  • Underthesea will update with the latest research in natural language processing.
  • Underthesea will serve as a bridge between Vietnamese NLP researchers. Through Underthesea, researchers can implement ideas, integrate datasets, and share the latest research results with the community.

[Update 2022]: Some famous tools for natural language processing HanLP 27.3k, stanza 6.4k, transformers 73.3k, flair 12.2k.