Skip to content

Releases: mideind/Tokenizer

Version 3.4.5

23 Aug 15:58
340ecb7
Compare
Choose a tag to compare
  • Compatibility with Python 3.13
  • Now requires Python 3.9 or later

Full Changelog: 3.4.4...3.4.5

Version 3.4.4

07 Aug 14:05
Compare
Choose a tag to compare
  • Better handling of abbreviations

Full Changelog: 3.4.3...3.4.4

Version 3.4.3

11 Aug 16:21
Compare
Choose a tag to compare
  • Various minor fixes.
  • Now requires Python 3.8 or later.

Full Changelog: 3.4.2...3.4.3

Version 3.4.2

23 Sep 13:59
Compare
Choose a tag to compare
  • Some abbreviations and phrases added
  • META_BEGIN token added to help users distinguish between metatokens and regular tokens

Version 3.4.1

03 May 13:45
Compare
Choose a tag to compare
  • Improved performance on large input chunks

Version 3.4.0

10 Mar 14:47
Compare
Choose a tag to compare
  • Improved handling and normalization of punctuation

Version 3.3.3

21 Jan 10:49
Compare
Choose a tag to compare
  • Better support for token-level errors

Version 3.3.2

27 Sep 14:58
Compare
Choose a tag to compare
  • Internal refactoring
  • Fixes in paragraph handling

Version 3.3.0

08 Sep 16:01
Compare
Choose a tag to compare
  • Fixed bug where opening quotes following beginning-of-paragraph markers were incorrectly recognized and normalized.

Version 3.2.0

16 Aug 16:46
Compare
Choose a tag to compare
  • Numbers and amounts that consist exclusively of alphabetic words (sjö hundruð) are now returned as the original TOK.WORD tokens (sjö and hundruð), not coalesced into TOK.NUMBER/TOK.AMOUNT/etc. tokens as before.