Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong offset with nonword-prefix #6

Closed
Lingepumpe opened this issue Nov 11, 2019 · 2 comments
Closed

Wrong offset with nonword-prefix #6

Lingepumpe opened this issue Nov 11, 2019 · 2 comments
Labels
bug Something isn't working

Comments

@Lingepumpe
Copy link

Hi,

when I run:

>>> list(syntok.tokenize('..A'))
[<Token '' : '.' @ 0>, <Token '' : '.' @ 0>, <Token '' : 'A' @ 2>]

Here the first two tokens have the same offset. As I understand offsets this is not the intended behavior.

The problem can be fixed by adding "+i" in tokenizer.py:197, making the line:

yield Token("", c, mo.start()+i)
@fnl
Copy link
Owner

fnl commented Nov 11, 2019

Indeed; Thanks for catching that naughty little bug! Will push a fix shortly.

@fnl fnl added the bug Something isn't working label Nov 11, 2019
fnl added a commit that referenced this issue Nov 11, 2019
Including a version bump to 1.2.1
@fnl
Copy link
Owner

fnl commented Nov 11, 2019

Fixed with 6feb04c and in release v1.2.2

Thank you for reporting, and even more for tracking down the core issue!
That helped massively closing this ticket asap.

@fnl fnl closed this as completed Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants