Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize wcwidth() #35

Merged
merged 1 commit into from
Mar 23, 2020
Merged

Optimize wcwidth() #35

merged 1 commit into from
Mar 23, 2020

Conversation

avylove
Copy link
Contributor

@avylove avylove commented Mar 22, 2020

Some minor optimizations for wcwidth(). Should result in ~30% performance gain.

>>> from timeit import timeit
>>> timeit('wcwidth("a")', setup='from wcwidth import wcwidth', number=10000000)

Before optimizations:
7.065002638002625

After optimizations:
4.69557189499028

The main speedups come from using a set instead of a chain of boolean comparisons and passing the upper bound of the table into the binary search instead of calling length on the table for each run.

@coveralls
Copy link

Coverage Status

Coverage remained the same at ?% when pulling 69dc30c on avylove:optimize_wcwidth into 3b1a268 on jquast:master.

@jquast
Copy link
Owner

jquast commented Mar 22, 2020

thank you!

@jquast jquast merged commit 4dac3f0 into jquast:master Mar 23, 2020
@jquast
Copy link
Owner

jquast commented Jun 1, 2020

I've had to unroll the ubound optimization in #23

@avylove
Copy link
Contributor Author

avylove commented Jun 1, 2020

Makes sense.
Do you think it would be worth adding lru_cache to _bisearch() and/or wcwidth() itself? My guess is, in the average use case, a small set of characters will be looked up repeatedly. Even in the case where the language itself is all unicode and the characters used exceed the size of the cache, there are going to be more common characters that will benefit from cached lookups.

@jquast
Copy link
Owner

jquast commented Jun 1, 2020

You're probably right, I will do that, maybe a size of 1,000.

I've been mostly "view all characters, top-bottom" in my own tests, but considering real-world use cases, folks will probably stay within their own language or range of emoticons :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants