Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of ParseUfloat #1865

Merged
merged 5 commits into from
Sep 22, 2024

Conversation

ksw2000
Copy link
Contributor

@ksw2000 ksw2000 commented Sep 14, 2024

Replaced offset handling logic with more efficient math.Pow10 based calculation.

The math.Pow10 function performs calculations using a lookup table, which is more efficient compared to the original method of continuous division. By switching to simple addition and subtraction operations, the execution time is reduced by approximately 30%.

goos: linux
goarch: amd64
pkg: github.com/valyala/fasthttp
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
              │   old.txt   │           new.txt           │
              │   sec/op    │   sec/op     vs base        │
ParseUfloat-8   44.22n ± 0%   31.06n ± 0%  -29.76% (n=50)

Replaced `offset` handling logic with more efficient math.Pow10 based calculation.

goos: linux
goarch: amd64
pkg: github.com/valyala/fasthttp
cpu: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
              │   old.txt   │           new.txt           │
              │   sec/op    │   sec/op     vs base        │
ParseUfloat-8   44.22n ± 0%   31.06n ± 0%  -29.76% (n=50)
@ksw2000 ksw2000 changed the title Improve performance of ParseUfloat function Improve performance of ParseUfloat Sep 14, 2024
@ksw2000 ksw2000 marked this pull request as ready for review September 14, 2024 02:36
@erikdubbelboer
Copy link
Collaborator

This is the first time I'm noticing that our ParseUfloat actually returns the wrong values for very high numbers. Even though it's faster than strconv.ParseFloat. I still think we should switch to strconv.ParseFloat as being incorrect like this is very bad. What do you think?

1234e23
ParseUfloat:        123399999999999981930938368.000000
strconv.ParseFloat: 123399999999999999110807552.000000

1.234e+32
ParseUfloat:        123399999999999984608249181634560.000000
strconv.ParseFloat: 123400000000000002622647691116544.000000

123456789123456789.987654321
ParseUfloat:        17502027876.294655
strconv.ParseFloat: 123456789123456784.000000

@ksw2000
Copy link
Contributor Author

ksw2000 commented Sep 22, 2024

Actually, I encountered this problem as well. I believe the reason is that we use a uint64 variable to represent a floating-point number. The largest value of uint64 is 1.8×10¹⁹. Therefore, the third example you provided, 123456789123456789.987654321, which has 27 digits, causes the uint64 value to overflow.

@erikdubbelboer
Copy link
Collaborator

This is currently not documented at all. I'm thinking I would rather have the function return the correct values even for big numbers than have it be 20ns faster. Just using strconv.ParseFloat would also result in less code to maintain. What do you think?

@ksw2000
Copy link
Contributor Author

ksw2000 commented Sep 22, 2024

Parse the float string is a complex topic. In strconv.ParseFloat, they use IEEE754 unbiased rounding.
In our implementation, we directly truncate the digits after the 19th position.
Although our implementation doesn't achieve the same high precision as strconv.ParseFloat, the error is within an acceptable range for network applications.
I think we can add the comments that our ParseUfloat is much faster than strconv.ParseFloat, but the results are less accurate.

goos: windows
goarch: amd64
pkg: github.com/valyala/fasthttp
cpu: Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz
BenchmarkParseUfloat-8                  13448752                78.83 ns/op
BenchmarkParseUfloatStrconv-8            4897358               248.1 ns/op
PASS
ok      github.com/valyala/fasthttp     2.738s
1234e23
ParseUfloat:        1.2339999999999998e+26
strconv.ParseFloat: 1.234e+26

1.234e+32
ParseUfloat:        1.2339999999999998e+32
strconv.ParseFloat: 1.234e+32

123456789123456789.987654321
ParseUfloat:        1.2345678912345678e+17
strconv.ParseFloat: 1.2345678912345678e+17

@erikdubbelboer
Copy link
Collaborator

There are other inconsistencies as well. For example this test fails because it returns 0 instead of 1 like strconv.ParseFloat does.

testParseUfloatSuccess(t, "00000000000000000001", 1)

Or parsing 87000000000.2 results in 87000000000.200012 while strconv.ParseFloat returns 87000000000.199997 which is a bigger difference for such a low number than I would like.

@ksw2000
Copy link
Contributor Author

ksw2000 commented Sep 22, 2024

In the first case 00000000000000000001, we can add some code to remove the leading zeros. As for the second case 87000000000.2, the relative error between ParseUfloat and strconv.ParseFloat is only 1.75e-16.

@ksw2000
Copy link
Contributor Author

ksw2000 commented Sep 22, 2024

The second case is very strange. Our algorithm calculates using float64(870000000002) * math.Pow10(-1), but if we change to using division, the result will be the same as strconv.ParseFloat.

fmt.Printf("%f %f\n", float64(870000000002)*math.Pow10(-1), float64(870000000002)/math.Pow10(1))
87000000000.200012 87000000000.199997

@erikdubbelboer
Copy link
Collaborator

71000000000000.1 results in a 1.562500e-02 relative difference. I'm trying to fuzz test the implementations against each other and I keep finding differences. There is also how they handle numbers with E or X. Or strconv.ParseFloat sometimes returning NaN instead of an error. There are so many differences that I really don't think it's worth the 20ns (nanoseconds) overhead. Unless someone is calling this function billions of times it's not going to make a difference. And if someone does that it's much better for them to use a more specific version in their own codebase.

@ksw2000
Copy link
Contributor Author

ksw2000 commented Sep 22, 2024

The example you provided, using Relative Error = (expected value - actual value) / actual value, the error is indeed very small, all below 10^-15. However, if our function cannot perfectly handle special cases like NaN, I agree with your suggestion. Perhaps we should consider directly calling strconv.ParseFloat, or rewriting our function based on strconv.ParseFloat. What do you think?

@erikdubbelboer
Copy link
Collaborator

Lets just call strconv.ParseFloat and include some error handling for negative numbers to keep that backwards compatible.

@erikdubbelboer erikdubbelboer merged commit e2bb2e0 into valyala:master Sep 22, 2024
15 checks passed
@erikdubbelboer
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants