Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LTO/IPO compile support #764

Merged
merged 8 commits into from
Nov 20, 2022
Merged

Add LTO/IPO compile support #764

merged 8 commits into from
Nov 20, 2022

Conversation

wanghenshui
Copy link
Contributor

No description provided.

CMakeLists.txt Outdated Show resolved Hide resolved
@PragmaTwice
Copy link
Member

PragmaTwice commented Aug 2, 2022

It seems GCC builds in linux work well but Clang builds fail in linking phase (https://github.com/apache/incubator-kvrocks/runs/7631095274?check_suite_focus=true).

I have not tried it manually but I guess it maybe due to ld, which cannot recognize *.o generated by Clang with -flto enabled (it is actually LLVM bytecode in LTO mode, not native object file).

Hence you can try to replace ld with lld while clang is used. I can help you to do that if you do not familiar with those stuff.

@wanghenshui
Copy link
Contributor Author

It seems GCC builds in linux work well but Clang builds fail in linking phase (https://github.com/apache/incubator-kvrocks/runs/7631095274?check_suite_focus=true).

I have not tried it manually but I guess it maybe due to ld, which cannot recognize *.o generated by clang with -flto enabled (it is actually llvm bytecode in LTO mode).

Hence you can try to replace ld with lld while clang is used. I can help you to do that if you do not familiar with those stuff.

I'll fix it. wait for

@wanghenshui wanghenshui closed this Aug 2, 2022
@wanghenshui wanghenshui reopened this Aug 2, 2022
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
@PragmaTwice
Copy link
Member

clang not working. i cannot fix it.

It's fine. I will help you.

I'm going to try your branch on my side and commit to it. So please do not force-push from now 🤣

PragmaTwice
PragmaTwice previously approved these changes Aug 3, 2022
git-hulk
git-hulk previously approved these changes Aug 3, 2022
Copy link
Member

@git-hulk git-hulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT, it looks didn't increase much compile time while comparing to before, but I didn't have a look at the memory usage.

@ShooterIT
Copy link
Member

I think we should add some descriptions for these options, why do we use, their advantage or disadvantages

@PragmaTwice
Copy link
Member

PragmaTwice commented Aug 3, 2022

I think we should add some descriptions for these options, why do we use, their advantage or disadvantages

Link-time optimization (LTO) is an important optimization technology in modern compilers.

As we already known, C/C++ compilers generate native object files for every translation unit (TU), which indicates compiler can only get information within a certain TU, i.e. it know nothing about another TU (except declarations). But obviously most compiler optimization approach requires the information in function definitions, like constant propagation, reachable analysis, interprocedual dataflow analysis, inlining, loop invariant analysis, pointer analysis, etc.

So if compiler cannot retrieve such information, the optimization will be just discarded. Obviously, this is a huge loss. LTO postpone the optimization procedures to link-time so (almost) every definition in the program is available to the optimizer. I do not think LTO has any disadvantages other than possibly slowing down compilation.

I do not know why CMake call it interprocedure optimization, since interprocedure optimization can be done without LTO (but limited since it loss information in definitions).

@PragmaTwice
Copy link
Member

Hi everyone, is there any other thought on this PR? I will merge it if no further discussion 🚀

@ShooterIT
Copy link
Member

oh, thanks, is there a performance comparison?

@PragmaTwice
Copy link
Member

PragmaTwice commented Aug 5, 2022

oh, thanks, is there a performance comparison?

You can check this review and this paper.

@ShooterIT
Copy link
Member

I mean how it affects kvrocks?
When we add new features or improvements, we must know their earnings, for example, if this build option make kvrocks performance better, we should know that and how much it improves. Otherwise, other contributors may add new optimization options, we also don't know which option bring earnings or why, we will loss the control of this project, we also can't answer the users's question how to optimize the performance with build options.

I don't say we will merge it only if this option must bring much obvious performance improvement, but we must know it clearly. In addition, not only for this PR changes, even if they are valid on another system, we also don't need to merge it if they don't bring earnings on kvrocks till now, since, for a long time, it will be more and more complex, and make it hard to maintain.

BTW, if you merge this commit, also please describe it clearly in commit log.

@wanghenshui
Copy link
Contributor Author

I mean how it affects kvrocks? When we add new features or improvements, we must know their earnings, for example, if this build option make kvrocks performance better, we should know that and how much it improves. Otherwise, other contributors may add new optimization options, we also don't know which option bring earnings or why, we will loss the control of this project, we also can't answer the users's question how to optimize the performance with build options.

I don't say we will merge it only if this option must bring much obvious performance improvement, but we must know it clearly. In addition, not only for this PR changes, even if they are valid on another system, we also don't need to merge it if they don't bring earnings on kvrocks till now, since, for a long time, it will be more and more complex, and make it hard to maintain.

BTW, if you merge this commit, also please describe it clearly in commit log.

Do we have benchmark test? Or some scripts to test performence easy? We could use those stuff test LTO/before LTO difference.

LTO is free lunch, just like change compiler from c++98 to c++11, c++17, more higher, more gains.

@PragmaTwice
Copy link
Member

Hi @ShooterIT, I agree with your argument that benchmark matters.

I think we can wait for a benchmark before merge it or set ENABLE_IPO to OFF on default since LTO is practical in lots of database projects like mysql, clickhouse and mongodb.

@ShooterIT
Copy link
Member

ShooterIT commented Aug 5, 2022

from studying of these options, i think they would bring performance earnings, but till now, we still don't know how much on kvrocks

@ShooterIT
Copy link
Member

for test, currently we can use redis-benchmark
https://github.com/apache/incubator-kvrocks#2--qps-on-different-payloads

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wanghenshui @PragmaTwice perhaps you can submit a benchmark report before and after this patch via redis-benchmark -p 6666 -q.

@tisonkun
Copy link
Member

tisonkun commented Nov 8, 2022

Closed as no consensus. @wanghenshui I suggest you create an issue first and share the performance benchmark so that our maintainers can be sure whether this change brings good.

@tisonkun tisonkun closed this Nov 8, 2022
@wanghenshui
Copy link
Contributor Author

before IPO

redis-benchmark -q -p 6666
ERROR: failed to fetch CONFIG from 127.0.0.1:6666
WARNING: Could not fetch server CONFIG
PING_INLINE: 185528.77 requests per second, p50=0.143 msec                    
PING_MBULK: 191204.59 requests per second, p50=0.135 msec                    
SET: 132802.12 requests per second, p50=0.343 msec                    
GET: 205338.81 requests per second, p50=0.127 msec                    
INCR: 124688.28 requests per second, p50=0.359 msec                    
LPUSH: 95693.78 requests per second, p50=0.455 msec                   
RPUSH: 101936.80 requests per second, p50=0.399 msec                   
LPOP: 90579.71 requests per second, p50=0.463 msec                   
RPOP: 84033.61 requests per second, p50=0.495 msec                   
SADD: 132100.39 requests per second, p50=0.303 msec                    
HSET: 133511.34 requests per second, p50=0.311 msec                    
SPOP: 224719.11 requests per second, p50=0.119 msec                    
ZADD: 180180.17 requests per second, p50=0.231 msec                    
ZPOPMIN: 289855.06 requests per second, p50=0.095 msec                    
LPUSH (needed to benchmark LRANGE): 100200.40 requests per second, p50=0.439 msec                    
LRANGE_100 (first 100 elements): 118063.76 requests per second, p50=0.231 msec                    
LRANGE_300 (first 300 elements): 65445.03 requests per second, p50=0.415 msec                   
LRANGE_500 (first 500 elements): 41876.05 requests per second, p50=0.647 msec                   
LRANGE_600 (first 600 elements): 37735.85 requests per second, p50=0.719 msec                   
MSET (10 keys): 21537.80 requests per second, p50=2.223 msec

after IPO

 redis-benchmark -q -p 6666
ERROR: failed to fetch CONFIG from 127.0.0.1:6666
WARNING: Could not fetch server CONFIG
PING_INLINE: 233100.23 requests per second, p50=0.111 msec                    
PING_MBULK: 241545.89 requests per second, p50=0.111 msec                    
SET: 137741.05 requests per second, p50=0.319 msec                    
GET: 204498.98 requests per second, p50=0.127 msec                    
INCR: 119760.48 requests per second, p50=0.375 msec                    
LPUSH: 99108.03 requests per second, p50=0.439 msec                   
RPUSH: 101626.02 requests per second, p50=0.415 msec                    
LPOP: 86655.11 requests per second, p50=0.463 msec                   
RPOP: 97656.24 requests per second, p50=0.415 msec                    
SADD: 149476.83 requests per second, p50=0.311 msec                    
HSET: 163398.70 requests per second, p50=0.255 msec                    
SPOP: 158478.61 requests per second, p50=0.167 msec                    
ZADD: 136425.66 requests per second, p50=0.327 msec                    
ZPOPMIN: 242718.45 requests per second, p50=0.111 msec                    
LPUSH (needed to benchmark LRANGE): 93720.71 requests per second, p50=0.447 msec                   
LRANGE_100 (first 100 elements): 141442.72 requests per second, p50=0.183 msec                    
LRANGE_300 (first 300 elements): 69832.40 requests per second, p50=0.391 msec                   
LRANGE_500 (first 500 elements): 44247.79 requests per second, p50=0.607 msec                   
LRANGE_600 (first 600 elements): 37721.61 requests per second, p50=0.719 msec                   
MSET (10 keys): 23557.13 requests per second, p50=1.911 msec

after IPO optimize, redis quick bench result was worse, so don't merge it.

@git-hulk
Copy link
Member

git-hulk commented Nov 19, 2022

Thanks for @wanghenshui input, I'd like to have a try as well.

Update: I tested the ping/ping_bulk which won't be affected by the data volume, and this PR improve the performance from 54929.96 => 57378.93. It's worth reopening since we have the data to prove this.

@git-hulk git-hulk reopened this Nov 19, 2022
@PragmaTwice
Copy link
Member

The merge conflict is resolved now.

@PragmaTwice
Copy link
Member

I think we can merge it now since the benchmark looks good.

@git-hulk
Copy link
Member

I think we can merge it now since the benchmark looks good.

Yes

@git-hulk
Copy link
Member

Thanks all, merging...

@git-hulk git-hulk merged commit bf13335 into apache:unstable Nov 20, 2022
@git-hulk
Copy link
Member

Thanks to @wanghenshui contribution again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants