Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use section/symbol ordering files for compiling rustc (e.g. BOLT) #50655

Open
4 tasks
michaelwoerister opened this issue May 11, 2018 · 21 comments
Open
4 tasks
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance

Comments

@michaelwoerister
Copy link
Member

michaelwoerister commented May 11, 2018

The order in which code is located in binaries has an influence on how fast the binary executes because (as I understand it) it affects instruction cache locality and how efficiently the code is paged in from disk. Many linkers support specifying this order (e.g. LLD via --symbol-ordering-file and MSVC via -ORDER). The hard part, though, is to find an order that will actually improve things. The chromium project has a tool for thisand somewhere else I've read that valgrind could be used for this too. The expected speedups are a few percent.

Prerequisites:

  • Support function instrumentation in rustc (if using the chromium tool) similar to what GCC's -finstrument-functions does.
  • Compile an instrumented version of the compiler
  • Run the instrumented version of the compiler for a realistic test program (this should be less sensitive than full PGO)
  • Use the generated ordering file for building release artifacts

The first point shouldn't be too hard. The rest, however, would big a big infrastructure investment. I hope that we'll get PGO support for our CI at some point. This symbol ordering business could then be part of that.

cc @glandium @rust-lang/wg-compiler-performance @rust-lang/infra

@michaelwoerister michaelwoerister added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance labels May 11, 2018
@ishitatsuyuki
Copy link
Contributor

For your reference, Git uses their integration tests as a source of PGO.

@bjorn3
Copy link
Member

bjorn3 commented May 12, 2018

Missing slash at the end in the link to cygprofile (should be https://cs.chromium.org/chromium/src/tools/cygprofile/) without it I get an error.

@michaelwoerister
Copy link
Member Author

@ishitatsuyuki Interesting!

@est31
Copy link
Member

est31 commented Jun 28, 2018

As an alternative to the google tool, there is BOLT by facebook (github link).

@michaelwoerister
Copy link
Member Author

Great find, @est31!

@Mark-Simulacrum
Copy link
Member

(This was originally typed in response to #55137 which has been closed as a duplicate of this issue)

I think the blocker historically for BOLT/PGO/LTO has been finding CI time, especially in the case of BOLT and PGO for gathering profile data. I think if the answer to "Can BOLT be run on a different binary from which we've gathered data for? (e.g., stage1/bin compiler is profiled while building stage2/bin compiler and then stage2/bin compiler is optimized?" is yes -- and there's still benefit from this -- then my next question is "how long does BOLT take?"

If someone would be willing to do the research to answer these questions then I think integrating this into CI would become more feasible. One good thing is that we can likely not worry about implementing this for all platforms at once since AFAICT BOLT is "just" an optimization

@bstrie
Copy link
Contributor

bstrie commented Oct 17, 2018

@Mark-Simulacrum I don't think this necessarily needs to involve CI at all. I envision these tools as useful for the artifacts that we distribute to users, rather than as an aid to rustc developers. Seems like it could just be the final step on the build servers while we're doing releases.

@Mark-Simulacrum
Copy link
Member

Well, our CI is Rust's build server, so in that regard that's why time especially is important.

@ishitatsuyuki
Copy link
Contributor

ishitatsuyuki commented Oct 12, 2019

I tried BOLT with my own build, and it performed 3% better on average. This was a rough benchmark since I'm using my laptop though, so it might be just noise. (I'm probably not going to run this again until I get a workstation.)

BOLT has some caveats:

  • You need a linker flag to keep relocation information for BOLT usage.
  • BOLT uses enormous amount of memory (about 6GB in my case). The run time itself is not bad, it was below 30s I think.
  • Perf needs to be ran with LBR support; this is almost always unsupported with VMs (which means you don't want to run the measurement inside CI).
  • Perf records tend to be big. Watch out for disk space.
  • You can use data from previous runs without issues, but for releases (presumably stable/beta) fresh data is recommended.

As for gathering data, maybe running them on rustc-perf is another option? We can make use of its perf support.

@zamazan4ik
Copy link
Contributor

Sorry for necroposting, but there is another alternative to BOLT - https://github.com/google/llvm-propeller Maybe it will be better for rustc than BOLT. I didn't try it (yet).

@zamazan4ik
Copy link
Contributor

But from my point of view Bolt is much more interesting way since BOLT is going to become a LLVM part - BOLT team now is working on it.

@bstrie bstrie changed the title Use section/symbol ordering files for compiling rustc Use section/symbol ordering files for compiling rustc (e.g. BOLT) Nov 5, 2021
@bstrie
Copy link
Contributor

bstrie commented Nov 5, 2021

BOLT is on the verge of being upstreamed into LLVM: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153551.html

@bstrie
Copy link
Contributor

bstrie commented Jan 12, 2022

BOLT has landed in LLVM: llvm/llvm-project@4c106cf

@ink-splatters
Copy link

In the face of previous comment: are there any activities regarding rustc's BOLT support, currently?
Is it available in nightly builds already? (I anticipate "no" here, so appreciate if someone could mention other relatively straightforward way to test it).

Thanks!

@Kobzol
Copy link
Contributor

Kobzol commented May 29, 2022

I have been trying to make it work for the past several months. Currently it doesn't seem to work however, since LLVM instrumented with BOLT segfaults.

@zamazan4ik
Copy link
Contributor

@Kobzol do you have a related issue to this crash? E.g. any from these: https://github.com/llvm/llvm-project/issues?q=is%3Aissue+is%3Aopen+bolt

I am also interested in BOLTing the rustc :)

@Kobzol
Copy link
Contributor

Kobzol commented May 29, 2022

You can check #94381 for more details, there are some related LLVM issues linked.

@jyn514 jyn514 added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels Jun 27, 2022
@chadbrewbaker
Copy link

I saw this made it into the rust CI via a shell script? Could it be made a little more friendly as a Cargo enhancement - or at least docs with a simple Rust crate?

@Kobzol
Copy link
Contributor

Kobzol commented Nov 4, 2022

I created https://github.com/Kobzol/cargo-pgo for this.

@jyn514
Copy link
Member

jyn514 commented Feb 3, 2023

It looks like #94381 has been merged - @Kobzol can we close this issue? :)

@Kobzol
Copy link
Contributor

Kobzol commented Feb 3, 2023

Well, BOLT is currently used only for optimizing LLVM on x64 Linux. It's not used to optimize rustc yet (there's an open PR, but the perf. gains bave been a bit lackluster).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-infra Relevant to the infrastructure team, which will review and decide on the PR/issue. WG-compiler-performance Working group: Compiler Performance
Projects
None yet
Development

No branches or pull requests