Run translation and LLVM in parallel when compiling with multiple CGUs #43506

michaelwoerister · 2017-07-27T15:52:24Z

This is still a work in progress but the bulk of the implementation is done, so I thought it would be good to get it in front of more eyes.

This PR makes the compiler start running LLVM while translation is still in progress, effectively allowing for more parallelism towards the end of the compilation pipeline. It also allows the main thread to switch between either translation or running LLVM, which allows to reduce peak memory usage since not all LLVM module have to be kept in memory until linking. This is especially good for incr. comp. but it works just as well when running with -Ccodegen-units=N.

In order to help tuning and debugging the work scheduler, the PR adds the -Ztrans-time-graph flag which spits out html files that show how work packages where scheduled:

(red is translation, green is llvm)

One side effect here is that -Ztime-passes might show something not quite correct because trans and LLVM are not strictly separated anymore. I plan to have some special handling there that will try to produce useful output.

One open question is how to determine whether the trans-thread should switch to intermediate LLVM processing.

TODO:

Restore -Z time-passes output for LLVM.
Update documentation, esp. for work package scheduling.
Tune the scheduling algorithm.

cc @alexcrichton @rust-lang/compiler

rust-highfive · 2017-07-27T15:52:28Z

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

michaelwoerister · 2017-07-27T16:01:52Z

r? @alexcrichton

Pre-assigning @alexcrichton for review, so he can already start reading :P

alexcrichton · 2017-07-27T16:50:21Z

🎊

retep998 · 2017-07-27T19:21:55Z

As a future thought for how -Ztime-passes could be implemented in a world where everything is threaded, we could use GetThreadTimes on Windows (and whatever linux equivalent) to measure the CPU time used up by a given thread to more precisely track where time is being spent.

alexcrichton

Looking great!

The general framework of who's running what when seemed a little confusing to follow, but I think a comment would go a long way towards helping that.

alexcrichton · 2017-07-27T20:42:48Z

src/librustc_trans/back/write.rs

+                    if let Ok(token) = token {
+                        tokens.push(token);
+                    } else {
+                        shared_emitter.fatal("failed to acquire jobserver token");


Could the error be emitted here as well?

What do you mean? Just panic?

Oh just something like:

match token { Ok(token) => tokens.push(token), Err(e) => shared_emitter.fatal(&format!("failed to acquire jobserver token: {}", e)), }

ah ok, yes.

alexcrichton · 2017-07-27T20:45:11Z

src/librustc_trans/back/write.rs

+                }
+
+                Message::TranslationDone { llvm_work_item, is_last } => {
+                    work_items.insert(0, llvm_work_item);


How come this is inserted on the front?

No reason. I changed that during a debugging session. I'll switch it back to a push.

alexcrichton · 2017-07-27T20:47:38Z

src/librustc_trans/back/write.rs

+                        assert_eq!(trans_worker_state, TransWorkerState::LLVMing);
+                        trans_worker_state = TransWorkerState::Idle;
+                    } else {
+                        drop(tokens.pop());


I think we discussed this awhile ago, but should we perhaps not drop the token here? If we greedily hold on to our tokens then that means we can more quickly finish this compilation, which in theory may be desirable to reduce overall memory usage?

Yes, good idea. The truncate above should take care of dropping the Token if we actually don't need it.

alexcrichton · 2017-07-27T20:49:47Z

src/librustc_trans/back/write.rs

+                    }
+                }
+            } else {
+                match trans_worker_state {


I'm currently finding the logic here sort of hard to follow in terms of what this "trans worker" is doing. Could you be sure to add a comment with a high-level architecture of what the relationship is between this coordinator thread, the main translation thread, and the worker codegen threads?

Ok so to see if I understand this:

The documentation above refers to a "translation worker"

This trans worker seems to represent the "ephemeral token" that we inherently have to run work from a jobserver

The literal thread representing the trans worker can change I think? Sometimes it's the literally the thread doing translation, sometimes it's a spawned worker here to translate an existing module.

If we happen to reach 4 translated codegen units but not codegen'd codegen units then we request the translating main thread to stop, and continue its work with a freshly spawned thread to codegen a module.

Does that sound roughly right?

Ah I see now there's also a crucial piece where when any codegen thread finishes we consider it "trans worker" thread now available again. This means after any codegen thread finishes we may be candidate to start translation of another unit of work I think, right?

Ah I see now there's also a crucial piece where when any codegen thread finishes we consider it "trans worker" thread now available again. This means after any codegen thread finishes we may be candidate to start translation of another unit of work I think, right?

No, but it's an excellent idea :D

alexcrichton · 2017-07-28T15:29:37Z

Docs look great! So one overall meta-comment as well now. I'm having some difficulty articulating this but I'm slightly worried about a situation like:

We've got N codegen units left to translate, but the main thread is stopped
Our heuristic blocks the main thread an spins up a worker for an existing codegen unit.
All of a sudden we get an influx of jobserver tokens, but the main thread is still stopped.

I haven't convinced myself this is possible and I think that the various handling makes it ok? I've found the interaction between all these threads and the implicit token a little difficult to follow, but does this make sense to you? Can you think of a case where we accidentally starve the translation thread due to our heuristic?

michaelwoerister · 2017-07-28T16:23:58Z

I haven't convinced myself this is possible and I think that the various handling makes it ok? I've found the interaction between all these threads and the implicit token a little difficult to follow, but does this make sense to you? Can you think of a case where we accidentally starve the translation thread due to our heuristic?

So with the latest change, the main thread will be free again as soon as the first LLVM worker is done (thanks to your suggestion). It's still possible to run into a LLVM work shortage the way you describe though.

I just adapted the strategy to estimate the cost of a LLVM WorkItem. Maybe the main thread should always start the cheapest one available, so it can get back to translating sooner, if needed? I don't think this is much of a problem though (with no data to back this claim up in any way :))

alexcrichton · 2017-07-28T19:32:11Z

Thinking about this a bit more, I think this is my mental model for what's happening: on each turn of the loop we'll have N slots of work to fill up depending on what's currently running and what amount of tokens we have. Given the choice of whether to translate a new unit or codegen an existing unit it seems fine to have a heuristic. Whenever something happens though it'll turn the loop and cause everything to start over.

I think that's roughly what's implemented right now, but I think that it means that we should consider the translation thread idle as soon as we've acquired a new token? That way if the translation thread was blocked and we get a token it should get unblocked? (which I don't think happens today?) I may not be following the code quite right though...

I just adapted the strategy to estimate the cost of a LLVM WorkItem

Neat!

michaelwoerister · 2017-07-31T11:26:04Z

but I think that it means that we should consider the translation thread idle as soon as we've acquired a new token?

Yes, that makes sense. It's a variation of considering the translation thread idle when a package is finished (another way of getting an additional token).

…ation.

…free up the main thread for translation.

…rough a channel instead of upfront.

…ctions.

…linking.

…compilation process.

michaelwoerister · 2017-08-01T12:06:22Z

Thanks, @kennytm. Compiling with LLVM 3.7 right now...

michaelwoerister · 2017-08-01T12:35:28Z

Can't reproduce with LLVM 3.7 either :(

michaelwoerister · 2017-08-01T12:50:50Z

I pushed another change that should give a sensible error message at a likely point of failure. Also adapted the scheduler heuristic slightly. Let's see.

michaelwoerister · 2017-08-01T13:40:33Z

New error, excellent :D

…t being available in memory.

michaelwoerister · 2017-08-01T15:12:30Z

@bors r=alexcrichton

OK, passes travis now.

bors · 2017-08-01T15:12:31Z

📌 Commit 6468cad has been approved by alexcrichton

@alexcrichton

Run translation and LLVM in parallel when compiling with multiple CGUs This is still a work in progress but the bulk of the implementation is done, so I thought it would be good to get it in front of more eyes. This PR makes the compiler start running LLVM while translation is still in progress, effectively allowing for more parallelism towards the end of the compilation pipeline. It also allows the main thread to switch between either translation or running LLVM, which allows to reduce peak memory usage since not all LLVM module have to be kept in memory until linking. This is especially good for incr. comp. but it works just as well when running with `-Ccodegen-units=N`. In order to help tuning and debugging the work scheduler, the PR adds the `-Ztrans-time-graph` flag which spits out html files that show how work packages where scheduled: ![Building regex](https://user-images.githubusercontent.com/1825894/28679272-f6752bd8-72f2-11e7-8a6c-56207855ce95.png) (red is translation, green is llvm) One side effect here is that `-Ztime-passes` might show something not quite correct because trans and LLVM are not strictly separated anymore. I plan to have some special handling there that will try to produce useful output. One open question is how to determine whether the trans-thread should switch to intermediate LLVM processing. TODO: - [x] Restore `-Z time-passes` output for LLVM. - [x] Update documentation, esp. for work package scheduling. - [x] Tune the scheduling algorithm. cc @alexcrichton @rust-lang/compiler

bors · 2017-08-01T17:21:31Z

⌛ Testing commit 6468cad with merge e772c28...

bors · 2017-08-01T19:59:48Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing e772c28 to master...

@philipc

…hton Don't unwrap work item results as the panic trace is useless Fixes #43402 now there's no multithreaded panic printouts Also update a comment -------- Likely regressed in #43506, where the code was changed to panic in worker threads on error. Unwrapping gives zero extra information since the stack trace is so short, so we may as well just surface that there was an error and exit the thread properly. Because there are then no multithreaded printouts, I think it should mean the output of the test for #26199 is deterministic and not interleaved (thanks to @philipc #43402 (comment) for a hint). Sadly the output is now: ``` thread '<unnamed>' panicked at 'aborting due to worker thread panic', src/librustc_trans/back/write.rs:1643:20 note: Run with `RUST_BACKTRACE=1` for a backtrace. error: could not write output to : No such file or directory error: aborting due to previous error ``` but it's an improvement over the multi-panic situation before. r? @alexcrichton

Changelog: Version 1.21.0 (2017-10-12) ========================== Language -------- - [You can now use static references for literals.][43838] Example: ```rust fn main() { let x: &'static u32 = &0; } ``` - [Relaxed path syntax. Optional `::` before `<` is now allowed in all contexts.][43540] Example: ```rust my_macro!(Vec<i32>::new); // Always worked my_macro!(Vec::<i32>::new); // Now works ``` Compiler -------- - [Upgraded jemalloc to 4.5.0][43911] - [Enabled unwinding panics on Redox][43917] - [Now runs LLVM in parallel during translation phase.][43506] This should reduce peak memory usage. Libraries --------- - [Generate builtin impls for `Clone` for all arrays and tuples that are `T: Clone`][43690] - [`Stdin`, `Stdout`, and `Stderr` now implement `AsRawFd`.][43459] - [`Rc` and `Arc` now implement `From<&[T]> where T: Clone`, `From<str>`, `From<String>`, `From<Box<T>> where T: ?Sized`, and `From<Vec<T>>`.][42565] Stabilized APIs --------------- [`std::mem::discriminant`] Cargo ----- - [You can now call `cargo install` with multiple package names][cargo/4216] - [Cargo commands inside a virtual workspace will now implicitly pass `--all`][cargo/4335] - [Added a `[patch]` section to `Cargo.toml` to handle prepublication dependencies][cargo/4123] [RFC 1969] - [`include` & `exclude` fields in `Cargo.toml` now accept gitignore like patterns][cargo/4270] - [Added the `--all-targets` option][cargo/4400] - [Using required dependencies as a feature is now deprecated and emits a warning][cargo/4364] Misc ---- - [Cargo docs are moving][43916] to [doc.rust-lang.org/cargo](https://doc.rust-lang.org/cargo) - [The rustdoc book is now available][43863] at [doc.rust-lang.org/rustdoc](https://doc.rust-lang.org/rustdoc) - [Added a preview of RLS has been made available through rustup][44204] Install with `rustup component add rls-preview` - [`std::os` documentation for Unix, Linux, and Windows now appears on doc.rust-lang.org][43348] Previously only showed `std::os::unix`. Compatibility Notes ------------------- - [Changes in method matching against higher-ranked types][43880] This may cause breakage in subtyping corner cases. [A more in-depth explanation is available.][info/43880] - [rustc's JSON error output's byte position start at top of file.][42973] Was previously relative to the rustc's internal `CodeMap` struct which required the unstable library `libsyntax` to correctly use. - [`unused_results` lint no longer ignores booleans][43728] [42565]: rust-lang/rust#42565 [42973]: rust-lang/rust#42973 [43348]: rust-lang/rust#43348 [43459]: rust-lang/rust#43459 [43506]: rust-lang/rust#43506 [43540]: rust-lang/rust#43540 [43690]: rust-lang/rust#43690 [43728]: rust-lang/rust#43728 [43838]: rust-lang/rust#43838 [43863]: rust-lang/rust#43863 [43880]: rust-lang/rust#43880 [43911]: rust-lang/rust#43911 [43916]: rust-lang/rust#43916 [43917]: rust-lang/rust#43917 [44204]: rust-lang/rust#44204 [cargo/4123]: rust-lang/cargo#4123 [cargo/4216]: rust-lang/cargo#4216 [cargo/4270]: rust-lang/cargo#4270 [cargo/4335]: rust-lang/cargo#4335 [cargo/4364]: rust-lang/cargo#4364 [cargo/4400]: rust-lang/cargo#4400 [RFC 1969]: rust-lang/rfcs#1969 [info/43880]: rust-lang/rust#44224 (comment) [`std::mem::discriminant`]: https://doc.rust-lang.org/std/mem/fn.discriminant.html

…ster fix faulty comment after rust-lang#43506 there is no fixed number of request sent.

rust-highfive assigned arielb1 Jul 27, 2017

rust-highfive assigned alexcrichton and unassigned arielb1 Jul 27, 2017

alexcrichton reviewed Jul 27, 2017

View reviewed changes

alexcrichton added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jul 27, 2017

michaelwoerister force-pushed the async-llvm branch from d2cdb54 to b6c9e69 Compare July 28, 2017 12:46

michaelwoerister force-pushed the async-llvm branch from b6c9e69 to dbaee99 Compare July 28, 2017 16:15

michaelwoerister force-pushed the async-llvm branch from dbaee99 to 29989ef Compare July 31, 2017 11:41

michaelwoerister and others added 14 commits July 31, 2017 14:55

async-llvm(1): Run LLVM already in trans_crate().

c4adece

async-llvm(2): Decouple diagnostics emission from LLVM worker coordin…

29d4725

…ation.

async-llvm(3): Make write::CodegenContext Clone and Send.

bac57cf

async-llvm(4): Move work coordination to separate thread in order to …

df6be33

…free up the main thread for translation.

async-llvm(5): Do continuous error handling on main thread.

b18a61a

async-llvm(6): Make the LLVM work coordinator get its work package th…

8f6894e

…rough a channel instead of upfront.

async-llvm(7): Clean up error handling a bit.

4282dd8

async-llvm(8): Clean up resource management and drop LLVM modules ASAP.

645841e

async-llvm(9): Move OngoingCrateTranslation into back::write.

ccb970b

async-llvm(10): Factor compile output files cleanup into separate fun…

28589ec

…ctions.

async-llvm(11): Delay joining ongoing translation until right before …

f3ce505

…linking.

async-llvm(12): Hide no_integrated_as logic in write::run_passes.

397b2a8

async-llvm(13): Submit LLVM work packages from base::trans_crate().

b924ec1

async-llvm(14): Move LTO/codegen-unit conflict check to beginning of …

a1be658

…compilation process.

async-llvm(28): Make some error messages more informative.

b8d4413

michaelwoerister force-pushed the async-llvm branch from 3b2af87 to b8d4413 Compare August 1, 2017 12:44

async-llvm(29): Adapt run-make/llvm-phase test case to LLVM module no…

6468cad

…t being available in memory.

bors merged commit 6468cad into rust-lang:master Aug 1, 2017

bors mentioned this pull request Aug 1, 2017

Profile queries #43345

Merged

alexcrichton mentioned this pull request Aug 2, 2017

Fix parsing nightly's -Z time-passes output rust-lang-deprecated/rustc-perf-collector#14

Merged

kennytm mentioned this pull request Aug 2, 2017

RFC: Implicit caller location (third try to the unwrap/expect line info problem) rust-lang/rfcs#2091

Merged

alexcrichton mentioned this pull request Aug 10, 2017

trans/LLVM: Don't keep all LLVM modules in memory at the same time #39280

Closed

lqd mentioned this pull request Aug 11, 2017

Possibly use a known profiling format for timings ? #43804

Closed

michaelwoerister mentioned this pull request Aug 14, 2017

20% compiler memory usage regression in syntex_syntax incremental compile #43835

Closed

This was referenced Sep 22, 2017

Compilation of a crate using a large static map fails on latest i686-pc-windows-gnu Beta #36799

Open

Updated RELEASES.md for 1.21.0 #44481

Merged

aidanhs mentioned this pull request Oct 4, 2017

Don't unwrap work item results as the panic trace is useless #45019

Merged

andjo403 mentioned this pull request Jan 9, 2018

fix faulty comment #47302

Merged

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 16, 2018

Rollup merge of rust-lang#47302 - andjo403:commentfix, r=michaelwoeri…

bca76c1

…ster fix faulty comment after rust-lang#43506 there is no fixed number of request sent.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 16, 2018

Rollup merge of rust-lang#47302 - andjo403:commentfix, r=michaelwoeri…

9928bf2

…ster fix faulty comment after rust-lang#43506 there is no fixed number of request sent.

kennytm added a commit to kennytm/rust that referenced this pull request Jan 17, 2018

Rollup merge of rust-lang#47302 - andjo403:commentfix, r=michaelwoeri…

3dc6f3c

…ster fix faulty comment after rust-lang#43506 there is no fixed number of request sent.

kennytm added a commit to kennytm/rust that referenced this pull request Jan 17, 2018

Rollup merge of rust-lang#47302 - andjo403:commentfix, r=michaelwoeri…

283ee54

…ster fix faulty comment after rust-lang#43506 there is no fixed number of request sent.

michaelwoerister mentioned this pull request May 18, 2018

convenient self-profile option? #50780

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run translation and LLVM in parallel when compiling with multiple CGUs #43506

Run translation and LLVM in parallel when compiling with multiple CGUs #43506

michaelwoerister commented Jul 27, 2017 •

edited

Loading

rust-highfive commented Jul 27, 2017

michaelwoerister commented Jul 27, 2017

alexcrichton commented Jul 27, 2017

retep998 commented Jul 27, 2017

alexcrichton left a comment

alexcrichton Jul 27, 2017

michaelwoerister Jul 28, 2017

alexcrichton Jul 28, 2017

michaelwoerister Jul 28, 2017

alexcrichton Jul 27, 2017

michaelwoerister Jul 28, 2017

alexcrichton Jul 27, 2017

michaelwoerister Jul 28, 2017

alexcrichton Jul 27, 2017

alexcrichton Jul 27, 2017

michaelwoerister Jul 28, 2017

alexcrichton commented Jul 28, 2017

michaelwoerister commented Jul 28, 2017

alexcrichton commented Jul 28, 2017

michaelwoerister commented Jul 31, 2017

michaelwoerister commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

bors commented Aug 1, 2017

bors commented Aug 1, 2017

bors commented Aug 1, 2017

Run translation and LLVM in parallel when compiling with multiple CGUs #43506

Run translation and LLVM in parallel when compiling with multiple CGUs #43506

Conversation

michaelwoerister commented Jul 27, 2017 • edited Loading

rust-highfive commented Jul 27, 2017

michaelwoerister commented Jul 27, 2017

alexcrichton commented Jul 27, 2017

retep998 commented Jul 27, 2017

alexcrichton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcrichton commented Jul 28, 2017

michaelwoerister commented Jul 28, 2017

alexcrichton commented Jul 28, 2017

michaelwoerister commented Jul 31, 2017

michaelwoerister commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

michaelwoerister commented Aug 1, 2017

bors commented Aug 1, 2017

bors commented Aug 1, 2017

bors commented Aug 1, 2017

michaelwoerister commented Jul 27, 2017 •

edited

Loading