Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run translation and LLVM in parallel when compiling with multiple CGUs #43506

Merged
merged 29 commits into from
Aug 1, 2017

Conversation

michaelwoerister
Copy link
Member

@michaelwoerister michaelwoerister commented Jul 27, 2017

This is still a work in progress but the bulk of the implementation is done, so I thought it would be good to get it in front of more eyes.

This PR makes the compiler start running LLVM while translation is still in progress, effectively allowing for more parallelism towards the end of the compilation pipeline. It also allows the main thread to switch between either translation or running LLVM, which allows to reduce peak memory usage since not all LLVM module have to be kept in memory until linking. This is especially good for incr. comp. but it works just as well when running with -Ccodegen-units=N.

In order to help tuning and debugging the work scheduler, the PR adds the -Ztrans-time-graph flag which spits out html files that show how work packages where scheduled:
Building regex
(red is translation, green is llvm)

One side effect here is that -Ztime-passes might show something not quite correct because trans and LLVM are not strictly separated anymore. I plan to have some special handling there that will try to produce useful output.

One open question is how to determine whether the trans-thread should switch to intermediate LLVM processing.

TODO:

  • Restore -Z time-passes output for LLVM.
  • Update documentation, esp. for work package scheduling.
  • Tune the scheduling algorithm.

cc @alexcrichton @rust-lang/compiler

@rust-highfive
Copy link
Collaborator

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

@michaelwoerister
Copy link
Member Author

r? @alexcrichton

Pre-assigning @alexcrichton for review, so he can already start reading :P

@alexcrichton
Copy link
Member

🎊

@retep998
Copy link
Member

As a future thought for how -Ztime-passes could be implemented in a world where everything is threaded, we could use GetThreadTimes on Windows (and whatever linux equivalent) to measure the CPU time used up by a given thread to more precisely track where time is being spent.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great!

The general framework of who's running what when seemed a little confusing to follow, but I think a comment would go a long way towards helping that.

if let Ok(token) = token {
tokens.push(token);
} else {
shared_emitter.fatal("failed to acquire jobserver token");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the error be emitted here as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? Just panic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh just something like:

match token {
    Ok(token) => tokens.push(token),
    Err(e) => shared_emitter.fatal(&format!("failed to acquire jobserver token: {}", e)),
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok, yes.

}

Message::TranslationDone { llvm_work_item, is_last } => {
work_items.insert(0, llvm_work_item);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come this is inserted on the front?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. I changed that during a debugging session. I'll switch it back to a push.

assert_eq!(trans_worker_state, TransWorkerState::LLVMing);
trans_worker_state = TransWorkerState::Idle;
} else {
drop(tokens.pop());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this awhile ago, but should we perhaps not drop the token here? If we greedily hold on to our tokens then that means we can more quickly finish this compilation, which in theory may be desirable to reduce overall memory usage?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea. The truncate above should take care of dropping the Token if we actually don't need it.

}
}
} else {
match trans_worker_state {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently finding the logic here sort of hard to follow in terms of what this "trans worker" is doing. Could you be sure to add a comment with a high-level architecture of what the relationship is between this coordinator thread, the main translation thread, and the worker codegen threads?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so to see if I understand this:

  • The documentation above refers to a "translation worker"
  • This trans worker seems to represent the "ephemeral token" that we inherently have to run work from a jobserver
  • The literal thread representing the trans worker can change I think? Sometimes it's the literally the thread doing translation, sometimes it's a spawned worker here to translate an existing module.
  • If we happen to reach 4 translated codegen units but not codegen'd codegen units then we request the translating main thread to stop, and continue its work with a freshly spawned thread to codegen a module.

Does that sound roughly right?

Ah I see now there's also a crucial piece where when any codegen thread finishes we consider it "trans worker" thread now available again. This means after any codegen thread finishes we may be candidate to start translation of another unit of work I think, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see now there's also a crucial piece where when any codegen thread finishes we consider it "trans worker" thread now available again. This means after any codegen thread finishes we may be candidate to start translation of another unit of work I think, right?

No, but it's an excellent idea :D

@alexcrichton alexcrichton added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Jul 27, 2017
@alexcrichton
Copy link
Member

Docs look great! So one overall meta-comment as well now. I'm having some difficulty articulating this but I'm slightly worried about a situation like:

  • We've got N codegen units left to translate, but the main thread is stopped
  • Our heuristic blocks the main thread an spins up a worker for an existing codegen unit.
  • All of a sudden we get an influx of jobserver tokens, but the main thread is still stopped.

I haven't convinced myself this is possible and I think that the various handling makes it ok? I've found the interaction between all these threads and the implicit token a little difficult to follow, but does this make sense to you? Can you think of a case where we accidentally starve the translation thread due to our heuristic?

@michaelwoerister
Copy link
Member Author

I haven't convinced myself this is possible and I think that the various handling makes it ok? I've found the interaction between all these threads and the implicit token a little difficult to follow, but does this make sense to you? Can you think of a case where we accidentally starve the translation thread due to our heuristic?

So with the latest change, the main thread will be free again as soon as the first LLVM worker is done (thanks to your suggestion). It's still possible to run into a LLVM work shortage the way you describe though.

I just adapted the strategy to estimate the cost of a LLVM WorkItem. Maybe the main thread should always start the cheapest one available, so it can get back to translating sooner, if needed? I don't think this is much of a problem though (with no data to back this claim up in any way :))

@alexcrichton
Copy link
Member

Thinking about this a bit more, I think this is my mental model for what's happening: on each turn of the loop we'll have N slots of work to fill up depending on what's currently running and what amount of tokens we have. Given the choice of whether to translate a new unit or codegen an existing unit it seems fine to have a heuristic. Whenever something happens though it'll turn the loop and cause everything to start over.

I think that's roughly what's implemented right now, but I think that it means that we should consider the translation thread idle as soon as we've acquired a new token? That way if the translation thread was blocked and we get a token it should get unblocked? (which I don't think happens today?) I may not be following the code quite right though...

I just adapted the strategy to estimate the cost of a LLVM WorkItem

Neat!

@michaelwoerister
Copy link
Member Author

but I think that it means that we should consider the translation thread idle as soon as we've acquired a new token?

Yes, that makes sense. It's a variation of considering the translation thread idle when a package is finished (another way of getting an additional token).

@michaelwoerister
Copy link
Member Author

Thanks, @kennytm. Compiling with LLVM 3.7 right now...

@michaelwoerister
Copy link
Member Author

Can't reproduce with LLVM 3.7 either :(

@michaelwoerister
Copy link
Member Author

I pushed another change that should give a sensible error message at a likely point of failure. Also adapted the scheduler heuristic slightly. Let's see.

@michaelwoerister
Copy link
Member Author

New error, excellent :D

@michaelwoerister
Copy link
Member Author

@bors r=alexcrichton

OK, passes travis now.

@bors
Copy link
Contributor

bors commented Aug 1, 2017

📌 Commit 6468cad has been approved by alexcrichton

bors added a commit that referenced this pull request Aug 1, 2017
Run translation and LLVM in parallel when compiling with multiple CGUs

This is still a work in progress but the bulk of the implementation is done, so I thought it would be good to get it in front of more eyes.

This PR makes the compiler start running LLVM while translation is still in progress, effectively allowing for more parallelism towards the end of the compilation pipeline. It also allows the main thread to switch between either translation or running LLVM, which allows to reduce peak memory usage since not all LLVM module have to be kept in memory until linking. This is especially good for incr. comp. but it works just as well when running with `-Ccodegen-units=N`.

In order to help tuning and debugging the work scheduler, the PR adds the `-Ztrans-time-graph` flag which spits out html files that show how work packages where scheduled:
![Building regex](https://user-images.githubusercontent.com/1825894/28679272-f6752bd8-72f2-11e7-8a6c-56207855ce95.png)
(red is translation, green is llvm)

One side effect here is that `-Ztime-passes` might show something not quite correct because trans and LLVM are not strictly separated anymore. I plan to have some special handling there that will try to produce useful output.

One open question is how to determine whether the trans-thread should switch to intermediate LLVM processing.

TODO:
- [x] Restore `-Z time-passes` output for LLVM.
- [x] Update documentation, esp. for work package scheduling.
- [x] Tune the scheduling algorithm.

cc @alexcrichton @rust-lang/compiler
@bors
Copy link
Contributor

bors commented Aug 1, 2017

⌛ Testing commit 6468cad with merge e772c28...

@bors
Copy link
Contributor

bors commented Aug 1, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing e772c28 to master...

@bors bors merged commit 6468cad into rust-lang:master Aug 1, 2017
@bors bors mentioned this pull request Aug 1, 2017
bors added a commit that referenced this pull request Oct 5, 2017
…hton

Don't unwrap work item results as the panic trace is useless

Fixes #43402 now there's no multithreaded panic printouts

Also update a comment

--------

Likely regressed in #43506, where the code was changed to panic in worker threads on error.

Unwrapping gives zero extra information since the stack trace is so short, so we may as well just surface that there was an error and exit the thread properly. Because there are then no multithreaded printouts, I think it should mean the output of the test for #26199 is deterministic and not interleaved (thanks to @philipc #43402 (comment) for a hint).

Sadly the output is now:
```
thread '<unnamed>' panicked at 'aborting due to worker thread panic', src/librustc_trans/back/write.rs:1643:20
note: Run with `RUST_BACKTRACE=1` for a backtrace.
error: could not write output to : No such file or directory

error: aborting due to previous error
```
but it's an improvement over the multi-panic situation before.

r? @alexcrichton
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Nov 3, 2017
Changelog:
Version 1.21.0 (2017-10-12)
==========================

Language
--------
- [You can now use static references for literals.][43838]
  Example:
  ```rust
  fn main() {
      let x: &'static u32 = &0;
  }
  ```
- [Relaxed path syntax. Optional `::` before `<` is now allowed in all contexts.][43540]
  Example:
  ```rust
  my_macro!(Vec<i32>::new); // Always worked
  my_macro!(Vec::<i32>::new); // Now works
  ```

Compiler
--------
- [Upgraded jemalloc to 4.5.0][43911]
- [Enabled unwinding panics on Redox][43917]
- [Now runs LLVM in parallel during translation phase.][43506]
  This should reduce peak memory usage.

Libraries
---------
- [Generate builtin impls for `Clone` for all arrays and tuples that
  are `T: Clone`][43690]
- [`Stdin`, `Stdout`, and `Stderr` now implement `AsRawFd`.][43459]
- [`Rc` and `Arc` now implement `From<&[T]> where T: Clone`, `From<str>`,
  `From<String>`, `From<Box<T>> where T: ?Sized`, and `From<Vec<T>>`.][42565]

Stabilized APIs
---------------

[`std::mem::discriminant`]

Cargo
-----
- [You can now call `cargo install` with multiple package names][cargo/4216]
- [Cargo commands inside a virtual workspace will now implicitly
  pass `--all`][cargo/4335]
- [Added a `[patch]` section to `Cargo.toml` to handle
  prepublication dependencies][cargo/4123] [RFC 1969]
- [`include` & `exclude` fields in `Cargo.toml` now accept gitignore
  like patterns][cargo/4270]
- [Added the `--all-targets` option][cargo/4400]
- [Using required dependencies as a feature is now deprecated and emits
  a warning][cargo/4364]


Misc
----
- [Cargo docs are moving][43916]
  to [doc.rust-lang.org/cargo](https://doc.rust-lang.org/cargo)
- [The rustdoc book is now available][43863]
  at [doc.rust-lang.org/rustdoc](https://doc.rust-lang.org/rustdoc)
- [Added a preview of RLS has been made available through rustup][44204]
  Install with `rustup component add rls-preview`
- [`std::os` documentation for Unix, Linux, and Windows now appears on doc.rust-lang.org][43348]
  Previously only showed `std::os::unix`.

Compatibility Notes
-------------------
- [Changes in method matching against higher-ranked types][43880] This may cause
  breakage in subtyping corner cases. [A more in-depth explanation is available.][info/43880]
- [rustc's JSON error output's byte position start at top of file.][42973]
  Was previously relative to the rustc's internal `CodeMap` struct which
  required the unstable library `libsyntax` to correctly use.
- [`unused_results` lint no longer ignores booleans][43728]

[42565]: rust-lang/rust#42565
[42973]: rust-lang/rust#42973
[43348]: rust-lang/rust#43348
[43459]: rust-lang/rust#43459
[43506]: rust-lang/rust#43506
[43540]: rust-lang/rust#43540
[43690]: rust-lang/rust#43690
[43728]: rust-lang/rust#43728
[43838]: rust-lang/rust#43838
[43863]: rust-lang/rust#43863
[43880]: rust-lang/rust#43880
[43911]: rust-lang/rust#43911
[43916]: rust-lang/rust#43916
[43917]: rust-lang/rust#43917
[44204]: rust-lang/rust#44204
[cargo/4123]: rust-lang/cargo#4123
[cargo/4216]: rust-lang/cargo#4216
[cargo/4270]: rust-lang/cargo#4270
[cargo/4335]: rust-lang/cargo#4335
[cargo/4364]: rust-lang/cargo#4364
[cargo/4400]: rust-lang/cargo#4400
[RFC 1969]: rust-lang/rfcs#1969
[info/43880]: rust-lang/rust#44224 (comment)
[`std::mem::discriminant`]: https://doc.rust-lang.org/std/mem/fn.discriminant.html
@andjo403 andjo403 mentioned this pull request Jan 9, 2018
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 16, 2018
…ster

fix faulty comment

after rust-lang#43506 there is no fixed number of request sent.
GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jan 16, 2018
…ster

fix faulty comment

after rust-lang#43506 there is no fixed number of request sent.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 17, 2018
…ster

fix faulty comment

after rust-lang#43506 there is no fixed number of request sent.
kennytm added a commit to kennytm/rust that referenced this pull request Jan 17, 2018
…ster

fix faulty comment

after rust-lang#43506 there is no fixed number of request sent.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants