Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the backend and improve fingerprint diagnostics #2022

Merged
merged 6 commits into from
Oct 6, 2015

Conversation

alexcrichton
Copy link
Member

This series of commits performs two primary tasks:

  • The backend is refactored (again!) to have a much better understanding of what's used as input to everything else in terms of parallelization and dependency tracking. The major goal here was to completely unify all parts of the backend which sort of ad-hoc track dependencies and deal with cross compilation. More details can be found in the first commit message.
  • Greatly improving diagnostics for why a crate is being rebuilt. This still requires setting RUST_LOG, but it was spurred on by Commands or tools for viewing build dependencies #2011 and should help at least my own personal internal debugging of "why is this crate rebuilding?" which seems to come up quite often!

@alexcrichton
Copy link
Member Author

r? @brson

@rust-highfive rust-highfive assigned brson and unassigned huonw Oct 4, 2015
@rust-highfive
Copy link

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

This performs a pass over the docs, touching up sections in a few places and
also moving everything to using inline table syntax by default (the favorted way
to add a dependency)
This commit started out identifying a relatively simple bug in Cargo. A recent
change made it such that the resolution graph included all target-specific
dependencies, relying on the structure of the backend to filter out those which
don't need to get built. This was unfortunately not accounted for in the portion
of the backend that schedules work, mistakenly causing spurious rebuilds if
different runs of the graph pulled in new crates. For example if `cargo build`
didn't build any target-specific dependencies but then later `cargo test` did
(e.g. a dev-dep pulled in a target-specific dep unconditionally) then it would
cause a rebuild of the entire graph.

This class of bug is certainly not the first in a long and storied history of
the backend having multiple points where dependencies are calculated and those
often don't quite agree with one another. The purpose of this rewrite is
twofold:

1. The `Stage` enum in the backend for scheduling work and ensuring that maximum
   parallelism is achieved is removed entirely. There is already a function on
   `Context` which expresses the dependency between targets (`dep_targets`)
   which takes a much finer grain of dependencies into account as well as
   already having all the logic for what-depends-on-what. This duplication has
   caused numerous problems in the past, an unifying these two will truly grant
   maximum parallelism while ensuring that everyone agrees on what their
   dependencies are.

2. A large number of locations in the backend have grown to take a (Package,
   Target, Profile, Kind) tuple, or some subset of this tuple. In general this
   represents a "unit of work" and is much easier to pass around as one
   variable, so a `Unit` was introduced which references all of these variables.
   Almost the entire backend was altered to take a `Unit` instead of these
   variables specifically, typically providing all of the contextual information
   necessary for an operation.

A crucial part of this change is the inclusion of `Kind` in a `Unit` to ensure
that everyone *also* agrees on what architecture they're compiling everything
for. There have been many bugs in the past where one part of the backend
determined that a package was built for one architecture and then another part
thought it was built for another. With the inclusion of `Kind` in dependency
management this is handled in a much cleaner fashion as it's only calculated in
one location.

Some other miscellaneous changes made were:

* The `Platform` enumeration has finally been removed. This has been entirely
  subsumed by `Kind`.
* The hokey logic for "build this crate once" even though it may be depended on
  by both the host/target kinds has been removed. This is now handled in a much
  nicer fashion where if there's no target then Kind::Target is just never used,
  and multiple requests for a package are just naturally deduplicated.
* There's no longer any need to build up the "requirements" for a package in
  terms of what platforms it's compiled for, this now just naturally falls out
  of the dependency graph.
* If a build script is overridden then its entire tree of dependencies are not
  compiled, not just the build script itself.
* The `threadpool` dependency has been replaced with one on `crossbeam`. The
  method of calculating dependencies has quite a few non-static lifetimes and
  the scoped threads of `crossbeam` are now used instead of a thread pool.
* Once any thread fails to execute a command work is no longer scheduled unlike
  before where some extra pending work may continue to start.
* Many functions used early on, such as `compile` and `build_map` have been
  massively simplified by farming out dependency management to
  `Context::dep_targets`.
* There is now a new profile to represent running a build script. This is used
  to inject dependencies as well as represent that a library depends on running
  a build script, not just building it.

This change has currently been tested against cross-compiling Servo to Android
and passes the test suite (which has quite a few corner cases for build scripts
and such), so I'm pretty confident that this refactoring won't have at least too
many regressions!
Adds some helpful error text if an error happens as well
This commit overhauls how a `Fingerprint` is stored on the filesystem and
in-memory to help provide much better diagnostics as to why crates are being
rebuilt. This involves storing more structured data on the filesystem in order
to have a finer-grained comparison with the previous state. This is not
currently surfaced in the output of cargo and still requires
`RUST_LOG=cargo::ops::cargo_rustc::fingerprint=info` but if it turns out to be
useful we can perhaps surface the output.

There are performance considerations here to ensure that a noop build is still
quite speedy for a few reasons:

1. JSON decoding is slow (these are just big structures to decode)
2. Each fingerprint stores all recursive fingerprints, so we can't just "vanilla
   decode" as it would decode O(n^2) items
3. Hashing is actually somewhat nontrivial for this many items here and there,
   so we still need as much memoization as possible.

To ensure that builds are just as speedy tomorrow as they are today, a few
strategies are taken:

* The same fingerprint strategy is used today as a "first line of defense" where
  a small text file with a string contains the "total fingerprint" hash. A
  separately stored file then contains the more detailed JSON structure of the
  old fingerprint, and that's only decoded if there's a mismatch of the short
  hashes. The rationale here is that most crates don't need to be rebuilt so we
  shouldn't decode JSON, but if it does need to be rebuilt then the work of
  compiling far dwarfs the work of decoding the JSON.
* When encoding a full fingerprint as JSON we don't actually include any
  dependencies, just the resolved u64 of them. This helps the O(n^2) problem in
  terms of decoding time and storage space on the filesystem.
* Short hashes continue to be memoized to ensure we don't recompute a hash if
  we've already done so (e.g. shared dependencies).

Overall, when profiling with Servo, this commit does not regress noop build
times, but should help diagnose why crates are being rebuilt hopefully!

Closes rust-lang#2011
Turns out the tests don't actually need this "more deterministic" output and it
was also showing up relatively high in profiles (e.g. ~15% of a noop runtime) so
if this is necessary let's find a better way!
@alexcrichton
Copy link
Member Author

Due to the use of crossbeam this also means we drop 1.1.0 support as crossbeam needs at least const fn to parse (which was presumably added in 1.2.0). I don't think this is necessarily the end of the world though.

@brson
Copy link
Contributor

brson commented Oct 6, 2015

@bors r+

@bors
Copy link
Collaborator

bors commented Oct 6, 2015

📌 Commit 47ea6d6 has been approved by brson

@bors
Copy link
Collaborator

bors commented Oct 6, 2015

⌛ Testing commit 47ea6d6 with merge 68e6799...

bors added a commit that referenced this pull request Oct 6, 2015
This series of commits performs two primary tasks:

* The backend is refactored (again!) to have a much better understanding of what's used as input to everything else in terms of parallelization and dependency tracking. The major goal here was to completely unify all parts of the backend which sort of ad-hoc track dependencies and deal with cross compilation. More details can be found in the first commit message.
* Greatly improving diagnostics for why a crate is being rebuilt. This still requires setting `RUST_LOG`, but it was spurred on by #2011 and should help at least my own personal internal debugging of "why is this crate rebuilding?" which seems to come up quite often!
@bors
Copy link
Collaborator

bors commented Oct 6, 2015

@bors bors merged commit 47ea6d6 into rust-lang:master Oct 6, 2015
@alexcrichton alexcrichton deleted the refactor branch October 6, 2015 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants