Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial version of checksum based freshness #14137

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Xaeroxe
Copy link

@Xaeroxe Xaeroxe commented Jun 25, 2024

Implementation for #14136 and resolves #6529

This PR implements the use of checksums in cargo fingerprints as an alternative to using mtimes. This is most useful on systems with poor mtime implementations.

This has a dependency on rust-lang/rust#126930. It's expected this will increase the time it takes to declare a build to be fresh. Still this loss in performance may be preferable to the issues the ecosystem has had with the use of mtimes for determining freshness.

@rustbot
Copy link
Collaborator

rustbot commented Jun 25, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @weihanglo (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added A-build-execution Area: anything dealing with executing the compiler A-cli Area: Command-line interface, option parsing, etc. A-configuration Area: cargo config files and env vars A-rebuild-detection Area: rebuild detection and fingerprinting A-unstable Area: nightly unstable support S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 25, 2024
Cargo.toml Outdated Show resolved Hide resolved
@rustbot rustbot added the A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. label Jul 13, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 17, 2024
Add unstable support for outputting file checksums for use in cargo

Adds an unstable option that appends file checksums and expected lengths to the end of the dep-info file such that `cargo` can read and use these values as an alternative to file mtimes.

This PR powers the changes made in this cargo PR rust-lang/cargo#14137

Here's the tracking issue for the cargo feature rust-lang/cargo#14136.
@bors
Copy link
Collaborator

bors commented Jul 26, 2024

☔ The latest upstream changes (presumably #13947) made this pull request unmergeable. Please resolve the merge conflicts.

@Xaeroxe
Copy link
Author

Xaeroxe commented Jul 26, 2024

Merge conflicts resolved.

src/cargo/util/context/mod.rs Outdated Show resolved Hide resolved
use cargo_test_support::{basic_lib_manifest, basic_manifest, project, rustc_host, rustc_host_env};

#[cargo_test]
fn checksum_actually_uses_checksum() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At minimum, could you structure your PR so its

  • A commit with these tests without -Zchecksum-freshness
  • A commit with the checksum work that also updates the tests to pass -Zchecksum-freshness

A big benefit to this is it shows to reviewers / the community how this feature is comparing to what was being done before

(sometimes, I also break out "adding an unstable feature" into its own commit which is the flag + docs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I've not dug deep into the tests, waiting on this change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's worth noting that freshness_checksum.rs derives very heavily from freshness.rs. So a version of freshness_checksum.rs without the freshness flag would just be a subset of freshness.rs. I'm not sure how this provides new information. There are two tests which are truly unique to freshness_checksum.rs, which are same_size_different_content() and checksum_actually_uses_checksum().

One might debate the merit of duplicating the tests like that. If you really wanted to deduplicate the tests then this would likely require a special case be added to the test runner code.

@rustbot rustbot added the A-documenting-cargo-itself Area: Cargo's documentation label Aug 18, 2024
@bors
Copy link
Collaborator

bors commented Sep 10, 2024

☔ The latest upstream changes (presumably #14493) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@weihanglo weihanglo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are almost there! Thank for yours efforts, epage and Xaeroxe. Thought it would be a tough review, but honestly it was a happy time :)

## checksum-freshness
* Tracking issue: [#14136](https://github.com/rust-lang/cargo/issues/14136)

The `-Z checksum-freshness` flag will replace the use of file mtimes in cargo's
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth noting that build script execution is not included in the current implementation.

@@ -23,6 +23,7 @@ anstream = "0.6.15"
anstyle = "1.0.8"
anyhow = "1.0.86"
base64 = "0.22.1"
blake3 = "1.5.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: need to check the compatibility of blake3 for Tier 1 with Host Tools and Tier 2 with Host Tools, as it contains some assembly code.

});
}
let Ok(checksum) = Checksum::compute(prior_checksum.algo, file) else {
return Some(StaleItem::MissingFile(path.to_path_buf()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is checksum computation failure also a StaleItem::MissingFile?

Comment on lines +2042 to +2047
let Ok(file) = File::open(path) else {
return Some(StaleItem::MissingFile(path.to_path_buf()));
};
let Ok(current_file_len) = file.metadata().map(|m| m.len()) else {
return Some(StaleItem::FailedToReadMetadata(path.to_path_buf()));
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since syscall stat is generally faster than open, should we reorder this part a bit that first compare file size then open it?

let dep_info = target_root.join(dep_info);
let cargo_exe = cargo_exe;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let cargo_exe = cargo_exe;

@@ -2250,3 +2479,201 @@ pub fn parse_rustc_dep_info(rustc_dep_info: &Path) -> CargoResult<RustcDepInfo>
Ok(ret)
}
}

/// Some algorithms are here to ensure compatibility with possible rustc outputs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: checksum part could potentially be split into a module. The LoC here in this file is already frightening.

(though I believe it will not help much 😓)

@@ -2102,13 +2271,14 @@ pub struct RustcDepInfo {
struct EncodedDepInfo {
files: Vec<(DepInfoPathType, PathBuf)>,
env: Vec<(String, Option<String>)>,
checksum: Vec<(DepInfoPathType, PathBuf, u64, String)>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not a blocker)
I feel like files and checksum should eventually merge into one field.

Comment on lines +268 to +271
"the file `{}` has changed (checksum didn't match, {} != {})",
file.display(),
stored_checksum,
new_checksum,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: arg capture whenver possible and making sense

Suggested change
"the file `{}` has changed (checksum didn't match, {} != {})",
file.display(),
stored_checksum,
new_checksum,
"the file `{}` has changed (checksum didn't match, {stored_checksum} != {new_checksum})",
file.display(),

@@ -183,6 +187,16 @@ impl DirtyReason {
DirtyReason::PrecalculatedComponentsChanged { .. } => {
s.dirty_because(unit, "the precalculated components changed")
}
DirtyReason::ChecksumUseChanged { old, new: _ } => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are not using new anywhere, should we just remove this field?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. The test module might need to be rewritten. See #14039

We can sort this out later as follow-ups. Don't worry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file looks like a copy of freshness.rs with -Zchecksum-freshness and masquerade_as_nightly_cargo everywhere.

Could you point out which tests are new and worth a review? Guess they are

  • checksum_actually_uses_checksum
  • same_size_different_content
  • modifying_and_moving

Also could you add a test that verifying -Zchecksum-freshness is gated behind nightly? (A test without masquerade_as_nightly_cargo is sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-execution Area: anything dealing with executing the compiler A-cli Area: Command-line interface, option parsing, etc. A-configuration Area: cargo config files and env vars A-documenting-cargo-itself Area: Cargo's documentation A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. A-rebuild-detection Area: rebuild detection and fingerprinting A-unstable Area: nightly unstable support S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Option to) Fingerprint by file contents instead of mtime
9 participants