initial version of checksum based freshness #14137

Xaeroxe · 2024-06-25T05:18:22Z

Implementation for #14136 and resolves #6529

This PR implements the use of checksums in cargo fingerprints as an alternative to using mtimes. This is most useful on systems with poor mtime implementations.

This has a dependency on rust-lang/rust#126930. It's expected this will increase the time it takes to declare a build to be fresh. Still this loss in performance may be preferable to the issues the ecosystem has had with the use of mtimes for determining freshness.

rustbot · 2024-06-25T05:18:26Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @weihanglo (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

Cargo.toml

Add unstable support for outputting file checksums for use in cargo Adds an unstable option that appends file checksums and expected lengths to the end of the dep-info file such that `cargo` can read and use these values as an alternative to file mtimes. This PR powers the changes made in this cargo PR rust-lang/cargo#14137 Here's the tracking issue for the cargo feature rust-lang/cargo#14136.

bors · 2024-07-26T22:03:01Z

☔ The latest upstream changes (presumably #13947) made this pull request unmergeable. Please resolve the merge conflicts.

Xaeroxe · 2024-07-26T22:23:39Z

Merge conflicts resolved.

epage · 2024-08-01T20:25:09Z

src/cargo/core/features.rs

@@ -757,6 +757,7 @@ unstable_cli_options!(
    build_std: Option<Vec<String>>  = ("Enable Cargo to compile the standard library itself as part of a crate graph compilation"),
    build_std_features: Option<Vec<String>>  = ("Configure features enabled for the standard library itself when building the standard library"),
    cargo_lints: bool = ("Enable the `[lints.cargo]` table"),
+    checksum_freshness: bool = ("Use a checksum to determine if output is fresh rather than filesystem mtime"),


Unstable features should also be documented at https://doc.rust-lang.org/nightly/cargo/reference/unstable.html

See src/doc/src/reference/unstable.md

Resolved in 6cf92e1

epage · 2024-08-01T20:28:36Z

src/cargo/util/command_prelude.rs

@@ -708,6 +708,7 @@ Run `{cmd}` to see possible targets."
        build_config.build_plan = self.flag("build-plan");
        build_config.unit_graph = self.flag("unit-graph");
        build_config.future_incompat_report = self.flag("future-incompat-report");
+        build_config.checksum_freshness = self.flag("checksum-freshness");


I think there might be a misunderstanding which makes me wonder how the tests are working. self.flag is for reading CLI flags from clap but no checksum-freshness CLI flag was created. Instead there is checksum-freshness unstable feature flag that should be accessible as gctx.cli_unstable().checksum_freshness

Resolved in 0240bf0, this was just vestigial.

epage · 2024-08-01T20:29:08Z

src/cargo/util/command_prelude.rs

+        if build_config.checksum_freshness {
+            gctx.cli_unstable()
+                .fail_if_stable_opt("--checksum-freshness", 14136)?;
+        }


If we're defining a feature flag for this, we don't need to require -Zunstable-options (which means we should also stop having the tests set it)

Resolved in 0240bf0

epage · 2024-08-01T20:29:19Z

src/cargo/util/context/mod.rs

@@ -91,6 +91,7 @@ use serde::Deserialize;
 use serde_untagged::UntaggedEnumVisitor;
 use time::OffsetDateTime;
 use toml_edit::Item;
+use tracing::warn;


This looks like dead code?

Resolved in 5afdc11

epage · 2024-08-01T20:31:14Z

tests/testsuite/freshness_checksum.rs

+use cargo_test_support::{basic_lib_manifest, basic_manifest, project, rustc_host, rustc_host_env};
+
+#[cargo_test]
+fn checksum_actually_uses_checksum() {


At minimum, could you structure your PR so its

A commit with these tests without -Zchecksum-freshness

A commit with the checksum work that also updates the tests to pass -Zchecksum-freshness

A big benefit to this is it shows to reviewers / the community how this feature is comparing to what was being done before

(sometimes, I also break out "adding an unstable feature" into its own commit which is the flag + docs)

Note: I've not dug deep into the tests, waiting on this change

So it's worth noting that freshness_checksum.rs derives very heavily from freshness.rs. So a version of freshness_checksum.rs without the freshness flag would just be a subset of freshness.rs. I'm not sure how this provides new information. There are two tests which are truly unique to freshness_checksum.rs, which are same_size_different_content() and checksum_actually_uses_checksum().

One might debate the merit of duplicating the tests like that. If you really wanted to deduplicate the tests then this would likely require a special case be added to the test runner code.

epage · 2024-08-01T20:34:08Z

tests/testsuite/freshness_checksum.rs

+    p.cargo("build")
+        .masquerade_as_nightly_cargo(&["checksum-freshness"])
+        .args(&["-Z", "unstable-options", "-Z", "checksum-freshness"])


nit: could you do

p.cargo("build -Zchecksum-freshness")

Identifying quickly in the test case what command is being run makes it a lot easier to read the test. This buries it compared to most tests.

Resolved in 3cc6814

epage · 2024-08-01T20:34:28Z

tests/testsuite/freshness_checksum.rs

+        .file("src/a.rs", "")
+        .build();
+
+    p.cargo("build")


Is there a reason we need cargo build instead of cargo check? The latter helps keep test times down

No, where it was possible to do so without breaking tests I downgraded these to cargo check in 3cc6814

epage · 2024-08-01T20:38:29Z

src/cargo/core/compiler/fingerprint/mod.rs

+        if build_runner.bcx.gctx.cli_unstable().checksum_freshness {
+            vec![LocalFingerprint::CheckDepInfoChecksums { dep_info }]
+        } else {
+            vec![LocalFingerprint::CheckDepInfo { dep_info }]
+        }


Why did you go with separate enums variants? Are we not able to read the checksum feature flag? Would it be better to have a checksum: bool in it?

Ah, I assume this is tied to the serialization. I wonder if we should have a slight decoupling as I feel like this makes things more complicated. I hope having two types of fingerprints is short term and so we shouldn't try to over-generalize

Can you help me better understand what such a decoupling might look like?

Nevermind, I think I've got it.

Resolved in d422f64

The original thinking was that I should maintain backwards compatibility with prior build caches, but then I remembered that cargo releases alongside rustc, and rustc doesn't provide backwards compatibility of build caches.

epage · 2024-08-01T20:40:50Z

src/cargo/core/compiler/fingerprint/mod.rs

+fn make_absolute_path(
+    ty: DepInfoPathType,
+    pkg_root: &Path,
+    path: PathBuf,
+    target_root: &Path,


nit: I'd make path the last parameter. The first three are all related in determining what the root should be

Alternatively, you could just make this a path_root function that only takes the first three and the join is done in the caller

Resolved in 5afdc11

epage · 2024-08-01T20:45:20Z

src/cargo/core/compiler/fingerprint/mod.rs

+fn dep_info_shared(
+    pkg_root: &Path,
+    target_root: &Path,
+    dep_info: &PathBuf,
+    cargo_exe: &Path,
+    gctx: &GlobalContext,
+) -> Result<Either<StaleItem, RustcDepInfo>, anyhow::Error> {


This name doesn't tell the reader what this function is doing

Function was removed as part of d422f64

epage · 2024-08-01T20:46:24Z

src/cargo/core/compiler/fingerprint/mod.rs

+fn dep_info_shared(
+    pkg_root: &Path,
+    target_root: &Path,
+    dep_info: &PathBuf,
+    cargo_exe: &Path,
+    gctx: &GlobalContext,
+) -> Result<Either<StaleItem, RustcDepInfo>, anyhow::Error> {


Not thrilled with using Either, I feel like its obscuring what this is doing.

For both of these, the problems go away I think if we merge the LocalFingerprint variants

Function was removed as part of d422f64

rustbot assigned weihanglo Jun 25, 2024

This was referenced Jun 25, 2024

Tracking Issue for checksum freshness #14136

Open

Add unstable support for outputting file checksums for use in cargo rust-lang/rust#126930

Open

Xaeroxe force-pushed the checksum-freshness branch from 320f73c to 310cd79 Compare June 25, 2024 05:37

tgross35 reviewed Jun 25, 2024

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

Xaeroxe force-pushed the checksum-freshness branch from cce62ba to 59441b6 Compare June 26, 2024 05:55

Xaeroxe mentioned this pull request Jul 1, 2024

MCP: Alternate cargo freshness algorithm, unstable flag to annotate depinfo file with checksums and file sizes rust-lang/compiler-team#765

Closed

3 tasks

rustbot added the A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. label Jul 13, 2024

Xaeroxe force-pushed the checksum-freshness branch from 27e2a18 to c83be55 Compare July 13, 2024 14:58

Xaeroxe added 5 commits July 26, 2024 16:23

initial version of checksum based freshness

6f92c69

remove support for md5 and sha1, add support for xxhash

0f12264

switch to using twox-hash instead due to licensing issues

a264c26

update to use blake3 instead of xxhash per rustc MCP resolution

06c2024

add bsd 2 clause to allow list for blake3 dep arrayref

b8c21fa

Xaeroxe force-pushed the checksum-freshness branch from c83be55 to b8c21fa Compare July 26, 2024 22:23

epage reviewed Aug 1, 2024

View reviewed changes

Xaeroxe added 5 commits August 17, 2024 20:25

remove vestigial BuildConfig flags

0240bf0

condense fingerprint variants into one

d422f64

rearrange function signature, remove unused imports

5afdc11

Add appropriate documentation

6cf92e1

Simplify how tests invoke cargo, use check where possible

3cc6814

rustbot added the A-documenting-cargo-itself Area: Cargo's documentation label Aug 18, 2024

gustavovalverde mentioned this pull request Sep 2, 2024

cargo build --dependencies-only #2644

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial version of checksum based freshness #14137

initial version of checksum based freshness #14137

Xaeroxe commented Jun 25, 2024 •

edited

Loading

rustbot commented Jun 25, 2024

bors commented Jul 26, 2024

Xaeroxe commented Jul 26, 2024

epage Aug 1, 2024

Xaeroxe Aug 18, 2024

epage Aug 1, 2024

Xaeroxe Aug 18, 2024 •

edited

Loading

epage Aug 1, 2024

Xaeroxe Aug 18, 2024

epage Aug 1, 2024

Xaeroxe Aug 18, 2024

epage Aug 1, 2024

epage Aug 1, 2024

Xaeroxe Aug 18, 2024

epage Aug 1, 2024

Xaeroxe Aug 18, 2024

epage Aug 1, 2024

Xaeroxe Aug 18, 2024

epage Aug 1, 2024 •

edited

Loading

epage Aug 2, 2024

Xaeroxe Aug 18, 2024 •

edited

Loading

Xaeroxe Aug 18, 2024

Xaeroxe Aug 18, 2024

Xaeroxe Aug 18, 2024 •

edited

Loading

epage Aug 1, 2024

Xaeroxe Aug 18, 2024

epage Aug 1, 2024

Xaeroxe Aug 18, 2024

epage Aug 1, 2024

epage Aug 1, 2024

Xaeroxe Aug 18, 2024

initial version of checksum based freshness #14137

Are you sure you want to change the base?

initial version of checksum based freshness #14137

Conversation

Xaeroxe commented Jun 25, 2024 • edited Loading

rustbot commented Jun 25, 2024

bors commented Jul 26, 2024

Xaeroxe commented Jul 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xaeroxe Aug 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

epage Aug 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xaeroxe Aug 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xaeroxe Aug 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xaeroxe commented Jun 25, 2024 •

edited

Loading

Xaeroxe Aug 18, 2024 •

edited

Loading

epage Aug 1, 2024 •

edited

Loading

Xaeroxe Aug 18, 2024 •

edited

Loading

Xaeroxe Aug 18, 2024 •

edited

Loading