Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial version of checksum based freshness #14137

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Xaeroxe
Copy link

@Xaeroxe Xaeroxe commented Jun 25, 2024

Implementation for #14136 and resolves #6529

This PR implements the use of checksums in cargo fingerprints as an alternative to using mtimes. This is most useful on systems with poor mtime implementations.

This has a dependency on rust-lang/rust#126930. It's expected this will increase the time it takes to declare a build to be fresh. Still this loss in performance may be preferable to the issues the ecosystem has had with the use of mtimes for determining freshness.

@rustbot
Copy link
Collaborator

rustbot commented Jun 25, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @weihanglo (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added A-build-execution Area: anything dealing with executing the compiler A-cli Area: Command-line interface, option parsing, etc. A-configuration Area: cargo config files and env vars A-rebuild-detection Area: rebuild detection and fingerprinting A-unstable Area: nightly unstable support S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 25, 2024
Cargo.toml Outdated Show resolved Hide resolved
@rustbot rustbot added the A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. label Jul 13, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 17, 2024
Add unstable support for outputting file checksums for use in cargo

Adds an unstable option that appends file checksums and expected lengths to the end of the dep-info file such that `cargo` can read and use these values as an alternative to file mtimes.

This PR powers the changes made in this cargo PR rust-lang/cargo#14137

Here's the tracking issue for the cargo feature rust-lang/cargo#14136.
@bors
Copy link
Collaborator

bors commented Jul 26, 2024

☔ The latest upstream changes (presumably #13947) made this pull request unmergeable. Please resolve the merge conflicts.

@Xaeroxe
Copy link
Author

Xaeroxe commented Jul 26, 2024

Merge conflicts resolved.

@@ -757,6 +757,7 @@ unstable_cli_options!(
build_std: Option<Vec<String>> = ("Enable Cargo to compile the standard library itself as part of a crate graph compilation"),
build_std_features: Option<Vec<String>> = ("Configure features enabled for the standard library itself when building the standard library"),
cargo_lints: bool = ("Enable the `[lints.cargo]` table"),
checksum_freshness: bool = ("Use a checksum to determine if output is fresh rather than filesystem mtime"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unstable features should also be documented at https://doc.rust-lang.org/nightly/cargo/reference/unstable.html

See src/doc/src/reference/unstable.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 6cf92e1

@@ -708,6 +708,7 @@ Run `{cmd}` to see possible targets."
build_config.build_plan = self.flag("build-plan");
build_config.unit_graph = self.flag("unit-graph");
build_config.future_incompat_report = self.flag("future-incompat-report");
build_config.checksum_freshness = self.flag("checksum-freshness");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might be a misunderstanding which makes me wonder how the tests are working. self.flag is for reading CLI flags from clap but no checksum-freshness CLI flag was created. Instead there is checksum-freshness unstable feature flag that should be accessible as gctx.cli_unstable().checksum_freshness

Copy link
Author

@Xaeroxe Xaeroxe Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 0240bf0, this was just vestigial.

Comment on lines 746 to 749
if build_config.checksum_freshness {
gctx.cli_unstable()
.fail_if_stable_opt("--checksum-freshness", 14136)?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're defining a feature flag for this, we don't need to require -Zunstable-options (which means we should also stop having the tests set it)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 0240bf0

@@ -91,6 +91,7 @@ use serde::Deserialize;
use serde_untagged::UntaggedEnumVisitor;
use time::OffsetDateTime;
use toml_edit::Item;
use tracing::warn;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like dead code?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 5afdc11

use cargo_test_support::{basic_lib_manifest, basic_manifest, project, rustc_host, rustc_host_env};

#[cargo_test]
fn checksum_actually_uses_checksum() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At minimum, could you structure your PR so its

  • A commit with these tests without -Zchecksum-freshness
  • A commit with the checksum work that also updates the tests to pass -Zchecksum-freshness

A big benefit to this is it shows to reviewers / the community how this feature is comparing to what was being done before

(sometimes, I also break out "adding an unstable feature" into its own commit which is the flag + docs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I've not dug deep into the tests, waiting on this change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it's worth noting that freshness_checksum.rs derives very heavily from freshness.rs. So a version of freshness_checksum.rs without the freshness flag would just be a subset of freshness.rs. I'm not sure how this provides new information. There are two tests which are truly unique to freshness_checksum.rs, which are same_size_different_content() and checksum_actually_uses_checksum().

One might debate the merit of duplicating the tests like that. If you really wanted to deduplicate the tests then this would likely require a special case be added to the test runner code.

Comment on lines 24 to 26
p.cargo("build")
.masquerade_as_nightly_cargo(&["checksum-freshness"])
.args(&["-Z", "unstable-options", "-Z", "checksum-freshness"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you do

p.cargo("build -Zchecksum-freshness")

Identifying quickly in the test case what command is being run makes it a lot easier to read the test. This buries it compared to most tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 3cc6814

.file("src/a.rs", "")
.build();

p.cargo("build")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we need cargo build instead of cargo check? The latter helps keep test times down

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, where it was possible to do so without breaking tests I downgraded these to cargo check in 3cc6814

Comment on lines 1521 to 1525
if build_runner.bcx.gctx.cli_unstable().checksum_freshness {
vec![LocalFingerprint::CheckDepInfoChecksums { dep_info }]
} else {
vec![LocalFingerprint::CheckDepInfo { dep_info }]
}
Copy link
Contributor

@epage epage Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you go with separate enums variants? Are we not able to read the checksum feature flag? Would it be better to have a checksum: bool in it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I assume this is tied to the serialization. I wonder if we should have a slight decoupling as I feel like this makes things more complicated. I hope having two types of fingerprints is short term and so we shouldn't try to over-generalize

Copy link
Author

@Xaeroxe Xaeroxe Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me better understand what such a decoupling might look like?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I think I've got it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in d422f64

Copy link
Author

@Xaeroxe Xaeroxe Aug 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original thinking was that I should maintain backwards compatibility with prior build caches, but then I remembered that cargo releases alongside rustc, and rustc doesn't provide backwards compatibility of build caches.

Comment on lines 2004 to 2008
fn make_absolute_path(
ty: DepInfoPathType,
pkg_root: &Path,
path: PathBuf,
target_root: &Path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd make path the last parameter. The first three are all related in determining what the root should be

Alternatively, you could just make this a path_root function that only takes the first three and the join is done in the caller

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved in 5afdc11

Comment on lines 909 to 915
fn dep_info_shared(
pkg_root: &Path,
target_root: &Path,
dep_info: &PathBuf,
cargo_exe: &Path,
gctx: &GlobalContext,
) -> Result<Either<StaleItem, RustcDepInfo>, anyhow::Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name doesn't tell the reader what this function is doing

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function was removed as part of d422f64

Comment on lines 909 to 915
fn dep_info_shared(
pkg_root: &Path,
target_root: &Path,
dep_info: &PathBuf,
cargo_exe: &Path,
gctx: &GlobalContext,
) -> Result<Either<StaleItem, RustcDepInfo>, anyhow::Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not thrilled with using Either, I feel like its obscuring what this is doing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both of these, the problems go away I think if we merge the LocalFingerprint variants

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Function was removed as part of d422f64

@rustbot rustbot added the A-documenting-cargo-itself Area: Cargo's documentation label Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-execution Area: anything dealing with executing the compiler A-cli Area: Command-line interface, option parsing, etc. A-configuration Area: cargo config files and env vars A-documenting-cargo-itself Area: Cargo's documentation A-infrastructure Area: infrastructure around the cargo repo, ci, releases, etc. A-rebuild-detection Area: rebuild detection and fingerprinting A-unstable Area: nightly unstable support S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Option to) Fingerprint by file contents instead of mtime
9 participants