determinator/
lib.rs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
// Copyright (c) The cargo-guppy Contributors
// SPDX-License-Identifier: MIT OR Apache-2.0

#![warn(missing_docs)]

//! Figure out what packages in a Rust workspace changed between two commits.
//!
//! A typical continuous integration system runs every build and every test on every pull request or
//! proposed change. In large monorepos, most proposed changes have no effect on most packages. A
//! *target determinator* decides, given a proposed change, which packages may have had changes
//! to them.
//!
//! The determinator is desiged to be used in the
//! [Diem Core workspace](https://github.com/diem/diem), which is one such monorepo.
//!
//! # Examples
//!
//! ```rust
//! use determinator::{Determinator, rules::DeterminatorRules};
//! use guppy::{CargoMetadata, graph::DependencyDirection};
//! use std::path::Path;
//!
//! // guppy accepts `cargo metadata` JSON output. Use a pre-existing fixture for these examples.
//! let old_metadata = CargoMetadata::parse_json(include_str!("../../../fixtures/guppy/metadata_guppy_78cb7e8.json")).unwrap();
//! let old = old_metadata.build_graph().unwrap();
//! let new_metadata = CargoMetadata::parse_json(include_str!("../../../fixtures/guppy/metadata_guppy_869476c.json")).unwrap();
//! let new = new_metadata.build_graph().unwrap();
//!
//! let mut determinator = Determinator::new(&old, &new);
//!
//! // The determinator supports custom rules read from a TOML file.
//! let rules = DeterminatorRules::parse(include_str!("../../../fixtures/guppy/path-rules.toml")).unwrap();
//! determinator.set_rules(&rules).unwrap();
//!
//! // The determinator expects a list of changed files to be passed in.
//! determinator.add_changed_paths(vec!["guppy/src/lib.rs", "tools/determinator/README.md"]);
//!
//! let determinator_set = determinator.compute();
//! // determinator_set.affected_set contains the workspace packages directly or indirectly affected
//! // by the change.
//! for package in determinator_set.affected_set.packages(DependencyDirection::Forward) {
//!     println!("affected: {}", package.name());
//! }
//! ```
//!
//! # Platform support
//!
//! * **Unix platforms**: The determinator works and is supported.
//! * **Windows**: experimental support. There may still be bugs around path normalization: please
//!   [report them](https://github.com/guppy-rs/guppy/issues/new)!
//!
//! # How it works
//!
//! A Rust package can behave differently if one or more of the following change:
//! * The source code or `Cargo.toml` of the package.
//! * A dependency.
//! * The build or test environment.
//!
//! The determinator gathers data from several sources, and processes it through
//! [guppy](https://docs.rs/guppy), to figure out which packages need to be re-tested.
//!
//! ## File changes
//!
//! The determinator takes as input a list of file changes between two revisions. For each
//! file provided:
//! * The determinator looks for the package nearest to the file and marks it as changed.
//! * If the file is outside a package, the determinator assumes that everything needs to be
//!   rebuilt.
//!
//! The list of file changes can be obtained from a source control system such as Git. This crate
//! provides a helper which simplifies the process of enumerating file lists while handling some
//! gnarly edge cases. For more information, see the documentation for
//! [`Utf8Paths0`].
//!
//! These simple rules may need to be customized for particular scenarios (e.g. to ignore certain
//! files, or mark a package changed if a file outside of it changes). For those situations, the
//! determinator lets you specify *custom rules*. See the
//! [Customizing behavior](#customizing-behavior) section below for more.
//!
//! ## Dependency changes
//!
//! A dependency is assumed to have changed if one or more of the following change:
//!
//! * For a workspace dependency, its source code.
//! * For a third-party dependency, its version or feature set.
//! * Something in the environment that it depends on.
//!
//! The determinator runs Cargo build simulations on every package in the workspace. For each
//! package, the determinator figures out whether any of its dependencies (including feature sets)
//! have changed. These simulations are done with:
//! * dev-dependencies enabled (by default; this can be customized)
//! * both the host and target platforms set to the current platform (by default; this can be
//!   customized)
//! * three sets of features for each package:
//!   * no features enabled
//!   * default features
//!   * all features enabled
//!
//! If any of these simulated builds indicates that a workspace package has had any dependency
//! changes, then it is marked changed.
//!
//! ## Environment changes
//!
//! The *environment* of a build or test run is anything not part of the source code that may
//! influence it. This includes but is not limited to:
//!
//! * the version of the Rust compiler used
//! * system libraries that a crate depends on
//! * environment variables that a crate depends on
//! * external services that a test depends on
//!
//! **By default, the determinator assumes that the environment stays the same between runs.**
//!
//! To represent changes to the environment, you may need to find ways to represent those changes
//! as files checked into the repository, and add [custom rules](#customizing-behavior) for them.
//! For example:
//!
//! * Use a [`rust-toolchain` file](https://doc.rust-lang.org/edition-guide/rust-2018/rustup-for-managing-rust-versions.html#managing-versions)
//!   to represent the version of the Rust compiler. There is a default rule which causes a full
//!   run if `rust-toolchain` changes.
//! * Record all environment variables in CI configuration files, such as [GitHub Actions workflow
//!   files](https://docs.github.com/en/free-pro-team@latest/actions/reference/workflow-syntax-for-github-actions),
//!   and add a custom rule to do a full run if any of those files change.
//! * As far as possible, make tests hermetic and not reach out to the network. If you only have a
//!   few tests that make network calls, run them unconditionally.
//!
//! # Customizing behavior
//!
//! The standard rules followed by the determinator may need to be tweaked in some situations:
//! * Some files should be ignored.
//! * If some files or packages change, a full test run may be necessary.
//! * *Virtual dependencies* that Cargo isn't aware of may need to be inserted.
//!
//! For these situations, the determinator allows for custom *rules* to be specified. The
//! determinator also ships with
//! [a default set of rules](crate::rules::DeterminatorRules::DEFAULT_RULES_TOML) for common files
//! like `.gitignore` and `rust-toolchain`.
//!
//! For more about custom rules, see the documentation for the [`rules` module](crate::rules).
//!
//! # Limitations
//!
//! While the determinator can bring significant benefits to CI and local workflows, its model is
//! quite different from Cargo's. **Please understand these limitations before using the
//! determinator for your project.**
//!
//! For best results, consider doing occasional full runs in addition to determinator-based runs.
//! You may wish to configure your CI system to use the determinator for pull-requests, and also
//! schedule full runs every few hours on the main branch in case the determinator misses something.
//!
//! ## Build scripts and include/exclude instructions
//!
//! **The determinator cannot run [build scripts](https://doc.rust-lang.org/cargo/reference/build-scripts.html).**
//! The standard Cargo method for declaring a dependency on a file or environment variable is to
//! output `rerun-if-changed` or `rerun-if-env-changed` instructions in build scripts. These
//! instructions must be duplicated through custom rules.
//!
//! **The determinator doesn't track the [`include` and `exclude` fields in
//! `Cargo.toml`](https://doc.rust-lang.org/cargo/reference/manifest.html#the-exclude-and-include-fields).**
//! This is because the determinator's view of what's changed doesn't always align with these
//! fields. For example, packages typically include `README` files, but the determinator has a
//! default rule to ignore them.
//!
//! If a package includes a file outside of it, either move it into the package (recommended) or
//! add a custom rule for it. Exclusions may be duplicated as custom rules that cause those files
//! to be ignored.
//!
//! ## Path dependencies outside the workspace
//!
//! **The determinator may not be able to figure out changes to path dependencies outside the
//! workspace.** The determinator relies on metadata to figure out whether a non-workspace
//! dependency has changed. The metadata includes:
//! * the version number
//! * the source, such as `crates.io` or a revision in a Git repository
//!
//! This approach works for dependencies on `crates.io` or other package repositories, because a
//! change to their source code necessarily requires a version change.
//!
//! This approach also works for [Git
//! dependencies](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html#specifying-dependencies-from-git-repositories).
//! It even works for Git dependencies that aren't pinned to an exact revision in `Cargo.toml`,
//! because `Cargo.lock` records exact revisions. For example:
//!
//! ```toml
//! # Specifying this in Cargo.toml...
//! [dependencies]
//! rand = { git = "https://github.com/rust-random/rand", branch = "master" }
//!
//! # ...results in Cargo.lock with:
//! [[package]]
//! name = "rand"
//! version = "0.7.4"
//! source = "git+https://github.com/rust-random/rand?branch=master#50c34064c80762ddae11447adc6240f42a6bd266"
//! ```
//!
//! The hash at the end is the exact Git revision used, and a change to it is recognized by the
//! determinator.
//!
//! Where this scheme may not work is with path dependencies, because the files on disk can change
//! without a version bump. `cargo build` can recognize those changes because it compares mtimes of
//! files on disk, but the determinator cannot do that.
//!
//! This is not expected to be a problem for most projects that use workspaces. If
//! there's future demand, it would be possible to add support for changes to non-workspace path
//! dependencies if they're in the same repository.
//!
//! # Alternatives and tradeoffs
//!
//! One way to look at the determinator is as a kind of
//! [*cache invalidation*](https://martinfowler.com/bliki/TwoHardThings.html). Viewed through this
//! lens, the main purpose of a build or test system is to cache results, and invalidate those
//! caches based on certain parameters. When the determinator marks a package as changed, it
//! invalidates any cached results for that package.
//!
//! There are several other ways to design caching systems:
//! * The caching built into Cargo and other systems like GNU Make, which is based on file
//!   modification times.
//! * [Mozilla's `sccache`](https://github.com/mozilla/sccache) and other "bottom-up" hash-based
//!   caching build systems.
//! * [Bazel](https://bazel.build/), [Buck](https://buck.build/) and other "top-down" hash-based
//!   caching build systems.
//!
//! These other systems end up making different tradeoffs:
//! * Cargo can use build scripts to track file and environment changes over time. However, it
//!   relies on a previous build being done on the same machine. Also, as of Rust 1.48, there is no
//!   way to use Cargo caching for test results, only for builds.
//! * `sccache` [requires paths to be exact across
//!   machines](https://github.com/mozilla/sccache#known-caveats), and is unable to cache [some
//!   kinds of Rust artifacts](https://github.com/mozilla/sccache/blob/master/docs/Rust.md). Also,
//!   just like Cargo's caching, there is no way to use it for test results, only for builds.
//! * Bazel and Buck have stringent requirements around the environment not affecting build results.
//!   They're also not seamlessly integrated with Cargo.
//! * The determinator works for both builds and tests, but cannot track file and environment
//!   changes over time and must rely on custom rules. This scheme may produce both false negatives
//!   and false positives.
//!
//! While the determinator is geared towards test runs, it also works for builds. If you wish to
//! use the determinator for build runs, consider stacking it with another layer of caching:
//! * Use the determinator as a first pass to filter out packages that haven't changed.
//! * Then use a system like `sccache` to get hash-based caching for builds.
//!
//! # Inspirations
//!
//! This determinator is inspired by, and shares its name with, the target determinator used in
//! Facebook's main source repository.

mod determinator;
pub mod errors;
mod paths0;
pub mod rules;

pub use crate::{determinator::*, paths0::*};