Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow build scripts to specify dependencies #2134

Merged
merged 1 commit into from
Dec 4, 2015

Conversation

alexcrichton
Copy link
Member

Currently Cargo is quite conservative in how it determines whether a build
script should be run. The heuristic used is "did any file in the project
directory change", but this is almost always guaranteed to be too coarse
grained in situations like:

  • If the build script takes a long time to run it's advantageous to run it as
    few times as possible. Being able to inform Cargo about precisely when a build
    script should be run should provide more robust support here.
  • Build scripts may not always have all of their dependencies in-tree or in the
    crate root. Sometimes a dependency could be elsewhere in a repository and
    scripts need a method of informing Cargo about this (as currently these
    compiles don't happen then they should).

This commit adds this support in build scripts via a new rerun-if-changed
directive which can be printed to standard output (using the standard Cargo
metadata format). The value for this key is a path relative to the crate root,
and Cargo will only look at these paths when determining whether to rerun the
build script. Any other file changes will not trigger the build script to be
rerun.

Currently the printed paths may either be a file or a directory, and a directory
is deeply traversed. The heuristic for trigger a rerun is detecting whether any
input file has been modified since the build script was last run (determined by
looking at the modification time of the output file of the build script). This
current implementation means that if you depend on a directory and then delete a
file within it the build script won't be rerun, but this is already the case and
can perhaps be patched up later.

Future extensions could possibly include the usage of glob patterns in build
script paths like the include and exclude features of Cargo.toml, but
these should be backwards compatible to add in the future.

Closes #1162

@rust-highfive
Copy link

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

r? @brson, but cc @wycats on design as well

@rust-highfive rust-highfive assigned brson and unassigned huonw Nov 11, 2015
@wycats
Copy link
Contributor

wycats commented Nov 11, 2015

This looks legit to me. 👍

@brson
Copy link
Contributor

brson commented Nov 11, 2015

What's the motivation for the directory case (I don't see it in the linked issue)? Are there other build systems that do this? I don't see tests for it, and it's admittedly buggy. Can it be excluded?

@briansmith
Copy link

Note that the build.rs script can do multiple things that each have independent dependencies. For example, one part of the build script might build a C library to be linked in, and another part might take a data file and generate Rust code from it. I don't use the gcc crate because it doesn't do dependency analysis (instead I shell out to make or msbuild) and the addition of this feature wouldn't change that.

Also, this feature seems dangerous as currently specified, as far as optimizing the build. In most cases, it would be better to be able to specify patterns of files that, if they are changed, do NOT cause a rebuild to be needed, with cargo by default assuming that any change does require the build script to get re-run. In particular, I expect it would be common for people to forget to manually update the list of dependencies as their build.rs dependencies change; the feature should be designed to do the safest thing when that happens.

@alexcrichton
Copy link
Member Author

@brson

The directory case was motivated by my own intentions which is that I have a bunch of crates with vendored C source code and I was hoping to not reimplement walking the entire directory tree and printing all paths in those cases. That being said I would be fine leaving it out for now and just not traversing and looking at the mtime of the directory itself (which just won't be meaningful information). You're probably right about just not including something that's already broken.


@briansmith

Yeah I would expect the case where multiple things are being done that Cargo would have to be informed about the union of the two and then the build script itself would have to worry about each individually (or just run make and hope it finishes quickly unconditionally).

I would also expect any build dependencies (e.g. gcc, cmake, etc) would print out all these paths by default to automatically get taken care of. It's a good point that it's easy to get something like this wrong, but it's also only easy to get wrong if it's opted into as Cargo will still by default use the "if anything changed" strategy. In that sense I'm ok that if you're opting in you're already acknowledging to opting-in. It's also the case that today Cargo doesn't always rerun build scripts when necessary if files are outside the build root, and from what I've seen it's less so dangerous than it is just annoying.

Currently Cargo is quite conservative in how it determines whether a build
script should be run. The heuristic used is "did any file in the project
directory change", but this is almost always guaranteed to be too coarse
grained in situations like:

* If the build script takes a long time to run it's advantageous to run it as
  few times as possible. Being able to inform Cargo about precisely when a build
  script should be run should provide more robust support here.
* Build scripts may not always have all of their dependencies in-tree or in the
  crate root. Sometimes a dependency could be elsewhere in a repository and
  scripts need a method of informing Cargo about this (as currently these
  compiles don't happen then they should).

This commit adds this support in build scripts via a new `rerun-if-changed`
directive which can be printed to standard output (using the standard Cargo
metadata format). The value for this key is a path relative to the crate root,
and Cargo will only look at these paths when determining whether to rerun the
build script. Any other file changes will not trigger the build script to be
rerun.

Future extensions could possibly include the usage of glob patterns in build
script paths like the `include` and `exclude` features of `Cargo.toml`, but
these should be backwards compatible to add in the future.

Closes rust-lang#1162
@alexcrichton
Copy link
Member Author

Updated with clarification of docs and removal of directory traversal

@tomaka
Copy link
Contributor

tomaka commented Nov 11, 2015

@brson

My pocket-resources crate lists all the files in a directory and includes all them with include_bin!. Therefore the build script needs to be rerun every time a file is added.

@brson
Copy link
Contributor

brson commented Nov 11, 2015

@tomata Is that a suitable replacement for the directory feature, or are you suggesting the directory feature is needed because of what you are doing?

@brson
Copy link
Contributor

brson commented Nov 11, 2015

Uh, @tomaka, sorry tomata.

@tomaka
Copy link
Contributor

tomaka commented Nov 11, 2015

It's an illustration of how the directory feature could be useful. All my game projects contain a resources directory whose content is included inside the game thanks to the pocket-resources crate.

However you could reasonably argue that it's my crate that has a bad design and that should be changed, just like using $(wildcard) in a makefile is considered bad. I don't know enough about this topic to have a strong opinion.

@alexcrichton
Copy link
Member Author

@brson, @tomaka

Yeah I can see where directories can be useful from time to time, but @brson's point about depending on a directory being a little "inherently broken" (e.g. doesn't track file deletions) is pretty convincing to me that it shouldn't happen just yet. It's reasonably easy enough to traverse a directory in Rust and just print out all the files as well, so I'm fine with a "v1" of this just being that you have to manually print out files in a directory.

I don't really expect every crate with a build script to start using this feature. Many build-dependency libraries will likely start working with this but there it's easy enough to implement traversal once and call it a day.

@tomaka
Copy link
Contributor

tomaka commented Nov 12, 2015

If a build script doesn't print rerun-if-changed, is the current behaviour used instead?

And if so, how do you write a build script that has no dependency at all? Do you have to use a hack like rerun-if-changed=build.rs, or is there a clean way?

@alexcrichton
Copy link
Member Author

Yeah the current behavior is used if nothing is printed, and I guess some dummy file like that could be used yeah (or perhaps some directory in the source folder). A new directive could perhaps be added for "no input files" at some point in the future.

@tomaka
Copy link
Contributor

tomaka commented Nov 24, 2015

👍 :bump: I'm eagerly waiting for this!

@huonw
Copy link
Member

huonw commented Nov 25, 2015

By "dependencies" this means file dependencies, not [dependencies] foo = "..." etc., right?

@alexcrichton
Copy link
Member Author

Indeed! Should perhaps consider a better PR title...

@alexcrichton
Copy link
Member Author

ping r? @brson

@brson
Copy link
Contributor

brson commented Dec 4, 2015

@bors r+

@bors
Copy link
Collaborator

bors commented Dec 4, 2015

📌 Commit 7c97c5b has been approved by brson

@bors
Copy link
Collaborator

bors commented Dec 4, 2015

⌛ Testing commit 7c97c5b with merge b2e4ff7...

bors added a commit that referenced this pull request Dec 4, 2015
Currently Cargo is quite conservative in how it determines whether a build
script should be run. The heuristic used is "did any file in the project
directory change", but this is almost always guaranteed to be too coarse
grained in situations like:

* If the build script takes a long time to run it's advantageous to run it as
  few times as possible. Being able to inform Cargo about precisely when a build
  script should be run should provide more robust support here.
* Build scripts may not always have all of their dependencies in-tree or in the
  crate root. Sometimes a dependency could be elsewhere in a repository and
  scripts need a method of informing Cargo about this (as currently these
  compiles don't happen then they should).

This commit adds this support in build scripts via a new `rerun-if-changed`
directive which can be printed to standard output (using the standard Cargo
metadata format). The value for this key is a path relative to the crate root,
and Cargo will only look at these paths when determining whether to rerun the
build script. Any other file changes will not trigger the build script to be
rerun.

Currently the printed paths may either be a file or a directory, and a directory
is deeply traversed. The heuristic for trigger a rerun is detecting whether any
input file has been modified since the build script was last run (determined by
looking at the modification time of the output file of the build script). This
current implementation means that if you depend on a directory and then delete a
file within it the build script won't be rerun, but this is already the case and
can perhaps be patched up later.

Future extensions could possibly include the usage of glob patterns in build
script paths like the `include` and `exclude` features of `Cargo.toml`, but
these should be backwards compatible to add in the future.

Closes #1162
@bors
Copy link
Collaborator

bors commented Dec 4, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants