Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add workspaces to Cargo #1525

Merged
merged 10 commits into from
May 4, 2016

Conversation

alexcrichton
Copy link
Member

@alexcrichton alexcrichton commented Mar 3, 2016

Improve Cargo's story around multi-crate single-repo project management by
introducing the concept of workspaces. All packages in a workspace will share
Cargo.lock and an output directory for artifacts.

Cargo will infer workspaces where possible, but it will also have knobs for
explicitly controlling what crates belong to which workspace.

Rendered

Improve Cargo's story around multi-crate single-repo project management by
introducing the concept of workspaces. All packages in a workspace will share
`Cargo.lock` and an output directory for artifacts.

Cargo will infer workspaces where possible, but it will also have knobs for
explicitly controlling what crates belong to which workspace.
@alexcrichton alexcrichton added the T-dev-tools Relevant to the development tools team, which will review and decide on the RFC. label Mar 3, 2016
@alexcrichton alexcrichton self-assigned this Mar 3, 2016
@sfackler
Copy link
Member

sfackler commented Mar 3, 2016

If I have a manually configured workspace, how will cargo figure out that a crate that I'm compiling belongs to a workspace? Will I have had to have built the workspace root first?

@alexcrichton
Copy link
Member Author

When Cargo constructs a workspace it's built upon the condition that all crates in a workspace can transitively reach all others. This means that for any workspace (manually configured or not) any crate can find its way to the root quickly. In that sense it won't matter if you've built the workspace root first or not, Cargo will be able to build any crate in the workspace at any point in time into the workspace root's output folder.

Does that make sense? Not sure if that answers your question...

@gkoz
Copy link

gkoz commented Mar 3, 2016

Are multi-repo workspaces too niche to be considered?

I've been abusing local path overrides for a while now. It would be nice if "local" workspaces could incorporate that. Imagine two repos foo and bar side by side, bar/Cargo.toml containing

[dependencies.foo]
git = "some://where/foo.git"

I'd be happy if the "local" workspace could transparently change that to path = "../foo" and also use a common target directory.

@sfackler
Copy link
Member

sfackler commented Mar 3, 2016

Let's say that I have a project structure like this:

.
├── .git
│   └── ...
├── bar-server
│   ├── Cargo.toml
│   └── src
│       └── main.rs
└── foo
    ├── Cargo.toml
    └── src
        └── lib.rs

bar-server has a path dependency on foo, and wants to be the workspace root. If I'm reading the RFC correctly, bar-server's Cargo.toml will need to explicitly specify workspace-root = true. If I take a fresh clone of this repo, go into the foo dir, and build it, how will it know that it needs to look in bar-server for the lockfile and target directory? Isn't all of the workspace structure metadata in bar-server?

@alexcrichton
Copy link
Member Author

@gkoz

That's an interesting idea! This doesn't currently consider multiple disjoint repositories beyond submodules. Using CARGO_TARGET_DIR as an environment variable should suffice for that use case, and I'm not sure how idiomatic it is in terms of baking support into Cargo. @wycats may have other opinions though.


@sfackler

Ah yes for the workspace to be constructed there you'd need two bits of configuration, one as workspace-root = true in bar-server and one as workspace = ["../bar-server"] in foo. Without the configuration in the foo crate the bar-server crate will have an implicit workspace edge pointing to foo, but because there is no edge pointing back it will not be considered part of the workspace. Once foo has been configured, however, it'll be able to figure that out.

@sfackler
Copy link
Member

sfackler commented Mar 4, 2016

Ok cool, that makes sense.

@nikomatsakis
Copy link
Contributor

@alexcrichton

Ah yes for the workspace to be constructed there you'd need two bits of configuration, one as workspace-root = true in bar-server and one as workspace = ["../bar-server"] in foo. Without the configuration in the foo crate the bar-server crate will have an implicit workspace edge pointing to foo, but because there is no edge pointing back it will not be considered part of the workspace. Once foo has been configured, however, it'll be able to figure that out.

What happens if people don't set this up correctly?

It seems a bit disappointing for such a common setup to require explicit configuration; I'm not sure what the alternative would be, though, except for having cargo trawl the git repo to look for things that depend on the current directory. (which...might be ok)

Still, I think i might prefer to have a Workspace.toml file that just lists both foo and bar-server. You might then have multiple such files and an explicit workspace = ../workspace1.toml declaration (but otherwise cargo just looks to find at most one, and reports an error if there are multiple).

@nikomatsakis
Copy link
Contributor

I asked:

What happens if people don't set this up correctly?

But I think I get it now. If you don't include all the links, things just aren't considered to be in 1 workspace.

@tomaka
Copy link

tomaka commented Mar 4, 2016

I don't like the convention that this RFC suggests:

  1. I prefer the hierarchy in @sfackler's example. Putting a crate in a subdirectory of a parent crate can be confusing, as you can't know which directories are part of a crate without looking inside each of them. For example here how can you guess that s3 is a crate while user is a module?
  2. Whether the user uses source control should be irrelevant for Cargo. The fact that the project contains a .git or a .svn directory should be out of scope. For example I don't know want my project to not compile on someone's machine because they downloaded an archive of the project instead of cloning it.

@netvl
Copy link

netvl commented Mar 4, 2016

How does the proposed design work in a scenario when there are multiple equally important binaries and some common library? For example, client-server programs like my wcd may be structured like this. In wcd I have the following structure:

wcd/
  common/
    Cargo.toml
  wcd/
    Cargo.toml, contains [dependencies] common = { path = "../common" }
  wcdctl/
    Cargo.toml, contains [dependencies] common = { path = "../common" }

I think I would prefer both binaries to be built together (now it is not very convenient to test how client and server interact when something inside common changes, because I have to build and run them separately), but I can't see how exactly they map to workspaces. Should there be two workspaces? Or should I choose one binary crate to be a workspace? Or should common be a workspace? All these options seems not very useful/convenient/straightforward to me. I think that the alternative with a virtual root crate probably would work well in this scenario, but I may be mistaken.

@nikomatsakis
Copy link
Contributor

I don't like the convention that this RFC suggests

Man, I must have been tired when reading the RFC last night. I didn't really grok that it was suggesting that we should structure our repos in a nested fashion. So, e.g., for LALRPOP, where the main lalrpop crate (lalrpop) has a number of subsidiary crates that it uses, I would presumably do something like:

lalrpop/
    src/...
    lalrop-intern/
        src/...

and in this way, lalrpop would implicitly be the "workspace root"? This makes the name 'root' make more sense to me. :)

I also never thought of structuring things in this way, but I can see that if I did so, then this RFC would allow for a lightweight (no annotation) setup.

@alexcrichton
Copy link
Member Author

@nikomatsakis

What happens if people don't set this up correctly?

Ah yes, as you surmised the end result is that workspaces just continue to be disjoint (as they are today). If manual configuration is incorrect (e.g. a workspace key points to a crate not actually in a workspace) then an error is generated. Any implicit configuration (by structuring the project) is just ignored, however, if it's "incorrect".


@tomaka

I don't like the convention that this RFC suggests:

One of the major motivational factors for this RFC is to have zero configuration in the conventional use case. A scheme such as you and @sfackler are proposing is certainly covered by this RFC (wish some configuration), but it's unclear to me how it could be the default. The requirement is to be able to go from any crate to all of its workspace crates quickly (eg can't walk the whole filesystem).

Do you have something in mind, however, to alleviate this? To reiterate, this RFC doesn't preclude any source layout, it's just blessing one as the conventional way to organize crates where features work without extra configuration.

Whether the user uses source control should be irrelevant for Cargo.

That's correct, but it's also why it's just a heuristic as part of this RFC. Any and all project layouts are supported, it's just a question of which require less configuration. Most projects usually have some sort of source repo at the top which is a good heuristic for probing (but that's it).


@netvl

How does the proposed design work in a scenario when there are multiple equally important binaries and some common library?

You'd basically choose one of your crates to be the root, and then you'd have to configure a few workspace keys. After that all output would be placed in your chosen root. It's true that a "virtual package" would perhaps make this somewhat easier to configure because you could place a virtual Cargo.toml at the root of the crate and then everything would automatically be connected to that workspace.

Do you think the configuration necessary to select one of the binaries as the root is too much, though? Is that a case in favor of some form of "virtual package"?


@nikomatsakis

I also never thought of structuring things in this way, but I can see that if I did so, then this RFC would allow for a lightweight (no annotation) setup.

Yeah that sort of layout would require no extra configuration to benefit, but as @tomaka mentions it may not always make sense for projects. There may be another heuristic we could use to find a "root crate" without any extra configuration, but I'm drawing blanks...

@tomaka
Copy link

tomaka commented Mar 4, 2016

An alternative could be to separate workspaces from packages, and create another configuration file named for example Cargo-workspace.toml. This is what the Visual Studio IDE does, and I think most IDEs do something similar.

If you want to have one crate per directory, you can just put a Cargo-workspace.toml at the root.
It would also cover @netvl's use case.

EDIT: Oops, that was already suggested :-/

@netvl
Copy link

netvl commented Mar 4, 2016

@alexcrichton

Do you think the configuration necessary to select one of the binaries as the root is too much, though? Is that a case in favor of some form of "virtual package"?

Well, I'm not sure whether it is too much or not, but the whole concept, when applied to "equally important" binaries, seems wrong to me. I just dislike the need to choose one of the crates as a root arbitrarily. Maybe if a client-server application is not a convincing example, then something like coreutils is? It is a collection of lots of different binaries using a single common library, and I doubt that it makes any sense to select one of them as a workspace root. Something like cargo-edit also qualifies, I believe. So yes, I certainly think that virtual packages or some equivalent solution is needed here.

@withoutboats
Copy link
Contributor

I'm a little unclear on the implications of the 'workspace inference' rules, which are described abstractly but don't have any examples of what organizations they can infer. I think the standard way to organize a multi-crate project is this:

parent_crate/
  src/
  child_crate_1/
    src/
    grandchild_crate/
      src/
  child_crate_2/
    src/

My understanding is that this structure will be inferred under the proposal and require no annotation. That sounds great.

But will other structures be inferred? Like @tomaka mentioned, I think the practice of putting multiple crates inside of a src directory is confusing and difficult to approach, and I'd rather that if people wanted to do something like that, cargo requires them to provide the explicit annotations. It'd be good in my opinion to have configuration, but also have a clear convention.

@Ericson2314
Copy link
Contributor

@nikomatsakis Still, I think i might prefer to have a Workspace.toml file that just lists both foo and bar-server.

Yes! Has anyone looked at Haskell's Stack http://docs.haskellstack.org? IMO whenever I develop multiple crates at once, the idea is almost always that while the libraries are easiest to develop together, they are still full fledged independent libraries:

  • The libraries should be released/uploaded to crates.io separately.
  • Semver is sufficient to force a tight coupling if need be.
  • A patch release in one library shouldn't and won't necessitate a release of the other.

As such a view Workspace.toml as something more akin to the overrides in a cargo config file than a normal package. crates.io package should always refer to other libraries on crates.io, but for lockstep development one uses the local version instead.

If the crates are released/uploaded to crates.io separately, they should NOT be nested in the filesystem, otherwise we need to prune sub trees when uploading so one library doesn't contain others' source.

@Ericson2314
Copy link
Contributor

The main problem I have stack's stack.yaml is that it somewhat takes on the role of a lockfile in that the version control solver is often used to generate it. But this RFC does get that right. Workspace.toml has "input constraints", further constraining or overriding the constraints in each cargo file, and Cargo.lock (maybe Workspace.lock to avoid compatibility issues?), as has always been the case, has "output constraints"---the concrete resolved dependencies.

@Ericson2314
Copy link
Contributor

I think @netvl concern's basically boil down to "dependencies form a DAG, but nested crates form a tree". IMO that right there is the simplest reason that nesting crates is going to suck---simpler and better than the problem I described 2 comments previous :).

@nikomatsakis
Copy link
Contributor

@withoutboats

Like @tomaka mentioned, I think the practice of putting multiple crates inside of a src directory is confusing and difficult to approach

Are you saying that the directory structure itself is confusing, or that the annotations one would need (under this RFC) to support it are confusing?


@alexcrichton

I definitely think the RFC would be improved by specifying the expected directory layout. It makes the rules make much more sense.

Do you know if this "nested" structure is in common use today?

@tomaka
Copy link

tomaka commented Mar 7, 2016

Do you know if this "nested" structure is in common use today?

I don't know for the nested structure, but the non-nested structure is not uncommon:

@withoutboats
Copy link
Contributor

Are you saying that the directory structure itself is confusing, or that the annotations one would need (under this RFC) to support it are confusing?

The former. I expect everything inside src to be a single crate. There may be good reasons to include other crates inside that directory, but I'd rather for cargo's rules to encourage a narrow set of possible directory structure conventions.

@alexcrichton
Copy link
Member Author

@tomaka

An alternative could be to separate workspaces from packages, and create another configuration file named for example Cargo-workspace.toml.

Yeah this is certainly always possible (and is one of the alternatives in the RFC), but one of the key points here is that an idiomatic project structure should require zero extra configuration (including this extra manifest).


@netvl

Thanks for the examples! I can certainly see the desire to have a manually selected root location for the workspace, but I'm pretty hesitant to so easily add the concept of virtual crates or packages. It seems like a pretty significant feature that may not quite pull its weight if it's just used for workspaces. For now this could always be worked around with a "dummy package" that ends up being the root, which although not quite as elegant should at least serve as a vector in the interim.

@wycats may have more thoughts on the "virtual package" idea, though. I believe he's thought about it more than I have.


@withoutboats

Could you clarify what you're finding hard to understand about how workspaces are constructed? It's described concretely because that's what'll end up being implemented (I tend to prefer that over a somewhat hand-wavy description). I can certainly add some examples though!


@nikomatsakis

I've seen lots of examples with a nested layout and also lots with a nothing-at-the-root layout. I wouldn't necessarily say one is more common than the other. As I mentioned in my reply to @tomaka, however, there's a fundamental problem of child crates still somehow need to quickly discover their parent. That tips the scales in favor (to me) much more in towards a something-should-be-at-the-root structure.

I've tried to add some examples though which may help.

@gkoz
Copy link

gkoz commented Mar 7, 2016

Perhaps alternatives should include using .cargo/config for this? There already is target-dir (which seems to treat relative paths differently from paths for some reason). It could also specify a common lockfile.

@Ericson2314
Copy link
Contributor

An empty Workspace.toml can tell Cargo to just including any Cargo.toml in the work-space in the current subtree. Cargo, when walking up the filesystem will look for Cargo.tomls and Workspace.tomls, preferring the latter. That's nearly zero configuration, and IMO vastly preferable to trying to be clever with .git.

@withoutboats
Copy link
Contributor

Could you clarify what you're finding hard to understand about how workspaces are constructed? It's described concretely because that's what'll end up being implemented (I tend to prefer that over a somewhat hand-wavy description). I can certainly add some examples though!

It's necessary that a concrete implementation be provided in the RFC for implementation, but some examples of the kinds of directory trees these rules infer and the kinds these rules cannot infer would help to understand how it impacts me as a user.

@alexcrichton
Copy link
Member Author

Ok, the tools team discussed this RFC during its triage meeting the other day, and the conclusion was somewhat mixed. We did not decide to merge just yet, but were somewhat hesitant to do so. There was some pushback about this being an "instantly stable" feature in Cargo because Cargo does not have unstable features like the compiler, and there was also discussion about perhaps blocking this until that existed or somehow ensuring it doesn't reach the stable channel just yet. The question of unstable features in Cargo, however, is pretty weighty, so I'd prefer to not tackle it at this time if possible.

I fear that this RFC was perhaps not well written which led to confusion in how it was interpreted. I've rewritten and reorganized the RFC in a format which I hope is much more clear and much simpler than before. Note, however, that there is no semantic change from the previous draft, just hopefully more understandable!

cc @rust-lang/tools


@brson

Why have two different ways to put a package into workspace?

We discussed this in the tools meeting, but I think that the current draft is more clear on this as well. There's only one way to put a package in a workspace, workspace.members, and that's it. A valid workspace must have members point back to the root, but pointing to a root doesn't make you a member.

Does this mean that in practice you don't have to mention [workspace] at all in your Cargo.toml?

To clarify, no workspace will exist unless [workspace] is mentioned somewhere.

If there are two local workspaces connected to a third root crate how do you keep the workspaces seperate, and not have them implicitly become one big workspace?

As pointed out by @Ericson2314 it's not possible for a crate to be a member of more than one workspace (it can only point to one root)


@nrc

As discussed in the tools triage meeting, the "we don't walk past VCS roots" part has been removed. There is no other VCS related portion of this RFC (to clarify), which I believe addresses your concerns!


@jnicklas

but I would also like to argue against using all path dependencies in a workspace automatically

The RFC has been changed slightly to account for this (which I think is simpler as well) to say that path dependencies are only used if workspace.members isn't specified.

Instead I'd like to propose the following:

Hm I think that the current draft may be a bit more understandable, does it help clear things up? I think it covers many of your points, but I'm not 100% certain.

@jnicklas
Copy link

@alexcrichton the new revision of this RFC is awesome! Really clarifies things and the added explicitness will make things much easier to understand, imo. Definitely addresses my concerns at least!

of crates pointing back to the root may not. If, however, this restriction were
not in place then the set of crates in a workspace may differ depending on
which crate it was viewed from. For example if workspace root A includes B then
it will think B is in A's workspace. If, however, B does ont point back to A,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: ont should be not.

(Please tell me if I'm not allowed to point out typos here)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

```toml
# ws1/Cargo.toml
[workspace]
members = ["crate1", "crate2"]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "Note that members aren't necessary ..." ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is required here as the newly created manifest does not otherwise depend on the crates.

@alexcrichton
Copy link
Member Author

The tools team discussed this RFC during triage yesterday and the decision was to merge. Once this is implemented we're likely to reevaluate right before the next release to see if we want to let it ride the trains to stable, and we may, for one release, temporarily prevent that particular released version of Cargo from having workspaces. This'll all be based on our experience with workspaces up to that point, however!

Thanks again for the discussion everyone!

@alexcrichton alexcrichton merged commit 5ce14f3 into rust-lang:master May 4, 2016
lambda-fairy added a commit to lambda-fairy/maud that referenced this pull request Dec 30, 2016
@Centril Centril added the A-workspaces Workspace related proposals & ideas label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-workspaces Workspace related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-dev-tools Relevant to the development tools team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.