Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Context into two types #5432

Merged
merged 5 commits into from
May 2, 2018
Merged

Split Context into two types #5432

merged 5 commits into from
May 2, 2018

Conversation

djc
Copy link
Contributor

@djc djc commented Apr 28, 2018

@matklad here's the crazy attempt to make the Context situation slightly more sane. It basically splits all of the stuff that is immutable over the lifetime of the compilation from the stuff that's initialized empty. The latter is now stored in a type which I have provisionally called BuildProgress -- feel free to bikeshed about the name (I came up with BuildData as another alternative). It stores a reference to the Context to make things easier for now, we might want to change that down the line.

IMO this is a pretty good improvement, which clarifies the design quite a bit.

@alexcrichton you'll probably want to review this, too.

@rust-highfive
Copy link

r? @matklad

(rust_highfive has picked a reviewer for you, use r? to override)

@bors
Copy link
Collaborator

bors commented Apr 28, 2018

☔ The latest upstream changes (presumably #5425) made this pull request unmergeable. Please resolve the merge conflicts.

Copy link
Member

@matklad matklad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks wonderful! I love how we separate immutable Context from mutable BuildProgress, and it's really nice that these files are all of a manageable size now!

I'm ready to merge this as is! Here are some other things we might do here as well:

  • Looks like links.rs file could be moved inside the build_progress dir?

  • It would be interesting to explore moving build_unit_dependencies out of BuildProgress and store a unit_deps: &'a HashMap<Unit<'a>, Vec<Unit<'a>>> inside a BuildProgress. I think that should make it possible to fix TODO on the dep_targets.

profiles,
extra_compiler_args,
)?;
cx.compile(&units, export_dir.clone(), &exec)?
let mut progress = BuildProgress::new(config, &cx)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we can remove mut from cx now! Not sure why rustc does not warn about it though...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I'll fix it, and I've written up a rustc bug: rust-lang/rust#50321.

@alexcrichton
Copy link
Member

Thanks @djc! I type out cx quite a lot when working in these areas and it's pretty nice having it be short, so perhaps we could take a leaf out of the compiler's book and name this something like BuildContext and bcx?

these files are all of a manageable size now

FWIW I've found optimizing for this sort of goal to really hamper readability/productivity in the past. I've found myself often burning far too many cycles on reorganizing things that otherwise wouldn't need reorganization if they were in one file. I think I'm in general much more comfortable with larger files than most, but I would not personally want a goal of Cargo to have every file be under 500 lines, for example.

An example of where I think this is hurts is that I'd pesonally find it far easier to understand four 2000 line files rather than eighty one-hundred line files. Now clearly it's also bad to have one 8000 line file, but just want to register my thinking that I don't want to slide either way on these spectrums.

I've personally got a lot of catching up to do with all the refactorings, and I'm also one of the primary people modifying all this code which takes a second to catch up to. Certainly not a problem, but I'd prefer if we kept some of the original structure sometimes in the sense that it naturally grew into that state (which can of course sometimes be bad as well as good).

@djc
Copy link
Contributor Author

djc commented Apr 29, 2018

I was thinking it makes more sense to me that this is not called something Context, because I feel conceptually the name Context makes more sense for something that is less mutable. I was thinking something like BuildData (abbreviated bdata), or BuildState (but that's already in use for something else, could be bst?).

As for the goals for this refactoring, I just want to say that first, getting to manageable file sizes was not a goal for me with this one; I'd be okay keeping Context and whatever we call BuildProgress in the same file. For me, this is much more about loose coupling: making it clear what things are tightly coupled and which things loosely coupled really helps a lot when you have to make some modifications.

About the refactoring churn: I totally understand if you are a bit uncomfortable (if that's what you're describing) with all these structural changes. However, I'd also like to argue the case from the other side: as someone who's only recently started trying to understand the Cargo code, I've felt a design was very much lacking in these areas of the code -- it's pretty obvious that it naturally grew into this state. I think it makes sense to optimize more for the pool of future contributors than the (supposedly smaller) pool of people who go a long way back with this code, and I also deeply believe that making the design more explicit and clear in the type system will make future changes easier.

I do think there's a bunch more work to do before we reach a more sensible design. If you prefer, we can also discuss these ideas upfront some time, so we can reach consensus on the goals before I put in a bunch that you think moves us in the wrong direction.

@djc djc force-pushed the split-context branch 2 times, most recently from 0a1e62d to 66ae384 Compare April 29, 2018 20:02
@djc
Copy link
Contributor Author

djc commented Apr 29, 2018

So I've rebased this. I've just moved the Links type straight into the build_progress module, since it's just 72 lines of code and the usage is completely internal to BuildProgress. I've also moved to store the unit dependencies into BuildProgress instead of accumulating them first (although it's likely the compiler would have optimized that out anyway). Changing dep_targets() to return a slice is a different can of worms though, since that gets you into borrow checker trouble with overlapping borrows -- I'd prefer to leave that to a follow-up change.

In general, as said in my previous comment, I have a bunch of follow-up in mind, but I prefer landing stuff in smaller chunks, because rebasing cargo code has already eaten up a bunch of time.

@alexcrichton
Copy link
Member

To be clear I'm totally fine with refactorings and I think it definitely makes sense to do so over time especially as cruft accrues. What I'm trying to get at is that we shouldn't 100% swing the pendulum in the "refactor everything" direction. Quite a lot of time and thought has gone into writing Cargo as-is today and I would argue it's already "sensible design," although of course those who aren't as familiar may not entirely agree and it's totally ok to change things as a result.

making the design more explicit and clear in the type system will make future changes easier

This is also what I really want to caution against as well. I don't want to go "full typestate" on everything in Cargo, for example. There's a lot of ways to interpret this sort of refactoring goal and some of them I would strongly disagree with (unsure which direction you're thinking of going). I think that having the general ideas written down somewhere would definitely be helpful for me!

I think I'm personally not a huge fan of piecemeal refactoring without a clear end goal and end state, so that may also be what I'm reacting to here. Some more information would certainly help quite a lot!

I was thinking it makes more sense to me that this is not called something Context, because I feel conceptually the name Context makes more sense for something that is less mutable.

Isn't this all either mutable or contextual? I personally am very used to "context" being all over the place in Rust and it has a very natural abbreviation as "cx" which works well too.

@djc
Copy link
Contributor Author

djc commented Apr 30, 2018

Sorry -- implying that the current design is not sensible is a bit too blunt. I'd also be very interested in your description of the architecture/design of this part of the code. I looked at ARCHITECTURE.md, but it does not appear to go into much depth about the whole compilation pipeline.

I think what I was trying to say is that I think the design in many places does not become clear easily from reading the code. What I'm trying to do with my refactoring is make the design emerge more clearly from the types of functions and structs: I'm not talking about very precise typestate or anything like that, just that data structures which are used together are part of the same struct, that function and method signatures are conceptually obvious. For example, with the large Context, it was passed around a lot, but passing it had no clear meaning: it was not clear which of the 22 fields might be accessed or even mutated. As this PR makes clear, it seems that those 22 fields are actually divided into two classes: those which are immutable over the lifetime of the compilation (and not dependent on the Units passed in), and those which are not.

So in terms of further work, one thing is that it seems to me that code in core should probably not be calling into ops (this is already largely the case), so as to provide a clear layering with bin -> ops (which can double as an API layer that should be sufficient for most consumers) -> core. Another goal would be to hide types and functions that are used internally from the higher API layers. So for example, I think cargo::ops::compile() should probably not know about what is now called BuildProgress, but instead should call something like compiler::compile(Context, Executor, Units) (it looks like export_dir could live in Context). Then, I'm wondering if it could make sense to split up BuildProgress a bit more; the code in BuildProgress::compile() suggests there might be three stages (gather input, execute, gather output) that would be more obvious if they had different data structured, coordinated by this new compiler::compile() function. Since Compilation is ostensibly the output from the compile process, it might make more sense for BuildProgress (or its successors) to hold a mutable reference to it rather than owning it.

A lot of this is also about getting the data flow more clear. For example, if this PR is merged, we can build on it a little bit (I have this mostly done in #5358 already) to put all the information gathering from rustc in one place, which should make it possible to cache more data in the JSON file @matklad recently inserted. Also, I noticed you somewhere saying that we'd like to use futures in the possible, which I think makes a lot of sense; I hope these changes will also help make that easier.

Isn't this all either mutable or contextual? I personally am very used to "context" being all over the place in Rust and it has a very natural abbreviation as "cx" which works well too.

To me, the word means something similar like "environment" or "setting", which to me implies it at least mostly immutable (as a non-native speaker, I checked the dictionary to confirm, but it seems to agree with this view). I think the word "state" makes more sense for something that starts out empty and is filled with stuff over the course of some particular process.

@bors
Copy link
Collaborator

bors commented May 1, 2018

☔ The latest upstream changes (presumably #5448) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton
Copy link
Member

For sure yeah, and again I definitely don't consider what's currently there ideal by any means. I'm mostly just getting worried in these that I don't 100% agree with these changes as-is and hearing that this is just the beginning makes me a bit worried. In any case it's best to hash it out beforehand I think!

One of the reasons I personally like a large "context" type is that it's far easier to maintain over time. Whether or not a field is mutable actually changes over time as bugs are fixed, features are added, etc. The Context barely looks like it was first added and the field makeup has changed quite a bit over time. For example PRs like #5301 are adding basically a whole new mode to executing the context and dealing with the various fields.

I agree that it's definitely easier to understand and grok smaller structs and such, but it's often hostile (I think) to do so to productivity. For example if I'm adding a new field now I've got to think hard about whether it's mutable or not at this point in time, allocate it appropriately to one structure, and write all the code. Down the road it may turn out to be better as a mutable structure which means it's gotta get moved around and updated everywhere again.

Now what I'm thinking is definitely a bit of an extreme example, for something as simple as mutability I think it's good to have a split as it's easy to see what's precalculated and what's not, and overall changes aren't too hard. If this is the beginning of a bunch-more follow-up, though, I fear that this'll skew too far in the direction of sharding things beyond the point of productivity.

So to also reiterate again, I'm totally fine with this PR as-is and the goals here. I'm mostly angling at the set of possible future PRs and I want to make sure we approach them with a wary eye of "is this going to benefit us long-term?" For this instance it seems like the answer is "yes" but I just want to make sure we don't automatically make every-and-all immediate local optimization without considering the long-term consequences of reorganization.

@djc
Copy link
Contributor Author

djc commented May 1, 2018

@alexcrichton that makes a lot of sense, thanks for clarifying. Of course I also want to optimize for long-term productivity, and I recognize that factoring everything out may make that harder. Let's move forward with this PR and I'll keep your thoughts in mind while I reevaluate some further changes.

@alexcrichton
Copy link
Member

I'd still personally prefer if progress were renamed to something shorter like cx or bcx if possible. Perhaps the current Context could become something like BuildContext and then BuildProgress could become Context? Most of the methods of BuildProgress don't actually progress much through the build in the sense that they're just planning it out so it may be good to reflect that in the name as well

@djc
Copy link
Contributor Author

djc commented May 2, 2018

TBH, I don't think these are good names; bcx and cx won't be that easy for new people to keep apart, and it sounds to me like you're optimizing for writing over reading. But anyway, to keep things progressing I've done as you asked.

@alexcrichton
Copy link
Member

@bors: r+

Thanks!

@bors
Copy link
Collaborator

bors commented May 2, 2018

📌 Commit d02c726 has been approved by alexcrichton

@bors
Copy link
Collaborator

bors commented May 2, 2018

⌛ Testing commit d02c726 with merge 9d57564...

bors added a commit that referenced this pull request May 2, 2018
Split Context into two types

@matklad here's the crazy attempt to make the `Context` situation slightly more sane. It basically splits all of the stuff that is immutable over the lifetime of the compilation from the stuff that's initialized empty. The latter is now stored in a type which I have provisionally called `BuildProgress` -- feel free to bikeshed about the name (I came up with `BuildData` as another alternative). It stores a reference to the `Context` to make things easier for now, we might want to change that down the line.

IMO this is a pretty good improvement, which clarifies the design quite a bit.

@alexcrichton you'll probably want to review this, too.
@matklad
Copy link
Member

matklad commented May 2, 2018

I agree that it's definitely easier to understand and grok smaller structs and such, but it's often hostile (I think) to do so to productivity.

This I think is a tradeoff between experienced contributors, who have all the necessary code "in the cache", and new contributors, who are reading the code for the first time. I think it make sense to slightly tilt the field in favor of new contributors :)

I totally agree that we shouldn't "refactor up front" (before organic growth happens), and that we shouldn't pursue any artificial metrics like "keeping all functions less than a hundred lines".

@bors
Copy link
Collaborator

bors commented May 2, 2018

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 9d57564 to master...

@bors bors merged commit d02c726 into rust-lang:master May 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants