Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor away RBML from rustc_metadata. #36551

Merged
merged 39 commits into from
Sep 22, 2016
Merged

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Sep 17, 2016

RBML and ty{en,de}code have had their long-overdue purge. Summary of changes:

  • Metadata is now a tree encoded in post-order and with relative backward references pointing to children nodes. With auto-deriving and type safety, this makes maintenance and adding new information to metadata painless and bug-free by default. It's also more compact and cache-friendly (cache misses should be proportional to the depth of the node being accessed, not the number of siblings as in EBML/RBML).
  • Metadata sizes have been reduced, for libcore it went down 16% (8.38MB -> 7.05MB) and for libstd 14% (3.53MB -> 3.03MB), while encoding more or less the same information
  • Specialization is used in the bundled libserialize (crates.io rustc_serialize remains unaffected) to customize the encoding (and more importantly, decoding) of various types, most notably those interned in the TyCtxt. Some of this abuses a soundness hole pending a fix (cc @aturon), but when that fix arrives, we'll move to macros 1.1 #[derive] and custom TyCtxt-aware serialization traits.
  • Enumerating children of modules from other crates is now orthogonal to describing those items via Def - this is a step towards bridging crate-local HIR and cross-crate metadata
  • CrateNum has been moved to rustc and both it and NodeId are now newtypes instead of u32 aliases, for specializing their decoding. This is [syntax-breaking] (cc @Manishearth ).

cc @rust-lang/compiler

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@eddyb eddyb added the S-waiting-on-crater Status: Waiting on a crater run to be completed. label Sep 17, 2016
if self.session.cstore.variant_kind(variant_id) == Some(VariantKind::Struct) {
// Not adding fields for variants as they are not accessed with a self receiver
self.structs.insert(variant_id, Vec::new());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was necessary, IIRC. Should be caught by compile-fail tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can trigger a change in intended behavior, I'll find a way to put it back.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as compile-fail/issue-19452.rs but cross-crate.

enum E { V {} }
let v = E::V;

pub fn each_child_of_item<F>(&self, id: DefIndex, mut callback: F)
where F: FnMut(def::Export)
{
// Find the item.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #31337 (comment) regarding the order in which proper items and reexports need to be in the children list for resolution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I kept direct children before reexports, during the whole refactoring.

@eddyb
Copy link
Member Author

eddyb commented Sep 17, 2016

Crater report shows that valico and human_name have a duplicate drop symbol problem (same hash).
That and @petrochenkov's struct-variant-as-value example, which I've managed to reproduce, are the only items left to fix before merging this (if other regressions are found later, we have a cycle to fix them).

EDIT: Also add to that the fact that I broke incremental recompilation, Travis just found out.

@petrochenkov
Copy link
Contributor

Could you explain the removal of HIR folders a bit more?
Of course, they are not used as much as AST folders, but parts of HIR are still folded occasionally.

@eddyb
Copy link
Member Author

eddyb commented Sep 18, 2016

@petrochenkov No, they're never supposed to be used, HIR is immutable once constructed.
The current transformation during match checking needs to be rewritten to work on HAIR.
Afterwards we can finally move HIR to be arena-allocated which would eventually be used for abomonation-like persistence wrt incremental recompilation.

Of course, that's all hypothetical, but the folder was only used by match checking so removing it seems logical as it allows more flexibility in the future. If it's needed again, it can always be recovered.

@bors
Copy link
Contributor

bors commented Sep 19, 2016

☔ The latest upstream changes (presumably #36487) made this pull request unmergeable. Please resolve the merge conflicts.

@eddyb eddyb force-pushed the meta-games branch 2 times, most recently from 531faec to 9be8525 Compare September 19, 2016 09:59
@eddyb
Copy link
Member Author

eddyb commented Sep 19, 2016

@nikomatsakis I've fixed all issues I could find and everything should be back in working order now.

@eddyb eddyb force-pushed the meta-games branch 2 times, most recently from cd1d8bb to a22f9be Compare September 19, 2016 16:29
&mut OpaqueDecoder) -> R,
'tcx: 'decoder
/// Execute f with access to the thread-local decoding context.
/// FIXME(eddyb) This is horribly unsound as it allows the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's open a bug to represent the lack of lifetime dispatch testing and tag this FIXME with it.

}

/// FIXME(eddyb) This is horribly unsound as it allows the
/// caler to pick any lifetime for 'tcx, including 'static.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/caler/caller/

}
}

impl<'a, 'tcx> SpecializedDecoder<CrateNum> for DecodeContext<'a, 'tcx> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really neat.

let id_range_doc = ast_doc.get(c::tag_id_range);
let from_id_range = IdRange::decode(&mut id_range_doc.opaque()).unwrap();
let from_id_range = {
let decoder = &mut ast_doc.get(c::tag_id_range).opaque();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use a different encoder here, right? Like, a newtype? (In order to signal we don't want translation)

But I think you revisit this later...

NBD, anyhow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or else use a (u32, u32) type.

.collect();
let mut associated_types = FnvHashSet::default();
for tr in traits::supertraits(tcx, principal) {
if let Some(trait_id) = tcx.map.as_local_node_id(tr.def_id()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mega-Nit: I'd prefer to see this as a helper, even if in the same file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, collect::trait_associated_type_names is the helper - unless you want that to handle the non-local case too? IIRC there was minor problem with temporaries but I solved it elsewhere by using indexes instead of slice iterators.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to thing of it, poking at this would be unproductive because I plan to make the list of associated items not depend on type information, which would mean they'd always be available at this point, for both local or remote traits.

dcx.decode()
predicates: (0..dcx.decode::<usize>()).map(|_| {
// Handle shorthands first, if we have an usize > 0x80.
if dcx.opaque.data[dcx.opaque.position()] & 0x80 != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it surprises me to see this open-coded here; I expect some sort of helper to handle this generically

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But if that's not practical, it's...fine.)


// ______________________________________________________________________
// Top-level methods.
#[derive(RustcEncodable, RustcDecodable)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so wonderful

@bors
Copy link
Contributor

bors commented Sep 20, 2016

☔ The latest upstream changes (presumably #36102) made this pull request unmergeable. Please resolve the merge conflicts.

// Cache the last used filemap for translating spans as an optimization.
last_filemap_index: usize,

lazy_state: LazyState
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this lazy scheme needs a comment explaining how it works somewhere. Am I missing such a comment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lazy's doc comment has the most information.

@@ -27,6 +27,9 @@ pub struct NodeCollector<'ast> {
pub map: Vec<MapEntry<'ast>>,
/// The parent of this node
pub parent_node: NodeId,
/// Whether to completely ignore nested items,
/// such as in decoded HIR fragments from metadata
pub ignore_nested_items: bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate just a touch on this comment? I always find these kind of booleans that switch between "two similar modes" kind of creepy, so having a good explanation of why you might pick one or the other is great. In particular, I'd change this from "such as in", which suggests that there are other times one might want it to be true. In fact, that's the only time -- right? So maybe something like:

"If true, completely ignore nested items. We do this when loading HIR from metadata since in that case we only want the HIR for one specific item (and not the ones nested inside of it)."

That said, I think that HIR encoded into metadata doesn't have nested items? Oh, I guess we stopped folding it now, huh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's easier to leave in a bunch of ItemId's that don't point to anything than actually fold them away. As for it being creepy, yes, I just don't want to spend a lot of time on infrastructure that I want removed ASAP. The comment change sounds good.

TyFnPtr(f) => {
self.hash(f.unsafety);
self.hash(f.abi);
self.hash(f.sig.variadic());
self.hash(f.sig.inputs().skip_binder().len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not f.sig.output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh never mind

Copy link
Member Author

@eddyb eddyb Sep 20, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regions and types are visited by recursion, only miscellaneous information & lengths of sequences must be hashed manually.


/// Given a trait `trait_ref`, iterates the vtable entries
/// that come from `trait_ref`, including its supertraits.
#[inline] // HACK(eddyb) Avoid closures being unexported due to impl Trait.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah -- is there a bug regarding exported closures and so forth?

/me tries to remember where we cut corners there. I've been trying not to do it knowing that this day would come =)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this should be FIXME(XXX) for the bug number XXX :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#35870, if you want I can link to it from the FIXME.

@@ -103,6 +105,38 @@ pub fn translate_substs<'a, 'gcx, 'tcx>(infcx: &InferCtxt<'a, 'gcx, 'tcx>,
source_substs.rebase_onto(infcx.tcx, source_impl, target_substs)
}

/// Locates the applicable definition of a method, given its name.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't seem sufficient. What is impl_def_id and impl_substs, for example? What is substs and how does it relate to impl_substs?

Probably nice to give some context about where/why it is used, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved it from trans, but I suppose I can document it. The impl_ arguments in particular from VtableImpl, I might be able to make it take that type directly.

@bors
Copy link
Contributor

bors commented Sep 22, 2016

⌛ Testing commit 4ac3001 with merge 1cf592f...

bors added a commit that referenced this pull request Sep 22, 2016
Refactor away RBML from rustc_metadata.

RBML and `ty{en,de}code` have had their long-overdue purge. Summary of changes:
* Metadata is now a tree encoded in post-order and with relative backward references pointing to children nodes. With auto-deriving and type safety, this makes maintenance and adding new information to metadata painless and bug-free by default. It's also more compact and cache-friendly (cache misses should be proportional to the depth of the node being accessed, not the number of siblings as in EBML/RBML).
* Metadata sizes have been reduced, for `libcore` it went down 16% (`8.38MB` -> `7.05MB`) and for `libstd` 14% (`3.53MB` -> `3.03MB`), while encoding more or less the same information
* Specialization is used in the bundled `libserialize` (crates.io `rustc_serialize` remains unaffected) to customize the encoding (and more importantly, decoding) of various types, most notably those interned in the `TyCtxt`. Some of this abuses a soundness hole pending a fix (cc @aturon), but when that fix arrives, we'll move to macros 1.1 `#[derive]` and custom `TyCtxt`-aware serialization traits.
* Enumerating children of modules from other crates is now orthogonal to describing those items via `Def` - this is a step towards bridging crate-local HIR and cross-crate metadata
* `CrateNum` has been moved to `rustc` and both it and `NodeId` are now newtypes instead of `u32` aliases, for specializing their decoding. This is `[syntax-breaking]` (cc @Manishearth ).

cc @rust-lang/compiler
@bors bors merged commit 4ac3001 into rust-lang:master Sep 22, 2016
@eddyb eddyb deleted the meta-games branch September 22, 2016 05:43
@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label Sep 22, 2016
bors added a commit that referenced this pull request Oct 30, 2016
[2/n] rustc_metadata: move is_extern_item to trans.

*This is part of a series ([prev](#37400) | [next](#37402)) of patches designed to rework rustc into an out-of-order on-demand pipeline model for both better feature support (e.g. [MIR-based](https://github.com/solson/miri) early constant evaluation) and incremental execution of compiler passes (e.g. type-checking), with beneficial consequences to IDE support as well.
If any motivation is unclear, please ask for additional PR description clarifications or code comments.*
<hr>

Minor cleanup missed by #36551: `is_extern_item` is one of, if not the only `CrateStore` method who takes a `TyCtxt` but doesn't produce something cached in it, and such methods are going away.
bors added a commit that referenced this pull request Nov 23, 2016
rustc_metadata: don't break the version check when CrateRoot changes.

In #36551 I made `rustc_version` a field of `CrateRoot`, but despite it being the first field, one could still break the version check by changing `CrateRoot` so older compilers couldn't fully decode it (e.g. #37463).

This PR fixes #37803 by moving the version string back at the beginning of metadata, right after the 32-bit big-endian absolute position of `CrateRoot`, and by incrementing `METADATA_VERSION`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants