Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the type context into a global and a local (inference-only) one. #33425

Merged
merged 23 commits into from
May 11, 2016

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented May 5, 2016

After this change, each InferCtxt creates its own local type interner for types with inference by-products.
Most of the code which handles both a global and a local interner uses 'gcx and 'tcx for them.
A reference to the type context in that situation (e.g. infcx.tcx) is TyCtxt<'a, 'gcx, 'tcx>.
The global type context which used to be &'a TyCtxt<'tcx> is now TyCtxt<'a, 'tcx, 'tcx>.

In order to minimize the number of extra lifetime parameters, many functions became methods.
Where possible (some inherent impls), lifetime parameters were added on the impl, not each method.

As inference by-products no longer escape their inference contexts, memory usage is lower.
Example of -Z time-passes excerpt for librustc, stage1 (~100MB gains):
Before "rustc: Split local type contexts interners from the global one.":

time: 0.395; rss: 335MB item-types checking
time: 15.392; rss: 472MB        item-bodies checking
time: 0.000; rss: 472MB drop-impl checking
time: 1.140; rss: 478MB const checking
time: 0.139; rss: 478MB privacy checking
time: 0.024; rss: 478MB stability index
time: 0.072; rss: 478MB intrinsic checking
time: 0.038; rss: 478MB effect checking
time: 0.255; rss: 478MB match checking
time: 0.128; rss: 484MB liveness checking
time: 1.372; rss: 484MB rvalue checking
time: 1.404; rss: 597MB MIR dump
time: 0.809; rss: 599MB MIR passes

After:

time: 0.467; rss: 337MB item-types checking
time: 17.443; rss: 395MB        item-bodies checking
time: 0.000; rss: 395MB drop-impl checking
time: 1.423; rss: 401MB const checking
time: 0.141; rss: 401MB privacy checking
time: 0.024; rss: 401MB stability index
time: 0.116; rss: 401MB intrinsic checking
time: 0.038; rss: 401MB effect checking
time: 0.382; rss: 401MB match checking
time: 0.132; rss: 407MB liveness checking
time: 1.678; rss: 407MB rvalue checking
time: 1.614; rss: 503MB MIR dump
time: 0.957; rss: 512MB MIR passes

NOTE: Functions changed to methods weren't re-indented to keep this PR easier to review.
Once approved, the changes will be mechanically performed.
However, indentation changes of function arguments are there - and I believe there's a way to hide whitespace-only changes in diffs on GitHub.

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@durka
Copy link
Contributor

durka commented May 5, 2016

Nice memory savings, but 3.5x time regression in item-bodies checking?

@eddyb
Copy link
Member Author

eddyb commented May 5, 2016

@durka Just noticed that actually, will need to investigate. We might have to make lifting cheaper.
EDIT: I also could have accidentally broken one of the trait selection caches, for example.

@eddyb
Copy link
Member Author

eddyb commented May 5, 2016

About 25% of the time appears to be spend in malloc and free.
I've tried reusing arenas across function-local type contexts during type-checking (internally each TypedArena can reuse only the largest chunk it holds) but it doesn't appear to help.

@arielb1
Copy link
Contributor

arielb1 commented May 5, 2016

What are the callers?

@eddyb
Copy link
Member Author

eddyb commented May 5, 2016

@arielb1 I believe I'd need perf record -g for that and DWARF debug info and last time I tried something like that it was writing GBs of traces. I'm going to have to do it anyway - I'll try to look into it in the next few days.

@eddyb
Copy link
Member Author

eddyb commented May 6, 2016

I've found a simple way of turning the split off while keeping all of the infrastructural changes, if we want to merge this without any significant impact on performance.

But it turns out that making tcx.lift(x) return None if x.needs_infer() && tcx.is_global() is enough to trigger the performance problem, without actually using the local interner at all.

So either the cost of checking needs_infer is really that high, or I broke one of the caches which depend on it.

@eddyb
Copy link
Member Author

eddyb commented May 6, 2016

@nikomatsakis I'm pretty sure the problem here is the global selection/evaluation caches which used to allow TyInfer(FreshTy(_)) and friends but those have HAS_TY_INFER just like any other inference variables so they get stuck in the local type contexts.

Should I just add a separate flag for freshened types, and keep them in the global context, just for the caches?

@nikomatsakis
Copy link
Contributor

@eddyb

Should I just add a separate flag for freshened types, and keep them in the global context, just for the caches?

Hmm. Seems worth a try! I'm a bit wary of removing TY_INFER from freshened types -- that is, I'm frightened that there will be existing logic that was assuming that they count. We could I guess add a new flag for "local context only" that is set for all ty-infer except freshened types.

@eddyb
Copy link
Member Author

eddyb commented May 6, 2016

@nikomatsakis I'd keep the current behavior of needs_infer and friends and only ignore the TY_FRESH flag when checking whether something must be interned in the local type context.

@eddyb
Copy link
Member Author

eddyb commented May 6, 2016

@durka @arielb1 Now that I solved the cache problem, the times look good again, although they all seem slightly increased.

@bors
Copy link
Contributor

bors commented May 6, 2016

☔ The latest upstream changes (presumably #33138) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented May 7, 2016

☔ The latest upstream changes (presumably #33267) made this pull request unmergeable. Please resolve the merge conflicts.

@eddyb eddyb force-pushed the rift branch 2 times, most recently from e2a30af to 3316c7a Compare May 7, 2016 16:15
@bors
Copy link
Contributor

bors commented May 8, 2016

☔ The latest upstream changes (presumably #33091) made this pull request unmergeable. Please resolve the merge conflicts.

@bors
Copy link
Contributor

bors commented May 10, 2016

☔ The latest upstream changes (presumably #33523) made this pull request unmergeable. Please resolve the merge conflicts.

@nikomatsakis
Copy link
Contributor

@eddyb r=me

you are my hero 👑

@bors
Copy link
Contributor

bors commented May 11, 2016

📌 Commit 42eb703 has been approved by nikomatsakis

@bors
Copy link
Contributor

bors commented May 11, 2016

⌛ Testing commit 42eb703 with merge c7ab884...

bors added a commit that referenced this pull request May 11, 2016
Split the type context into a global and a local (inference-only) one.

After this change, each `InferCtxt` creates its own local type interner for types with inference by-products.
Most of the code which handles both a global and a local interner uses `'gcx` and `'tcx` for them.
A reference to the type context in that situation (e.g. `infcx.tcx`) is `TyCtxt<'a, 'gcx, 'tcx>`.
The global type context which used to be `&'a TyCtxt<'tcx>` is now `TyCtxt<'a, 'tcx, 'tcx>`.

In order to minimize the number of extra lifetime parameters, many functions became methods.
Where possible (some inherent impls), lifetime parameters were added on the impl, not each method.

As inference by-products no longer escape their inference contexts, memory usage is lower.
Example of `-Z time-passes` excerpt for `librustc`, stage1 (~100MB gains):
Before "rustc: Split local type contexts interners from the global one.":
```
time: 0.395; rss: 335MB item-types checking
time: 15.392; rss: 472MB        item-bodies checking
time: 0.000; rss: 472MB drop-impl checking
time: 1.140; rss: 478MB const checking
time: 0.139; rss: 478MB privacy checking
time: 0.024; rss: 478MB stability index
time: 0.072; rss: 478MB intrinsic checking
time: 0.038; rss: 478MB effect checking
time: 0.255; rss: 478MB match checking
time: 0.128; rss: 484MB liveness checking
time: 1.372; rss: 484MB rvalue checking
time: 1.404; rss: 597MB MIR dump
time: 0.809; rss: 599MB MIR passes
```
After:
```
time: 0.467; rss: 337MB item-types checking
time: 17.443; rss: 395MB        item-bodies checking
time: 0.000; rss: 395MB drop-impl checking
time: 1.423; rss: 401MB const checking
time: 0.141; rss: 401MB privacy checking
time: 0.024; rss: 401MB stability index
time: 0.116; rss: 401MB intrinsic checking
time: 0.038; rss: 401MB effect checking
time: 0.382; rss: 401MB match checking
time: 0.132; rss: 407MB liveness checking
time: 1.678; rss: 407MB rvalue checking
time: 1.614; rss: 503MB MIR dump
time: 0.957; rss: 512MB MIR passes
```

**NOTE**: Functions changed to methods weren't re-indented to keep this PR easier to review.
Once approved, the changes will be mechanically performed.
However, indentation changes of function arguments are there - and I believe there's a way to hide whitespace-only changes in diffs on GitHub.
@bors bors merged commit 42eb703 into rust-lang:master May 11, 2016
@eddyb eddyb deleted the rift branch May 11, 2016 15:11
nikomatsakis added a commit to nikomatsakis/rust that referenced this pull request May 12, 2016
It turns out that `ast_ty_to_ty` is supposed to be updating the `def`
after it finishes, but at some point in the past it stopped doing
so. This was never noticed because of the `ast_ty_to_ty_cache`, but that
cache was recently removed. This PR fixes the code to update the def
properly, but apparently that is not quite enough to make the operation
idempotent, so for now we reintroduce the cache too.

Fixes rust-lang#33425.
eddyb added a commit to eddyb/rust that referenced this pull request May 12, 2016
…-type-path, r=eddyb

re-introduce a cache for ast-ty-to-ty

It turns out that `ast_ty_to_ty` is supposed to be updating the `def`
after it finishes, but at some point in the past it stopped doing
so. This was never noticed because of the `ast_ty_to_ty_cache`, but that
cache was recently removed. This PR fixes the code to update the def
properly, but apparently that is not quite enough to make the operation
idempotent, so for now we reintroduce the cache too.

Fixes rust-lang#33425.

r? @eddyb
@bluss bluss added the relnotes Marks issues that should be documented in the release notes of the next release. label May 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants