Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a fastpath to match_impl for the non-HRTB case #36931

Closed
wants to merge 3 commits into from

Conversation

arielb1
Copy link
Contributor

@arielb1 arielb1 commented Oct 3, 2016

This leads to a 2.5% all-around typeck time improvement.

e.g. librustc times before:

time: 0.233; rss: 89MB  parsing
time: 0.000; rss: 89MB  recursion limit
time: 0.000; rss: 89MB  crate injection
time: 0.000; rss: 89MB  plugin loading
time: 0.000; rss: 89MB  plugin registration
time: 0.715; rss: 240MB expansion
time: 0.000; rss: 240MB maybe building test harness
time: 0.016; rss: 240MB maybe creating a macro crate
time: 0.000; rss: 240MB checking for inline asm in case the target doesn't support it
time: 0.033; rss: 240MB complete gated feature checking
time: 0.055; rss: 240MB early lint checks
time: 0.019; rss: 240MB AST validation
time: 0.210; rss: 261MB name resolution
time: 0.133; rss: 349MB lowering ast -> hir
time: 0.025; rss: 357MB indexing hir
time: 0.016; rss: 357MB attribute checking
time: 0.016; rss: 357MB language item collection
time: 0.034; rss: 357MB lifetime resolution
time: 0.000; rss: 357MB looking for entry point
time: 0.000; rss: 357MB looking for plugin registrar
time: 0.115; rss: 377MB region resolution
time: 0.015; rss: 377MB loop checking
time: 0.015; rss: 377MB static item recursion checking
time: 0.221; rss: 378MB compute_incremental_hashes_map
time: 0.000; rss: 378MB load_dep_graph
time: 0.192; rss: 384MB type collecting
time: 0.004; rss: 384MB variance inference
time: 0.078; rss: 385MB coherence checking
time: 0.359; rss: 391MB wf checking
time: 0.248; rss: 394MB item-types checking
time: 8.445; rss: 476MB item-bodies checking
time: 0.000; rss: 476MB drop-impl checking
time: 0.858; rss: 484MB const checking
time: 0.131; rss: 484MB privacy checking
time: 0.023; rss: 484MB stability index
time: 0.055; rss: 484MB intrinsic checking
time: 0.042; rss: 484MB effect checking
time: 0.154; rss: 484MB match checking
time: 0.118; rss: 484MB liveness checking
time: 0.763; rss: 484MB rvalue checking
time: 1.144; rss: 696MB MIR dump
  time: 0.122; rss: 696MB       SimplifyCfg
  time: 0.245; rss: 697MB       QualifyAndPromoteConstants
  time: 0.299; rss: 697MB       TypeckMir
  time: 0.011; rss: 697MB       SimplifyBranches
  time: 0.080; rss: 697MB       SimplifyCfg
time: 0.756; rss: 697MB MIR passes
time: 1.831; rss: 699MB borrow checking
time: 0.080; rss: 699MB reachability checking
time: 0.108; rss: 699MB death checking
time: 0.097; rss: 701MB stability checking
time: 0.000; rss: 701MB unused lib feature checking
time: 0.597; rss: 701MB lint checking
time: 0.003; rss: 701MB resolving dependency formats
  time: 0.008; rss: 701MB       NoLandingPads
  time: 0.055; rss: 701MB       SimplifyCfg
  time: 0.192; rss: 714MB       EraseRegions
  time: 0.030; rss: 714MB       AddCallGuards
  time: 2.457; rss: 719MB       ElaborateDrops
  time: 0.008; rss: 719MB       NoLandingPads
  time: 0.089; rss: 720MB       SimplifyCfg
  time: 0.070; rss: 720MB       InstCombine
  time: 0.037; rss: 720MB       Deaggregator
  time: 0.009; rss: 720MB       CopyPropagation
  time: 0.027; rss: 720MB       AddCallGuards
  time: 0.008; rss: 720MB       PreTrans
time: 2.989; rss: 720MB Prepare MIR codegen passes
  time: 0.578; rss: 734MB       write metadata
  time: 2.018; rss: 756MB       translation item collection
  time: 0.587; rss: 765MB       codegen unit partitioning
  time: 0.255; rss: 1151MB      internalize symbols
time: 14.005; rss: 1151MB       translation

at the first commit:

time: 0.233; rss: 89MB  parsing
time: 0.000; rss: 89MB  recursion limit
time: 0.000; rss: 89MB  crate injection
time: 0.000; rss: 89MB  plugin loading
time: 0.000; rss: 89MB  plugin registration
time: 0.710; rss: 239MB expansion
time: 0.000; rss: 239MB maybe building test harness
time: 0.016; rss: 239MB maybe creating a macro crate
time: 0.000; rss: 239MB checking for inline asm in case the target doesn't support it
time: 0.033; rss: 239MB complete gated feature checking
time: 0.055; rss: 239MB early lint checks
time: 0.019; rss: 239MB AST validation
time: 0.213; rss: 261MB name resolution
time: 0.134; rss: 349MB lowering ast -> hir
time: 0.026; rss: 357MB indexing hir
time: 0.016; rss: 357MB attribute checking
time: 0.016; rss: 357MB language item collection
time: 0.035; rss: 357MB lifetime resolution
time: 0.000; rss: 357MB looking for entry point
time: 0.000; rss: 357MB looking for plugin registrar
time: 0.114; rss: 377MB region resolution
time: 0.015; rss: 377MB loop checking
time: 0.015; rss: 377MB static item recursion checking
time: 0.224; rss: 377MB compute_incremental_hashes_map
time: 0.000; rss: 377MB load_dep_graph
time: 0.199; rss: 383MB type collecting
time: 0.004; rss: 383MB variance inference
time: 0.078; rss: 385MB coherence checking
time: 0.354; rss: 390MB wf checking
time: 0.245; rss: 394MB item-types checking
time: 8.271; rss: 476MB item-bodies checking
time: 0.000; rss: 476MB drop-impl checking
time: 0.848; rss: 483MB const checking
time: 0.132; rss: 483MB privacy checking
time: 0.024; rss: 483MB stability index
time: 0.056; rss: 483MB intrinsic checking
time: 0.043; rss: 483MB effect checking
time: 0.156; rss: 483MB match checking
time: 0.119; rss: 484MB liveness checking
time: 0.749; rss: 484MB rvalue checking
time: 1.149; rss: 696MB MIR dump
  time: 0.122; rss: 696MB       SimplifyCfg
  time: 0.247; rss: 697MB       QualifyAndPromoteConstants
  time: 0.310; rss: 697MB       TypeckMir
  time: 0.011; rss: 697MB       SimplifyBranches
  time: 0.080; rss: 697MB       SimplifyCfg
time: 0.770; rss: 697MB MIR passes
time: 1.782; rss: 698MB borrow checking
time: 0.080; rss: 699MB reachability checking
time: 0.109; rss: 699MB death checking
time: 0.097; rss: 700MB stability checking
time: 0.000; rss: 700MB unused lib feature checking
time: 0.591; rss: 700MB lint checking
time: 0.003; rss: 700MB resolving dependency formats
  time: 0.008; rss: 700MB       NoLandingPads
  time: 0.055; rss: 700MB       SimplifyCfg
  time: 0.193; rss: 714MB       EraseRegions
  time: 0.030; rss: 714MB       AddCallGuards
  time: 2.369; rss: 719MB       ElaborateDrops
  time: 0.008; rss: 719MB       NoLandingPads
  time: 0.090; rss: 719MB       SimplifyCfg
  time: 0.070; rss: 719MB       InstCombine
  time: 0.034; rss: 719MB       Deaggregator
  time: 0.009; rss: 719MB       CopyPropagation
  time: 0.027; rss: 719MB       AddCallGuards
  time: 0.008; rss: 719MB       PreTrans
time: 2.900; rss: 719MB Prepare MIR codegen passes
  time: 0.578; rss: 733MB       write metadata
  time: 1.953; rss: 756MB       translation item collection
  time: 0.597; rss: 764MB       codegen unit partitioning
  time: 0.250; rss: 1151MB      internalize symbols
time: 14.108; rss: 1151MB       translation

at the third commit:

x86_64-unknown-linux-gnu/stage1/lib/rustlib/x86_64-unknown-linux-gnu/lib/librustc
time: 0.233; rss: 89MB  parsing
time: 0.000; rss: 89MB  recursion limit
time: 0.000; rss: 89MB  crate injection
time: 0.000; rss: 89MB  plugin loading
time: 0.000; rss: 89MB  plugin registration
time: 0.711; rss: 240MB expansion
time: 0.000; rss: 240MB maybe building test harness
time: 0.016; rss: 240MB maybe creating a macro crate
time: 0.000; rss: 240MB checking for inline asm in case the target doesn't support it
time: 0.033; rss: 240MB complete gated feature checking
time: 0.055; rss: 240MB early lint checks
time: 0.019; rss: 240MB AST validation
time: 0.211; rss: 262MB name resolution
time: 0.133; rss: 350MB lowering ast -> hir
time: 0.026; rss: 358MB indexing hir
time: 0.016; rss: 358MB attribute checking
time: 0.016; rss: 358MB language item collection
time: 0.034; rss: 358MB lifetime resolution
time: 0.000; rss: 358MB looking for entry point
time: 0.000; rss: 358MB looking for plugin registrar
time: 0.115; rss: 378MB region resolution
time: 0.015; rss: 378MB loop checking
time: 0.015; rss: 378MB static item recursion checking
time: 0.222; rss: 379MB compute_incremental_hashes_map
time: 0.000; rss: 379MB load_dep_graph
time: 0.194; rss: 384MB type collecting
time: 0.004; rss: 384MB variance inference
time: 0.079; rss: 386MB coherence checking
time: 0.353; rss: 391MB wf checking
time: 0.242; rss: 395MB item-types checking
time: 8.239; rss: 477MB item-bodies checking
time: 0.000; rss: 477MB drop-impl checking
time: 0.869; rss: 484MB const checking
time: 0.132; rss: 484MB privacy checking
time: 0.023; rss: 484MB stability index
time: 0.055; rss: 484MB intrinsic checking
time: 0.043; rss: 484MB effect checking
time: 0.154; rss: 484MB match checking
time: 0.118; rss: 485MB liveness checking
time: 0.770; rss: 485MB rvalue checking
time: 1.149; rss: 697MB MIR dump
  time: 0.122; rss: 697MB       SimplifyCfg
  time: 0.246; rss: 698MB       QualifyAndPromoteConstants
  time: 0.311; rss: 698MB       TypeckMir
  time: 0.011; rss: 698MB       SimplifyBranches
  time: 0.082; rss: 698MB       SimplifyCfg
time: 0.771; rss: 698MB MIR passes
time: 1.829; rss: 700MB borrow checking
time: 0.076; rss: 700MB reachability checking
time: 0.109; rss: 700MB death checking
time: 0.098; rss: 702MB stability checking
time: 0.000; rss: 702MB unused lib feature checking
time: 0.599; rss: 702MB lint checking
time: 0.003; rss: 702MB resolving dependency formats
  time: 0.009; rss: 702MB       NoLandingPads
  time: 0.055; rss: 702MB       SimplifyCfg
  time: 0.193; rss: 716MB       EraseRegions
  time: 0.030; rss: 716MB       AddCallGuards
  time: 2.421; rss: 720MB       ElaborateDrops
  time: 0.008; rss: 720MB       NoLandingPads
  time: 0.090; rss: 721MB       SimplifyCfg
  time: 0.070; rss: 721MB       InstCombine
  time: 0.036; rss: 721MB       Deaggregator
  time: 0.008; rss: 721MB       CopyPropagation
  time: 0.027; rss: 721MB       AddCallGuards
  time: 0.008; rss: 721MB       PreTrans
time: 2.956; rss: 721MB Prepare MIR codegen passes
  time: 0.580; rss: 735MB       write metadata
  time: 1.971; rss: 757MB       translation item collection
  time: 0.607; rss: 766MB       codegen unit partitioning
  time: 0.247; rss: 1153MB      internalize symbols
time: 13.953; rss: 1153MB       translation

r? @nikomatsakis

@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 4, 2016
@arielb1 arielb1 mentioned this pull request Oct 4, 2016
@bors
Copy link
Contributor

bors commented Oct 4, 2016

☔ The latest upstream changes (presumably #36953) made this pull request unmergeable. Please resolve the merge conflicts.

Ariel Ben-Yehuda added 2 commits October 4, 2016 19:22
this leads to a 1.5% all-around typeck time improvement
this does not have a measurable performance effect and mostly makes the logs clearer
@@ -2668,6 +2668,49 @@ impl<'cx, 'gcx, 'tcx> SelectionContext<'cx, 'gcx, 'tcx> {
}
}

fn match_impl_fastpath(&mut self,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm not thrilled about the duplication, though the perf win is nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's about the smallest amount of code you can duplicate ;-).

this improves typeck performance by another 1%
@arielb1
Copy link
Contributor Author

arielb1 commented Oct 5, 2016

Perf differences over just shortcutting leak_check:

futures-rs-test  4.548s vs  4.513s --> 1.008x faster (variance: 1.006x, 1.009x)
helloworld       0.180s vs  0.178s --> 1.015x faster (variance: 1.040x, 1.006x)
html5ever-2016-  7.925s vs  7.902s --> 1.003x faster (variance: 1.014x, 1.021x)
hyper.0.5.0      5.507s vs  5.464s --> 1.008x faster (variance: 1.006x, 1.032x)
inflate-0.1.0    4.929s vs  4.942s --> 0.997x faster (variance: 1.007x, 1.009x)
issue-32062-equ  0.298s vs  0.293s --> 1.017x faster (variance: 1.011x, 1.027x)
issue-32278-big  1.654s vs  1.638s --> 1.010x faster (variance: 1.013x, 1.005x)
jld-day15-parse  1.563s vs  1.534s --> 1.019x faster (variance: 1.022x, 1.019x)
piston-image-0. 12.757s vs 12.670s --> 1.007x faster (variance: 1.005x, 1.013x)
regex.0.1.30     2.663s vs  2.649s --> 1.006x faster (variance: 1.004x, 1.006x)
rust-encoding-0  3.238s vs  3.225s --> 1.004x faster (variance: 1.072x, 1.017x)
syntex-0.42.2   34.370s vs 34.378s --> 1.000x faster (variance: 1.011x, 1.001x)
syntex-0.42.2-i 18.641s vs 18.509s --> 1.007x faster (variance: 1.003x, 1.006x)

Small but solid. OTOH, maybe small enough we should just shortcut leak_check.

@nikomatsakis
Copy link
Contributor

r=me

@nikomatsakis
Copy link
Contributor

@arielb1

Small but solid. OTOH, maybe small enough we should just shortcut leak_check.

Hmm. So I think the changes are ok, but I agree these gains are not very large.

@nikomatsakis
Copy link
Contributor

@arielb1 after having let this sit for a bit, my take is that given the minimal gains and the code duplication that was introduced, this is maybe not worth it. What do you think?

@arielb1
Copy link
Contributor Author

arielb1 commented Oct 17, 2016

Sure. I have an idea for a better improvement too - just need to get to it.

But at least the shortcut-leak-check improvement sounds good.

@arielb1 arielb1 closed this Oct 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants