Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Substs::super_fold_with. #37108

Merged
merged 2 commits into from
Oct 19, 2016

Conversation

nnethercote
Copy link
Contributor

This speeds up some of the rustc-benchmarks by up to ~4%.

futures-rs-test  4.467s vs  4.387s --> 1.018x faster (variance: 1.001x, 1.006x)
helloworld       0.242s vs  0.246s --> 0.980x faster (variance: 1.007x, 1.013x)
html5ever-2016-  7.664s vs  7.630s --> 1.004x faster (variance: 1.008x, 1.006x)
hyper.0.5.0      5.218s vs  5.133s --> 1.016x faster (variance: 1.013x, 1.008x)
inflate-0.1.0    5.040s vs  5.103s --> 0.988x faster (variance: 1.005x, 1.008x)
issue-32062-equ  0.361s vs  0.345s --> 1.047x faster (variance: 1.008x, 1.019x)
issue-32278-big  1.874s vs  1.850s --> 1.013x faster (variance: 1.020x, 1.018x)
jld-day15-parse  1.569s vs  1.508s --> 1.040x faster (variance: 1.009x, 1.003x)
piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x)
regex.0.1.30     2.568s vs  2.555s --> 1.005x faster (variance: 1.018x, 1.044x)
rust-encoding-0  2.139s vs  2.135s --> 1.001x faster (variance: 1.012x, 1.005x)
syntex-0.42.2   33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x)
syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x)

r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.

// and instead reuse the existing substs.
// - In that case, when the substs length is one, we also avoid
// creating a new Vec, which avoids a heap allocation.
let params;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params is used only in 338 line block. Can you move into the block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It's used on lines 336, 338, 349.

// - In that case, when the substs length is one, we also avoid
// creating a new Vec, which avoids a heap allocation.
let params;
if self.params.len() == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rewrite it to match self.params.len()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer having it this way. The 1 case is most frequent, then >1, and 0 is a distant last. With a match the ordering is less clear (i.e. not guaranteed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a comment to that effect, if it helps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nnethercote Okay. I got it.

return &self;
}
} else {
return &self;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super minor, but the & in &self is redundant. The Self type is &'tcx Substs<'tcx> and the self identifier is of type &Self. Deref coercion means that using &self, self or *self here is going to be equivalent.

@eddyb
Copy link
Member

eddyb commented Oct 12, 2016

So this is an interesting take. I'd like to do it after moving to on-stack arrays and slice arena allocations instead of always allocating a Vec. Then it could either be an extension of that algorithm or maybe it wouldn't even have a measurable impact etc

@eddyb
Copy link
Member

eddyb commented Oct 12, 2016

Oh and this doesn't handle ty::relate which may have a larger impact.

@nnethercote
Copy link
Contributor Author

I'd like to do it after moving to on-stack arrays and slice arena allocations instead of always allocating a Vec.

My original version of the patch just checked if params == self.params and skipped the mk_substs call in that case, and that alone was a decent improvement because it avoids the interning, which is dominated by hash operations.

Then I added the skip-Vec-creation-in-the-length-1 case and that improved things a bit more.

@nnethercote
Copy link
Contributor Author

Oh and this doesn't handle ty::relate which may have a larger impact.

I haven't seen it in any of the profiles I've looked at. Can you describe in more detail how you think it might have an impact?

@bors
Copy link
Contributor

bors commented Oct 15, 2016

☔ The latest upstream changes (presumably #37100) made this pull request unmergeable. Please resolve the merge conflicts.

This speeds up several rustc-benchmarks by 1--4%.
@nnethercote
Copy link
Contributor Author

I have changed the code to just do the "skip mk_substs if the fold was a no-op" optimisation. That should avoid any conflicts with @Mark-Simulacrum's changes to Substs. It still gives some nice speed-ups.

futures-rs-test  4.189s vs  4.157s --> 1.008x faster (variance: 1.004x, 1.007x)
helloworld       0.223s vs  0.223s --> 1.000x faster (variance: 1.013x, 1.012x)
html5ever-2016-  5.192s vs  5.147s --> 1.009x faster (variance: 1.007x, 1.004x)
hyper.0.5.0      4.788s vs  4.763s --> 1.005x faster (variance: 1.008x, 1.008x)
inflate-0.1.0    4.553s vs  4.494s --> 1.013x faster (variance: 1.010x, 1.044x)
issue-32062-equ  0.324s vs  0.314s --> 1.032x faster (variance: 1.008x, 1.020x)
issue-32278-big  1.661s vs  1.654s --> 1.005x faster (variance: 1.008x, 1.002x)
jld-day15-parse  1.475s vs  1.422s --> 1.038x faster (variance: 1.007x, 1.008x)
piston-image-0. 11.386s vs 11.227s --> 1.014x faster (variance: 1.005x, 1.063x)
reddit-stress    2.359s vs  2.308s --> 1.022x faster (variance: 1.017x, 1.024x)
regex.0.1.30     2.421s vs  2.388s --> 1.014x faster (variance: 1.036x, 1.008x)
rust-encoding-0  1.983s vs  1.973s --> 1.005x faster (variance: 1.008x, 1.016x)
syntex-0.42.2   29.582s vs 29.610s --> 0.999x faster (variance: 1.014x, 1.045x)
syntex-0.42.2-i 15.203s vs 15.175s --> 1.002x faster (variance: 1.011x, 1.006x)

r? @eddyb

@Mark-Simulacrum
Copy link
Member

Note that it will be at least a few days, more likely a week until I can resume work on the Substs improvements, so I'm not sure removing things now is a good idea. I haven't followed this PR closely though, not sure how large/significant the changes here are.

let params: Vec<_> = self.iter().map(|k| k.fold_with(folder)).collect();

// If folding doesn't change the substs, it's faster to avoid calling
// `mk_substs` and instead reuse the existing substs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe open an issue about doing this in general, or at least further investigation? That is, there's a bunch of places where we can short-circuit the interner, maybe some are worth doing.

@eddyb
Copy link
Member

eddyb commented Oct 18, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Oct 18, 2016

📌 Commit 1e4241a has been approved by eddyb

@bors
Copy link
Contributor

bors commented Oct 18, 2016

⌛ Testing commit 1e4241a with merge eeb6379...

@alexcrichton
Copy link
Member

@bors: retry force clean

  • restarted buildbot

Just kidding I'm doing this only to unstuck @bors/homu/buildbot.
@eddyb
Copy link
Member

eddyb commented Oct 19, 2016

@bors r+

@bors
Copy link
Contributor

bors commented Oct 19, 2016

📌 Commit ab5dcff has been approved by eddyb

eddyb added a commit to eddyb/rust that referenced this pull request Oct 19, 2016
… r=eddyb

Optimize `Substs::super_fold_with`.

This speeds up some of the rustc-benchmarks by up to ~4%.
```
futures-rs-test  4.467s vs  4.387s --> 1.018x faster (variance: 1.001x, 1.006x)
helloworld       0.242s vs  0.246s --> 0.980x faster (variance: 1.007x, 1.013x)
html5ever-2016-  7.664s vs  7.630s --> 1.004x faster (variance: 1.008x, 1.006x)
hyper.0.5.0      5.218s vs  5.133s --> 1.016x faster (variance: 1.013x, 1.008x)
inflate-0.1.0    5.040s vs  5.103s --> 0.988x faster (variance: 1.005x, 1.008x)
issue-32062-equ  0.361s vs  0.345s --> 1.047x faster (variance: 1.008x, 1.019x)
issue-32278-big  1.874s vs  1.850s --> 1.013x faster (variance: 1.020x, 1.018x)
jld-day15-parse  1.569s vs  1.508s --> 1.040x faster (variance: 1.009x, 1.003x)
piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x)
regex.0.1.30     2.568s vs  2.555s --> 1.005x faster (variance: 1.018x, 1.044x)
rust-encoding-0  2.139s vs  2.135s --> 1.001x faster (variance: 1.012x, 1.005x)
syntex-0.42.2   33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x)
syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x)
```
r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.
bors added a commit that referenced this pull request Oct 19, 2016
@bors bors merged commit ab5dcff into rust-lang:master Oct 19, 2016
@nnethercote nnethercote deleted the substs-experimentation branch October 19, 2016 23:23
@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 24, 2016
bors added a commit that referenced this pull request Nov 25, 2016
Avoid more unnecessary mk_ty calls in Ty::super_fold_with.

This speeds up several rustc-benchmarks by 1--5%.

This PR is the lovechild of #37108 and #37705.
```
futures-rs-test  4.059s vs  4.011s --> 1.012x faster (variance: 1.016x, 1.026x)
helloworld       0.236s vs  0.239s --> 0.986x faster (variance: 1.051x, 1.014x)
html5ever-2016-  3.831s vs  3.824s --> 1.002x faster (variance: 1.020x, 1.019x)
hyper.0.5.0      4.928s vs  4.936s --> 0.998x faster (variance: 1.003x, 1.012x)
inflate-0.1.0    4.135s vs  4.104s --> 1.007x faster (variance: 1.026x, 1.028x)
issue-32062-equ  0.309s vs  0.303s --> 1.017x faster (variance: 1.019x, 1.084x)
issue-32278-big  1.818s vs  1.797s --> 1.011x faster (variance: 1.011x, 1.008x)
jld-day15-parse  1.304s vs  1.271s --> 1.026x faster (variance: 1.018x, 1.012x)
piston-image-0. 10.938s vs 10.921s --> 1.002x faster (variance: 1.025x, 1.016x)
reddit-stress    2.327s vs  2.208s --> 1.054x faster (variance: 1.016x, 1.006x)
regex-0.1.80     8.796s vs  8.727s --> 1.008x faster (variance: 1.012x, 1.019x)
regex.0.1.30     2.294s vs  2.249s --> 1.020x faster (variance: 1.013x, 1.026x)
rust-encoding-0  1.914s vs  1.886s --> 1.015x faster (variance: 1.027x, 1.026x)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants