Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile queries #43345

Merged
merged 5 commits into from
Aug 24, 2017
Merged

Profile queries #43345

merged 5 commits into from
Aug 24, 2017

Conversation

matthewhammer
Copy link

@matthewhammer matthewhammer commented Jul 19, 2017

This PR implements the "profile queries" debugging feature described here:
https://github.com/rust-lang-nursery/rust-forge/blob/master/profile-queries.md

In particular, it implements the debugging flag -Z profile-queries

FYI: This PR is my second attempt at pushing these changes. My original PR required a rebase; I have now done that rebase manually, after messing up with git's "interactive" rebase support. The original (now closed/cancelled) PR is this one: #43156

r? @nikomatsakis

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @alexcrichton (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

depth,
t.extent.len(),
/* Heuristic for 'important' CSS class: */
if t.extent.len() > 5 || percent >= 1.0 as f64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do these as f64 needed? 1.0 should be f64 by default, or you could force the literal as f64 with 1.0_f64 (Similarly 100.0_f64 several lines above)

data.push((cons.clone(), qm.count.clone(), qm.duration.clone()));
};
data.sort_by(|&(_,_,d1),&(_,_,d2)|
if d1 > d2 { Ordering::Less } else { Ordering::Greater } );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data.sort_by_key(|entry| std::cmp::Reverse(entry.2))

@@ -0,0 +1,167 @@
// Copyright 2012-2015 The Rust Project Developers. See the COPYRIGHT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2015? 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 07daf55

if let None = chan.borrow().as_ref() {
true
} else { false }
;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let is_none = chan.borrow().is_none();

assert!(frame.parse_st == trace::ParseState::NoQuery);
{
// write log
if false {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the if false here expected?

@alexcrichton
Copy link
Member

r? @nikomatsakis

@aidanhs
Copy link
Member

aidanhs commented Jul 20, 2017

Thanks for the PR @matthewhammer! We'll check in now and again to make sure @nikomatsakis or another reviewer gets to this soon.

@aidanhs aidanhs added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 20, 2017
@michaelwoerister
Copy link
Member

cc @eddyb

// If enabled, send a message to the profile-queries thread
macro_rules! profq_msg {
($tcx:expr, $msg:expr) => {
if $tcx.sess.opts.debugging_opts.profile_queries {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you wrap this if in if cfg!(debug_assertions) { ... }? Then it will be filtered out at compile time if debug assertions are disabled.

Copy link
Author

@matthewhammer matthewhammer Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will do; thanks!

},
(ParseState::HaveQuery(q,_),
ProfileQueriesMsg::ProviderEnd) => {
panic!("parse error: unexpected ProviderEnd;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use a \ to indicate that the string continues at the first non-whitespace character on the next line, like in:

    panic!("parse error: unexpected QueryBegin; \
            earlier query is unfinished: {:?} and now {:?}",
           q1, Query{span:span2, msg:querymsg2})

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

return sum
}

fn duration_div(nom: Duration, den: Duration) -> f64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like:

fn duration_div(nom: Duration, den: Duration) -> f64 {
    fn to_nanos(d: Duration) -> u64 {
        d.as_secs() * 1_000_000_000 + d.subsec_nanos() as u64
    }    

    to_nanos(nom) as f64 / to_nanos(den) as f64
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thank you!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done 689d4ab

@@ -29,6 +33,59 @@ pub struct ErrorReported;

thread_local!(static TIME_DEPTH: Cell<usize> = Cell::new(0));

/// Initialized for -Z profile-queries
thread_local!(static PROFQ_CHAN: RefCell<Option<Sender<ProfileQueriesMsg>>> = RefCell::new(None));
Copy link
Member

@eddyb eddyb Jul 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate thread? Isn't the overhead of synchronization more than that of pushing to a Vec or something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The separate thread design was @nikomatsakis's suggestion, so I'll let him respond first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that explains a bunch. If I get a satisfactory explanation out of him this may be good to go.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we can measure. Just to be sure we all share the same concerns: I presume you are worried that the overhead of making and sending messages will distort the measurements, right? I admit my inclination is that separate threads are basically useful and good overall, if we can fit them in easily enough. =)

I think the reason I suggested a thread was that, with incremental compilation, we found that when we moved processing of the dependency graph into one thread, there was a significant hit, and that copying data to a separate thread was cheaper. But it's not obviously comparable -- in the incremental case, we did some effort to keep overhead low, such as pushing a series of messages into a vec and then sending the entire vec over to the thread to be processed (and using a double-buffering scheme).

In this case, in contrast, using a separate thread causes some difficulty since our keys contain region pointers and hence have to be converted to strings. And of course it's just using a channel rather than a vector, so there may be some overhead there (I'm not sure).

One question also: how many events would we wind up accumulating in this vector? I could imagine it being quite a few! But I guess we just accumulating in the current code too so that doesn't make much difference. I think I was imagining before that we would be doing some "consolidation" (e.g., summing up data or whatever) in the separate thread.

Regardless, seems like we can measure the "distortion" of a separate thread (versus a vector) just by measuring how much time the "main thread" of compilation takes, right?

TL;DR: I am happy either way really, just want to do the thing that gives us the best results.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, another point --

There are things we'd eventually like to profile that occur before the tcx is created, or after it is destroyed. I suspect that we will eventually refactor all the things that occur before the tcx is created away completely, but I'm not sure so sure about the stuff that comes after it is destroyed -- in particular, I think it'll always be an important win to drop the arenas before we do LLVM translation, to reduce peak memory usage. So if we have the vector containing references into those arenas, that will make it harder to include those LLVM optimizations into our measurements, presumably (but not impossible, I suppose).

@@ -580,10 +598,17 @@ macro_rules! define_maps {
key,
span);

profq_msg!(tcx,
ProfileQueriesMsg::QueryBegin(span.clone(),
QueryMsg::$name(format!("{:?}", key))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely happy with this formatting the key. I would prefer it if we started with just the query type, and not the key. Also, I would definitely not record individual keys except by having a threshold of a minimal duration that someone might be interested in. And there is a way to measure time on the spot here, that could be placed in tcx.maps - basically, adjust the "last query start time" after finishing an inner query so the inner time is not included in the outer one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is your concern here with the time required to format the key, the space required to store it (or time/space to write it out to output files), or all of the above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time, formatting in the compiler is not optimized for speed but rather code size (to avoid bloat from the presence of every debug! etc.).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we want multiple options. There are various things we might want data for:

  • Which passes take the most time?
    • For this, we don't care about cache hits, and we probably don't care about cache keys either.
  • Which bits of the input took the most time?
    • For this, we don't care about cache hits, but we do care about cache keys, but maybe only those that took significant time.
  • An overview of what happened, e.g. because something went wrong with incremental (re-use failed to occur, or re-use occurred improperly) and we'd like to know why
    • For this, knowing the precise cache keys is very important, and we do care I think about cache hits.

I think this PR started off targeting the last one, and kind of grew to target the others. I suspect we care about all three. We may want to allow for options choosing which one you want, I'm not sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we want to separate the mechanisms for debugging and profiling more -- it would mean losing the ability to answer "which bits of the input took the most time", but it's not 100% clear to me that this is a useful query. It seems useful in the abstract, but I can't remember asking that question very frequently. But that might be because there wasn't an easy way to do so. Certainly it could be useful when you get random perf regressions. I do often want "show me all debug logs pertaining to foo", but that's more a debugging thing.

@alexcrichton
Copy link
Member

ping @eddyb, @michaelwoerister, @nikomatsakis, I think this may be ready for another round of review?

@nikomatsakis
Copy link
Contributor

@matthewhammer and I talked about this today. It seems like the "best approach" is not obvious. The plan then is to land something and open an issue with the various bits of feedback and iterate. That said, I believe @matthewhammer had planned to do the following before landing:

  • Add a way to disable tracking individual keys and instead just track queries by query name
    • this should lower potential overhead from running format!
    • for now we figured we'd keep the option to track indnividual keys, but you opt into it
  • Since this PR now includes the full time-passes information, and hence subsumes that switch, add a way to get the top queries, sorted by total time (or self time?)
  • Summarize other feedback and open an issue for discussion.

One thing I don't really know is what final set of command-line switches we want for this. I think in my ideal world we would remove -Ztime-passes and instead just have something like:

  • -Zprofile-queries -- tracks time spent by query kind
  • -Zprofile-queries-and-keys -- as above, but also includes data about individual keys

Using the second switch would subsume the first. Using either switch would dump information to stderr at the end showing where we spent most of our time.

Before we just remove -Ztime-passes, though -- or modify its output! -- we'll have to update the perf server.

@Mark-Simulacrum
Copy link
Member

I'm happy to help with the perfect server update, though we'll need to keep both forms of perf data scraping for backwards compatibility reasons.

@bors
Copy link
Contributor

bors commented Aug 1, 2017

☔ The latest upstream changes (presumably #43506) made this pull request unmergeable. Please resolve the merge conflicts.

@matthewhammer
Copy link
Author

matthewhammer commented Aug 2, 2017

@nikomatsakis I've done what you describe above, that is:

  • -Zprofile-queries -- tracks time spent by query kind
  • -Zprofile-queries-and-keys -- as above, but also includes data about individual keys

Further, the output file profile_queries.counts.txt is a CSV file that consists of "provider name", "invocation count", "total time", "self time".

I'll now work on summarizing the discussion above in some detail. In brief, I think the main (unanswered) questions concern the time/space overhead of this tracing thread, and whether we should have a thread or some other place for this global state (a question raised by @eddyb).

Also, I believe that I have resolved the merge issues mentioned just above by bors.

@eddyb
Copy link
Member

eddyb commented Aug 2, 2017

@matthewhammer You should rebase on master to remove the merge commits.

@nikomatsakis
Copy link
Contributor

@matthewhammer great! However, the commit history looks really messy. I think it'd be fine to rebase and/or just squash everything together into one commit, this PR is not so big.

@eddyb
Copy link
Member

eddyb commented Aug 4, 2017

Looks like the rebase failed and duplicated master commits. This is bad.
@matthewhammer Please contact me on IRC. If you want I can also attempt fixing the PR myself.

@matthewhammer
Copy link
Author

Hey @nikomatsakis and @eddyb, I talked to eddyb briefly on IRC about rebasing, but I totally goofed it up, and haven't gotten back to fixing it. Obviously, I need to do some background reading about git before I try this again.

@eddyb Thanks for your offer to help, I think I need it. :)

@aidanhs
Copy link
Member

aidanhs commented Aug 4, 2017

@matthewhammer if you just want to rebase and squash all this into a single commit, you can just

  1. make a clean branch of rust off master
  2. curl -sSL https://github.com/rust-lang/rust/pull/43345.diff | git apply

You'll see your changes in git status.

@eddyb
Copy link
Member

eddyb commented Aug 4, 2017

@aidanhs I believe we already dealt with it. git reflog is magic ;)

@aidanhs
Copy link
Member

aidanhs commented Aug 4, 2017

@eddyb the merge is clean but I still see a ton of commits on the github UI?

@eddyb
Copy link
Member

eddyb commented Aug 4, 2017

@aidanhs I don't think @matthewhammer had a chance to push the result since we talked on IRC.

@matthewhammer
Copy link
Author

@nikomatsakis I got some more help from @eddyb today, and fixed up the rebase in this PR
cc @aidanhs

@arielb1
Copy link
Contributor

arielb1 commented Aug 8, 2017

@matthewhammer : @nikomatsakis is on vacation until Aug 15, and will probably be busy afterwards. Do you want someone else to review this PR, or are you ok with waiting?

@matthewhammer
Copy link
Author

@arielb1 I'm OK with waiting for @nikomatsakis. Thanks!

@bors
Copy link
Contributor

bors commented Aug 10, 2017

☔ The latest upstream changes (presumably #43522) made this pull request unmergeable. Please resolve the merge conflicts.

@aidanhs
Copy link
Member

aidanhs commented Aug 16, 2017

@matthewhammer just a ping on the merge conflicts so this can be reviewed!

@aidanhs aidanhs added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 16, 2017
@matthewhammer
Copy link
Author

@aidanhs Thanks! I'll get to this rebase later today.

@matthewhammer
Copy link
Author

@nikomatsakis rebased.
cc @aidanhs

@@ -510,6 +511,28 @@ impl<'tcx> QueryDescription for queries::extern_crate<'tcx> {
impl<'tcx> QueryDescription for queries::lint_levels<'tcx> {
fn describe(_tcx: TyCtxt, _: CrateNum) -> String {
format!("computing the lint levels for items in this crate")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You forgot a }.

[00:07:43] error: this file contains an un-closed delimiter
[00:07:43]     --> /checkout/src/librustc/ty/maps.rs:1116:3
[00:07:43]      |
[00:07:43] 1116 | }
[00:07:43]      |   ^
[00:07:43]      |
[00:07:43] help: did you mean to close this delimiter?
[00:07:43]     --> /checkout/src/librustc/ty/maps.rs:511:60
[00:07:43]      |
[00:07:43] 511  | impl<'tcx> QueryDescription for queries::lint_levels<'tcx> {
[00:07:43]      |                                                            ^

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Aug 24, 2017

📌 Commit 43335ae has been approved by nikomatsakis

@nikomatsakis
Copy link
Contributor

(The travis failure looks spurious.)

@bors
Copy link
Contributor

bors commented Aug 24, 2017

⌛ Testing commit 43335ae with merge 2c0558f...

bors added a commit that referenced this pull request Aug 24, 2017
Profile queries

This PR implements the "profile queries" debugging feature described here:
https://github.com/rust-lang-nursery/rust-forge/blob/master/profile-queries.md

In particular, it implements the debugging flag `-Z profile-queries`

FYI: This PR is my second attempt at pushing these changes.  My original PR required a rebase; I have now done that rebase manually, after messing up with git's "interactive" rebase support.  The original (now closed/cancelled) PR is this one: #43156

r? @nikomatsakis
@bors
Copy link
Contributor

bors commented Aug 24, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: nikomatsakis
Pushing 2c0558f to master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.