Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan: convert in subquery to agg and inner join #7531

Merged
merged 28 commits into from
Oct 30, 2018

Conversation

winoros
Copy link
Member

@winoros winoros commented Aug 29, 2018

What problem does this PR solve?

Implement #7205.

What is changed and how it works?

I decided to unfold the subquery first. And implement this one in future new planner. But in testing, unfolding subquery's performance the far more worse than rewriting.(Listed below).
So decides to implement it in current planner.

This rewriting will cause new aggregation or projection. So add a aggregate elimination rule and one more project elimination.

Note that we can only do one project elimination, just at the place where we add the new project elimination. Only do once will change the column name in some explain result, though won't change the plan structure. I've not decided which one is better.

Check List

Tests

  • Existing Unit test
  • Existing Integration test
  • Manual test (add detailed scripts or steps below)

Related changes

  • Need to update the documentation. Yes, the doc about session variable should be updated
  • Need to be included in the release note. Yes, Rewrite in subquery to inner join with aggregation.

@winoros winoros added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 29, 2018
@winoros
Copy link
Member Author

winoros commented Aug 29, 2018

mysql> set @@session.tidb_opt_insubquery_unfold = 1;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from lineitem where l_orderkey in (select l_orderkey from lineitem limit 800000);
+----------+
| count(*) |
+----------+
|   800017 |
+----------+
1 row in set (18.68 sec)
mysql> set @@session.tidb_opt_insubquery_unfold = 0;
Query OK, 0 rows affected (0.00 sec)
mysql> select count(*) from lineitem where l_orderkey in (select l_orderkey from lineitem limit 800000);
+----------+
| count(*) |
+----------+
|   800020 |
+----------+
1 row in set (5.92 sec)
mysql> select count(*) from lineitem where l_orderkey in (select l_orderkey from lineitem limit 800000);
+----------+
| count(*) |
+----------+
|   800020 |
+----------+
1 row in set (37.81 sec)

First one is unfolding, second is rewriting, third is do nothing.

Another test in TPCH Query 18. Though due to the join order and not so good row count estimation. This rule currently cannot take effect on it.
But if we rewrite the join order, it can get results at 120s~. Before this pr, it's 240s~.

return nil
}

func (a *aggregationEliminater) convertAggToProj(agg *LogicalAggregation) *LogicalProjection {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most methods are copied from aggregate push down. I think it's not good to leave them as function that anyone can call.
Extract a struct that can be used both by aggregate elimination and aggregate push down, make them as method of this struct ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we set aggregationOptimizer as an anonymous field of aggregationEliminater? then we only need to override the optimize method for aggregationEliminater.

aggregationOptimizer also handles the aggregate elimination, why not use it here? because AllowAggPushDown is false by default?

Extract a struct that can be used both by aggregate elimination and aggregate push down, make them as method of this struct ?

IMHO, aggregationOptimizer should be this struct, and we should add an aggregationPushDownSolver as subclass of it to hold the aggregate pushdown logic.

Copy link
Member Author

@winoros winoros Sep 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregationOptimizer also handles the aggregate elimination, why not use it here? because AllowAggPushDown is false by default?

Yes, that one is mainly to push down aggregation. The elimination is done there for convenience and due to some limitation the current planner has.

IMHO, aggregationOptimizer should be this struct, and we should add an aggregationPushDownSolver as subclass of it to hold the aggregate pushdown logic.

But there's no overriding in golang. So we cannot make one containing another one. They both have a recursive method optimize to implement logicalOptRule interface.

@zz-jason
Copy link
Member

@eurekaka PTAL

flagDecorrelate
flagEliminateProjection2
flagMaxMinEliminate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why flagEliminateProjection2 is applied after flagDecorrelate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems i can just place it after the ElimiateAgg before the Decorrelate. This will be done in the newly coming commit.

Copy link
Member Author

@winoros winoros Sep 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok tpch Q20's result will change if we change the position.
This should be caused that we don't maintain the KeyInfo very carefully.
So move it after the decorrelation.

return nil
}

func (a *aggregationEliminater) convertAggToProj(agg *LogicalAggregation) *LogicalProjection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we set aggregationOptimizer as an anonymous field of aggregationEliminater? then we only need to override the optimize method for aggregationEliminater.

aggregationOptimizer also handles the aggregate elimination, why not use it here? because AllowAggPushDown is false by default?

Extract a struct that can be used both by aggregate elimination and aggregate push down, make them as method of this struct ?

IMHO, aggregationOptimizer should be this struct, and we should add an aggregationPushDownSolver as subclass of it to hold the aggregate pushdown logic.

return v, true
// If it's not the form of `not in (SUBQUERY)`, has no correlated column and don't need to append a scalar value. We can rewrite it to inner join.
if er.ctx.GetSessionVars().AllowInSubqueryRewriting && !v.Not && !asScalar && len(np.extractCorrelatedCols()) == 0 {
// We need to try to eliminate the agg and the projection produced by this operation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the expressionRewriter::asScalar field, could you please shed some light on the purpose of this field? thx

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can take a look at buildSemiJoin. The asScalar indicates that whether this semi join need to be considered as scalar and output a special row to the plan above it.

@@ -5,7 +5,6 @@ create table t2 (c1 int unique, c2 int);
insert into t2 values(1, 0), (2, 1);
create table t3 (a bigint, b bigint, c bigint, d bigint);
create table t4 (a int, b int, c int, index idx(a, b), primary key(a));
set @@session.tidb_opt_insubquery_unfold = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to keep this test and add another test which disables tidb_opt_insubquery_unfold.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this variable is removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just noticed that after reviewing all the code changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just changed the name. We still test both open it and close it.

// tidb_opt_insubquery_unfold is used to enable/disable the optimizer rule of in subquery unfold.
TiDBOptInSubqUnFolding = "tidb_opt_insubquery_unfold"
// tidb_opt_insubquery_rewriting is used to enable/disable the optimizer rule of rewriting IN subquery.
TiDBOptInSubqRewriting = "tidb_opt_insubquery_rewriting"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about:

s/tidb_opt_insubquery_rewriting/tidb_opt_insubquery_to_innerjoin/
s/TiDBOptInSubqRewriting/TiDBOptInSubqueryToInnerJoin/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual form is inner join together with an aggregation. So ToInnerJoin is not complete but InnerJoinAndAgg it too long.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "tidb_opt_insubquery_to_innerjoin" is clearer than "tidb_opt_insubquery_rewriting", because the key information about this transformation is displayed in the variable name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "tidb_opt_insubquery_to_innerjoin" can be shorten to "tidb_insubq_to_innerjoin"

// AllowInSubqueryUnFolding can be set to true to fold in subquery
AllowInSubqueryUnFolding bool
// AllowInSubqueryRewriting can be set to false to forbid rewriting the semi join to inner join with agg.
AllowInSubqueryRewriting bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about s/AllowInSubqueryRewriting/InSubqueryToInnerJoin/

@@ -0,0 +1,129 @@
// Copyright 2018 PingCAP, Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#7676 already did the same thing, this patch can be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be done when merge that into this. Or the test result will be strange.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's get #7676 merged as soon as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should merge 7680 first...

@winoros
Copy link
Member Author

winoros commented Oct 8, 2018

In explain test, there's one projection left though it should be okay to eliminate it. I'm looking into it.
This should not block the review of the main code change in this pr.

Okay, the binary that test used is wrong. It's actually correct.
@zz-jason @eurekaka @lamxTyler PTAL

@winoros
Copy link
Member Author

winoros commented Oct 9, 2018

/run-common-test

@winoros
Copy link
Member Author

winoros commented Oct 9, 2018

/run-all-tests

planner/core/optimizer.go Outdated Show resolved Hide resolved
@zz-jason
Copy link
Member

@winoros please merge master && resolve conflicts.

@winoros
Copy link
Member Author

winoros commented Oct 22, 2018

@zz-jason updated

@winoros
Copy link
Member Author

winoros commented Oct 25, 2018

/run-all-tests

@zz-jason
Copy link
Member

LGTM

@zz-jason
Copy link
Member

@eurekaka @lamxTyler PTAL

@zz-jason zz-jason added the status/LGT1 Indicates that a PR has LGTM 1. label Oct 25, 2018
Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alivxxx
Copy link
Contributor

alivxxx commented Oct 26, 2018

@winoros Please resolve the conflict.

alivxxx
alivxxx previously approved these changes Oct 30, 2018
@winoros
Copy link
Member Author

winoros commented Oct 30, 2018

./run-all-tests

@alivxxx alivxxx added status/LGT2 Indicates that a PR has LGTM 2. status/all tests passed and removed status/LGT1 Indicates that a PR has LGTM 1. labels Oct 30, 2018
@winoros winoros merged commit 696ef7b into pingcap:master Oct 30, 2018
@winoros winoros deleted the insubq-to-agg-and-join branch October 30, 2018 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants