Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ranger: merge multiple EQ or In expressions if possible #7577

Merged
merged 22 commits into from
Sep 10, 2018

Conversation

eurekaka
Copy link
Contributor

@eurekaka eurekaka commented Sep 1, 2018

What problem does this PR solve?

handle more than one 'equal' or 'in' function for a column in ranger #7279

What is changed and how it works?

before this pr:

drop table if exists t;
create table t(a bigint, b bigint, index idx(a, b));

MySQL [test]> explain select * from t where a in (1, 2) and a in (1, 3);
+---------------------+-------+------+-------------------------------------------------------------------------+
| id                  | count | task | operator info                                                           |
+---------------------+-------+------+-------------------------------------------------------------------------+
| IndexReader_10      | 0.04  | root | index:Selection_9                                                       |
| └─Selection_9       | 0.04  | cop  | in(test.t.a, 1, 2)                                                      |
|   └─IndexScan_8     | 20.00 | cop  | table:t, index:a, b, range:[1,1], [3,3], keep order:false, stats:pseudo |
+---------------------+-------+------+-------------------------------------------------------------------------+

MySQL [test]> explain select * from t where a = 2 and a = 3;
+---------------------+-------+------+------------------------------------------------------------------+
| id                  | count | task | operator info                                                    |
+---------------------+-------+------+------------------------------------------------------------------+
| IndexReader_10      | 0.01  | root | index:Selection_9                                                |
| └─Selection_9       | 0.01  | cop  | eq(test.t.a, 2)                                                  |
|   └─IndexScan_8     | 10.00 | cop  | table:t, index:a, b, range:[3,3], keep order:false, stats:pseudo |
+---------------------+-------+------+------------------------------------------------------------------+

the filters indeed can be simplified to eliminate unnecessary 'Selections', or even can be removed at all and build an empty range.

after this pr:

mysql> explain select * from t where a in (1, 2) and a in (1, 3);
+-------------------+-------+------+------------------------------------------------------------------+
| id                | count | task | operator info                                                    |
+-------------------+-------+------+------------------------------------------------------------------+
| IndexReader_9     | 0.00  | root | index:IndexScan_8                                                |
| └─IndexScan_8     | 0.00  | cop  | table:t, index:a, b, range:[1,1], keep order:false, stats:pseudo |
+-------------------+-------+------+------------------------------------------------------------------+

mysql> explain select * from t where a = 2 and a = 3;
+-------------------+-------+------+-----------------------------------------------------+
| id                | count | task | operator info                                       |
+-------------------+-------+------+-----------------------------------------------------+
| IndexReader_9     | 0.00  | root | index:IndexScan_8                                   |
| └─IndexScan_8     | 0.00  | cop  | table:t, index:a, b, keep order:false, stats:pseudo |
+-------------------+-------+------+-----------------------------------------------------+

Check List

Tests

  • Unit test
  • Integration test

Related changes

  • Need to be included in the release note
    handle more than one 'equal' or 'in' function for a column in ranger

Copy link
Contributor Author

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR should not be merged before #7553 is merged, because for queries like:

select /*+ TIDB_INLJ(t1) */ * from t t1 join t t2 where t1.a=t2.a and t2.a = 2

a fake EqCond like t2.a = 1 would be introduced in building IndexLoopUpJoin, and this fake EqCond would be merged with t2.a = 2 after this PR to build an empty range, which is wrong. #7553 fixes this problem, so this PR has dependency on it.

@shenli
Copy link
Member

shenli commented Sep 1, 2018

Should the datasouce in the following query be a TableDual? Because a = 2 and a = 3 should always be false.

mysql> explain select * from t where a = 2 and a = 3;
+-------------------+-------+------+-----------------------------------------------------+
| id                | count | task | operator info                                       |
+-------------------+-------+------+-----------------------------------------------------+
| IndexReader_9     | 0.00  | root | index:IndexScan_8                                   |
| └─IndexScan_8     | 0.00  | cop  | table:t, index:a, b, keep order:false, stats:pseudo |
+-------------------+-------+------+-----------------------------------------------------+

@eurekaka
Copy link
Contributor Author

eurekaka commented Sep 2, 2018

@shenli make sense, patch updated.

two problems found during converting empty range DataSource into TableDual:


First, plan cache would be broken if empty range DataSource is converted to TableDual, check out this example with prepared-plan-cache-enabled=true on master branch:

MySQL [test]> create table t(a int, b int, index a_idx(a));
Query OK, 0 rows affected (0.01 sec)

MySQL [test]> insert into t values(1,1),(2,2),(null,3);
Query OK, 3 rows affected (0.00 sec)

MySQL [test]> select * from t;
+------+------+
| a    | b    |
+------+------+
|    1 |    1 |
|    2 |    2 |
| NULL |    3 |
+------+------+
3 rows in set (0.00 sec)

MySQL [test]> prepare stmt from 'select * from t where ?';
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> execute stmt using @param;
Empty set (0.00 sec)

MySQL [test]> set @param = true;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> execute stmt using @param;
Empty set (0.00 sec)

last execute statement returns wrong result, because previously cached Dual plan is reused, and rebuildRange of plan cache cannot handle this case.

I would add an issue to track this problem separately, and in this patch, we do not apply this converting if we are building plan for prepared statement when plan cache is enabled.


second, ranger has a bug regarding null filters, check out this example in current master branch using the table created in above example:

MySQL [test]> select * from t;
+------+------+
| a    | b    |
+------+------+
|    1 |    1 |
|    2 |    2 |
| NULL |    3 |
+------+------+
3 rows in set (0.00 sec)

MySQL [test]> explain select * from t where a = null;
+-------------------+-------+------+--------------------------------------------------+
| id                | count | task | operator info                                    |
+-------------------+-------+------+--------------------------------------------------+
| IndexLookUp_10    | 0.00  | root |                                                  |
| ├─IndexScan_8     | 0.00  | cop  | table:t, index:a, keep order:false, stats:pseudo |
| └─TableScan_9     | 0.00  | cop  | table:t, keep order:false, stats:pseudo          |
+-------------------+-------+------+--------------------------------------------------+
3 rows in set (0.00 sec)

MySQL [test]> select * from t where a = null;
Empty set (0.00 sec)

| null | 3 | should be returned instead. I would report this problem in another separate issue as well.

currently unit test testAnalyzeSuite::TestPreparedNullParam fails for this patch, because the second bug would result in a wrong empty range scan, and this scan would be converted to TableDual further by this patch. Would revisit this unit test later when second bug is resolved, so we have another dependency for this PR.

@eurekaka
Copy link
Contributor Author

eurekaka commented Sep 2, 2018

OK, for the second problem mentioned above, I just figured out NULL = NULL is false, so it works in the right way, not a problem. Issue for the first problem is here #7579

create table t(a bigint primary key);
explain select * from t where a = 1 and a = 2;
id count task operator info
TableDual_5 10000.00 root rows:0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the count field of the explain seems wrong, it should be 0.00 instead. Is is computed by the following code snippet in function GetRowCountByIntColumnRanges called by Selectivity:

404     if coll.ColumnIsInvalid(sc, colID) {
405         if len(intRanges) == 0 {
406             return float64(coll.Count), nil
407         }
408         if intRanges[0].LowVal[0].Kind() == types.KindInt64 {
409             return getPseudoRowCountBySignedIntRanges(intRanges, float64(coll.Count)), nil
410         }
411         return getPseudoRowCountByUnsignedIntRanges(intRanges, float64(coll.Count)), nil
412     }

line 406 does not make sense, it should return 0, nil when range is empty. Anyway, this should be resolved in another issue, and we can update this result later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue for this problem: #7582
PR to fix it: #7583

sf, _ := expr.(*expression.ScalarFunction)
//Constant and Column args should have same RetType, simply get from first arg
retType := sf.GetArgs()[0].GetType()
values := make([]expression.Expression, 0, len(points)/2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about s/values/args/?

@@ -466,3 +467,33 @@ func newFieldType(tp *types.FieldType) *types.FieldType {
return tp
}
}

func points2EqOrInCond(ctx sessionctx.Context, points []point, expr expression.Expression) expression.Expression {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a comment for this function to explain:

  1. the basic functionality.
  2. the constraints of the input parameters.

how about:

// points2EqOrInCond constructs a 'EQUAL' or 'IN' scalar function based on the
// 'points'. The target column is extracted from the 'expr'.
// NOTE:
// 1. 'expr' must be either 'EQUAL' or 'IN' function.
// 2. 'points' should not be empty.

@@ -252,6 +252,14 @@ func (ds *DataSource) findBestTask(prop *requiredProp) (t task, err error) {
t = invalidTask

for _, path := range ds.possibleAccessPaths {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's better to remove other access paths if there is a path which has an empty range. This can be done in DataSource.deriveStats()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@zz-jason zz-jason added type/enhancement The issue or PR belongs to an enhancement. and removed status/DNM labels Sep 5, 2018
@eurekaka
Copy link
Contributor Author

eurekaka commented Sep 5, 2018

/run-all-tests

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@winoros PTAL

@zz-jason zz-jason added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 5, 2018
accesses[offset] = cond
continue
}
//multiple Eq/In conditions for one column in CNF, apply intersection on them
Copy link
Member

@winoros winoros Sep 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space between // and the sentence. And capitalize first character.

continue
}
//multiple Eq/In conditions for one column in CNF, apply intersection on them
//lazily compute the points for the previously visited Eq/In
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space.

// 1. 'expr' must be either 'EQUAL' or 'IN' function.
// 2. 'points' should not be empty.
func points2EqOrInCond(ctx sessionctx.Context, points []point, expr expression.Expression) expression.Expression {
//len(points) cannot be 0 here, since we impose early termination in extractEqAndInCondition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space.

func points2EqOrInCond(ctx sessionctx.Context, points []point, expr expression.Expression) expression.Expression {
//len(points) cannot be 0 here, since we impose early termination in extractEqAndInCondition
sf, _ := expr.(*expression.ScalarFunction)
//Constant and Column args should have same RetType, simply get from first arg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add space.

Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need some tests in ranger/ranger_test.go.

rest lgtm.

@eurekaka
Copy link
Contributor Author

eurekaka commented Sep 6, 2018

/run-all-tests

@eurekaka eurekaka added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 6, 2018
@eurekaka
Copy link
Contributor Author

eurekaka commented Sep 7, 2018

@winoros comment addressed, PTAL

@zz-jason
Copy link
Member

/run-all-tests

@zz-jason
Copy link
Member

/run-common-test tidb-test=pr/620

5 similar comments
@zz-jason
Copy link
Member

/run-common-test tidb-test=pr/620

@zz-jason
Copy link
Member

/run-common-test tidb-test=pr/620

@zz-jason
Copy link
Member

/run-common-test tidb-test=pr/620

@eurekaka
Copy link
Contributor Author

/run-common-test tidb-test=pr/620

@eurekaka
Copy link
Contributor Author

/run-common-test tidb-test=pr/620

@eurekaka
Copy link
Contributor Author

/run-common-test

@zz-jason
Copy link
Member

/run-integration-common-test
/run-common-test

@eurekaka
Copy link
Contributor Author

/run-all-tests

@eurekaka eurekaka merged commit d7d1309 into pingcap:master Sep 10, 2018
@eurekaka eurekaka deleted the ranger_eq_in_and branch September 10, 2018 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants