ranger: merge multiple EQ or In expressions if possible #7577

eurekaka · 2018-09-01T09:06:44Z

What problem does this PR solve?

handle more than one 'equal' or 'in' function for a column in ranger #7279

What is changed and how it works?

before this pr:

drop table if exists t;
create table t(a bigint, b bigint, index idx(a, b));

MySQL [test]> explain select * from t where a in (1, 2) and a in (1, 3);
+---------------------+-------+------+-------------------------------------------------------------------------+
| id                  | count | task | operator info                                                           |
+---------------------+-------+------+-------------------------------------------------------------------------+
| IndexReader_10      | 0.04  | root | index:Selection_9                                                       |
| └─Selection_9       | 0.04  | cop  | in(test.t.a, 1, 2)                                                      |
|   └─IndexScan_8     | 20.00 | cop  | table:t, index:a, b, range:[1,1], [3,3], keep order:false, stats:pseudo |
+---------------------+-------+------+-------------------------------------------------------------------------+

MySQL [test]> explain select * from t where a = 2 and a = 3;
+---------------------+-------+------+------------------------------------------------------------------+
| id                  | count | task | operator info                                                    |
+---------------------+-------+------+------------------------------------------------------------------+
| IndexReader_10      | 0.01  | root | index:Selection_9                                                |
| └─Selection_9       | 0.01  | cop  | eq(test.t.a, 2)                                                  |
|   └─IndexScan_8     | 10.00 | cop  | table:t, index:a, b, range:[3,3], keep order:false, stats:pseudo |
+---------------------+-------+------+------------------------------------------------------------------+

the filters indeed can be simplified to eliminate unnecessary 'Selections', or even can be removed at all and build an empty range.

after this pr:

mysql> explain select * from t where a in (1, 2) and a in (1, 3);
+-------------------+-------+------+------------------------------------------------------------------+
| id                | count | task | operator info                                                    |
+-------------------+-------+------+------------------------------------------------------------------+
| IndexReader_9     | 0.00  | root | index:IndexScan_8                                                |
| └─IndexScan_8     | 0.00  | cop  | table:t, index:a, b, range:[1,1], keep order:false, stats:pseudo |
+-------------------+-------+------+------------------------------------------------------------------+

mysql> explain select * from t where a = 2 and a = 3;
+-------------------+-------+------+-----------------------------------------------------+
| id                | count | task | operator info                                       |
+-------------------+-------+------+-----------------------------------------------------+
| IndexReader_9     | 0.00  | root | index:IndexScan_8                                   |
| └─IndexScan_8     | 0.00  | cop  | table:t, index:a, b, keep order:false, stats:pseudo |
+-------------------+-------+------+-----------------------------------------------------+

Check List

Tests

Unit test
Integration test

Related changes

Need to be included in the release note
handle more than one 'equal' or 'in' function for a column in ranger

eurekaka

This PR should not be merged before #7553 is merged, because for queries like:

select /*+ TIDB_INLJ(t1) */ * from t t1 join t t2 where t1.a=t2.a and t2.a = 2

a fake EqCond like t2.a = 1 would be introduced in building IndexLoopUpJoin, and this fake EqCond would be merged with t2.a = 2 after this PR to build an empty range, which is wrong. #7553 fixes this problem, so this PR has dependency on it.

shenli · 2018-09-01T12:27:06Z

Should the datasouce in the following query be a TableDual? Because a = 2 and a = 3 should always be false.

mysql> explain select * from t where a = 2 and a = 3;
+-------------------+-------+------+-----------------------------------------------------+
| id                | count | task | operator info                                       |
+-------------------+-------+------+-----------------------------------------------------+
| IndexReader_9     | 0.00  | root | index:IndexScan_8                                   |
| └─IndexScan_8     | 0.00  | cop  | table:t, index:a, b, keep order:false, stats:pseudo |
+-------------------+-------+------+-----------------------------------------------------+

eurekaka · 2018-09-02T04:35:35Z

@shenli make sense, patch updated.

two problems found during converting empty range DataSource into TableDual:

First, plan cache would be broken if empty range DataSource is converted to TableDual, check out this example with prepared-plan-cache-enabled=true on master branch:

MySQL [test]> create table t(a int, b int, index a_idx(a));
Query OK, 0 rows affected (0.01 sec)

MySQL [test]> insert into t values(1,1),(2,2),(null,3);
Query OK, 3 rows affected (0.00 sec)

MySQL [test]> select * from t;
+------+------+
| a    | b    |
+------+------+
|    1 |    1 |
|    2 |    2 |
| NULL |    3 |
+------+------+
3 rows in set (0.00 sec)

MySQL [test]> prepare stmt from 'select * from t where ?';
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> execute stmt using @param;
Empty set (0.00 sec)

MySQL [test]> set @param = true;
Query OK, 0 rows affected (0.00 sec)

MySQL [test]> execute stmt using @param;
Empty set (0.00 sec)

last execute statement returns wrong result, because previously cached Dual plan is reused, and rebuildRange of plan cache cannot handle this case.

I would add an issue to track this problem separately, and in this patch, we do not apply this converting if we are building plan for prepared statement when plan cache is enabled.

second, ranger has a bug regarding null filters, check out this example in current master branch using the table created in above example:

MySQL [test]> select * from t;
+------+------+
| a    | b    |
+------+------+
|    1 |    1 |
|    2 |    2 |
| NULL |    3 |
+------+------+
3 rows in set (0.00 sec)

MySQL [test]> explain select * from t where a = null;
+-------------------+-------+------+--------------------------------------------------+
| id                | count | task | operator info                                    |
+-------------------+-------+------+--------------------------------------------------+
| IndexLookUp_10    | 0.00  | root |                                                  |
| ├─IndexScan_8     | 0.00  | cop  | table:t, index:a, keep order:false, stats:pseudo |
| └─TableScan_9     | 0.00  | cop  | table:t, keep order:false, stats:pseudo          |
+-------------------+-------+------+--------------------------------------------------+
3 rows in set (0.00 sec)

MySQL [test]> select * from t where a = null;
Empty set (0.00 sec)

| null | 3 | should be returned instead. I would report this problem in another separate issue as well.

currently unit test testAnalyzeSuite::TestPreparedNullParam fails for this patch, because the second bug would result in a wrong empty range scan, and this scan would be converted to TableDual further by this patch. Would revisit this unit test later when second bug is resolved, so we have another dependency for this PR.

eurekaka · 2018-09-02T04:56:25Z

OK, for the second problem mentioned above, I just figured out NULL = NULL is false, so it works in the right way, not a problem. Issue for the first problem is here #7579

eurekaka · 2018-09-02T12:14:26Z

cmd/explaintest/r/explain_easy.result

+create table t(a bigint primary key);
+explain select * from t where a = 1 and a = 2;
+id	count	task	operator info
+TableDual_5	10000.00	root	rows:0


the count field of the explain seems wrong, it should be 0.00 instead. Is is computed by the following code snippet in function GetRowCountByIntColumnRanges called by Selectivity:

404 if coll.ColumnIsInvalid(sc, colID) { 405 if len(intRanges) == 0 { 406 return float64(coll.Count), nil 407 } 408 if intRanges[0].LowVal[0].Kind() == types.KindInt64 { 409 return getPseudoRowCountBySignedIntRanges(intRanges, float64(coll.Count)), nil 410 } 411 return getPseudoRowCountByUnsignedIntRanges(intRanges, float64(coll.Count)), nil 412 }

line 406 does not make sense, it should return 0, nil when range is empty. Anyway, this should be resolved in another issue, and we can update this result later.

issue for this problem: #7582
PR to fix it: #7583

…sion

check the len(values) to decide whether EQ or In

be converted to TableDual now

…converted to TableDual

zz-jason · 2018-09-05T08:56:09Z

util/ranger/ranger.go

+	sf, _ := expr.(*expression.ScalarFunction)
+	//Constant and Column args should have same RetType, simply get from first arg
+	retType := sf.GetArgs()[0].GetType()
+	values := make([]expression.Expression, 0, len(points)/2)


how about s/values/args/?

zz-jason · 2018-09-05T09:15:56Z

util/ranger/ranger.go

@@ -466,3 +467,33 @@ func newFieldType(tp *types.FieldType) *types.FieldType {
 		return tp
 	}
 }
+
+func points2EqOrInCond(ctx sessionctx.Context, points []point, expr expression.Expression) expression.Expression {


we need a comment for this function to explain:

the basic functionality.

the constraints of the input parameters.

how about:

// points2EqOrInCond constructs a 'EQUAL' or 'IN' scalar function based on the // 'points'. The target column is extracted from the 'expr'. // NOTE: // 1. 'expr' must be either 'EQUAL' or 'IN' function. // 2. 'points' should not be empty.

zz-jason · 2018-09-05T09:36:54Z

plan/find_best_task.go

@@ -252,6 +252,14 @@ func (ds *DataSource) findBestTask(prop *requiredProp) (t task, err error) {
 	t = invalidTask

 	for _, path := range ds.possibleAccessPaths {


Maybe it's better to remove other access paths if there is a path which has an empty range. This can be done in DataSource.deriveStats()

eurekaka · 2018-09-05T13:04:18Z

/run-all-tests

zz-jason

LGTM
@winoros PTAL

winoros · 2018-09-06T06:28:21Z

util/ranger/detacher.go

 			accesses[offset] = cond
+			continue
+		}
+		//multiple Eq/In conditions for one column in CNF, apply intersection on them


Space between // and the sentence. And capitalize first character.

winoros · 2018-09-06T06:29:00Z

util/ranger/detacher.go

+			continue
+		}
+		//multiple Eq/In conditions for one column in CNF, apply intersection on them
+		//lazily compute the points for the previously visited Eq/In


winoros · 2018-09-06T06:29:12Z

util/ranger/ranger.go

+// 1. 'expr' must be either 'EQUAL' or 'IN' function.
+// 2. 'points' should not be empty.
+func points2EqOrInCond(ctx sessionctx.Context, points []point, expr expression.Expression) expression.Expression {
+	//len(points) cannot be 0 here, since we impose early termination in extractEqAndInCondition


winoros · 2018-09-06T06:29:16Z

util/ranger/ranger.go

+func points2EqOrInCond(ctx sessionctx.Context, points []point, expr expression.Expression) expression.Expression {
+	//len(points) cannot be 0 here, since we impose early termination in extractEqAndInCondition
+	sf, _ := expr.(*expression.ScalarFunction)
+	//Constant and Column args should have same RetType, simply get from first arg


winoros

We also need some tests in ranger/ranger_test.go.

rest lgtm.

eurekaka · 2018-09-06T12:15:08Z

/run-all-tests

eurekaka · 2018-09-07T07:12:06Z

@winoros comment addressed, PTAL

zz-jason · 2018-09-10T03:00:15Z

/run-all-tests

zz-jason · 2018-09-10T04:49:51Z