Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan: improve row count estimation using column order correlation #9839

Merged
merged 2 commits into from
Apr 24, 2019

Conversation

eurekaka
Copy link
Contributor

What problem does this PR solve?

Fix #9067

What is changed and how it works?

When estimating row count for TableScan with Limit pushed down:

  • check if we can use another column's histogram to estimate the row count of table scan, if the column's order has high correlation with handle column;
  • if we cannot use histogram, provide a heuristic approach in which users can control the preference for other index scans compared with the table scan using newly introduced parameters tidb_opt_correlation_threshold and tidb_opt_correlation_exp_factor;

Check List

Tests

  • Unit test

Code changes

  • Has exported variable/fields change

Side effects

  • Increased code complexity

Related changes

  • Need to cherry-pick to the release branch: to be determined
  • Need to update the documentation: add documentation items for those two added system variables
  • Need to be included in the release note

@eurekaka eurekaka added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner labels Mar 21, 2019
@eurekaka
Copy link
Contributor Author

/run-all-tests

planner/core/find_best_task.go Outdated Show resolved Hide resolved
planner/core/find_best_task.go Outdated Show resolved Hide resolved
Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add some comment to help us understanding the code. 😂

@codecov
Copy link

codecov bot commented Mar 27, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@fa2d6f0). Click here to learn what that means.
The diff coverage is 72.5352%.

@@             Coverage Diff             @@
##             master      #9839   +/-   ##
===========================================
  Coverage          ?   77.9438%           
===========================================
  Files             ?        407           
  Lines             ?      83555           
  Branches          ?          0           
===========================================
  Hits              ?      65126           
  Misses            ?      13590           
  Partials          ?       4839

@eurekaka eurekaka requested review from qw4990, winoros, alivxxx and zz-jason and removed request for qw4990 April 9, 2019 03:02
planner/core/find_best_task.go Outdated Show resolved Hide resolved
planner/core/find_best_task.go Show resolved Hide resolved
planner/core/find_best_task.go Outdated Show resolved Hide resolved
planner/core/find_best_task.go Show resolved Hide resolved
@eurekaka eurekaka force-pushed the corr_estimation branch 2 times, most recently from 5acf74f to e51ac82 Compare April 23, 2019 04:39
@eurekaka
Copy link
Contributor Author

/run-all-tests

@@ -44,18 +44,3 @@ IndexLookUp_11 0.00 root
├─IndexScan_8 1.00 cop table:outdated_statistics, index:a, b, range:[1 1,1 1], keep order:false
└─Selection_10 0.00 cop eq(test.outdated_statistics.c, 1)
└─TableScan_9 1.00 cop table:outdated_statistics, keep order:false
CREATE TABLE `unknown_correlation` (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to keep this test, or move it to another file. In order to discover plan regression.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is moved to TestLimitCrossEstimation.

for _, col := range cols {
hist, ok := histColl.Columns[col.ID]
if !ok {
return nil, 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why breaking here? Should we continue to compare other columns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't know correlation of a column, we assume they are independent, so the whole condition should be treated as independent?

@eurekaka eurekaka requested a review from zz-jason April 24, 2019 06:52
Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@qw4990 qw4990 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason merged commit f9c82b5 into pingcap:master Apr 24, 2019
@eurekaka eurekaka deleted the corr_estimation branch September 24, 2019 03:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/planner SIG: Planner type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

YCSB workloade is too slow for scan operation
5 participants