*: refactor cost model formulas and constants #10581

eurekaka · 2019-05-23T07:56:41Z

What problem does this PR solve?

Our current cost model is too naive to pick out the physical plans we prefer in some scenarios, for example:

cost such as sorting in index lookup operator, or inner cost of index join operator, is not reflected in cost computing at all;
some cost computings are wrong or not that accurate because we are using wrong input row count estimation (e.g, cost computing of TopN operator);

Besides, cost computings for different operators are not uniform now: some operators consider memory cost, others do not; some operators consider operator parallelism, others do not;

What is changed and how it works?

This PR tries to

1. refine cost model to catch up with the current executor implementations
1. and uniform the dimensions we consider in cost computing for all operators, i.e, CPU cost, memory cost, network cost, scan cost, and operator parallelism.

Check List

Tests

Unit test: some UT results are updated
Integration test: some integration tests are updated

Code changes

Has exported function/method change
Has exported variable/fields change
Has interface methods change

Side effects

Possible performance regression

Related changes

Need to cherry-pick to the release branch: we may need this in release-3.0
Need to be included in the release note

zhouqiang-cl · 2019-05-23T08:15:23Z

/rebuild

codecov · 2019-05-29T13:54:06Z

Codecov Report

Merging #10581 into master will decrease coverage by 0.1794%.
The diff coverage is 96.2643%.

@@               Coverage Diff                @@
##             master     #10581        +/-   ##
================================================
- Coverage   81.4101%   81.2307%   -0.1795%     
================================================
  Files           426        426                
  Lines         92513      92028       -485     
================================================
- Hits          75315      74755       -560     
- Misses        11826      11904        +78     
+ Partials       5372       5369         -3

zhouqiang-cl · 2019-06-03T13:19:23Z

/bench

cmd/explaintest/r/tpch.result

executor/builder.go

eurekaka · 2019-06-21T12:13:25Z

/rebuild

eurekaka · 2019-06-21T13:09:29Z

/run-all-tests

eurekaka · 2019-06-21T13:29:26Z

/run-common-test tidb-test=pr/840

eurekaka · 2019-06-21T13:36:43Z

/run-common-test tidb-test=pr/840

eurekaka · 2019-06-21T14:23:02Z

/run-all-tests tidb-test=pr/840

eurekaka · 2019-06-24T05:59:43Z

/run-all-tests tidb-test=pr/840

eurekaka · 2019-06-24T06:23:49Z

/run-all-tests tidb-test=pr/840

planner/core/task.go

statistics/table.go

lzmhhh123

LGTM.

planner/core/task.go

executor/index_lookup_join_test.go

planner/core/cbo_test.go

planner/core/exhaust_physical_plans.go

zz-jason · 2019-08-05T06:31:29Z

statistics/table.go

+			colHist, ok := coll.Columns[col.UniqueID]
+			// Normally this would not happen, it is for compatibility with old version stats which
+			// does not include TotColSize.
+			if !ok || (colHist.TotColSize == 0 && (colHist.NullCount != coll.Count)) {


we can calculate (colHist.TotColSize == 0 && (colHist.NullCount != coll.Count)) once outside the for loop.

We need to get a valid colHist to make this computation check, if we move this check outside the for loop, the code is pretty ugly.

planner/core/exhaust_physical_plans.go

zz-jason · 2019-08-05T12:32:09Z

planner/core/exhaust_physical_plans.go

 	copTask := &copTask{
 		tablePlan:         ts,
 		indexPlanFinished: true,
+		cst:               scanFactor * rowSize * 1.0,


how about replacing 1.0 with ts.stats.RowCount? That will be much clearer.

planner/core/task.go

zz-jason · 2019-08-05T13:38:43Z

planner/core/task.go

+	rCount := rTask.count()
+	if len(p.RightConditions) > 0 {
+		cpuCost += lCount * rCount * cpuFactor
+		rCount *= selectionFactor


maybe rCount is incorrect when we can use index scan on the inner side table, in which condition the scan range is decided by the correlated outer side join key.

But we cannot know the selectivity of the outer key until execution.

zz-jason · 2019-08-05T13:51:09Z

planner/core/task.go

+	cpuCost += probeCost + (innerConcurrency+1.0)*concurrencyFactor
+	// Memory cost of hash tables for inner rows. The computed result is the upper bound,
+	// since the executor is pipelined and not all workers are always in full load.
+	memoryCost := innerConcurrency * (batchSize * distinctFactor) * innerCnt * memoryFactor


should we consider avg row size for each inner row?

The row in memory would have different size compared with its representation in disk and network. Currently, we are using a very small default memoryFactor in order to choose the fastest plan which makes full utilization of resources. To make cost model friendly for memory management, we need to consider row size here indeed. We can leave this to another separate PR later?

planner/core/task.go

eurekaka · 2019-08-07T06:26:21Z

/rebuild

zz-jason

LGTM

sre-bot · 2019-08-07T08:44:42Z

/run-all-tests

sre-bot · 2019-08-07T08:47:40Z

@eurekaka merge failed.

eurekaka · 2019-08-07T09:34:54Z

/run-all-tests tidb-test=pr/840

eurekaka added type/enhancement The issue or PR belongs to an enhancement. sig/planner SIG: Planner labels May 23, 2019

zz-jason reviewed Jun 3, 2019

View reviewed changes

cmd/explaintest/r/tpch.result Outdated Show resolved Hide resolved

executor/builder.go Outdated Show resolved Hide resolved

eurekaka added the status/WIP label Jun 3, 2019

eurekaka force-pushed the cost_model branch 2 times, most recently from 5c4dfa9 to b97ea8e Compare June 17, 2019 09:56

eurekaka force-pushed the cost_model branch from 8994c27 to e1ea81a Compare June 21, 2019 06:34

eurekaka added the status/all tests passed label Jun 21, 2019

eurekaka marked this pull request as ready for review June 24, 2019 05:52

eurekaka removed the status/WIP label Jun 24, 2019

eurekaka requested review from zz-jason, winoros and alivxxx June 24, 2019 08:20

winoros reviewed Jun 26, 2019

View reviewed changes

planner/core/task.go Outdated Show resolved Hide resolved

planner/core/task.go Show resolved Hide resolved

alivxxx reviewed Jun 27, 2019

View reviewed changes

statistics/table.go Outdated Show resolved Hide resolved

eurekaka force-pushed the cost_model branch from 1d60670 to a4592c6 Compare July 2, 2019 10:38

eurekaka requested review from alivxxx and winoros July 2, 2019 10:41

eurekaka force-pushed the cost_model branch from a4592c6 to bc465b6 Compare July 18, 2019 08:17

eurekaka changed the title ~~planner: refactor cost model formulas and constants~~ *: refactor cost model formulas and constants Jul 18, 2019

eurekaka force-pushed the cost_model branch from 47b7990 to 346dea3 Compare July 25, 2019 07:47

lzmhhh123 reviewed Jul 26, 2019

View reviewed changes

lzmhhh123 added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 26, 2019

alivxxx reviewed Jul 31, 2019

View reviewed changes

planner/core/task.go Show resolved Hide resolved

zz-jason reviewed Aug 1, 2019

View reviewed changes

executor/index_lookup_join_test.go Show resolved Hide resolved

planner/core/cbo_test.go Show resolved Hide resolved

planner/core/exhaust_physical_plans.go Outdated Show resolved Hide resolved

planner/core/exhaust_physical_plans.go Outdated Show resolved Hide resolved

eurekaka force-pushed the cost_model branch 2 times, most recently from f949384 to 0617428 Compare August 1, 2019 12:59

eurekaka requested review from zz-jason and alivxxx August 1, 2019 13:00

alivxxx removed their request for review August 2, 2019 06:25

qw4990 removed their request for review August 5, 2019 07:38

zz-jason reviewed Aug 5, 2019

View reviewed changes

eurekaka requested a review from zz-jason August 7, 2019 05:59

eurekaka force-pushed the cost_model branch from 0ad067e to 1c7f97e Compare August 7, 2019 06:29

zz-jason approved these changes Aug 7, 2019

View reviewed changes

zz-jason removed request for winoros and foreyes August 7, 2019 08:36

zz-jason added status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Aug 7, 2019

Merge branch 'master' into cost_model

621b8e1

eurekaka merged commit fe03864 into pingcap:master Aug 7, 2019

eurekaka deleted the cost_model branch August 7, 2019 09:57

eurekaka mentioned this pull request Aug 15, 2019

planner: increase default concurrency factor of cost computing #11752

Merged

lzmhhh123 pushed a commit to lzmhhh123/tidb that referenced this pull request Jan 19, 2020

*: refactor cost model formulas and constants (pingcap#10581)

f8657bd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*: refactor cost model formulas and constants #10581

*: refactor cost model formulas and constants #10581

eurekaka commented May 23, 2019 •

edited

Loading

zhouqiang-cl commented May 23, 2019

codecov bot commented May 29, 2019 •

edited

Loading

zhouqiang-cl commented Jun 3, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 24, 2019

eurekaka commented Jun 24, 2019

lzmhhh123 left a comment

zz-jason Aug 5, 2019

eurekaka Aug 7, 2019

zz-jason Aug 5, 2019

zz-jason Aug 5, 2019

eurekaka Aug 7, 2019

zz-jason Aug 5, 2019

eurekaka Aug 7, 2019

eurekaka commented Aug 7, 2019

zz-jason left a comment

sre-bot commented Aug 7, 2019

sre-bot commented Aug 7, 2019

eurekaka commented Aug 7, 2019

*: refactor cost model formulas and constants #10581

*: refactor cost model formulas and constants #10581

Conversation

eurekaka commented May 23, 2019 • edited Loading

What problem does this PR solve?

What is changed and how it works?

Check List

zhouqiang-cl commented May 23, 2019

codecov bot commented May 29, 2019 • edited Loading

Codecov Report

zhouqiang-cl commented Jun 3, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 21, 2019

eurekaka commented Jun 24, 2019

eurekaka commented Jun 24, 2019

lzmhhh123 left a comment

Choose a reason for hiding this comment

zz-jason Aug 5, 2019

Choose a reason for hiding this comment

eurekaka Aug 7, 2019

Choose a reason for hiding this comment

zz-jason Aug 5, 2019

Choose a reason for hiding this comment

zz-jason Aug 5, 2019

Choose a reason for hiding this comment

eurekaka Aug 7, 2019

Choose a reason for hiding this comment

zz-jason Aug 5, 2019

Choose a reason for hiding this comment

eurekaka Aug 7, 2019

Choose a reason for hiding this comment

eurekaka commented Aug 7, 2019

zz-jason left a comment

Choose a reason for hiding this comment

sre-bot commented Aug 7, 2019

sre-bot commented Aug 7, 2019

eurekaka commented Aug 7, 2019

eurekaka commented May 23, 2019 •

edited

Loading

codecov bot commented May 29, 2019 •

edited

Loading