Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lightning: refine progress for compress files import #39219

Merged
merged 12 commits into from
Dec 2, 2022

Conversation

lichunzhu
Copy link
Contributor

What problem does this PR solve?

Issue Number: ref #38514

Problem Summary:

What is changed and how it works?

Currently lightning's progress is not accurate for compress files import. This PR uses reader's posistion to calculate progress for compressed files.

Check List

Tests

  • Manual test (add detailed scripts or steps below)
    check whether lightning progress is correct when importing compressed files.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • None

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Nov 18, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • buchuitoudegou
  • dsdashun

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 18, 2022
@ti-chi-bot ti-chi-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 29, 2022
@lichunzhu lichunzhu marked this pull request as ready for review November 29, 2022 09:31
@ti-chi-bot ti-chi-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 29, 2022
@lichunzhu lichunzhu added the component/lightning This issue is related to Lightning of TiDB. label Nov 29, 2022
@@ -302,7 +302,7 @@ func MakeSourceFileRegion(
// set fileSize to INF to make sure compressed files can be read until EOF. Because we can't get the exact size of the compressed files.
// TODO: update progress bar calculation for compressed files.
if fi.FileMeta.Compression != CompressionNone {
rowIDMax = fileSize * 100 / divisor // FIXME: this is not accurate. Need more tests and fix solution.
rowIDMax = fileSize * 500 / divisor // FIXME: this is not accurate. Need more tests and fix solution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 500 represent the compress ratio? Maybe leave a comment here to illustrate the number.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Dec 1, 2022
Copy link
Contributor

@dsdashun dsdashun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

outLoop:
for !canDeliver {
readDurStart := time.Now()
err = cr.parser.ReadRow()
columnNames := cr.parser.Columns()
newOffset, rowID = cr.parser.Pos()
if cr.chunk.FileMeta.Compression != mydump.CompressionNone {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not compressed, the realOffset is not assigned and 0 is used. Is this the expected behavior ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Not compressed file won't use realOffset.

@@ -2299,6 +2299,8 @@ type deliveredKVs struct {
columns []string
offset int64
rowID int64

realOffset int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add some comment, indicating that realOffset is only used in compressed file scenarios.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 32071ed

Copy link
Contributor

@dsdashun dsdashun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Dec 2, 2022
@lichunzhu
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 32071ed

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Dec 2, 2022
@ti-chi-bot ti-chi-bot merged commit 10b3bc7 into pingcap:master Dec 2, 2022
@sre-bot
Copy link
Contributor

sre-bot commented Dec 2, 2022

TiDB MergeCI notify

✅ Well Done! New fixed [1] after this pr merged.

CI Name Result Duration Compare with Parent commit
idc-jenkins-ci/integration-cdc-test 🔴 failed 2, success 38, total 40 24 min Existing failure
idc-jenkins-ci-tidb/common-test 🔴 failed 1, success 10, total 11 9 min 35 sec Existing failure
idc-jenkins-ci-tidb/integration-common-test ✅ all 17 tests passed 15 min Fixed
idc-jenkins-ci-tidb/tics-test 🟢 all 1 tests passed 5 min 43 sec Existing passed
idc-jenkins-ci-tidb/integration-ddl-test 🟢 all 6 tests passed 5 min 43 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-2 🟢 all 28 tests passed 5 min 24 sec Existing passed
idc-jenkins-ci-tidb/sqllogic-test-1 🟢 all 26 tests passed 5 min 5 sec Existing passed
idc-jenkins-ci-tidb/mybatis-test 🟢 all 1 tests passed 3 min 37 sec Existing passed
idc-jenkins-ci-tidb/integration-compatibility-test 🟢 all 1 tests passed 2 min 47 sec Existing passed
idc-jenkins-ci-tidb/plugin-test 🟢 build success, plugin test success 4min Existing passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/lightning This issue is related to Lightning of TiDB. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants