Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM can't sync uppercase tables if forget to set case-sensitive: true in task config #5255

Closed
lance6716 opened this issue Apr 24, 2022 · 9 comments · Fixed by #5307
Closed
Assignees
Labels
affects-5.3 affects-5.4 affects-6.0 area/dm Issues or PRs related to DM. severity/major type/bug The issue is confirmed as a bug.

Comments

@lance6716
Copy link
Contributor

lance6716 commented Apr 24, 2022

What did you do?

task config uses uppercase block-allow-list

$ cat bin/task.yaml
name: test             # 任务名称,需要全局唯一
task-mode: all         # 任务模式,可设为 "full"、"incremental"、"all"

target-database:       # 下游数据库实例配置
  host: "127.0.0.1"
  port: 4000
  user: "root"
  password: ""         # 如果密码不为空,则推荐使用经过 dmctl 加密的密文

## ******** 功能配置集 **********
block-allow-list:        # 上游数据库实例匹配的表的 block-allow-list 过滤规则集,如果 DM 版本 <= v2.0.0-beta.2 则使用 black-white-list
  bw-rule-1:             # 黑白名单配置的名称
    do-dbs: ["test"] # 迁移哪些库
    do-tables:
    - db-name: "test"
      tbl-name: "TM_DATA"

# ----------- 实例配置 -----------
mysql-instances:
  - source-id: "mysql-replica-01"  # 上游实例或者复制组 ID,参考  的  配置
    block-allow-list:  "bw-rule-1" # 黑白名单配置名称,如果 DM 版本 <= v2.0.0-beta.2 则使用 black-white-list
    meta:
      binlog-name: 'mysql-bin.000001'
      binlog-pos: 4

start-task

What did you expect to see?

not sure what's the expected behaviour for test.TM_DATA 🤔

but v2.0.7 can replicate test.TM_DATA with above config.

What did you see instead?

test.TM_DATA is not replicated

Versions of the cluster

DM version (run dmctl -V or dm-worker -V or dm-master -V):

(paste DM version here, and you must ensure versions of dmctl, DM-worker and DM-master are same)

Upstream MySQL/MariaDB server version:

(paste upstream MySQL/MariaDB server version here)

Downstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

How did you deploy DM: tiup or manually?

(leave TiUP or manually here)

Other interesting information (system version, hardware config, etc):

>
>

current status of DM cluster (execute query-status <task-name> in dmctl)

(paste current status of DM cluster here)
@lance6716 lance6716 added type/bug The issue is confirmed as a bug. area/dm Issues or PRs related to DM. affects-5.3 affects-5.4 affects-6.0 labels Apr 24, 2022
@lance6716
Copy link
Contributor Author

The root case is when start-task, DM-master sees a case-sensitive: false task which is the default value. When convert task config to subtask config, DM-master calls SubTaskConfig.Adjust

if err := cfg.Adjust(true); err != nil {

inside Adjust, filter.New will modify the value of the pointer of BAList, so it's converted to lowercase
if _, err := filter.New(c.CaseSensitive, c.BAList); err != nil {

But when operate-source create, DM will assign case-sensitive by the value of lower_case_table_names which is true when MySQL in on Linux.

When DM-worker uses case-sensitive, since the source is sensitive it will treat the subtask as sensitive. So lowercased block-allow-list will not match the tables.

@lance6716
Copy link
Contributor Author

lance6716 commented Apr 25, 2022

my expected behaviour for case sensitive:

the case sensitivity of source should be decided by source's lower_case_table_names

For lower_case_table_names = 0 (case sensitive) source, if it has a table DB.TBL:

task case sensitive task case insensitive
allow list: DB.TBL A: replicated B: replicated
allow list: db.tbl C: not replicated D: ???

This issue doesn't satisfy B

For lower_case_table_names = 1 (case insensitive) source, if it has a table DB.TBL or db.tbl:

task case sensitive task case insensitive
allow list: DB.TBL E: ??? F: replicated
allow list: db.tbl G: ??? H: replicated

lower_case_table_names = 2 (case insensitive) source is very rare, we can treat them as case insensitive.

https://dev.mysql.com/doc/refman/8.0/en/identifier-case-sensitivity.html

Or we deprecate case sensitive in task configuration?

@D3Hunter
Copy link
Contributor

Or we deprecate case sensitive in task configuration?

more intuitive, and can handle cases when different source with different lower_case_table_names config, quite rare though.

@sunzhaoyang
Copy link

deprecate case-sensitive in source and task. automatically detect the lower_case_table_names and decide case sensitive when creating source .

@lance6716
Copy link
Contributor Author

lance6716 commented Apr 25, 2022

So the final expected behaviour is

the case sensitivity of source should be decided by source's lower_case_table_names, user can't set it

For lower_case_table_names = 0 (case sensitive), if it has a table DB.TBL:

allow list: DB.TBL -> replicated
allow list: db.tbl -> not replicated

For lower_case_table_names = 1 (case insensitive) source and lower_case_table_names = 2 (stored sensitive, used insensitive) source, if it has a table DB.TBL or db.tbl:

allow list: DB.TBL -> replicated
allow list: db.tbl -> replicated

@lance6716
Copy link
Contributor Author

lance6716 commented Apr 25, 2022

behaviours of typical versions:

  • in v2.0.7, case-sensitive has both task config and source config. When converting task to subtask at DM-master, it only uses the one from task config, and during the filter.New in Adjust the allow-list may converted to lowercase 😂 . When DM-worker handles it, the final case-sensitive is cfg.CaseSensitive || sourceCfg.CaseSensitive.
  • in v5.3.1, the difference with v2.0.7 is that
    • case-sensitive in source config will be overwritten by upstream lower_case_table_names.
    • Sync unit will fetch upstream lower_case_table_names on its own to decide changing saved table names in checkpoint during upgrading and converting binlog table names to lowercase during running.

@lance6716
Copy link
Contributor Author

lance6716 commented Apr 25, 2022

the upgrading case is very complex, I'll list the available combination of config in different versions.

To replicate DB.TBL

in v2.0.7:

Case case sensitive in task case sensitive in source allow list
A false false DB.TBL/db.tbl or any mixed cases
B false true cannot replicate, since it's converted to lower case during the filter.New in Adjust and saved into etcd
C true false DB.TBL
D true true DB.TBL

in v5.3.1

Case case sensitive in task lower_case_table_names in source allow list
E false 1 DB.TBL
F false 1 db.tbl
G true 1 DB.TBL
H true 0 DB.TBL

If we're going to use lower_case_table_names in source and deprecate case sensitive in task, we check what we can do for those cases:

A: if we can upgrade the value of "case sensitive in source" stored in etcd, and for the lowercase allow-list in etcd, we upgrade the values by checking the upstream table names, no breaking compatibility(if user uses a consistent naming convention, i.e no mixing cases)
B: BREAKING COMPATIBITY, because user now can't use case-insensitive allow-list for case-sensitive source. Even if we correct the metadata in etcd when upgrading, he may still start new task or stop-then-start new task using local files. Maybe we can add a precheck logic about wrong case-sensitive setting?
C, D, E, F, G, H: don't need revise

@niubell
Copy link
Contributor

niubell commented Apr 27, 2022

/assign lance6716

@lance6716
Copy link
Contributor Author

introduced by pingcap/dm#2055

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.3 affects-5.4 affects-6.0 area/dm Issues or PRs related to DM. severity/major type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants