Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fix utf8 charset upgrade compatibility #9820

Merged
merged 24 commits into from
Mar 25, 2019

Conversation

crazycs520
Copy link
Contributor

@crazycs520 crazycs520 commented Mar 20, 2019

What problem does this PR solve?

In TiDB v2.0.8

tidb > create table t (a varchar(10)) charset=utf8mb4;
Query OK, 0 rows affected
tidb > insert into t values(x'f09f8c80');
Query OK, 1 row affected
$ curl "http://$IP:10080/schema/test/t"
...
"cols": [
      {
          ...
           "name": {
               "L": "a",
               "O": "a"
          },
        ...
           "type": {
               "Charset": "utf8",
               "Collate": "utf8_bin",

In TiDB v2.0.8, create table with specified table charset, the column charset won't use the table charset, but use UTF8 charset. It is a bug, and already fixed.

Then, after upgrade to TiDB Master,

tidb > show create table t
+-------+--------------------------------------------------------------------+
| Table | Create Table                                                       |
+-------+--------------------------------------------------------------------+
| t     | CREATE TABLE `t` (                                                 |
|       |   `a` varchar(10) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL |
|       | ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin        |
+-------+--------------------------------------------------------------------+

tidb > insert into t values(x'f09f8c80');
(1366, u'incorrect utf8 value f09f8c80(\U0001f300) for column a')

TiDB master have tidb_check_mb4_value_in_utf8 check, so insert x'f09f8c80' to utf8 column will return error.

For the client that upgrade from v2.0.8 to master, It will look like not compatibility. Because insert `x'f09f8c80' will successful in TiDB v2.0.8 but failed in TiDB master.

What is changed and how it works?

Add treat-old-version-utf8-as-utf8mb4 variable to toml config file and session variable (actual is system scope variable) and set default is true.

treat-old-version-utf8-as-utf8mb4 use for upgrade compatibility. Set to true will tread old version table/column UTF8 charset as UTF8MB4.

How to judge the table/column is new or old version?

  • Use tableInfo.Version to judge table is new or old version.
  • Use ColumnInfo.Version to judge column is new or old version.

This PR also increase the table and column version, Please merge parser PR: pingcap/parser#254 first.

drawback
The treat-old-version-utf8-as-utf8mb4 variable is not friendly for some user that specially specified the column charset to utf8.

another
Maybe a better way to fix this compatibility problem is to rebuild table info schema charset by get all ddl histofy and parse it and rebuild for charset. This may spend a little long time and will change the stored table schema data.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change

Side effects

Related changes

  • Need to cherry-pick to the release branch 2.1

@codecov
Copy link

codecov bot commented Mar 20, 2019

Codecov Report

Merging #9820 into master will decrease coverage by 0.0353%.
The diff coverage is 47.2222%.

@@               Coverage Diff                @@
##             master      #9820        +/-   ##
================================================
- Coverage   67.3615%   67.3261%   -0.0354%     
================================================
  Files           383        383                
  Lines         80353      80370        +17     
================================================
- Hits          54127      54110        -17     
- Misses        21397      21416        +19     
- Partials       4829       4844        +15

table/column.go Outdated Show resolved Hide resolved
ddl/column.go Show resolved Hide resolved
@@ -743,6 +743,17 @@ func (h settingsHandler) ServeHTTP(w http.ResponseWriter, req *http.Request) {
return
}
}
if treadOldVersionUTF8AsUTF8MB4 := req.Form.Get("treat-old-version-utf8-as-utf8mb4"); treadOldVersionUTF8AsUTF8MB4 != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a session variable to set this config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the "session variable" actually is system scope, not session.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the session variable for need reload schema after change this variable.

config/config.go Outdated Show resolved Hide resolved
config/config.go Outdated Show resolved Hide resolved
@@ -131,6 +131,9 @@ const (

// TiDBCheckMb4ValueInUtf8 is used to control whether to enable the check wrong utf8 value.
TiDBCheckMb4ValueInUtf8 = "tidb_check_mb4_value_in_utf8"

// TiDBCheckMb4ValueInUtf8 is used to control whether to enable the check wrong utf8 value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix this comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

config/config.go Outdated Show resolved Hide resolved
table/column.go Outdated
@@ -183,6 +183,12 @@ func CastValue(ctx sessionctx.Context, val types.Datum, col *model.ColumnInfo) (
}
str := casted.GetString()
utf8Charset := col.Charset == mysql.UTF8Charset
doUTF8Check := utf8Charset && config.GetGlobalConfig().CheckMb4ValueInUtf8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be doMB4CharCheck

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reset LGTM

@winkyao winkyao added the priority/release-blocker This issue blocks a release. Please solve it ASAP. label Mar 25, 2019
@bb7133
Copy link
Member

bb7133 commented Mar 25, 2019

LGTM

xiekeyi98
xiekeyi98 previously approved these changes Mar 25, 2019
Copy link
Contributor

@xiekeyi98 xiekeyi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@xiekeyi98 xiekeyi98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazycs520
Copy link
Contributor Author

/run-all-tests

@xiekeyi98 xiekeyi98 added the status/LGT3 The PR has already had 3 LGTM. label Mar 25, 2019
@crazycs520
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@zimulala zimulala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the column's or table's version when we executing AlterTableCharsetAndCollate or Change/Modify column? Because we need to add a tool to update old TiDB table versions to the new version.

@winkyao
Copy link
Contributor

winkyao commented Mar 25, 2019

@zimulala Not necessary, we can just change the charset to utf8mb4, and keep the table version unchanged, that is ok.

Copy link
Contributor

@zimulala zimulala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use TableInfoVersion2 to check, but we don't set table versions to TableInfoVersion2 or CurrLatestTableInfoVersion

@winkyao winkyao merged commit c451f00 into pingcap:master Mar 25, 2019
crazycs520 added a commit to crazycs520/tidb that referenced this pull request Mar 25, 2019
@you06 you06 added the sig/sql-infra SIG: SQL Infra label Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/release-blocker This issue blocks a release. Please solve it ASAP. sig/sql-infra SIG: SQL Infra status/LGT3 The PR has already had 3 LGTM. type/compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants