Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: change default charset and collation from 'utf8 utf8_bin' to 'utf8mb4 utf8mb4_bin' #7965

Merged
merged 17 commits into from
Nov 9, 2018
Merged

Conversation

winkyao
Copy link
Contributor

@winkyao winkyao commented Oct 19, 2018

What problem does this PR solve?

fix #7920.

Change TiDB default charset and collation to "utf8mb4 utf8mb4_bin", TiDB treat all the data as utf8mb4 actually, but the previous default charset is "utf8", insert the 4 bytes unicode string into TiDB will be ok, but if we use mysqldump to restore the data back into mysql, the charset will be utf8, and it will report an error ERROR 1366 (HY000): Incorrect string value: '\xF0\xA4\x8B\xAE' for column 'v' at row 1.

how it works?

  1. change mysql.DefaultCharset from UTF8Charset to UTF8MB4Charset.
  2. change mysql.DefaultCollationName from UTF8DefaultCollation to UTF8MB4DefaultCollation.
  3. Find all places that used charset.CharsetUTF8 and CollationUTF8, modify them to charset.CharsetUTF8MB4 or mysql.DefaultCharset.

Then fix corresponding test cases.

Check List

Tests

  • exits test cases

Code changes

  • Has exported variable/fields change

Related changes

  • Need to update the documentation
  • Need to be included in the release note

@winkyao winkyao added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Oct 19, 2018
@winkyao winkyao mentioned this pull request Oct 19, 2018
3 tasks
@@ -365,8 +365,8 @@ func DefaultTypeForValue(value interface{}, tp *FieldType) {
// TODO: tp.Flen should be len(x) * 3 (max bytes length of CharsetUTF8)
tp.Flen = len(x)
tp.Decimal = UnspecifiedLength
tp.Charset = mysql.DefaultCharset
tp.Collate = mysql.DefaultCollationName
tp.Charset = charset.CharsetUTF8MB4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use GetDefaultCharsetAndCollate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me for this PR. My PR will change it back though :)

Copy link
Contributor

@gregwebs gregwebs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. However, I think we are treading on very dangerous ground with change until we just make utf8mb4 the default: #7757

@morgo morgo mentioned this pull request Oct 19, 2018
69 tasks
@winkyao
Copy link
Contributor Author

winkyao commented Oct 22, 2018

@shenli PTAL

@winkyao
Copy link
Contributor Author

winkyao commented Nov 5, 2018

The ci will be fixed after pingcap/parser#13 merged.

@winkyao winkyao removed the status/DNM label Nov 5, 2018
@winkyao
Copy link
Contributor Author

winkyao commented Nov 8, 2018

/run-all-tests

@winkyao
Copy link
Contributor Author

winkyao commented Nov 8, 2018

/run-all-tests -tidb-test=pr/646

@winkyao
Copy link
Contributor Author

winkyao commented Nov 9, 2018

/run-integration-ddl-test -tidb-test=pr/646

@winkyao
Copy link
Contributor Author

winkyao commented Nov 9, 2018

/run-sqllogic-test -tidb-test=pr/646

@winkyao
Copy link
Contributor Author

winkyao commented Nov 9, 2018

/run-integration-ddl-test -tidb-test=pr/646

Copy link
Contributor

@XuHuaiyu XuHuaiyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

a.RetTp.Charset = charset.CharsetUTF8
a.RetTp.Collate = charset.CollationUTF8
a.RetTp.Charset = charset.CharsetUTF8MB4
a.RetTp.Collate = charset.CollationUTF8MB4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/charset.CharsetUTF8MB4/mysql.DefaultCharset/
s/charset.CollationUTF8MB4/mysql.DefaultCollationName/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or: charset.GetDefaultCharsetAndCollate()

Flag: mysql.BinaryFlag,
}
}
if mysql.HasBinaryFlag(fieldType.Flag) && fieldType.Tp != mysql.TypeJSON {
fieldType.Charset, fieldType.Collate = charset.CharsetBin, charset.CollationBin
} else {
fieldType.Charset, fieldType.Collate = charset.CharsetUTF8, charset.CharsetUTF8
fieldType.Charset, fieldType.Collate = mysql.DefaultCharset, mysql.DefaultCollationName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about: charset.GetDefaultCharsetAndCollate()

@@ -199,7 +199,7 @@ func (b *baseBuiltinFunc) getRetTp() *types.FieldType {
b.tp.Tp = mysql.TypeMediumBlob
}
if len(b.tp.Charset) <= 0 {
b.tp.Charset, b.tp.Collate = charset.CharsetUTF8, charset.CollationUTF8
b.tp.Charset, b.tp.Collate = mysql.DefaultCharset, mysql.DefaultCollationName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Charset: charset.CharsetUTF8,
Collate: charset.CollationUTF8,
Charset: charset.CharsetUTF8MB4,
Collate: charset.CollationUTF8MB4,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -1741,7 +1741,7 @@ func WrapWithCastAsString(ctx sessionctx.Context, expr Expression) Expression {
argLen = mysql.MaxIntWidth
}
tp := types.NewFieldType(mysql.TypeVarString)
tp.Charset, tp.Collate = charset.CharsetUTF8, charset.CollationUTF8
tp.Charset, tp.Collate = charset.CharsetUTF8MB4, charset.CollationUTF8MB4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -141,15 +141,15 @@ func newBaseBuiltinFuncWithTp(ctx sessionctx.Context, args []Expression, retType
Tp: mysql.TypeJSON,
Flen: mysql.MaxBlobWidth,
Decimal: 0,
Charset: charset.CharsetUTF8,
Collate: charset.CollationUTF8,
Charset: mysql.DefaultCharset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not charset.CharsetUTF8MB4?

Charset: charset.CharsetUTF8,
Collate: charset.CollationUTF8,
Charset: mysql.DefaultCharset,
Collate: mysql.DefaultCollationName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not charset.CollationUTF8MB4

Flag: mysql.BinaryFlag,
}
}
if mysql.HasBinaryFlag(fieldType.Flag) && fieldType.Tp != mysql.TypeJSON {
fieldType.Charset, fieldType.Collate = charset.CharsetBin, charset.CollationBin
} else {
fieldType.Charset, fieldType.Collate = charset.CharsetUTF8, charset.CharsetUTF8
fieldType.Charset, fieldType.Collate = mysql.DefaultCharset, mysql.DefaultCollationName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -199,7 +199,7 @@ func (b *baseBuiltinFunc) getRetTp() *types.FieldType {
b.tp.Tp = mysql.TypeMediumBlob
}
if len(b.tp.Charset) <= 0 {
b.tp.Charset, b.tp.Collate = charset.CharsetUTF8, charset.CollationUTF8
b.tp.Charset, b.tp.Collate = mysql.DefaultCharset, mysql.DefaultCollationName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

if mysql.HasBinaryFlag(rhs.Flag) || !types.IsNonBinaryStr(lhs) {
resultFieldType.Flag |= mysql.BinaryFlag
}
} else if types.IsBinaryStr(lhs) || types.IsBinaryStr(rhs) || !evalType.IsStringKind() {
types.SetBinChsClnFlag(resultFieldType)
} else {
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = charset.CharsetUTF8, charset.CollationUTF8, 0
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = mysql.DefaultCharset, mysql.DefaultCollationName, 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -164,7 +164,7 @@ func (c *caseWhenFunctionClass) getFunction(ctx sessionctx.Context, args []Expre
}
fieldTp.Decimal, fieldTp.Flen = decimal, flen
if fieldTp.EvalType().IsStringKind() && !isBinaryStr {
fieldTp.Charset, fieldTp.Collate = mysql.DefaultCharset, mysql.DefaultCollationName
fieldTp.Charset, fieldTp.Collate = charset.CharsetUTF8MB4, charset.CollationUTF8MB4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @XuHuaiyu , use mysql.DefaultCharset or charset.CharsetUTF8MB4?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will mysql.DefaultCharset change? If so, we should we utf8mb4.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XuHuaiyu, DefaultCharset is better. It hides the implementation details, once we change the default charset again, the code modification can be minimized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this place should be utf8mb4, if we did not set the charset, it can be default charset, but in here, it should definitely be utf8mb4.

@winkyao
Copy link
Contributor Author

winkyao commented Nov 9, 2018

@zz-jason @crazycs520 PTAL

executor/show.go Outdated
@@ -618,7 +618,7 @@ func (e *ShowExec) fetchShowCreateTable() error {
buf.WriteString(") ENGINE=InnoDB")
charsetName := tb.Meta().Charset
if len(charsetName) == 0 {
charsetName = charset.CharsetUTF8
charsetName = charset.CharsetUTF8MB4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/charset.CharsetUTF8MB4/mysql.DefaultCharset/

@@ -81,19 +81,19 @@ func inferType4ControlFuncs(lhs, rhs *types.FieldType) *types.FieldType {
}
}
if types.IsNonBinaryStr(lhs) && !types.IsBinaryStr(rhs) {
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = charset.CharsetUTF8, charset.CollationUTF8, 0
resultFieldType.Charset, resultFieldType.Collate, resultFieldType.Flag = charset.CharsetUTF8MB4, charset.CollationUTF8MB4, 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use mysql.DefaultCharset, mysql.DefaultCollationName instead.

cs = mysql.DefaultCharset
cl = mysql.DefaultCollationName
cs = charset.CharsetUTF8MB4
cl = charset.CollationUTF8MB4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cs, cl = charset.GetDefaultCharsetAndCollate()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

mCharset = mysql.DefaultCharset
mCollation = mysql.DefaultCollationName
mCharset = charset.CharsetUTF8MB4
mCollation = charset.CollationUTF8MB4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

charset.GetDefaultCharsetAndCollate()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here should be definitely utf8mb4

server/server.go Outdated
@@ -153,6 +153,7 @@ func NewServer(cfg *config.Config, driver IDriver) (*Server, error) {
var err error
if cfg.Socket != "" {
if s.listener, err = net.Listen("unix", cfg.Socket); err == nil {
// job.SnapshotVer == 0 means
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor

@crazycs520 crazycs520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reset LGTM

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zz-jason zz-jason added the status/LGT2 Indicates that a PR has LGTM 2. label Nov 9, 2018
@zz-jason zz-jason merged commit 29f14d4 into pingcap:master Nov 9, 2018
@winkyao winkyao deleted the charset_mb4 branch November 9, 2018 11:03
winkyao added a commit that referenced this pull request Dec 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change default charset and collation from utf8 to utf8mb4
6 participants