MySQL -- utf8 - utf8mb4

通過一個emoji表情引發(fā)的報錯來認(rèn)識下MySQL中獨一無二的utf8mb4

1.發(fā)現(xiàn)問題

java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\xAB' for column 'name' at row 1
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2487)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
    at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1197)
    at com.example.demo.tst.Three.main(Three.java:51)

碰到這種四個字節(jié)(\xF0\x9F\x98\xAB)的報錯,說明字符集有問題。

2. 分析問題

第一篇文章中,我們就說過mysql中的字符集轉(zhuǎn)換是怎么玩的,那就一個一個來看下哪里的問題。

表結(jié)構(gòu):

mysql> show create table account;
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table   | Create Table
                                                                                                                                  |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| account | CREATE TABLE `account` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT '自增id',
  `name` varchar(100) CHARACTER SET utf8mb4 DEFAULT NULL,
  `balance` int(11) DEFAULT NULL COMMENT '余額',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8     |
+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
  • 先看庫中字段的字符集
mysql> show full columns from account;
+---------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+----------+
| Field   | Type         | Collation       | Null | Key | Default | Extra          | Privileges                      | Comment  |
+---------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+----------+
| id      | int(11)      | NULL            | NO   | PRI | NULL    | auto_increment | select,insert,update,references | 自增id   |
| name    | varchar(100) | utf8_general_ci | YES  |     | NULL    |                | select,insert,update,references |          |
| balance | int(11)      | NULL            | YES  |     | NULL    |                | select,insert,update,references | 余額     |
+---------+--------------+-----------------+------+-----+---------+----------------+---------------------------------+----------+
3 rows in set (0.01 sec)

上面報錯的字段是name,這里其比較規(guī)則為utf8_general_ci,我們知道這是utf8 的比較規(guī)則,MySQL中的utf8字符集實際是utf8mb3,最大只能放三個字節(jié)。
那就改唄,四個字節(jié)的utf8 -- utf8mb4

MySQL版本5.1.13以后開始支持utf8mb4

3.解決問題

  • 檢查列字符集:
mysql> alter table account modify name varchar(100) character set utf8mb4 collate utf8mb4_general_ci;
Query OK, 1 row affected (0.76 sec)
Records: 1  Duplicates: 0  Warnings: 0
  • 檢查MySQL服務(wù)端那三個字符集
mysql> show variables like 'character_set_%';
+--------------------------+---------------------------------------+
| Variable_name            | Value                                 |
+--------------------------+---------------------------------------+
| character_set_client     | utf8                               |
| character_set_connection | utf8                               |
| character_set_database   | utf8                                  |
| character_set_filesystem | binary                                |
| character_set_results    | utf8                               |
| character_set_server     | utf8                                  |
| character_set_system     | utf8                                  |
| character_sets_dir       | E:\MySQL\mysql-5.7.26\share\charsets\ |
+--------------------------+---------------------------------------+
8 rows in set, 1 warning (0.00 sec)

啥?居然都是utf8,沒的說,改:

mysql> set names utf8mb4;
Query OK, 0 rows affected (0.00 sec)

再查看下:

mysql> show variables like 'character_set_%';
+--------------------------+---------------------------------------+
| Variable_name            | Value                                 |
+--------------------------+---------------------------------------+
| character_set_client     | utf8mb4                               |
| character_set_connection | utf8mb4                               |
| character_set_database   | utf8                                  |
| character_set_filesystem | binary                                |
| character_set_results    | utf8mb4                               |
| character_set_server     | utf8                                  |
| character_set_system     | utf8                                  |
| character_sets_dir       | E:\MySQL\mysql-5.7.26\share\charsets\ |
+--------------------------+---------------------------------------+
8 rows in set, 1 warning (0.00 sec)

嗯,改回來了~

  • 檢查客戶端發(fā)送字符集(這步有點坑~)

上面兩步都能理解,這一步可能有點不明白為啥還要再來一個字符集?
其實,一個字符從出生(發(fā)送)到安全落庫(落盤),經(jīng)歷的字符集可能有點多,但其第一個字符集就是這個所謂的客戶端發(fā)送字符集。

雖然要經(jīng)過這么多的字符集轉(zhuǎn)換,但并不會有那么多的轉(zhuǎn)換,因為實際情況下會將其中幾個字符集都設(shè)為一致~

我們是通過Java代碼連接MySQL,這里可以在Connector/J中指定(省略其他配置):

 jdbc:mysql://localhost:3306/asia?characterEncoding=utf8mb4&useUnicode=true

不過,你會發(fā)現(xiàn)這樣啟動會報錯告訴你不支持這么寫。。
這里先貼下官方說明:

Setting the Character Encoding

The character encoding between client and server is automatically detected upon connection (provided that the Connector/J connection properties characterEncoding and connectionCollation are not set). You specify the encoding on the server using the system variable character_set_server (for more information, see Server Character Set and Collation). The driver automatically uses the encoding specified by the server. For example, to use the 4-byte UTF-8 character set with Connector/J, configure the MySQL server with character_set_server=utf8mb4, and leave characterEncoding and connectionCollation out of the Connector/J connection string. Connector/J will then autodetect the UTF-8 setting.

MySQL告訴我們?nèi)缛粢褂?個字節(jié)的utf8,那你需要做兩件事:

  • MySQL服務(wù)端配置character_set_server=utf8mb4
  • 不要設(shè)置characterEncoding and connectionCollation屬性

網(wǎng)上還有一種說法:

Connector/J did not support utf8mb4 for servers 5.5.2 and newer.

Connector/J now auto-detects servers configured with character_set_server=utf8mb4 or treats the Java encoding utf-8 passed usingcharacterEncoding=... as utf8mb4 in the SET NAMES= calls it makes when establishing the connection. (Bug #54175)

這個意思是Connector/J并不支持utf8mb4,不過有兩種方式等價:

  • MySQL服務(wù)器中配置加上character_set_server=utf8mb4
  • Connector/J中加上characterEncoding=utf8

本文選擇相信官方文檔,那個Bug #54175貼出來也是求同存異, 也可能是我理解錯了[狗頭]


我這里就直接改庫中字符集了:

mysql> set character_set_server=utf8mb4;
Query OK, 0 rows affected (0.00 sec)

嗯,上面吹了這么多牛逼,結(jié)果發(fā)現(xiàn)還是不行:

java.sql.SQLException: Incorrect string value: '\xF0\x9F\x98\xAB' for column 'name' at row 1
    at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:964)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3973)
    at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3909)
    at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2527)
    at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2680)
    at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2487)
    at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
    at com.mysql.jdbc.PreparedStatement.execute(PreparedStatement.java:1197)
    at com.example.demo.tst.Three.main(Three.java:51)

原因很簡單,需要重啟,不過重啟讀的是配置文件中的配置,本次會話手動設(shè)置的字符集將全部失效。
所以建議寫在MySQL配置文件中 my.iniwindows系統(tǒng))
綜上的配置如下(省略其他配置):

[client]
# 服務(wù)端的三個字符集,效果等同 set NAMES utf8mb4
default-character-set=utf8mb4

[mysqld]
# 指定服務(wù)器系統(tǒng)字符集
character-set-server=utf8mb4
collation-server=utf8mb4_general_ci

這里的比較規(guī)則是 ci(默認(rèn)是這種比較規(guī)則),是不區(qū)分大小寫的,如果需要可以改成utf8mb4_bin

再把服務(wù)重啟一哈:

C:\Windows\system32>net stop mysql
MySQL 服務(wù)正在停止..
MySQL 服務(wù)已成功停止。

C:\Windows\system32>
C:\Windows\system32>net start mysql
MySQL 服務(wù)正在啟動 ..
MySQL 服務(wù)已經(jīng)啟動成功。

代碼再執(zhí)行就執(zhí)行成功了,查詢看看:

mysql> select * from account;
+----+------+---------+
| id | name | balance |
+----+------+---------+
|  1 | ??     |      10 |
+----+------+---------+
1 row in set (0.00 sec)

這個表情真丑。。

這里還可以玩一個東西,我們把服務(wù)端的三個字符集改成utf8,再查詢看看:

mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql> select * from account;
+----+------+---------+
| id | name | balance |
+----+------+---------+
|  1 | ?    |      10 |
+----+------+---------+
1 row in set (0.00 sec)

還可以在這個時候來把列name的字符集改回utf8試試:

mysql> alter table account modify name varchar(100) character set utf8 collate utf8_general_ci;
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x98\xAB' for column 'name' at row 2

挺好玩吧~~~


番外

  • 本地開發(fā)環(huán)境測試的時候發(fā)現(xiàn),當(dāng)配置文件中加了這個配置character-set-server=utf8mb4,那么無論其他的字符集你怎么玩,連接怎么設(shè)置,你的emoji表情都能正常入庫,而生產(chǎn)環(huán)境不行。

推論: character-set-server這個字符集的作用可能不止于指定發(fā)送字符集為utf8mb4, 甚至猜測不能指定發(fā)送的字符集

如果有大佬知道的,還望留言告知,萬謝!

  • 鑒于以上情況,測試環(huán)境characterEncoding=utf8 此配置基本無用.
  • 對于生產(chǎn)環(huán)境不太好修改庫設(shè)置的,可以在數(shù)據(jù)源中增加 initSql,以druid配置為例:
dataSource = new DruidDataSource();
// 省略其他配置
List<String> sqlList = new ArrayList<>();
sqlList.add("SET NAMES utf8mb4");
dataSource.setConnectionInitSqls(sqlList);

每次連接都會去設(shè)置字符集為uft8mb4,生產(chǎn)環(huán)境可行。

  • 最后捋一捋思路
    • 庫中字段字符集為utf8mb4
    • 連接的字符集要為utf8mb4, 就是連接過程中的那三個, 通過SET NAMES UTF8MB4指定,這一步可以走庫配置也可以走數(shù)據(jù)源設(shè)置,具體見上文
    • 發(fā)送字符集要為utf8mb4,也就是這個配置character-set-server=utf8mb4

按照這個思路,BMP中的字符保管你全部都能存。

BMP是啥? 我下篇文章就寫寫Unicode吧,等我~

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容