MySQL 知识点分享一:utf8 字符集和排序规则

这篇具有很好参考价值的文章主要介绍了MySQL 知识点分享一:utf8 字符集和排序规则。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

我们经常能在数据库中看到这些: utf8mb4 和 utf8, utf8mb4_unicode_ci, utf8mb4_general_ci, utf8mb4_bin 分别代表什么意思呢?

其实他们表示的是字符集 和 排序规则

字符集:就是用来定义字符在数据库中的编码的集合。 排序规则:用来定义比较字符串的方式。 字符集和排序规则是一对多的关系

一 MySQL 支持多个 Unicode 字符集

utf8mb4: Unicode 字符集的 UTF-8 编码, 每个字符使用 1-4 个字节, mb4 即 most bytes 4

utf8mb3: Unicode 字符集的 UTF-8 编码, 每个字符使用 1-3 个字节. MySQL 8.0 中不推荐使用此字符集, 应改用utf8mb4

ucs2: Unicode 字符集的 UCS-2 编码, 每个字符使用 2 个字节. MySQL 8.0.28 中已弃用, 您应该希望在将来的版本中删除对该字符集的支持

utf16: Unicode字符集的 UTF-16 编码, 每个字符使用 2-4 字节. 与 ucs2 类似, 但具有补充字符的扩展名

utf16le: Unicode 字符集的 UTF-16LE 编码. 像 utf16, 但小端序而不是大端序

utf32: Unicode 字符集的 UTF-32 编码, 每个字符使用 4 个字节

utf8: 是utf8mb3 的别名, 节省空间但不能表达全部的UTF-8 (比如emoji表情). MySQL 的 utf8 实际上不是真正的 UTF-8. 在 MySQL 8.0 中, 此别名已弃用, 改用utf8mb4, utf8 有望在未来的版本中成为 utf8mb4 的别名. 从 MySQL 8.0.28 开始, 在信息模式表的列和 SQL SHOW 语句的输出中, utf8mb3也显示在utf8的位置.

utf8mb4 utf16 utf16le 和 utf32 支持BMP(Basic Multilingual Plane: 基本多语言平面)字符和位于BMP之外的补充字符. utf8mb3 和 ucs2 仅支持 BMP 字符

utf8mb4 是 utf8mb3 的超集

推荐

为了避免 utf8 含义的歧义, 请考虑为字符集引用显式指定 utf8mb4

二 排序规则

大多数Unicode字符集都有一个通用排序规则(由名称中的_general表示, 或由缺少语言说明符表示), 一个二进制排序规则(名称中的_bin表示)和几个特定于语言的排序规则(通过语言说明符指示).

例如, 对于utf8mb4, utf8mb4_general_ci 和 utf8mb 4_bin 是其通用和二进制排序规则, 而 utf8mb4_unicode_ci 是其特定语言的排序规则之一

大多数字符集都有一个二进制排序规则, utf8mb4 是一个例外, 它有两个: utf8mb4_bin 和 utf8mb4_0900_bin(从MySQL 8.0.17起) 这两个二进制排序规则具有相同的排序顺序, 但通过其 pad 属性和排序权重特性进行区分

utf8mb4_general_ci 和 utf8mb4_unicode_ci 对比

从准确性比较:

utf8mb4_unicode_ci 是基于标准的 Unicode 来排序和比较, 能够在各种语言之间精确排序

utf8mb4_general_ci 没有实现 Unicode 排序规则, 在遇到某些特殊语言或者字符集, 排序结果可能不一致. 但在绝大多数情况下, 这些特殊字符的顺序并不需要那么精确

从性能比较:

utf8mb4_general_ci 在比较和排序的时候更快

utf8mb4_unicode_ci 在特殊情况下, Unicode 排序规则为了能够处理特殊字符的情况, 实现了略微复杂的排序算法. 但是在绝大多数情况不会发生此类复杂比较

utf8mb4_0900_ai_ci 和 utf8mb4_bin

utf8mb4_0900_ai_ci: 0900代表Unicode 9.0的规范, ai表示 accent insensitivity(不区分音调), 而ci表示case insensitivity(不区分大小写) 而 as 区分音调 cs 表示区分大小写

utf8mb4_bin: 将字符串每个字符用二进制数据编译存储, 区分大小写, 而且可以存二进制的内容

8.0之后默认不再像之前版本一样是utf8mb4_general_ci 而是统一更新成了utf8mb4_0900_ai_ci

推荐

utf8mb4_0900_ai_ci(默认值) 或 utf8mb4_bin

三 字符集和排序规则怎么看

首先查看支持的字符集和排序规则

MySQL不同版本支持的字符集有所不同,可以使用show charset 和 show collation 查看

mysql> show charset;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| binary   | Binary pseudo charset           | binary              |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| utf8     | UTF-8 Unicode                   | utf8_general_ci     |      3 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_0900_ai_ci  |      4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.01 sec)
 
mysql> show collation;
+----------------------------+----------+-----+---------+----------+---------+---------------+
| Collation                  | Charset  | Id  | Default | Compiled | Sortlen | Pad_attribute |
+----------------------------+----------+-----+---------+----------+---------+---------------+
| armscii8_bin               | armscii8 |  64 |         | Yes      |       1 | PAD SPACE     |
| armscii8_general_ci        | armscii8 |  32 | Yes     | Yes      |       1 | PAD SPACE     |
| ascii_bin                  | ascii    |  65 |         | Yes      |       1 | PAD SPACE     |
| ascii_general_ci           | ascii    |  11 | Yes     | Yes      |       1 | PAD SPACE     |
| big5_bin                   | big5     |  84 |         | Yes      |       1 | PAD SPACE     |
| big5_chinese_ci            | big5     |   1 | Yes     | Yes      |       1 | PAD SPACE     |
| binary                     | binary   |  63 | Yes     | Yes      |       1 | NO PAD        |
| cp1250_bin                 | cp1250   |  66 |         | Yes      |       1 | PAD SPACE     |
| cp1250_croatian_ci         | cp1250   |  44 |         | Yes      |       1 | PAD SPACE     |
| cp1250_czech_cs            | cp1250   |  34 |         | Yes      |       2 | PAD SPACE     |
| cp1250_general_ci          | cp1250   |  26 | Yes     | Yes      |       1 | PAD SPACE     |
| cp1250_polish_ci           | cp1250   |  99 |         | Yes      |       1 | PAD SPACE     |
| cp1251_bin                 | cp1251   |  50 |         | Yes      |       1 | PAD SPACE     |
| cp1251_bulgarian_ci        | cp1251   |  14 |         | Yes      |       1 | PAD SPACE     |
| cp1251_general_ci          | cp1251   |  51 | Yes     | Yes      |       1 | PAD SPACE     |
| cp1251_general_cs          | cp1251   |  52 |         | Yes      |       1 | PAD SPACE     |
| cp1251_ukrainian_ci        | cp1251   |  23 |         | Yes      |       1 | PAD SPACE     |
| cp1256_bin                 | cp1256   |  67 |         | Yes      |       1 | PAD SPACE     |
| cp1256_general_ci          | cp1256   |  57 | Yes     | Yes      |       1 | PAD SPACE     |
| cp1257_bin                 | cp1257   |  58 |         | Yes      |       1 | PAD SPACE     |
| cp1257_general_ci          | cp1257   |  59 | Yes     | Yes      |       1 | PAD SPACE     |
| cp1257_lithuanian_ci       | cp1257   |  29 |         | Yes      |       1 | PAD SPACE     |
| cp850_bin                  | cp850    |  80 |         | Yes      |       1 | PAD SPACE     |
| cp850_general_ci           | cp850    |   4 | Yes     | Yes      |       1 | PAD SPACE     |
| cp852_bin                  | cp852    |  81 |         | Yes      |       1 | PAD SPACE     |
| cp852_general_ci           | cp852    |  40 | Yes     | Yes      |       1 | PAD SPACE     |
| cp866_bin                  | cp866    |  68 |         | Yes      |       1 | PAD SPACE     |
| cp866_general_ci           | cp866    |  36 | Yes     | Yes      |       1 | PAD SPACE     |
| cp932_bin                  | cp932    |  96 |         | Yes      |       1 | PAD SPACE     |
| cp932_japanese_ci          | cp932    |  95 | Yes     | Yes      |       1 | PAD SPACE     |
| dec8_bin                   | dec8     |  69 |         | Yes      |       1 | PAD SPACE     |
| dec8_swedish_ci            | dec8     |   3 | Yes     | Yes      |       1 | PAD SPACE     |
| eucjpms_bin                | eucjpms  |  98 |         | Yes      |       1 | PAD SPACE     |
| eucjpms_japanese_ci        | eucjpms  |  97 | Yes     | Yes      |       1 | PAD SPACE     |
| euckr_bin                  | euckr    |  85 |         | Yes      |       1 | PAD SPACE     |
| euckr_korean_ci            | euckr    |  19 | Yes     | Yes      |       1 | PAD SPACE     |
| gb18030_bin                | gb18030  | 249 |         | Yes      |       1 | PAD SPACE     |
| gb18030_chinese_ci         | gb18030  | 248 | Yes     | Yes      |       2 | PAD SPACE     |
| gb18030_unicode_520_ci     | gb18030  | 250 |         | Yes      |       8 | PAD SPACE     |
| gb2312_bin                 | gb2312   |  86 |         | Yes      |       1 | PAD SPACE     |
| gb2312_chinese_ci          | gb2312   |  24 | Yes     | Yes      |       1 | PAD SPACE     |
| gbk_bin                    | gbk      |  87 |         | Yes      |       1 | PAD SPACE     |
| gbk_chinese_ci             | gbk      |  28 | Yes     | Yes      |       1 | PAD SPACE     |
| geostd8_bin                | geostd8  |  93 |         | Yes      |       1 | PAD SPACE     |
| geostd8_general_ci         | geostd8  |  92 | Yes     | Yes      |       1 | PAD SPACE     |
| greek_bin                  | greek    |  70 |         | Yes      |       1 | PAD SPACE     |
| greek_general_ci           | greek    |  25 | Yes     | Yes      |       1 | PAD SPACE     |
| hebrew_bin                 | hebrew   |  71 |         | Yes      |       1 | PAD SPACE     |
| hebrew_general_ci          | hebrew   |  16 | Yes     | Yes      |       1 | PAD SPACE     |
| hp8_bin                    | hp8      |  72 |         | Yes      |       1 | PAD SPACE     |
| hp8_english_ci             | hp8      |   6 | Yes     | Yes      |       1 | PAD SPACE     |
| keybcs2_bin                | keybcs2  |  73 |         | Yes      |       1 | PAD SPACE     |
| keybcs2_general_ci         | keybcs2  |  37 | Yes     | Yes      |       1 | PAD SPACE     |
| koi8r_bin                  | koi8r    |  74 |         | Yes      |       1 | PAD SPACE     |
| koi8r_general_ci           | koi8r    |   7 | Yes     | Yes      |       1 | PAD SPACE     |
| koi8u_bin                  | koi8u    |  75 |         | Yes      |       1 | PAD SPACE     |
| koi8u_general_ci           | koi8u    |  22 | Yes     | Yes      |       1 | PAD SPACE     |
| latin1_bin                 | latin1   |  47 |         | Yes      |       1 | PAD SPACE     |
| latin1_danish_ci           | latin1   |  15 |         | Yes      |       1 | PAD SPACE     |
| latin1_general_ci          | latin1   |  48 |         | Yes      |       1 | PAD SPACE     |
| latin1_general_cs          | latin1   |  49 |         | Yes      |       1 | PAD SPACE     |
| latin1_german1_ci          | latin1   |   5 |         | Yes      |       1 | PAD SPACE     |
| latin1_german2_ci          | latin1   |  31 |         | Yes      |       2 | PAD SPACE     |
| latin1_spanish_ci          | latin1   |  94 |         | Yes      |       1 | PAD SPACE     |
| latin1_swedish_ci          | latin1   |   8 | Yes     | Yes      |       1 | PAD SPACE     |
| latin2_bin                 | latin2   |  77 |         | Yes      |       1 | PAD SPACE     |
| latin2_croatian_ci         | latin2   |  27 |         | Yes      |       1 | PAD SPACE     |
| latin2_czech_cs            | latin2   |   2 |         | Yes      |       4 | PAD SPACE     |
| latin2_general_ci          | latin2   |   9 | Yes     | Yes      |       1 | PAD SPACE     |
| latin2_hungarian_ci        | latin2   |  21 |         | Yes      |       1 | PAD SPACE     |
| latin5_bin                 | latin5   |  78 |         | Yes      |       1 | PAD SPACE     |
| latin5_turkish_ci          | latin5   |  30 | Yes     | Yes      |       1 | PAD SPACE     |
| latin7_bin                 | latin7   |  79 |         | Yes      |       1 | PAD SPACE     |
| latin7_estonian_cs         | latin7   |  20 |         | Yes      |       1 | PAD SPACE     |
| latin7_general_ci          | latin7   |  41 | Yes     | Yes      |       1 | PAD SPACE     |
| latin7_general_cs          | latin7   |  42 |         | Yes      |       1 | PAD SPACE     |
| macce_bin                  | macce    |  43 |         | Yes      |       1 | PAD SPACE     |
| macce_general_ci           | macce    |  38 | Yes     | Yes      |       1 | PAD SPACE     |
| macroman_bin               | macroman |  53 |         | Yes      |       1 | PAD SPACE     |
| macroman_general_ci        | macroman |  39 | Yes     | Yes      |       1 | PAD SPACE     |
| sjis_bin                   | sjis     |  88 |         | Yes      |       1 | PAD SPACE     |
| sjis_japanese_ci           | sjis     |  13 | Yes     | Yes      |       1 | PAD SPACE     |
| swe7_bin                   | swe7     |  82 |         | Yes      |       1 | PAD SPACE     |
| swe7_swedish_ci            | swe7     |  10 | Yes     | Yes      |       1 | PAD SPACE     |
| tis620_bin                 | tis620   |  89 |         | Yes      |       1 | PAD SPACE     |
| tis620_thai_ci             | tis620   |  18 | Yes     | Yes      |       4 | PAD SPACE     |
| ucs2_bin                   | ucs2     |  90 |         | Yes      |       1 | PAD SPACE     |
| ucs2_croatian_ci           | ucs2     | 149 |         | Yes      |       8 | PAD SPACE     |
| ucs2_czech_ci              | ucs2     | 138 |         | Yes      |       8 | PAD SPACE     |
| ucs2_danish_ci             | ucs2     | 139 |         | Yes      |       8 | PAD SPACE     |
| ucs2_esperanto_ci          | ucs2     | 145 |         | Yes      |       8 | PAD SPACE     |
| ucs2_estonian_ci           | ucs2     | 134 |         | Yes      |       8 | PAD SPACE     |
| ucs2_general_ci            | ucs2     |  35 | Yes     | Yes      |       1 | PAD SPACE     |
| ucs2_general_mysql500_ci   | ucs2     | 159 |         | Yes      |       1 | PAD SPACE     |
| ucs2_german2_ci            | ucs2     | 148 |         | Yes      |       8 | PAD SPACE     |
| ucs2_hungarian_ci          | ucs2     | 146 |         | Yes      |       8 | PAD SPACE     |
| ucs2_icelandic_ci          | ucs2     | 129 |         | Yes      |       8 | PAD SPACE     |
| ucs2_latvian_ci            | ucs2     | 130 |         | Yes      |       8 | PAD SPACE     |
| ucs2_lithuanian_ci         | ucs2     | 140 |         | Yes      |       8 | PAD SPACE     |
| ucs2_persian_ci            | ucs2     | 144 |         | Yes      |       8 | PAD SPACE     |
| ucs2_polish_ci             | ucs2     | 133 |         | Yes      |       8 | PAD SPACE     |
| ucs2_romanian_ci           | ucs2     | 131 |         | Yes      |       8 | PAD SPACE     |
| ucs2_roman_ci              | ucs2     | 143 |         | Yes      |       8 | PAD SPACE     |
| ucs2_sinhala_ci            | ucs2     | 147 |         | Yes      |       8 | PAD SPACE     |
| ucs2_slovak_ci             | ucs2     | 141 |         | Yes      |       8 | PAD SPACE     |
| ucs2_slovenian_ci          | ucs2     | 132 |         | Yes      |       8 | PAD SPACE     |
| ucs2_spanish2_ci           | ucs2     | 142 |         | Yes      |       8 | PAD SPACE     |
| ucs2_spanish_ci            | ucs2     | 135 |         | Yes      |       8 | PAD SPACE     |
| ucs2_swedish_ci            | ucs2     | 136 |         | Yes      |       8 | PAD SPACE     |
| ucs2_turkish_ci            | ucs2     | 137 |         | Yes      |       8 | PAD SPACE     |
| ucs2_unicode_520_ci        | ucs2     | 150 |         | Yes      |       8 | PAD SPACE     |
| ucs2_unicode_ci            | ucs2     | 128 |         | Yes      |       8 | PAD SPACE     |
| ucs2_vietnamese_ci         | ucs2     | 151 |         | Yes      |       8 | PAD SPACE     |
| ujis_bin                   | ujis     |  91 |         | Yes      |       1 | PAD SPACE     |
| ujis_japanese_ci           | ujis     |  12 | Yes     | Yes      |       1 | PAD SPACE     |
| utf16le_bin                | utf16le  |  62 |         | Yes      |       1 | PAD SPACE     |
| utf16le_general_ci         | utf16le  |  56 | Yes     | Yes      |       1 | PAD SPACE     |
| utf16_bin                  | utf16    |  55 |         | Yes      |       1 | PAD SPACE     |
| utf16_croatian_ci          | utf16    | 122 |         | Yes      |       8 | PAD SPACE     |
| utf16_czech_ci             | utf16    | 111 |         | Yes      |       8 | PAD SPACE     |
| utf16_danish_ci            | utf16    | 112 |         | Yes      |       8 | PAD SPACE     |
| utf16_esperanto_ci         | utf16    | 118 |         | Yes      |       8 | PAD SPACE     |
| utf16_estonian_ci          | utf16    | 107 |         | Yes      |       8 | PAD SPACE     |
| utf16_general_ci           | utf16    |  54 | Yes     | Yes      |       1 | PAD SPACE     |
| utf16_german2_ci           | utf16    | 121 |         | Yes      |       8 | PAD SPACE     |
| utf16_hungarian_ci         | utf16    | 119 |         | Yes      |       8 | PAD SPACE     |
| utf16_icelandic_ci         | utf16    | 102 |         | Yes      |       8 | PAD SPACE     |
| utf16_latvian_ci           | utf16    | 103 |         | Yes      |       8 | PAD SPACE     |
| utf16_lithuanian_ci        | utf16    | 113 |         | Yes      |       8 | PAD SPACE     |
| utf16_persian_ci           | utf16    | 117 |         | Yes      |       8 | PAD SPACE     |
| utf16_polish_ci            | utf16    | 106 |         | Yes      |       8 | PAD SPACE     |
| utf16_romanian_ci          | utf16    | 104 |         | Yes      |       8 | PAD SPACE     |
| utf16_roman_ci             | utf16    | 116 |         | Yes      |       8 | PAD SPACE     |
| utf16_sinhala_ci           | utf16    | 120 |         | Yes      |       8 | PAD SPACE     |
| utf16_slovak_ci            | utf16    | 114 |         | Yes      |       8 | PAD SPACE     |
| utf16_slovenian_ci         | utf16    | 105 |         | Yes      |       8 | PAD SPACE     |
| utf16_spanish2_ci          | utf16    | 115 |         | Yes      |       8 | PAD SPACE     |
| utf16_spanish_ci           | utf16    | 108 |         | Yes      |       8 | PAD SPACE     |
| utf16_swedish_ci           | utf16    | 109 |         | Yes      |       8 | PAD SPACE     |
| utf16_turkish_ci           | utf16    | 110 |         | Yes      |       8 | PAD SPACE     |
| utf16_unicode_520_ci       | utf16    | 123 |         | Yes      |       8 | PAD SPACE     |
| utf16_unicode_ci           | utf16    | 101 |         | Yes      |       8 | PAD SPACE     |
| utf16_vietnamese_ci        | utf16    | 124 |         | Yes      |       8 | PAD SPACE     |
| utf32_bin                  | utf32    |  61 |         | Yes      |       1 | PAD SPACE     |
| utf32_croatian_ci          | utf32    | 181 |         | Yes      |       8 | PAD SPACE     |
| utf32_czech_ci             | utf32    | 170 |         | Yes      |       8 | PAD SPACE     |
| utf32_danish_ci            | utf32    | 171 |         | Yes      |       8 | PAD SPACE     |
| utf32_esperanto_ci         | utf32    | 177 |         | Yes      |       8 | PAD SPACE     |
| utf32_estonian_ci          | utf32    | 166 |         | Yes      |       8 | PAD SPACE     |
| utf32_general_ci           | utf32    |  60 | Yes     | Yes      |       1 | PAD SPACE     |
| utf32_german2_ci           | utf32    | 180 |         | Yes      |       8 | PAD SPACE     |
| utf32_hungarian_ci         | utf32    | 178 |         | Yes      |       8 | PAD SPACE     |
| utf32_icelandic_ci         | utf32    | 161 |         | Yes      |       8 | PAD SPACE     |
| utf32_latvian_ci           | utf32    | 162 |         | Yes      |       8 | PAD SPACE     |
| utf32_lithuanian_ci        | utf32    | 172 |         | Yes      |       8 | PAD SPACE     |
| utf32_persian_ci           | utf32    | 176 |         | Yes      |       8 | PAD SPACE     |
| utf32_polish_ci            | utf32    | 165 |         | Yes      |       8 | PAD SPACE     |
| utf32_romanian_ci          | utf32    | 163 |         | Yes      |       8 | PAD SPACE     |
| utf32_roman_ci             | utf32    | 175 |         | Yes      |       8 | PAD SPACE     |
| utf32_sinhala_ci           | utf32    | 179 |         | Yes      |       8 | PAD SPACE     |
| utf32_slovak_ci            | utf32    | 173 |         | Yes      |       8 | PAD SPACE     |
| utf32_slovenian_ci         | utf32    | 164 |         | Yes      |       8 | PAD SPACE     |
| utf32_spanish2_ci          | utf32    | 174 |         | Yes      |       8 | PAD SPACE     |
| utf32_spanish_ci           | utf32    | 167 |         | Yes      |       8 | PAD SPACE     |
| utf32_swedish_ci           | utf32    | 168 |         | Yes      |       8 | PAD SPACE     |
| utf32_turkish_ci           | utf32    | 169 |         | Yes      |       8 | PAD SPACE     |
| utf32_unicode_520_ci       | utf32    | 182 |         | Yes      |       8 | PAD SPACE     |
| utf32_unicode_ci           | utf32    | 160 |         | Yes      |       8 | PAD SPACE     |
| utf32_vietnamese_ci        | utf32    | 183 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_0900_ai_ci         | utf8mb4  | 255 | Yes     | Yes      |       0 | NO PAD        |
| utf8mb4_0900_as_ci         | utf8mb4  | 305 |         | Yes      |       0 | NO PAD        |
| utf8mb4_0900_as_cs         | utf8mb4  | 278 |         | Yes      |       0 | NO PAD        |
| utf8mb4_0900_bin           | utf8mb4  | 309 |         | Yes      |       1 | NO PAD        |
| utf8mb4_bin                | utf8mb4  |  46 |         | Yes      |       1 | PAD SPACE     |
| utf8mb4_croatian_ci        | utf8mb4  | 245 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_cs_0900_ai_ci      | utf8mb4  | 266 |         | Yes      |       0 | NO PAD        |
| utf8mb4_cs_0900_as_cs      | utf8mb4  | 289 |         | Yes      |       0 | NO PAD        |
| utf8mb4_czech_ci           | utf8mb4  | 234 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_danish_ci          | utf8mb4  | 235 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_da_0900_ai_ci      | utf8mb4  | 267 |         | Yes      |       0 | NO PAD        |
| utf8mb4_da_0900_as_cs      | utf8mb4  | 290 |         | Yes      |       0 | NO PAD        |
| utf8mb4_de_pb_0900_ai_ci   | utf8mb4  | 256 |         | Yes      |       0 | NO PAD        |
| utf8mb4_de_pb_0900_as_cs   | utf8mb4  | 279 |         | Yes      |       0 | NO PAD        |
| utf8mb4_eo_0900_ai_ci      | utf8mb4  | 273 |         | Yes      |       0 | NO PAD        |
| utf8mb4_eo_0900_as_cs      | utf8mb4  | 296 |         | Yes      |       0 | NO PAD        |
| utf8mb4_esperanto_ci       | utf8mb4  | 241 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_estonian_ci        | utf8mb4  | 230 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_es_0900_ai_ci      | utf8mb4  | 263 |         | Yes      |       0 | NO PAD        |
| utf8mb4_es_0900_as_cs      | utf8mb4  | 286 |         | Yes      |       0 | NO PAD        |
| utf8mb4_es_trad_0900_ai_ci | utf8mb4  | 270 |         | Yes      |       0 | NO PAD        |
| utf8mb4_es_trad_0900_as_cs | utf8mb4  | 293 |         | Yes      |       0 | NO PAD        |
| utf8mb4_et_0900_ai_ci      | utf8mb4  | 262 |         | Yes      |       0 | NO PAD        |
| utf8mb4_et_0900_as_cs      | utf8mb4  | 285 |         | Yes      |       0 | NO PAD        |
| utf8mb4_general_ci         | utf8mb4  |  45 |         | Yes      |       1 | PAD SPACE     |
| utf8mb4_german2_ci         | utf8mb4  | 244 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_hr_0900_ai_ci      | utf8mb4  | 275 |         | Yes      |       0 | NO PAD        |
| utf8mb4_hr_0900_as_cs      | utf8mb4  | 298 |         | Yes      |       0 | NO PAD        |
| utf8mb4_hungarian_ci       | utf8mb4  | 242 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_hu_0900_ai_ci      | utf8mb4  | 274 |         | Yes      |       0 | NO PAD        |
| utf8mb4_hu_0900_as_cs      | utf8mb4  | 297 |         | Yes      |       0 | NO PAD        |
| utf8mb4_icelandic_ci       | utf8mb4  | 225 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_is_0900_ai_ci      | utf8mb4  | 257 |         | Yes      |       0 | NO PAD        |
| utf8mb4_is_0900_as_cs      | utf8mb4  | 280 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ja_0900_as_cs      | utf8mb4  | 303 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ja_0900_as_cs_ks   | utf8mb4  | 304 |         | Yes      |      24 | NO PAD        |
| utf8mb4_latvian_ci         | utf8mb4  | 226 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_la_0900_ai_ci      | utf8mb4  | 271 |         | Yes      |       0 | NO PAD        |
| utf8mb4_la_0900_as_cs      | utf8mb4  | 294 |         | Yes      |       0 | NO PAD        |
| utf8mb4_lithuanian_ci      | utf8mb4  | 236 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_lt_0900_ai_ci      | utf8mb4  | 268 |         | Yes      |       0 | NO PAD        |
| utf8mb4_lt_0900_as_cs      | utf8mb4  | 291 |         | Yes      |       0 | NO PAD        |
| utf8mb4_lv_0900_ai_ci      | utf8mb4  | 258 |         | Yes      |       0 | NO PAD        |
| utf8mb4_lv_0900_as_cs      | utf8mb4  | 281 |         | Yes      |       0 | NO PAD        |
| utf8mb4_persian_ci         | utf8mb4  | 240 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_pl_0900_ai_ci      | utf8mb4  | 261 |         | Yes      |       0 | NO PAD        |
| utf8mb4_pl_0900_as_cs      | utf8mb4  | 284 |         | Yes      |       0 | NO PAD        |
| utf8mb4_polish_ci          | utf8mb4  | 229 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_romanian_ci        | utf8mb4  | 227 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_roman_ci           | utf8mb4  | 239 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_ro_0900_ai_ci      | utf8mb4  | 259 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ro_0900_as_cs      | utf8mb4  | 282 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ru_0900_ai_ci      | utf8mb4  | 306 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ru_0900_as_cs      | utf8mb4  | 307 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sinhala_ci         | utf8mb4  | 243 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_sk_0900_ai_ci      | utf8mb4  | 269 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sk_0900_as_cs      | utf8mb4  | 292 |         | Yes      |       0 | NO PAD        |
| utf8mb4_slovak_ci          | utf8mb4  | 237 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_slovenian_ci       | utf8mb4  | 228 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_sl_0900_ai_ci      | utf8mb4  | 260 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sl_0900_as_cs      | utf8mb4  | 283 |         | Yes      |       0 | NO PAD        |
| utf8mb4_spanish2_ci        | utf8mb4  | 238 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_spanish_ci         | utf8mb4  | 231 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_sv_0900_ai_ci      | utf8mb4  | 264 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sv_0900_as_cs      | utf8mb4  | 287 |         | Yes      |       0 | NO PAD        |
| utf8mb4_swedish_ci         | utf8mb4  | 232 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_tr_0900_ai_ci      | utf8mb4  | 265 |         | Yes      |       0 | NO PAD        |
| utf8mb4_tr_0900_as_cs      | utf8mb4  | 288 |         | Yes      |       0 | NO PAD        |
| utf8mb4_turkish_ci         | utf8mb4  | 233 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_unicode_520_ci     | utf8mb4  | 246 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_unicode_ci         | utf8mb4  | 224 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_vietnamese_ci      | utf8mb4  | 247 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_vi_0900_ai_ci      | utf8mb4  | 277 |         | Yes      |       0 | NO PAD        |
| utf8mb4_vi_0900_as_cs      | utf8mb4  | 300 |         | Yes      |       0 | NO PAD        |
| utf8mb4_zh_0900_as_cs      | utf8mb4  | 308 |         | Yes      |       0 | NO PAD        |
| utf8_bin                   | utf8     |  83 |         | Yes      |       1 | PAD SPACE     |
| utf8_croatian_ci           | utf8     | 213 |         | Yes      |       8 | PAD SPACE     |
| utf8_czech_ci              | utf8     | 202 |         | Yes      |       8 | PAD SPACE     |
| utf8_danish_ci             | utf8     | 203 |         | Yes      |       8 | PAD SPACE     |
| utf8_esperanto_ci          | utf8     | 209 |         | Yes      |       8 | PAD SPACE     |
| utf8_estonian_ci           | utf8     | 198 |         | Yes      |       8 | PAD SPACE     |
| utf8_general_ci            | utf8     |  33 | Yes     | Yes      |       1 | PAD SPACE     |
| utf8_general_mysql500_ci   | utf8     | 223 |         | Yes      |       1 | PAD SPACE     |
| utf8_german2_ci            | utf8     | 212 |         | Yes      |       8 | PAD SPACE     |
| utf8_hungarian_ci          | utf8     | 210 |         | Yes      |       8 | PAD SPACE     |
| utf8_icelandic_ci          | utf8     | 193 |         | Yes      |       8 | PAD SPACE     |
| utf8_latvian_ci            | utf8     | 194 |         | Yes      |       8 | PAD SPACE     |
| utf8_lithuanian_ci         | utf8     | 204 |         | Yes      |       8 | PAD SPACE     |
| utf8_persian_ci            | utf8     | 208 |         | Yes      |       8 | PAD SPACE     |
| utf8_polish_ci             | utf8     | 197 |         | Yes      |       8 | PAD SPACE     |
| utf8_romanian_ci           | utf8     | 195 |         | Yes      |       8 | PAD SPACE     |
| utf8_roman_ci              | utf8     | 207 |         | Yes      |       8 | PAD SPACE     |
| utf8_sinhala_ci            | utf8     | 211 |         | Yes      |       8 | PAD SPACE     |
| utf8_slovak_ci             | utf8     | 205 |         | Yes      |       8 | PAD SPACE     |
| utf8_slovenian_ci          | utf8     | 196 |         | Yes      |       8 | PAD SPACE     |
| utf8_spanish2_ci           | utf8     | 206 |         | Yes      |       8 | PAD SPACE     |
| utf8_spanish_ci            | utf8     | 199 |         | Yes      |       8 | PAD SPACE     |
| utf8_swedish_ci            | utf8     | 200 |         | Yes      |       8 | PAD SPACE     |
| utf8_tolower_ci            | utf8     |  76 |         | Yes      |       1 | PAD SPACE     |
| utf8_turkish_ci            | utf8     | 201 |         | Yes      |       8 | PAD SPACE     |
| utf8_unicode_520_ci        | utf8     | 214 |         | Yes      |       8 | PAD SPACE     |
| utf8_unicode_ci            | utf8     | 192 |         | Yes      |       8 | PAD SPACE     |
| utf8_vietnamese_ci         | utf8     | 215 |         | Yes      |       8 | PAD SPACE     |
+----------------------------+----------+-----+---------+----------+---------+---------------+
272 rows in set (0.03 sec)

查看服务器当前字符集和排序规则

可以使用 show variables like '%character%' 和 show variables like '%collation%' 查看

character_set_client 表示客户端使用的字符集

character_set_connection 表示连接层字符集

character_set_results 表示查询结果字符集

character_set_server 表示服务器字符集文章来源地址https://www.toymoban.com/news/detail-471378.html

mysql> show variables like '%character%';
+--------------------------+--------------------------------+
| Variable_name            | Value                          |
+--------------------------+--------------------------------+
| character_set_client     | utf8mb4                        || character_set_connection | utf8mb4                        |
| character_set_database   | utf8mb4                        || character_set_filesystem | binary                         |
| character_set_results    | utf8mb4                        || character_set_server     | utf8mb4                        |
| character_set_system     | utf8                           || character_sets_dir       | /usr/share/mysql-8.0/charsets/ |
+--------------------------+--------------------------------+
8 rows in set (0.01 sec)
 
mysql> show variables like '%collation%';
+-------------------------------+--------------------+
| Variable_name                 | Value              |
+-------------------------------+--------------------+
| collation_connection          | utf8mb4_0900_ai_ci |
| collation_database            | utf8mb4_0900_ai_ci || collation_server              | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.01 sec)

查看数据库, 表,字段的字符集

mysql> use TEST;
Database changed
mysql> show variables like 'character_set_database';
+------------------------+---------+
| Variable_name          | Value   |
+------------------------+---------+
| character_set_database | utf8mb4 |
+------------------------+---------+
1 row in set (0.01 sec)
 
mysql> show table status from TEST like 'T1';
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation   | Checksum | Create_options | Comment |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------+----------+----------------+---------+
| T1   | InnoDB |10| Dynamic    |0|              0 |16384|               0 |0|         0 | NULL           | 2023-03-22 09:10:17 | NULL        | NULL       | utf8mb4_bin | NULL     ||         |
+------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------+----------+----------------+---------+
1 row in set (0.01 sec)
 
mysql> show full columns from TEST.T1;
+-------+--------------+-------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type         | Collation   | Null | Key | Default | Extra | Privileges                      | Comment |
+-------+--------------+-------------+------+-----+---------+-------+---------------------------------+---------+
| C1    | varchar(255) | utf8mb4_bin | YES  |     | NULL    |       | select,insert,update,references |         |
+-------+--------------+-------------+------+-----+---------+-------+---------------------------------+---------+
1 row in set (0.01 sec)

到了这里,关于MySQL 知识点分享一:utf8 字符集和排序规则的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • MySQL常用排序规则utf8mb4_general_ci、utf8mb4_unicode_ci、utf8mb4_bin、utf8mb4_0900_ai_ci和存储字符集 utf8 和 utf8mb4

    在创建数据库时,我们经常会需要填写数据库名、字符集、排序规则; 而本文主要讲述常用的存储字符集 utf8 和 utf8mb4;排序字符集 utf8mb4_unicode_ci 和 utf8mb4_general_ci、utf8mb4_bin、utf8mb4_0900_ai_ci 一般我本人创建创建数据库通常排序规则都使用utf8mb4_general_ci,因为对特殊字符的顺

    2024年01月17日
    浏览(29)
  • IDEA连接TiDB报字符集不匹配问题COLLATION ‘utf8_general_ci‘ is not valid for CHARACTER SET ‘utf8mb4‘.

    最近因工作需要,部署了一套TiDB,然而通过IDEA,使用MySQL驱动连接数据库时,一直报字符集不匹配。网上找了些资料,但是并没有相关说明。最后请教了一个大佬,问题得到解决。这边记录一下,希望能帮助到遇到同样问题的人。 问题现象 IDEA连接TiDB时,成功连接,但无法

    2024年02月13日
    浏览(41)
  • 数据库编码 问题 mysql 修改字符集为utf8mb4

    数据库编码 问题 mysql 修改字符集为utf8mb4 问题 ; 当向数据库插入表,或者在表中插入数据时,出现 ERROR 1366 (HY000): Incorrect string value: ‘xBDxF0xD3xB9’ for column ‘name’ at row 1 原因 数据库编码方式 和 表编码方式 以及 插入数据(字符串)的编码方式不同 我们可以查看建表,

    2023年04月08日
    浏览(32)
  • 【MySQL】关于数据库字符编码以及字符集排序相关知识

    通过设置表的字符集和排序规则,解决MySQL查询时不区分字母大小写、插入时不支持特殊字符的问题。 关于MySQL查询时不区分字母大小写、插入时不支持特殊字符的问题,只有修改表的字符集和排序规则才能根治,而且事半功倍。utf8mb4支持的最低mysql版本为5.5.3+,若不是,请

    2024年02月12日
    浏览(33)
  • C语言:数组、字符串知识点整理:

    补充 :数组长度= sizeof(arr)/sizeof(arr[0]) 注意: !!! 不适用于当arr 充当形参时(函数传参) !!! 因为函数 传递 的是 int arr[0] 的 地址(指针) ,而 不是数组内容 , 若在此情况计算整形数组长度         在X86系统下,指针大小为 4 ,所以计算结果为4/4=1;       

    2024年03月10日
    浏览(43)
  • 《Windows核心编程》若干知识点应用实战分享

    目录 1、进程的虚拟内存分区与小于0x10000的小地址内存区 1.1、进程的虚拟内存分区 1.2、小于0x10000的小地址内存区 2、保存线程上下文的CONTEXT结构体 3、从汇编代码角度去理解多线程运行过程的典型实例 4、调用TerminateThread强制结束线程会导致线程中的资源没有释放的问题 5、

    2024年01月22日
    浏览(36)
  • 《Windows核心编程》若干知识点实战应用分享

    目录 1、进程的虚拟内存分区与小于0x10000的小地址内存区 1.1、进程的虚拟内存分区 1.2、小于0x10000的小地址内存区 2、保存线程上下文的CONTEXT结构体 3、从汇编代码角度去理解多线程运行过程的典型实例 4、调用TerminateThread强制结束线程会导致线程中的资源没有释放的问题 5、

    2024年01月25日
    浏览(35)
  • C语言零碎知识点之字符串数组

    在C语言中字符串数组有两种表示方法: 第一种, 二维字符数组 可以利用二维字符数组来表示字符串,格式是: char strs[row][col]; 。例如: 其中的 row 表示二维数组中的行,表示有几个字符串;而 col 表示二维数组中的列,表示能存放字符串的最大长度。 第二种, 字符指针数

    2024年01月18日
    浏览(35)
  • python蓝桥杯备考——字符串小知识点

    \\\' \\\'.join(str(k) for k in sorted(set(factors))) 是一个用于将列表中的元素连接成一个字符串的常见用法。 下面对其含义和用法进行解释: set(factors) : 这一步将列表 factors 转换为集合(set),以去除重复的元素。这样可以确保最终连接的字符串中不会有重复的元素。 sorted(set(factors)) :

    2024年01月24日
    浏览(31)
  • Win11系统设置计算机二级MySQL环境变量,设置创建数据库默认的字符集为utf8mb4,设置WampServer服务器phpMyAdmin程序。

    NCRE官网的二级MySQL考试应用软件下载 : 点击下载 WinRAR解压软件 :点击下载 微软常用运行库合集: 点击下载:来源https://www.mefcl.com/yxk-dreamcast/249 wampserver2.2e-php5.4.3-httpd2.2.22-mysql5.5.24-32b 安装过程要选择默认浏览器,选择Win11自带Microsoft Edge浏览器 选择浏览器路径:C:Program

    2024年02月20日
    浏览(37)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包