现象:
GIT 中的中文名: 中文.txt 会转码为:
"src/components/\344\270\255\346\226\207.txt"
解决方法:
$ git config core.quotepath false
原因:
“中”是按UTF8编码存在磁盘上,真实的2进制编码为:
11 100 100 10 111 000 10 101 101
8进制为 (这里要留意:是按每个byte,8bit最左补0,9bit来转换;而不是简单合到一起转换):
011 100 100 | 010 111 000 | 010 101 101
3 4 4 | 2 7 0 | 2 5 5
16进制编码为
e4 b8 ad
参考代码
16进制的获取方法很多:
- python:
print('中'.encode('utf8'))
或b16=[hex(b) for b in '中'.encode('utf8')]
- ES6:
encodeURIComponent("中");
- Java:
System.out.println(URLEncoder.encode("中","utf8"))
- Mysql:
select '中', HEX('中'), char(0xE4B8AD using utf8mb4)
8进制的获取方法:
-
python:
b8=[oct(b) for b in '中'.encode('utf8')];
-
ES6:
let [b10,b8,b2]=[[],[],[]];new TextEncoder().encode("中").forEach(b=>{b10.push(b); b8.push(b.toString(8)); b2.push(b.toString(2));});console.log(b10,b8,b2);
-
Java
Bytes.asList("中".getBytes("utf8")).stream().forEach(_b->System.out.println(Integer.toOctalString(_b & 0xFF) + " "));
-
MySQL
select OCT(0xE4), OCT(0xB8), OCT(0xAD)
2进制的获取方法:
-
python:
b2=[bin(b) for b in '中'.encode('utf8')]
-
ES6:
let [b10,b8,b2]=[[],[],[]];new TextEncoder().encode("中").forEach(b=>{b10.push(b); b8.push(b.toString(8)); b2.push(b.toString(2));});console.log(b10,b8,b2);
-
Java
Bytes.asList("中".getBytes("utf8")).stream().forEach(_b->System.out.println(Integer.toBinaryString(_b & 0xFF) + " "));
-
Mysql
select BIN(0xE4), BIN(0xE4), BIN(0xAD)
Git源代码参考
源文件片段 quote.c:
if (cq_lookup[ch] >= ' ') {
EMIT(cq_lookup[ch]);
} else {
EMIT(((ch >> 6) & 03) + '0');
EMIT(((ch >> 3) & 07) + '0');
EMIT(((ch >> 0) & 07) + '0');
}
解释:
#11 100 100 转8进制 344
#11 100 100 (ch >> 6) & 03 >>> 11 & 11 >>> 11 >>> 3
#11 100 100 (ch >> 3) & 07 >>> 11100 & 111 >>> 100 >>> 4
#11 100 100 (ch >> 0) & 07 >>> 11100100 & 111 >>> 100 >>> 4文章来源:https://www.toymoban.com/news/detail-788175.html
补充:16|8|2进制编码转中文:文章来源地址https://www.toymoban.com/news/detail-788175.html
- python:
print('------------------------------decode---------------------------------')
print(bytes([int(b,2) for b in b2]).decode("utf-8"))
print(bytes([int(b,8) for b in b8]).decode("utf-8"))
print(bytes([int(b,10) for b in b10]).decode("utf-8"))
print(bytes([int(b,16) for b in b16]).decode("utf-8"))
- ES6:
new TextDecoder().decode(Uint8Array.from(b10))
new TextDecoder().decode(Uint8Array.from(b8.map(b=>{return parseInt(''+b,8)})))
new TextDecoder().decode(Uint8Array.from(b2.map(b=>{return parseInt(''+b,2)})))
- Java
//bytes to char
System.out.println(newString(new int[]{228,184,173},10, "utf8"));
System.out.println(newString(new int[]{344,270,255},8, "utf8"));
System.out.println(newString(new int[]{11100100,10111000,10101101},2, "utf8"));
public static String newString(int[] i, int radix,String encoding) throws UnsupportedEncodingException {
byte[] b2= new byte[i.length];
for (int j=0;j<i.length;j++) {
int i1 = Integer.parseInt("" + i[j], radix);
b2[j]=(byte) (i1 & 0xFF);
}
String utf8 = new String(b2, encoding);
return utf8;
}
- Mysql
select '中', char(0xE4B8AD using utf8mb4)
到了这里,关于GIT乱码原因解决方法及解释的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!