【hive】hive修复分区或修复表以及msck命令的使用-Toy模板网

这篇具有很好参考价值的文章主要介绍了【hive】hive修复分区或修复表以及msck命令的使用。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

【hive】hive修复分区或修复表以及msck命令的使用

问题原因：

之前hive里有数据，后面存储元数据信息的MySQL数据库坏了，导致hive元数据信息丢失，但是hdfs上hive表的数据并没有丢失，重新建表后查看hive分区没有，数据也没有。

解决方法：

MSCK REPAIR TABLE 是在 Hive 中用于修复分区表（Partitioned Table）元数据的命令。在 Hive 中，当您向分区表添加、删除或更改分区数据时，有时会导致分区元数据不一致的情况。使用MSCK REPAIR TABLE命令可以检测并修复这些不一致，从而确保分区表的元数据与实际数据保持一致。

msck repair table <table_name>;

msck命令解析：

MSCK REPAIR TABLE 命令会扫描表的数据存储位置，检测缺失的分区并更新元数据。这在以下情况下特别有用：

当手动添加或删除分区时，元数据可能不会自动更新。
当通过外部操作更改了表的分区数据时，元数据可能不会自动同步。

我们知道hive有个服务叫metastore，这个服务主要是存储一些元数据信息，比如数据库名，表名或者表的分区等等信息。如果不是通过hive的insert等插入语句，很多分区信息在metastore中是没有的，如果插入分区数据量很多的话，你用 ALTER TABLE table_name ADD PARTITION 一个个分区添加十分麻烦。这时候MSCK REPAIR TABLE就派上用场了。只需要运行MSCK REPAIR TABLE命令，hive就会去检测这个表在hdfs上的文件，把没有写入metastore的分区信息写入metastore。

例子:

我们先创建一个分区表，然后往其中的一个分区插入一条数据，在查看分区信息

CREATE TABLE repair_test (col_a STRING) PARTITIONED BY (par STRING);
INSERT INTO TABLE repair_test PARTITION(par="partition_1") VALUES ("test");
SHOW PARTITIONS repair_test;

查看分区信息的结果如下

　jdbc:hive2://localhost:10000> show partitions repair_test;

INFO  : Compiling command(queryId=hive_20180810175151_5260f52e-10bb-4589-ad48-31ba72a81c21): show partitions repair_test
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180810175151_5260f52e-10bb-4589-ad48-31ba72a81c21); Time taken: 0.029 seconds
INFO  : Executing command(queryId=hive_20180810175151_5260f52e-10bb-4589-ad48-31ba72a81c21): show partitions repair_test
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20180810175151_5260f52e-10bb-4589-ad48-31ba72a81c21); Time taken: 0.017 seconds
INFO  : OK
+------------------+--+
|    partition     |
+------------------+--+
| par=partition_1  |
+------------------+--+
1 row selected (0.073 seconds)
0: jdbc:hive2://localhost:10000>

然后我们通过hdfs的put命令手动创建一个数据

[ericsson@h3cnamenode1 pcc]$ echo "123123" > test.txt
[ericsson@h3cnamenode1 pcc]$ hdfs dfs -mkdir -p /user/hive/warehouse/test.db/repair_test/par=partition_2/
[ericsson@h3cnamenode1 pcc]$ hdfs dfs -put -f test.txt /user/hive/warehouse/test.db/repair_test/par=partition_2/
[ericsson@h3cnamenode1 pcc]$ hdfs dfs -ls -R /user/hive/warehouse/test.db/repair_test
drwxrwxrwt   - ericsson hive          0 2018-08-10 17:46 /user/hive/warehouse/test.db/repair_test/par=partition_1
drwxrwxrwt   - ericsson hive          0 2018-08-10 17:46 /user/hive/warehouse/test.db/repair_test/par=partition_1/.hive-staging_hive_2018-08-10_17-45-59_029_1594310228554990949-1
drwxrwxrwt   - ericsson hive          0 2018-08-10 17:46 /user/hive/warehouse/test.db/repair_test/par=partition_1/.hive-staging_hive_2018-08-10_17-45-59_029_1594310228554990949-1/-ext-10000
-rwxrwxrwt   3 ericsson hive          5 2018-08-10 17:46 /user/hive/warehouse/test.db/repair_test/par=partition_1/000000_0
drwxr-xr-x   - ericsson hive          0 2018-08-10 17:57 /user/hive/warehouse/test.db/repair_test/par=partition_2
-rw-r--r--   3 ericsson hive          7 2018-08-10 17:57 /user/hive/warehouse/test.db/repair_test/par=partition_2/test.txt
[ericsson@h3cnamenode1 pcc]$

这时候我们查询分区信息，发现partition_2这个分区并没有加入到hive中

0: jdbc:hive2://localhost:10000> show partitions repair_test;
INFO  : Compiling command(queryId=hive_20180810175959_e7cefe8c-57b5-486c-8e03-b1201dac4d79): show partitions repair_test
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180810175959_e7cefe8c-57b5-486c-8e03-b1201dac4d79); Time taken: 0.029 seconds
INFO  : Executing command(queryId=hive_20180810175959_e7cefe8c-57b5-486c-8e03-b1201dac4d79): show partitions repair_test
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20180810175959_e7cefe8c-57b5-486c-8e03-b1201dac4d79); Time taken: 0.02 seconds
INFO  : OK
+------------------+--+
|    partition     |
+------------------+--+
| par=partition_1  |
+------------------+--+
1 row selected (0.079 seconds)
0: jdbc:hive2://localhost:10000>

运行MSCK REPAIR TABLE 命令后再查询分区信息,可以看到通过put命令放入的分区已经可以查询了文章来源地址https://www.toymoban.com/news/detail-737150.html

0: jdbc:hive2://localhost:10000> MSCK REPAIR TABLE repair_test;
INFO  : Compiling command(queryId=hive_20180810180000_7099daf2-6fde-44dd-8938-d2a02589358f): MSCK REPAIR TABLE repair_test
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20180810180000_7099daf2-6fde-44dd-8938-d2a02589358f); Time taken: 0.004 seconds
INFO  : Executing command(queryId=hive_20180810180000_7099daf2-6fde-44dd-8938-d2a02589358f): MSCK REPAIR TABLE repair_test
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20180810180000_7099daf2-6fde-44dd-8938-d2a02589358f); Time taken: 0.138 seconds
INFO  : OK
No rows affected (0.154 seconds)
0: jdbc:hive2://localhost:10000> show partitions repair_test;
INFO  : Compiling command(queryId=hive_20180810180000_ff711820-6f41-4d5d-9fee-b6e1cdbe1e25): show partitions repair_test
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:partition, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180810180000_ff711820-6f41-4d5d-9fee-b6e1cdbe1e25); Time taken: 0.045 seconds
INFO  : Executing command(queryId=hive_20180810180000_ff711820-6f41-4d5d-9fee-b6e1cdbe1e25): show partitions repair_test
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=hive_20180810180000_ff711820-6f41-4d5d-9fee-b6e1cdbe1e25); Time taken: 0.016 seconds
INFO  : OK
+------------------+--+
|    partition     |
+------------------+--+
| par=partition_1  |
| par=partition_2  |
+------------------+--+
2 rows selected (0.088 seconds)
0: jdbc:hive2://localhost:10000> select * from repair_test;
INFO  : Compiling command(queryId=hive_20180810180101_1225075e-43c8-4a49-b8ef-a12f72544a38): select * from repair_test
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:repair_test.col_a, type:string, comment:null), FieldSchema(name:repair_test.par, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20180810180101_1225075e-43c8-4a49-b8ef-a12f72544a38); Time taken: 0.059 seconds
INFO  : Executing command(queryId=hive_20180810180101_1225075e-43c8-4a49-b8ef-a12f72544a38): select * from repair_test
INFO  : Completed executing command(queryId=hive_20180810180101_1225075e-43c8-4a49-b8ef-a12f72544a38); Time taken: 0.001 seconds
INFO  : OK
+--------------------+------------------+--+
| repair_test.col_a  | repair_test.par  |
+--------------------+------------------+--+
| test               | partition_1      |
| 123123             | partition_2      |
+--------------------+------------------+--+
2 rows selected (0.121 seconds)
0: jdbc:hive2://localhost:10000>

到了这里，关于【hive】hive修复分区或修复表以及msck命令的使用的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！