现象
线上收到hadoop集群datanode掉线告警。
排查
1、确认datanode状态
$ jps
24752 Jps
1428 JournalNode
发现未存在datanode进程。
2、尝试单独启动datanode,发现还是不行,错误日志信息如下:$ hadoop-daemon.sh start datanode
2022-11-25 15:58:43,267 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool (Datanode Uuid cc451ed7-45c6-460c-a30b-bb68e54ef8fb) service to jjhxxxx/10.x.x.x:8020 All specified directories have failed to load.
2022-11-25 15:58:43,268 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 11 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=11, dataDirs=11)
2022-11-25 15:58:43,313 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /disk1/hdfs/datanode/in_use.lock acquired by nodename 18522@jjhxxx
2022-11-25 15:58:43,314 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/disk1/hdfs/datanode
java.io.IOException: Incompatible clusterIDs in /disk1/hdfs/datanode: namenode clusterID = CID-28fd667c-4411-4a5d-a2b0-fb5190fec245; datanode clusterID = CID-bca80872-89c0-428f-af56-3cff6e6e16c2
$ grep -C 5 ERROR hadoop-hdfs-datanode-jjhxxx.log|grep namenode
java.io.IOException: Incompatible clusterIDs in /disk1/hdfs/datanode: namenode clusterID = CID-28fd667c-4411-4a5d-a2b0-fb5190fec245; datanode clusterID = CID-bca80872-89c0-428f-af56-3cff6e6e16c2
此时会发现问题所在:
namenode clusterID = CID-28fd667c-4411-4a5d-a2b0-fb5190fec245
datanode clusterID = CID-bca80872-89c0-428f-af56-3cff6e6e16c2
发现这两个id不一致,经查阅相关资料,id不一致会导致Initialization failed for Block pool。
解决
将name/current/VERSION 文件中的 clusterID的值,拷贝到 name/current/VERSION 文件中的 clusterID的=后面 ,也就是让name data两个的clusterID保持一致,示例如下:文章来源:https://www.toymoban.com/news/detail-404893.html
#其中某一天datanode节点
hdfs@localhost:/disk1/hdfs/datanode/current$ cat VERSION
#Fri Nov 25 16:28:58 CST 2022
storageID=DS-beab9a85-2dc8-4111-a269-2322ad2f7458
clusterID=CID-bca80872-89c0-428f-af56-3cff6e6e16c2
cTime=0
datanodeUuid=cc451ed7-45c6-460c-a30b-bb68e54ef8fb
storageType=DATA_NODE
layoutVersion=-57
#单独的namenode节点
cat /disk1/hdfs/namenode/current/VERSION
#Fri Apr 15 18:26:07 CST 2022
namespaceID=1752898827
clusterID=CID-bca80872-89c0-428f-af56-3cff6e6e16c2
cTime=1570465355748
storageType=NAME_NODE
blockpoolID=BP-1290606271-10.x.x.x-1570465355748
layoutVersion=-64
然后再次尝试启动datanode进程。文章来源地址https://www.toymoban.com/news/detail-404893.html
到了这里,关于Hadoop datanode启动异常 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!