1、Shell$ExitCodeException
现象:运行hadoop job时出现如下异常:
14/07/09 14:42:50 INFO mapreduce.Job: Task Id : attempt_1404886826875_0007_m_000000_1, Status : FAILED
Exception from container-launch: org.apache.hadoop.util.Shell
E
x
i
t
C
o
d
e
E
x
c
e
p
t
i
o
n
:
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
u
t
i
l
.
S
h
e
l
l
ExitCodeException: org.apache.hadoop.util.Shell
ExitCodeException:org.apache.hadoop.util.ShellExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell
S
h
e
l
l
C
o
m
m
a
n
d
E
x
e
c
u
t
o
r
.
e
x
e
c
u
t
e
(
S
h
e
l
l
.
j
a
v
a
:
650
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
y
a
r
n
.
s
e
r
v
e
r
.
n
o
d
e
m
a
n
a
g
e
r
.
D
e
f
a
u
l
t
C
o
n
t
a
i
n
e
r
E
x
e
c
u
t
o
r
.
l
a
u
n
c
h
C
o
n
t
a
i
n
e
r
(
D
e
f
a
u
l
t
C
o
n
t
a
i
n
e
r
E
x
e
c
u
t
o
r
.
j
a
v
a
:
195
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
y
a
r
n
.
s
e
r
v
e
r
.
n
o
d
e
m
a
n
a
g
e
r
.
c
o
n
t
a
i
n
e
r
m
a
n
a
g
e
r
.
l
a
u
n
c
h
e
r
.
C
o
n
t
a
i
n
e
r
L
a
u
n
c
h
.
c
a
l
l
(
C
o
n
t
a
i
n
e
r
L
a
u
n
c
h
.
j
a
v
a
:
300
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
y
a
r
n
.
s
e
r
v
e
r
.
n
o
d
e
m
a
n
a
g
e
r
.
c
o
n
t
a
i
n
e
r
m
a
n
a
g
e
r
.
l
a
u
n
c
h
e
r
.
C
o
n
t
a
i
n
e
r
L
a
u
n
c
h
.
c
a
l
l
(
C
o
n
t
a
i
n
e
r
L
a
u
n
c
h
.
j
a
v
a
:
81
)
a
t
j
a
v
a
.
u
t
i
l
.
c
o
n
c
u
r
r
e
n
t
.
F
u
t
u
r
e
T
a
s
k
.
r
u
n
(
F
u
t
u
r
e
T
a
s
k
.
j
a
v
a
:
262
)
a
t
j
a
v
a
.
u
t
i
l
.
c
o
n
c
u
r
r
e
n
t
.
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
.
r
u
n
W
o
r
k
e
r
(
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
.
j
a
v
a
:
1145
)
a
t
j
a
v
a
.
u
t
i
l
.
c
o
n
c
u
r
r
e
n
t
.
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
ShellCommandExecutor.execute(Shell.java:650) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor
ShellCommandExecutor.execute(Shell.java:650)atorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)atjava.util.concurrent.FutureTask.run(FutureTask.java:262)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)atjava.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Container exited with a non-zero exit code 1
原因及解决办法:原因未知。重启可恢复正常
2、libhadoop.so.1.0.0 which might have disabled stack guard
现象:Hadoop 2.2.0 - warning: You have loaded library /home/hadoop/2.2.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard.
原因及解决方法:
在/etc/profile中添加:
#hadoop configuration
export PATH=
P
A
T
H
:
/
h
o
m
e
/
j
e
d
i
a
e
l
/
h
a
d
o
o
p
−
2.4.1
/
b
i
n
:
/
h
o
m
e
/
j
e
d
i
a
e
l
/
h
a
d
o
o
p
−
2.4.1
/
s
b
i
n
e
x
p
o
r
t
H
A
D
O
O
P
_
H
O
M
E
=
/
h
o
m
e
/
j
e
d
i
a
e
l
/
h
a
d
o
o
p
−
2.4.1
e
x
p
o
r
t
H
A
D
O
O
P
_
C
O
M
M
O
N
_
H
O
M
E
=
PATH:/home/jediael/hadoop-2.4.1/bin:/home/jediael/hadoop-2.4.1/sbin export HADOOP\_HOME=/home/jediael/hadoop-2.4.1 export HADOOP\_COMMON\_HOME=
PATH:/home/jediael/hadoop−2.4.1/bin:/home/jediael/hadoop−2.4.1/sbinexportHADOOP_HOME=/home/jediael/hadoop−2.4.1exportHADOOP_COMMON_HOME=HADOOP_HOME
export HADOOP_HDFS_HOME=
H
A
D
O
O
P
_
H
O
M
E
e
x
p
o
r
t
H
A
D
O
O
P
_
M
A
P
R
E
D
_
H
O
M
E
=
HADOOP\_HOME export HADOOP\_MAPRED\_HOME=
HADOOP_HOMEexportHADOOP_MAPRED_HOME=HADOOP_HOME
export HADOOP_YARN_HOME=
H
A
D
O
O
P
_
H
O
M
E
e
x
p
o
r
t
H
A
D
O
O
P
_
C
O
N
F
_
D
I
R
=
HADOOP\_HOME export HADOOP\_CONF\_DIR=
HADOOP_HOMEexportHADOOP_CONF_DIR=HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=
H
A
D
O
O
P
_
H
O
M
E
/
l
i
b
/
n
a
t
i
v
e
e
x
p
o
r
t
H
A
D
O
O
P
_
O
P
T
S
=
"
−
D
j
a
v
a
.
l
i
b
r
a
r
y
.
p
a
t
h
=
HADOOP\_HOME/lib/native export HADOOP\_OPTS="-Djava.library.path=
HADOOP_HOME/lib/nativeexportHADOOP_OPTS="−Djava.library.path=HADOOP_HOME/lib"
此警告出现的原因是最后2项未添加。
3、Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s)
在datanode上执行hdfs相关命令时,出现以下错误:
[jediael@slave156 ~]$ hadoop fs -ls /
14/08/31 15:00:37 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:38 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:39 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:40 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:41 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:42 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:43 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:44 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:45 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/08/31 15:00:46 INFO ipc.Client: Retrying connect to server: master166/10.252.48.166:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
ls: Call to master166/10.252.48.166:9000 failed on connection exception: java.net.ConnectException: Connection refused
出现以上错误,通常都是由于datanode无法连接到namenode所致,以下是一种情况:
/etc/hosts中存在127.0.0.1 *****的配置,如
127.0.0.1 localhost
将这些配置去掉,然后重新格式化namenode,并重启hadoop进程即可解决。
或者是以下原因:
hadoop安装完成后,必须要用haddop namenode format格式化后,才能使用,如果重启机器
在启动hadoop后,用hadoop fs -ls命令老是报 10/09/25 18:35:29 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).的错误,
用jps命令,也看不不到namenode的进程, 必须再用命令hadoop namenode format格式化后,才能再使用
原因是:hadoop默认配置是把一些tmp文件放在/tmp目录下,重启系统后,tmp目录下的东西被清除,所以报错
解决方法:在conf/core-site.xml 中增加以下内容
hadoop.tmp.dir
/var/log/hadoop/tmp
A base for other temporary directories
重启hadoop后,格式化namenode即可
4、Permission denied: user=liaoliuqing, access=WRITE, inode=“”:jediael:supergroup:rwxr-xr-x
原因为用户权限不足,能能访写HDFS中的文件。
解决方案:
关闭hadoop权限,在hdfs-site.xml文件中添加
dfs.permissions
false
5、Incompatible namespaceIDs
2015-02-02 15:10:57,526 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2015-02-02 15:10:57,543 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2015-02-02 15:10:57,544 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2015-02-02 15:10:57,699 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2015-02-02 15:10:58,090 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Incompatible namespaceIDs in /mnt/tmphadoop/dfs/data: namenode namespaceID = 2017454015; datanode namespaceID = 1238467850
at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:414)
at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:321)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1712)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1651)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1669)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1795)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1812)
问题原因:
每次namenode format会重新创建一个namenodeId,而${hadoop.tmp.dir}/dfs/data下包含了上次format下的id,当重新执行namenode format时清空了namenode下的数据,但是没有清空datanode下的数据,所以造成namenode节点上的namespaceID与 datanode节点上的namespaceID不一致,从而导致从现上述异常,启动失败。
解决办法:
(1)停止hadoop
stop-all.sh
(2)在各个slave中删除dfs.data.dir中的内容。若此属性未修改,则其默认值为
d
f
s
.
d
a
t
a
.
d
i
r
<
/
n
a
m
e
>
<
v
a
l
u
e
>
{dfs.data.dir}</name> <value>
dfs.data.dir</name><value>{hadoop.tmp.dir}/dfs/data
Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
(3)重新格式化namenode
hadoop namenode -format
然后start-all.sh启动hadoop文章来源:https://www.toymoban.com/news/detail-456767.html
以上解决办法需要将原有数据删除,若数据不能删除,则使用以下方法之一:
(1)修改
d
f
s
.
d
a
t
a
.
d
i
r
/
c
u
r
r
e
n
t
/
V
E
R
S
I
O
N
文件,将
d
a
t
a
n
o
d
e
中的
i
d
改成与
n
a
m
e
n
o
d
e
中的
i
d
一致。(
2
)修改
{dfs.data.dir}/current/VERSION文件,将datanode中的id改成与namenode中的id一致。 (2)修改
dfs.data.dir/current/VERSION文件,将datanode中的id改成与namenode中的id一致。(2)修改{dfs.data.dir}文章来源地址https://www.toymoban.com/news/detail-456767.html
到了这里,关于Hadoop常见异常及其解决方案的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!