hive on spark hql 插入数据报错 Failed to create Spark client for Spark session Error code 30041

这篇具有很好参考价值的文章主要介绍了hive on spark hql 插入数据报错 Failed to create Spark client for Spark session Error code 30041。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

一、遇到问题

离线数仓 hive on spark 模式,hive 客户端 sql 插入数据报错
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 50cec71c-2636-4d99-8de2-a580ae3f1c58)'
FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 50cec71c-2636-4d99-8de2-a580ae3f1c58

以下是报错详情:

[hadoop@hadoop102 ~]$ hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/datafs/module/jdk1.8.0_212/bin:/datafs/module/hadoop-3.1.3/bin:/datafs/module/hadoop-3.1.3/sbin:/datafs/module/zookeeper-3.5.7/bin:/datafs/module/kafka/bin:/datafs/module/flume/bin:/datafs/module/mysql-5.7.35/bin:/datafs/module/hive/bin:/datafs/module/spark/bin:/home/hadoop/.local/bin:/home/hadoop/bin)
Hive Session ID = 7db87c21-d9fb-4e76-a868-770691199377

Logging initialized using configuration in jar:file:/datafs/module/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = 24cd3001-0726-482f-9294-c901f49ace29
hive (default)> show databases;
OK
database_name
default
Time taken: 1.582 seconds, Fetched: 1 row(s)
hive (default)> show tables;
OK
tab_name
student
Time taken: 0.118 seconds, Fetched: 1 row(s)
hive (default)> select * from student;
OK
student.id      student.name
Time taken: 4.1 seconds
hive (default)> insert into table student values(1,'abc');
Query ID = hadoop_20220728195619_ded278b4-0ffa-41f2-9f2f-49313ea3d752
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create Spark client for Spark session 50cec71c-2636-4d99-8de2-a580ae3f1c58)'
FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create Spark client for Spark session 50cec71c-2636-4d99-8de2-a580ae3f1c58
hive (default)> [hadoop@hadoop102 ~]$

二、排查过程:

0、确认 hive、spark 版本

hive3.1.2:apache-hive-3.1.2-bin.tar.gz (重新编译之后的)

spark3.0.0:
+spark-3.0.0-bin-hadoop3.2.tgz
+spark-3.0.0-bin-without-hadoop.tgz

兼容性说明
注意:官网下载的 Hive 3.1.2 和 Spark 3.0.0 默认是不兼容的。因为 Hive3.1.2 支持的Spark版本是2.4.5,所以需要我们重新编译Hive3.1.2版本。
编译步骤:
官网下载Hive3.1.2源码,修改pom文件中引用的Spark版本为3.0.0,如果编译通过,直接打包获取jar包。如果报错,就根据提示,修改相关方法,直到不报错,打包获取jar包。

1、确认 SPARK_HOME 环境变量

[hadoop@hadoop102 software]$ sudo vim /etc/profile.d/my_env.sh
# 添加如下内容
# SPARK_HOME
export SPARK_HOME=/opt/module/spark
export PATH=$PATH:$SPARK_HOME/bin

source 使其生效

[hadoop@hadoop102 software]$ source /etc/profile.d/my_env.sh

2、hive 创建的 spark 配置文件

在hive中创建spark配置文件

[atguigu@hadoop102 software]$ vim /opt/module/hive/conf/spark-defaults.conf
# 添加如下内容(在执行任务时,会根据如下参数执行)
spark.master                               yarn
spark.eventLog.enabled                   true
spark.eventLog.dir                        hdfs://hadoop102:8020/spark-history
spark.executor.memory                    1g
spark.driver.memory					   1g

3、确认是否创建 hdfs 存储历史日志路径

确认存储历史日志路径是否创建

[hadoop@hadoop102 conf]$ hdfs dfs -ls /
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2022-07-28 20:31 /spark-history
drwxr-xr-x   - hadoop supergroup          0 2022-03-15 16:42 /test
drwxrwx---   - hadoop supergroup          0 2022-03-16 09:14 /tmp
drwxrwxrwx   - hadoop supergroup          0 2022-07-28 18:38 /user

若不存在,则需要在HDFS创建如下路径

[hadoop@hadoop102 software]$ hadoop fs -mkdir /spark-history

4、确认 是否上传 Spark 纯净版 jar 包

说明1:由于Spark3.0.0非纯净版默认支持的是hive2.3.7版本,直接使用会和安装的Hive3.1.2出现兼容性问题。所以采用Spark纯净版jar包,不包含hadoop和hive相关依赖,避免冲突。

说明2:Hive任务最终由Spark来执行,Spark任务资源分配由Yarn来调度,该任务有可能被分配到集群的任何一个节点。所以需要将Spark的依赖上传到HDFS集群路径,这样集群中任何一个节点都能获取到。

[hadoop@hadoop102 software]$ tar -zxvf /opt/software/spark-3.0.0-bin-without-hadoop.tgz

上传Spark纯净版jar包到HDFS

[hadoop@hadoop102 software]$ hadoop fs -mkdir /spark-jars
[hadoop@hadoop102 software]$ hadoop fs -put spark-3.0.0-bin-without-hadoop/jars/* /spark-jars

5、确认 hive-site.xml 配置文件

[hadoop@hadoop102 ~]$ vim /opt/module/hive/conf/hive-site.xml

添加如下内容

<!--Spark依赖位置(注意:端口号8020必须和namenode的端口号一致)-->
<property>
    <name>spark.yarn.jars</name>
    <value>hdfs://hadoop102:8020/spark-jars/*</value>
</property>
  
<!--Hive执行引擎-->
<property>
    <name>hive.execution.engine</name>
    <value>spark</value>
</property>

三、解决问题

hive/conf/hive-site.xml中追加:
(这里延长了 hive 和 spark 连接的时间,可以有效避免超时报错)

<!--Hive和spark连接超时时间-->
<property>
    <name>hive.spark.client.connect.timeout</name>
    <value>100000ms</value>
</property>

这时,重新打开 hive 客户端,插入数据正常无报错

[hadoop@hadoop102 conf]$ hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/datafs/module/jdk1.8.0_212/bin:/datafs/module/hadoop-3.1.3/bin:/datafs/module/hadoop-3.1.3/sbin:/datafs/module/zookeeper-3.5.7/bin:/datafs/module/kafka/bin:/datafs/module/flume/bin:/datafs/module/mysql-5.7.35/bin:/datafs/module/hive/bin:/datafs/module/spark/bin:/home/hadoop/.local/bin:/home/hadoop/bin)
Hive Session ID = b7564f00-0c04-45fd-9984-4ecd6e6149c2

Logging initialized using configuration in jar:file:/datafs/module/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = e4af620a-8b6a-422e-b921-5d6c58b81293
hive (default)> 

插入第一条数据,需要初始化 spark session 所以慢

hive (default)> insert into table student values(1,'abc');
Query ID = hadoop_20220728201636_11b37058-89dc-4050-a4bf-1dcf404bd579
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1659005322171_0009
Kill Command = /datafs/module/hadoop-3.1.3/bin/yarn application -kill application_1659005322171_0009
Hive on Spark Session Web UI URL: http://hadoop104:38030

Query Hive on Spark job[0] stages: [0, 1]
Spark job[0] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED
--------------------------------------------------------------------------------------
Stage-0 ........         0      FINISHED      1          1        0        0       0
Stage-1 ........         0      FINISHED      1          1        0        0       0
--------------------------------------------------------------------------------------
STAGES: 02/02    [==========================>>] 100%  ELAPSED TIME: 40.06 s
--------------------------------------------------------------------------------------
Spark job[0] finished successfully in 40.06 second(s)
WARNING: Spark Job[0] Spent 16% (3986 ms / 25006 ms) of task time in GC
Loading data to table default.student
OK
col1    col2
Time taken: 127.46 seconds
hive (default)> 

下面再插入数据就快了

hive (default)> insert into table student values(2,'ddd');
Query ID = hadoop_20220728202000_1093388b-3ec6-45e5-a9f1-1b07c64f2583
Total jobs = 1
Launching Job 1 out of 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Running with YARN Application = application_1659005322171_0009
Kill Command = /datafs/module/hadoop-3.1.3/bin/yarn application -kill application_1659005322171_0009
Hive on Spark Session Web UI URL: http://hadoop104:38030

Query Hive on Spark job[1] stages: [2, 3]
Spark job[1] status = RUNNING
--------------------------------------------------------------------------------------
          STAGES   ATTEMPT        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED
--------------------------------------------------------------------------------------
Stage-2 ........         0      FINISHED      1          1        0        0       0
Stage-3 ........         0      FINISHED      1          1        0        0       0
--------------------------------------------------------------------------------------
STAGES: 02/02    [==========================>>] 100%  ELAPSED TIME: 2.12 s
--------------------------------------------------------------------------------------
Spark job[1] finished successfully in 3.20 second(s)
Loading data to table default.student
OK
col1    col2
Time taken: 6.0 seconds
hive (default)> 

查询数据

hive (default)> select * from student;
OK
student.id      student.name
1       abc
2       ddd
Time taken: 0.445 seconds, Fetched: 2 row(s)
hive (default)> [hadoop@hadoop102 conf]$

四、后记

遇到问题,不放弃
网上搜索了很多解决方案,不靠谱的很多
靠谱的是这个大佬在 https://b23.tv/hzvzdJc 评论区写的

spark 30041,BigData,经验分享,闭关苦练内功,hive,spark,hadoop

spark 30041,BigData,经验分享,闭关苦练内功,hive,spark,hadoop

尝试到第三种思路,瞬间解决

第一条数据插入成功的那一刻,是久违的成就感,开心

分享这篇 blog,一是记录解决问题的过程,二是帮助萌新小白

我们下期见,拜拜!文章来源地址https://www.toymoban.com/news/detail-625339.html

到了这里,关于hive on spark hql 插入数据报错 Failed to create Spark client for Spark session Error code 30041的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • hive插入数据后报错 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

    hive 插入数据的时候,不能直接运行,报错 错误原因: namenode内存空间不够,JVM剩余内存空间不够新job运行所致 错误提示: Starting Job = job_1594085668614_0006, Tracking URL = http://kudu:8088/proxy/application_1594085668614_0006/ Kill Command = /root/soft/hadoop-3.2.1/bin/mapred job -kill job_1594085668614_0006 Hadoop

    2023年04月15日
    浏览(37)
  • Kubernetes Kubelete 报错 ctr: failed to create shim task: OCI runtime create failed

            最近在工作中加入Kubernetes新的服务器节点的之后,发现 Kubelet 报错如下:         这个是说缺少依赖包 libseccomp ,需要注意的是centos 7中yum下载的版本是2.3的,版本不满足我们最新containerd的需求,需要下载2.4以上的,所以我们先下载2.5.1版本的 libseccomp:     

    2024年02月11日
    浏览(53)
  • 【大数据技术】Hive on spark 与Spark on hive 的区别与联系

    【大数据技术】Hive on spark 与Spark on hive 的区别与联系 Hive on Spark Hive 既作为存储元数据又负责sql的解析优化,遵守HQL语法,执行引擎变成了spark,底层由spark负责RDD操作 Spark on Hive Hive只作为存储元数据,Spark负责sql的解析优化,遵守spark sql 语法,底层由spark负责RDD操作

    2024年02月15日
    浏览(65)
  • 关于hive on spark部署后insert报错Error code 30041问题

    ERROR : FAILED: Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask.      Spark没有启动;      需在/opt/module/spark路径下输入以下内容启动spark: Spark与hive的版本不一致造成的问题(这里我找到了部分spark和hive对应的版本):      注意:官网下载的 Hive 3.1.2 和

    2024年02月02日
    浏览(52)
  • docker调用gpu报错:failed to create shim task: OCI runtime create failed: runc create failed,以及如何开启gpu持久

    docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as \\\'legacy\\\' nvidia-container-cli: initialization error: driver rpc error: timed out: unkno

    2024年02月11日
    浏览(54)
  • docker 启动容器 报错 Error response from daemon: failed to create shim task: OCI runtime create failed

    Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: unable to apply apparmor profile: apparmor failed to apply profile: write /proc/self/attr/apparmor/exec: no such file or directory: unknown 解决方案

    2024年02月12日
    浏览(56)
  • Android Studio 报错:Failed to create Jar file xxxxx.jar

    通过分析,新下载的项目没有 project/gradle 目录,故通过其他项目复制到当前项目,就解决了该问题。 同时也出现了新的问题 原因:gradle版本过低,修改到java1.8即可 查看agp和gradle的版本关系 https://developer.android.google.cn/studio/releases/gradle-plugin?hl=zh-cn

    2024年02月14日
    浏览(41)
  • 【解决方法】各类软件启动报错:Failed to create the Java Virtual Machine

    工具:小锐云服 PRO ,Windows 命令处理器,Java 环境 系统版本:Windows 10 描述:不知名原因导致的 Java 虚拟机创建失败,百度良久后通过修改系统环境变量,完成了对问题的处理。 提示:若按照教程还是无法完成操作,可以进入右侧的企鹅,找我看看。 视频教程: 文字教程:

    2024年02月12日
    浏览(53)
  • 银河麒麟V10桌面版Docker启动报错:failed to create NAT chain DOCKER: iptables failed

    module=libcontainerd namespace=plugins.moby failed to start daemon: Error initializing network controller: error obtaining controller instance: failed to create NAT chain DOCKER: iptables failed: iptables --wait -t nat -N DOCKER: iptables: Invalid argument. Run `dmesg\\\' for more information. 这个错误通常与 Docker 无法创建必要的 iptables 链有关。

    2024年01月17日
    浏览(53)
  • 一百三十七、Hive——HQL运行报错(持续更新中)

    样例:from_unixtime(unix_timestamp(change_time, \\\'yyyy-MM-dd HH:mm:ss\\\') + green) AS new_timestamp  成功!!! 在DWS层中,对多层SQL使用with语句嵌套查询,然后insert插入数据。如果直接把insert放在with语句上面,那么就会如下报错 org.apache.hadoop.hive.ql.parse.ParseException:line 2:0 cannot recognize input near \\\'w

    2024年02月15日
    浏览(40)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包