kyuubi整合spark on yarn

这篇具有很好参考价值的文章主要介绍了kyuubi整合spark on yarn。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

概述

目标:

  • 1.实现kyuubi spark on yarn
  • 2.实现 kyuubi spark on yarn 资源的动态分配

注意:版本 kyuubi 1.8.0 、 spark 3.4.2 、hadoop 3.3.6

前置准备请看如下文章

文章 链接
hadoop一主三从安装 链接
spark on yarn 链接

实践

下载

官网下载地址

kyubbi spark,kyuubi,kyuubi,1.8.x,spark,paimon

配置

官方文档

修改配置文件

# 需要修改的配置文件
[root@hadoop01 conf]# ls
kyuubi-defaults.conf.template  kyuubi-env.sh.template  log4j2.xml.template
[root@hadoop01 conf]# pwd
/data/hadoop/soft/apache-kyuubi-1.8.0/conf

# 第一步: 修改日志文件
[root@hadoop01 conf]# mv log4j2.xml.template log4j2.xml
# 第二步: kyuubi-env.sh
# export SPARK_HOME=/data/hadoop/soft/spark-3.4.2
[root@hadoop01 conf]# ls
kyuubi-defaults.conf.template  kyuubi-env.sh.template  log4j2.xml
[root@hadoop01 conf]# mv  kyuubi-env.sh.template kyuubi-env.sh
[root@hadoop01 conf]# vi kyuubi-env.sh 
# 第三步: kyuubi-defaults.conf
[root@hadoop01 conf]# mv kyuubi-defaults.conf.template kyuubi-defaults.conf
[root@hadoop01 conf]# vi kyuubi-defaults.conf 

kyuubi.engine.type                       SPARK_SQL
kyuubi.engine.share.level                USER
# ============ Spark Config ============
spark.master=yarn
spark.submit.deployMode=cluster
spark.driver.memory 4g
spark.executor.memory 4g
spark.executor.cores 2
#Dynamic allocation
spark.dynamicAllocation.enabled=true
##false if prefer shuffle tracking than ESS
spark.shuffle.service.enabled=true
spark.dynamicAllocation.initialExecutors=2
spark.dynamicAllocation.minExecutors=6
spark.dynamicAllocation.maxExecutors=6
spark.dynamicAllocation.executorAllocationRatio=0.5
spark.dynamicAllocation.executorIdleTimeout=200s
spark.dynamicAllocation.cachedExecutorIdleTimeout=30min
# true if prefer shuffle tracking than ESS
spark.dynamicAllocation.shuffleTracking.enabled=false
spark.dynamicAllocation.shuffleTracking.timeout=30min
spark.dynamicAllocation.schedulerBacklogTimeout=1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=1s
spark.cleaner.periodicGC.interval=5min
# aqe
spark.sql.adaptive.enabled=true
spark.sql.adaptive.forceApply=false
spark.sql.adaptive.logLevel=info
spark.sql.adaptive.advisoryPartitionSizeInBytes=128m
spark.sql.adaptive.coalescePartitions.enabled=true
spark.sql.adaptive.coalescePartitions.minPartitionNum=30
spark.sql.adaptive.coalescePartitions.initialPartitionNum=5120
spark.sql.adaptive.fetchShuffleBlocksInBatch=true
spark.sql.adaptive.localShuffleReader.enabled=true
spark.sql.adaptive.skewJoin.enabled=true
spark.sql.adaptive.skewJoin.skewedPartitionFactor=3
spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=256m
spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0.2
spark.sql.adaptive.optimizer.excludedRules
spark.sql.autoBroadcastJoinThreshold=-1
# spark 返回数据大小设定
spark.kryoserializer.buffer.max=2047m
spark.driver.maxResultSize=4096m
# paimon
#spark.sql.catalog.hive.warehouse=hdfs:///data/paimon
spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog
spark.sql.catalog.paimon.warehouse=hdfs:///data/paimon
spark.sql.extensions=org.apache.paimon.spark.PaimonSparkSessionExtension


# Log into Spark History Server
spark.eventLog.enabled true
spark.eventLog.dir hdfs://hadoop01:9000/spark-eventlog
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.eventLog.compress true
#将 spark 历史服务器与 yarn 历史服务器进行整合,写到 HDFS 中
spark.yarn.historyServer.address hadoop01:18080

# spark UI参数
spark.ui.retainedJobs 50
spark.ui.retainedStages 300

启动

[root@hadoop01 apache-kyuubi-1.8.0]# ./bin/kyuubi start
bin/beeline -u 'jdbc:hive2://10.32.36.142:10009/' -n root
24/02/26 15:47:30 INFO Client: Application report for application_1708505130791_0007 (state: RUNNING)
24/02/26 15:47:31 INFO Client: Application report for application_1708505130791_0007 (state: RUNNING)
24/02/26 15:47:32 INFO Client: Application report for application_1708505130791_0007 (state: RUNNING)
24/02/26 15:47:33 INFO Client: Application report for application_1708505130791_0007 (state: RUNNING)
2024-02-26 15:47:35.793 INFO KyuubiSessionManager-exec-pool: Thread-58 org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient: Get service instance:hadoop04:39670 engine id:application_1708505130791_0007 and version:1.8.0 under /kyuubi_1.8.0_USER_SPARK_SQL/root/default
24/02/26 15:47:35 INFO Client: Application report for application_1708505130791_0007 (state: RUNNING)
Connected to: Spark SQL (version 3.4.2)
Driver: Kyuubi Project Hive JDBC Client (version 1.8.0)
Beeline version 1.8.0 by Apache Kyuubi
0: jdbc:hive2://10.32.36.142:10009/> 
0: jdbc:hive2://10.32.36.142:10009/> show databases;
2024-02-26 15:48:03.130 INFO KyuubiSessionManager-exec-pool: Thread-66 org.apache.kyuubi.operation.ExecuteStatement: Processing root's query[81c88029-474d-4ed2-98fc-21013bf10cc3]: PENDING_STATE -> RUNNING_STATE, statement:
show databases
2024-02-26 15:48:03.261 INFO KyuubiSessionManager-exec-pool: Thread-66 org.apache.kyuubi.operation.ExecuteStatement: Processing root's query[81c88029-474d-4ed2-98fc-21013bf10cc3]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.131 seconds
+------------+
| namespace  |
+------------+
| default    |
+------------+
1 row selected (0.34 seconds)
0: jdbc:hive2://10.32.36.142:10009/> 

kyubbi spark,kyuubi,kyuubi,1.8.x,spark,paimon
kyubbi spark,kyuubi,kyuubi,1.8.x,spark,paimon

set spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog
set spark.sql.catalog.paimon.warehouse=hdfs:///data/paimon;


24/02/26 20:30:08 INFO ExecuteStatement: Execute in full collect mode
24/02/26 20:30:08 INFO V2ScanRelationPushDown: 
Pushing operators to trace_log_refdes_hive_ro
Pushed Filters: IsNotNull(id), EqualTo(id,11C0928D5A29E048E063AA2C200ABEF3)
Post-Scan Filters: isnotnull(id#354),(id#354 = 11C0928D5A29E048E063AA2C200ABEF3)
         
24/02/26 20:30:08 INFO V2ScanRelationPushDown: 
Output: pcbid#345, rid#346, refdes#347, bm_circuit_no#348, timestamp#349, pickupstatus#350, serial_number#351, flag#352, kitid#353, id#354, createdate#355, etl#356, opt1#357, opt2#358, opt3#359, opt4#360, opt5#361, nozzleid#362, laneno#363, componentbarcode#364, pn#365, lotcode#366, datecode#367, verdor#368, workorder#369, dt#370
         
24/02/26 20:30:08 INFO SparkContext: Starting job: collect at ExecuteStatement.scala:72
24/02/26 20:30:08 INFO SQLOperationListener: Query [4a52f7cf-f446-40b0-b16f-6b895ab84224]: Job 5 started with 1 stages, 1 active jobs running
24/02/26 20:30:08 INFO SQLOperationListener: Query [4a52f7cf-f446-40b0-b16f-6b895ab84224]: Stage 6.0 started with 1 tasks, 1 active stages running

三千五百万
kyubbi spark,kyuubi,kyuubi,1.8.x,spark,paimon

主键id单笔获取

0: jdbc:hive2://10.32.36.142:10009/> select count(*) from trace_log_refdes_hive_ro where id='11C0928D5A26E048E063AA2C200ABEF3';
24/02/27 08:43:33 INFO AdaptiveSparkPlanExec: Final plan:
*(1) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#523L])
+- *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#527L])
   +- *(1) Project
      +- *(1) Filter (isnotnull(id#505) AND (id#505 = 11C0928D5A26E048E063AA2C200ABEF3))
         +- BatchScan trace_log_refdes_hive_ro[id#505] PaimonScan: [trace_log_refdes_hive_ro], PushedFilters: [IsNotNull(id),Equal(id, 11C0928D5A26E048E063AA2C200ABEF3)] RuntimeFilters: []

24/02/27 08:43:33 INFO ExecuteStatement: Processing root's query[b4889a64-d943-41ae-819e-d8a0491c60c6]: RUNNING_STATE -> FINISHED_STATE, time taken: 1.871 seconds
24/02/27 08:43:33 INFO SQLOperationListener: Query [b4889a64-d943-41ae-819e-d8a0491c60c6]: Job 7 succeeded, 0 active jobs running
2024-02-27 08:43:33.005 INFO KyuubiSessionManager-exec-pool: Thread-496 org.apache.kyuubi.operation.ExecuteStatement: Query[b4889a64-d943-41ae-819e-d8a0491c60c6] in FINISHED_STATE
2024-02-27 08:43:33.006 INFO KyuubiSessionManager-exec-pool: Thread-496 org.apache.kyuubi.operation.ExecuteStatement: Processing root's query[b4889a64-d943-41ae-819e-d8a0491c60c6]: RUNNING_STATE -> FINISHED_STATE, time taken: 1.874 seconds
+-----------+
| count(1)  |
+-----------+
| 1         |
+-----------+
1 row selected (1.883 seconds)

非主键 count 测试

0: jdbc:hive2://10.32.36.142:10009/> select count(*) from trace_log_refdes_hive_ro where pcbid='E23MPM42201540';
24/02/27 08:41:56 INFO AdaptiveSparkPlanExec: Final plan:
*(2) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#489L])
+- ShuffleQueryStage 0
   +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=109]
      +- *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#493L])
         +- *(1) Project
            +- *(1) Filter (isnotnull(pcbid#462) AND (pcbid#462 = E23MPM42201540))
               +- BatchScan trace_log_refdes_hive_ro[pcbid#462] PaimonScan: [trace_log_refdes_hive_ro], PushedFilters: [IsNotNull(pcbid),Equal(pcbid, E23MPM42201540)] RuntimeFilters: []

2024-02-27 08:41:56.274 INFO KyuubiSessionManager-exec-pool: Thread-495 org.apache.kyuubi.operation.ExecuteStatement: Processing root's query[f473dde1-ba80-4d3e-8dc7-fa0a3e32df1f]: RUNNING_STATE -> FINISHED_STATE, time taken: 5.368 seconds
+-----------+
| count(1)  |
+-----------+
| 108       |
+-----------+
1 row selected (5.393 seconds)

差不多6千万

24/02/27 16:54:54 INFO SQLOperationListener: Query [96e1d2f4-8230-4cc7-9b84-ce888387bb7d]: Job 2 succeeded, 0 active jobs running
24/02/27 16:54:54 INFO AdaptiveSparkPlanExec: Final plan:
*(2) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#61L])
+- ShuffleQueryStage 0
   +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=52]
      +- *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#64L])
         +- *(1) Project
            +- BatchScan trace_log_refdes_hive_ro[] PaimonScan: [trace_log_refdes_hive_ro] RuntimeFilters: []

24/02/27 16:54:54 INFO CodeGenerator: Code generated in 7.970386 ms
24/02/27 16:54:54 INFO ExecuteStatement: Processing root's query[96e1d2f4-8230-4cc7-9b84-ce888387bb7d]: RUNNING_STATE -> FINISHED_STATE, time taken: 21.921 seconds
2024-02-27 16:54:54.490 INFO KyuubiSessionManager-exec-pool: Thread-806 org.apache.kyuubi.operation.ExecuteStatement: Query[96e1d2f4-8230-4cc7-9b84-ce888387bb7d] in FINISHED_STATE
2024-02-27 16:54:54.490 INFO KyuubiSessionManager-exec-pool: Thread-806 org.apache.kyuubi.operation.ExecuteStatement: Processing root's query[96e1d2f4-8230-4cc7-9b84-ce888387bb7d]: RUNNING_STATE -> FINISHED_STATE, time taken: 21.924 seconds
+-----------+
| count(1)  |
+-----------+
| 59480487  |
+-----------+
1 row selected (21.942 seconds)
0: jdbc:hive2://10.32.36.142:10009/> select *  from trace_log_refdes_hive_ro limit 10;
24/02/27 16:55:27 INFO ExecuteStatement: Execute in full collect mode
24/02/27 16:55:27 INFO V2ScanRelationPushDown: 
Output: pcbid#67, rid#68, refdes#69, bm_circuit_no#70, timestamp#71, pickupstatus#72, serial_number#73, flag#74, kitid#75, id#76, createdate#77, etl#78, opt1#79, opt2#80, opt3#81, opt4#82, opt5#83, nozzleid#84, laneno#85, componentbarcode#86, pn#87, lotcode#88, datecode#89, verdor#90, workorder#91, dt#92
         
2024-02-27 16:55:28.414 INFO KyuubiSessionManager-exec-pool: Thread-807 org.apache.kyuubi.operation.ExecuteStatement: Processing root's query[cadc193d-1343-4973-828c-23f23250b3d3]: RUNNING_STATE -> FINISHED_STATE, time taken: 0.735 seconds
+-----------------+------------------------------+---------+----------------+----------------------+---------------+--------------------+-------+--------+-----------------------------------+----------------------+------+-------+-------+-------+-------+-------+-----------+---------+-------------------+-------+----------+-----------+---------+------------+-------------+
|      pcbid      |             xxx              | yyyyff  | zzzzzxxxxxxxx  |      timestamp       | gggggggfffff  |   ddddddddddddd    | cccc  | aaaaa  |                id                 |      createdate      | etl  | opt1  | opt2  | opt3  | opt4  | opt5  | nozzleid  | laneno  | componentbarcode  |  pn   | xyzzzzx  | xzzzzzzz  | txxxxx  | zxxxxxxxx  |     dt      |
+-----------------+------------------------------+---------+----------------+----------------------+---------------+--------------------+-------+--------+-----------------------------------+----------------------+------+-------+-------+-------+-------+-------+-----------+---------+-------------------+-------+----------+-----------+---------+------------+-------------+
| E23MPM42203175  | 514S00292-11420240109000060  | J0200   | 5              | 2024-02-23 18:16:42  | 0             | DLC4084004RPQVLAG  | 0     | NXT    | 11C0928D5A26E048E063AA2C200ABEF3  | 2024-02-23 18:14:50  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
| E23MPM42203175  | 514S00292-11420240109000060  | J0200   | 4              | 2024-02-23 18:16:42  | 0             | DLC4084004SPQVLAF  | 0     | NXT    | 11C0928D5A27E048E063AA2C200ABEF3  | 2024-02-23 18:14:50  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
| E23MPM42203175  | 514S00292-11420240117000057  | J0200   | 7              | 2024-02-23 18:16:42  | 0             | DLC4084004PPQVLAJ  | 0     | NXT    | 11C0928D5A28E048E063AA2C200ABEF3  | 2024-02-23 18:14:50  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
| E23MPM42203175  | 514S00292-11420240117000057  | J0200   | 9              | 2024-02-23 18:16:42  | 0             | DLC4084004MPQVLAL  | 0     | NXT    | 11C0928D5A29E048E063AA2C200ABEF3  | 2024-02-23 18:14:50  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
| E23MPM42203175  | 514S00292-11420240117000056  | J0200   | 12             | 2024-02-23 18:16:42  | 0             | DLC4084004VPQVLAC  | 0     | NXT    | 11C0928D5A2CE048E063AA2C200ABEF3  | 2024-02-23 18:14:50  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
| E23MPM42203176  | 514S00292-11420240117000056  | J0200   | 1              | 2024-02-23 18:16:42  | 0             | DLC4084005NPQVLAG  | 0     | NXT    | 11C0928D5A31E048E063AA2C200ABEF3  | 2024-02-23 18:14:50  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
| E23MPM42203176  | 514S00292-11420240117000059  | J0200   | 4              | 2024-02-23 18:16:42  | 0             | DLC4084005GPQVLAN  | 0     | NXT    | 11C0928D5A34E048E063AA2C200ABEF3  | 2024-02-23 18:14:50  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
| E23MPM42203176  | 514S00292-11420240116000073  | J0200   | 10             | 2024-02-23 18:16:42  | 0             | DLC4084005MPQVLAH  | 0     | NXT    | 11C0928D5A3AE048E063AA2C200ABEF3  | 2024-02-23 18:14:50  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
| E23MPM42201540  | 117S0158-A420240115001804    | R0514   | 7              | 2024-02-23 18:16:42  | 0             | DLC40860DTVPQVLAW  | 0     | NXT    | 11C0928D5A59E048E063AA2C200ABEF3  | 2024-02-23 18:14:51  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
| E23MPM42201540  | 117S0158-A420240115001804    | R0401   | 1              | 2024-02-23 18:16:42  | 0             | DLC40860DU4PQVLAJ  | 0     | NXT    | 11C0928D5A5DE048E063AA2C200ABEF3  | 2024-02-23 18:14:51  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
+-----------------+------------------------------+---------+----------------+----------------------+---------------+--------------------+-------+--------+-----------------------------------+----------------------+------+-------+-------+-------+-------+-------+-----------+---------+-------------------+-------+----------+-----------+---------+------------+-------------+
10 rows selected (0.777 seconds)
0: jdbc:hive2://10.32.36.142:10009/> select *  from trace_log_refdes_hive_ro where id ='11C0928D5A5DE048E063AA2C200ABEF3';
24/02/27 16:55:49 INFO V2ScanRelationPushDown: 
Pushing operators to trace_log_refdes_hive_ro
Pushed Filters: IsNotNull(id), EqualTo(id,11C0928D5A5DE048E063AA2C200ABEF3)
Post-Scan Filters: isnotnull(id#181),(id#181 = 11C0928D5A5DE048E063AA2C200ABEF3)
         
24/02/27 16:55:49 INFO V2ScanRelationPushDown: 
Output: pcbid#172, rid#173, refdes#174, bm_circuit_no#175, timestamp#176, pickupstatus#177, serial_number#178, flag#179, kitid#180, id#181, createdate#182, etl#183, opt1#184, opt2#185, opt3#186, opt4#187, opt5#188, nozzleid#189, laneno#190, componentbarcode#191, pn#192, lotcode#193, datecode#194, verdor#195, workorder#196, dt#197
         
2024-02-27 16:55:52.191 INFO KyuubiSessionManager-exec-pool: Thread-808 org.apache.kyuubi.operation.ExecuteStatement: Processing root's query[1dd09511-efc5-4e64-ad2a-15d4b59fb553]: RUNNING_STATE -> FINISHED_STATE, time taken: 2.859 seconds
+-----------------+----------------------------+---------+----------------+----------------------+---------------+--------------------+-------+--------+-----------------------------------+----------------------+------+-------+-------+-------+-------+-------+-----------+---------+-------------------+-------+----------+-----------+---------+------------+-------------+
|      pcbid      |             xxx              | yyyyff  | zzzzzxxxxxxxx  |      timestamp       | gggggggfffff  |   ddddddddddddd    | cccc  | aaaaa  |                id                 |      createdate      | etl  | opt1  | opt2  | opt3  | opt4  | opt5  | nozzleid  | laneno  | componentbarcode  |  pn   | xyzzzzx  | xzzzzzzz  | txxxxx  | zxxxxxxxx  |     dt      |
+-----------------+----------------------------+---------+----------------+----------------------+---------------+--------------------+-------+--------+-----------------------------------+----------------------+------+-------+-------+-------+-------+-------+-----------+---------+-------------------+-------+----------+-----------+---------+------------+-------------+
| E23MPM42201540  | 117S0158-A420240115001804  | R0401   | 1              | 2024-02-23 18:16:42  | 0             | DLC40860DU4PQVLAJ  | 0     | NXT    | 11C0928D5A5DE048E063AA2C200ABEF3  | 2024-02-23 18:14:51  | N    | NULL  | NULL  | NULL  | NULL  | NULL  | NULL      | NULL    | NULL              | NULL  | NULL     | NULL      | NULL    | NULL       | 2024-02-23  |
+-----------------+----------------------------+---------+----------------+----------------------+---------------+--------------------+-------+--------+-----------------------------------+----------------------+------+-------+-------+-------+-------+-------+-----------+---------+-------------------+-------+----------+-----------+---------+------------+-------------+
1 row selected (2.875 seconds)
0: jdbc:hive2://10.32.36.142:10009/> select count(*)  from trace_log_refdes_hive_ro where pcbid ='E23MPM42203176';
2024-02-27 16:57:01.984 INFO KyuubiSessionManager-exec-pool: Thread-809 org.apache.kyuubi.operation.ExecuteStatement: Processing root's query[92377a4f-2dad-4039-be2d-4762946eaea8]: PENDING_STATE -> RUNNING_STATE, statement:
select count(*)  from trace_log_refdes_hive_ro where pcbid ='E23MPM42203176'
24/02/27 16:57:01 INFO ExecuteStatement: Processing root's query[92377a4f-2dad-
24/02/27 16:57:10 INFO SparkContext: Starting job: collect at ExecuteStatement.scala:72
24/02/27 16:57:10 INFO SQLOperationListener: Query [92377a4f-2dad-4039-be2d-4762946eaea8]: Job 6 started with 2 stages, 1 active jobs running
24/02/27 16:57:10 INFO SQLOperationListener: Query [92377a4f-2dad-4039-be2d-4762946eaea8]: Stage 8.0 started with 1 tasks, 1 active stages running
24/02/27 16:57:10 INFO SQLOperationListener: Finished stage: Stage(8, 0); Name: 'collect at ExecuteStatement.scala:72'; Status: succeeded; numTasks: 1; Took: 47 msec
24/02/27 16:57:10 INFO AdaptiveSparkPlanExec: Final plan:
*(2) HashAggregate(keys=[], functions=[count(1)], output=[count(1)#304L])
+- ShuffleQueryStage 0
   +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=115]
      +- *(1) HashAggregate(keys=[], functions=[partial_count(1)], output=[count#308L])
         +- *(1) Project
            +- *(1) Filter (isnotnull(pcbid#277) AND (pcbid#277 = E23MPM42203176))
               +- BatchScan trace_log_refdes_hive_ro[pcbid#277] PaimonScan: [trace_log_refdes_hive_ro], PushedFilters: [IsNotNull(pcbid),Equal(pcbid, E23MPM42203176)] RuntimeFilters: []

2024-02-27 16:57:10.578 INFO KyuubiSessionManager-exec-pool: Thread-809 org.apache.kyuubi.operation.ExecuteStatement: Query[92377a4f-2dad-4039-be2d-4762946eaea8] in FINISHED_STATE
2024-02-27 16:57:10.578 INFO KyuubiSessionManager-exec-pool: Thread-809 org.apache.kyuubi.operation.ExecuteStatement: Processing root's query[92377a4f-2dad-4039-be2d-4762946eaea8]: RUNNING_STATE -> FINISHED_STATE, time taken: 8.594 seconds
+-----------+
| count(1)  |
+-----------+
| 3         |
+-----------+
1 row selected (8.604 seconds)
0: jdbc:hive2://10.32.36.142:10009/> 

结束

kyuubi整合spark on yarn 至此结束。文章来源地址https://www.toymoban.com/news/detail-844724.html

到了这里,关于kyuubi整合spark on yarn的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • Spark on Yarn 最佳运行参数调优-计算方式_spark on yarn 调优 nodemanager

    先自我介绍一下,小编浙江大学毕业,去过华为、字节跳动等大厂,目前阿里P7 深知大多数程序员,想要提升技能,往往是自己摸索成长,但自己不成体系的自学效果低效又漫长,而且极易碰到天花板技术停滞不前! 因此收集整理了一份《2024年最新软件测试全套学习资料》

    2024年04月26日
    浏览(45)
  • spark on hive

    注意:需要提前搭建好hive,并对hive进行配置。并将spark配置成为spark on yarn模式。 提前创建好启动日志存放路径 mkdir $HIVE_HOME/logStart 注意:其实还是hive的thirftserver服务,同时还需要启动spark集群 连接thirftserver服务后,就可以使用hive的元数据(hive在hdfs中的数据库和表),并且

    2024年02月07日
    浏览(49)
  • Hive on Spark (1)

    在 Apache Spark 中,Executor 是分布式计算框架中的一个关键组件,用于 在集群中执行具体的计算任务 。每个 Executor 都在独立的 JVM 进程中运行,可以在集群的多台机器上同时存在。Executors 负责实际的数据处理、计算和任务执行,它们执行来自 Driver 的指令,并将计算结果返回给

    2024年02月12日
    浏览(45)
  • Spark on Hive及 Spark SQL的运行机制

    代码中集成Hive: Spark SQL底层依然运行的是Spark RDD的程序,所以说Spark RDD程序的运行的流程,在Spark SQL中依然是存在的,只不过在这个流程的基础上增加了从SQL翻译为RDD的过程 Spark SQL的运行机制,其实就是在描述如何将Spark SQL翻译为RDD程序 Catalyst内部具体的执行流程: 专业术

    2024年01月23日
    浏览(50)
  • Hive on Spark环境搭建

    Hive 引擎包括:默认 MR、tez、spark 最底层的引擎就是MR (Mapreduce)无需配置,Hive运行自带 Hive on Spark:Hive 既作为存储元数据又负责 SQL 的解析优化,语法是 HQL 语法,执行引擎变成了 Spark,Spark 负责采用 RDD 执行。 Spark on Hive : Hive 只作为存储元数据,Spark 负责 SQL 解析优化,语

    2024年02月13日
    浏览(50)
  • hive on spark内存模型

    hive on spark的调优,那必然涉及到这一系列框架的内存模型。本章就是来讲一下这些框架的内存模型。 hive on spark的任务,从开始到结束。总共涉及了3个框架。分别是:yarn、hive、spark 其中,hive只是一个客户端的角色。就不涉及任务运行时的内存。所以这里主要讲的yarn和spark的

    2024年04月16日
    浏览(43)
  • Spark On Hive原理和配置

    目录 一、Spark On Hive原理         (1)为什么要让Spark On Hive? 二、MySQL安装配置(root用户)         (1)安装MySQL         (2)启动MySQL设置开机启动         (3)修改MySQL密码 三、Hive安装配置         (1)修改Hadoop的core-site.xml         (2)创建hive-site.xml        

    2024年02月08日
    浏览(55)
  • 【Spark实战】Windows环境下编译Spark2 Linux上部署Spark On Yarn

    环境准备 git-2.14.1 maven-3.9.2 jdk-1.8 scala-2.11.8 zinc-0.3.15 主下载地址 spark-2.3.4 github官方地址 编译准备 maven远程仓库使用的是阿里云的 解压源码包 spark-2.3.4.zip ,修改根模块的pom文件。主要目的是为了变更hadoop的版本号,默认是 2.6.5 。 修改 spark-2.3.4devmake-distribution.sh 文件 主要是

    2024年02月13日
    浏览(56)
  • (超详细)Spark on Yarn安装配置

    1,前期准备 使用 root 用户完成相关配置,已安装配置Hadoop 及前置环境 2,spark上传解压到master服务器 3,修改环境变量  /etc/profile末尾添加下面代码 4,环境变量生效 5,运行spark-submit --version 显示如下 6,修改saprk-env.sh文件   在.../spark-3.1.1-bin-hadoop3.2/conf目录下,将下面两行

    2024年03月21日
    浏览(44)
  • Spark on Yarn模式下执行过程

    Driver Application启动 Driver Application启动:用户提交的Spark Application在YARN上启动一个ApplicationMaster(即Driver Application)进程来管理整个应用程序的生命周期,并向ResourceManager请求资源。 获得资源 Driver Application向ResourceManager请求可用的资源(CPU核数、内存等),并等待接收到资源

    2024年02月01日
    浏览(50)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包