1.前提准备
1.安装好hadoop(建议安装高可用)
如果没有安装,参考 采集项目(HA)(五台服务器)_ha数据采集-CSDN博客
2.安装Hive
1.解压
[atguigu@hadoop100 software]$ tar -zxvf /opt/software/apache-hive-3.1.3.tar.gz -C /opt/module/ [atguigu@hadoop100 software]$ mv /opt/module/apache-hive-3.1.3-bin/ /opt/module/hive
2.环境变量
[atguigu@hadoop100 software]$ sudo vim /etc/profile.d/my_env.sh #HIVE_HOME export HIVE_HOME=/opt/module/hive export PATH=$PATH:$HIVE_HOME/bin [atguigu@hadoop100 software]$ source /etc/profile.d/my_env.sh
解决日志Jar包冲突,进入/opt/module/hive/lib
[atguigu@hadoop100 lib]$ mv log4j-slf4j-impl-2.17.1.jar log4j-slf4j-impl-2.17.1.jar.bak
3.hive元数据配置到mysql 拷贝驱动
[atguigu@hadoop102 lib]$ cp /opt/software/mysql/mysql-connector-j-8.0.31.jar /opt/module/hive/lib/
配置Metastore到mysql
[atguigu@hadoop102 conf]$ vim hive-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop100:3306/metastore?useSSL=false&useUnicode=true&characterEncoding=UTF-8</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.cj.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>000000</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>hadoop101</value> </property> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <property> <name>hive.cli.print.header</name> <value>true</value> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> </configuration>
4.启动hive 1.登录mysql
[atguigu@hadoop100 conf]$ mysql -uroot -p000000
2.新建hive元数据库
mysql> create database metastore;
3.初始化hive元数据库
[atguigu@hadoop100 conf]$ schematool -initSchema -dbType mysql -verbose
4.修改元数据字符集
mysql>use metastore; mysql> alter table COLUMNS_V2 modify column COMMENT varchar(256) character set utf8; mysql> alter table TABLE_PARAMS modify column PARAM_VALUE mediumtext character set utf8; mysql> quit;
5.启动hive客户端
[atguigu@hadoop100 hive]$ bin/hive
6.用客户端软件连接时
[atguigu@hadoop100 bin]$ hiveserver2
3.Spark纯净包安装
1.纯净包下载地址
Downloads | Apache Spark
2.解压
[atguigu@hadoop102 software]$ tar -zxvf spark-3.3.1-bin-without-hadoop.tgz -C /opt/module/ [atguigu@hadoop102 software]$ mv /opt/module/spark-3.3.1-bin-without-hadoop /opt/module/spark
3.编辑文档
[atguigu@hadoop102 software]$ mv /opt/module/spark/conf/spark-env.sh.template /opt/module/spark/conf/spark-env.sh
[atguigu@hadoop102 software]$ vim /opt/module/spark/conf/spark-env.sh
添加内容
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
4.环境变量
[atguigu@hadoop102 software]$ sudo vim /etc/profile.d/my_env.sh
# SPARK_HOME export SPARK_HOME=/opt/module/spark export PATH=$PATH:$SPARK_HOME/bin
[atguigu@hadoop102 software]$ source /etc/profile.d/my_env.sh
5.在hive中创建spark配置文件
[atguigu@hadoop102 software]$ vim /opt/module/hive/conf/spark-defaults.conf
spark.master yarn spark.eventLog.enabled true spark.eventLog.dir hdfs://mycluster:8020/spark-history spark.executor.memory 1g spark.driver.memory 1g
注意:配置文件中hdfs://mycluster:8020/spark-history 是namenode的地址,我本人高可用名称是mycluster(如果不是高可用,写ip地址即可)
[atguigu@hadoop102 software]$ hadoop fs -mkdir /spark-history
6.向HDFS上传Spark纯净版jar包
说明1:采用Spark纯净版jar包,不包含hadoop和hive相关依赖,能避免依赖冲突。
说明2:Hive任务最终由Spark来执行,Spark任务资源分配由Yarn来调度,该任务有可能被分配到集群的任何一个节点。所以需要将Spark的依赖上传到HDFS集群路径,这样集群中任何一个节点都能获取到。
[atguigu@hadoop102 software]$ hadoop fs -mkdir /spark-jars [atguigu@hadoop102 software]$ hadoop fs -put /opt/module/spark/jars/* /spark-jars
7.修改hive-site.xml文件
[atguigu@hadoop102 ~]$ vim /opt/module/hive/conf/hive-site.xml
<!--Spark依赖位置(注意:端口号8020必须和namenode的端口号一致)--> <property> <name>spark.yarn.jars</name> <value>hdfs://mycluster:8020/spark-jars/*</value> </property> <!--Hive执行引擎--> <property> <name>hive.execution.engine</name> <value>spark</value> </property>
注意:配置文件中hdfs://mycluster:8020/spark-jars/*是namenode的地址,我本人高可用名称是mycluster(如果不是高可用,写ip地址即可)
4.Yarn环境配置
1.修改配置
vim /opt/module/hadoop/etc/hadoop/capacity-scheduler.xml
如果该配置有就修改,没有就添加
<property> <name>yarn.scheduler.capacity.maximum-am-resource-percent</name> <value>0.8</value> </property
2.分发
[atguigu@hadoop102 hadoop]$ xsync capacity-scheduler.xml
3.重启
[atguigu@hadoop103 hadoop]$ stop-yarn.sh [atguigu@hadoop103 hadoop]$ start-yarn.sh
5.测试
1.测试
[atguigu@hadoop102 hive]$ hive
hive (default)> create table student(id int, name string);
hive (default)> insert into table student values(1,'abc');
2.远程连接文章来源:https://www.toymoban.com/news/detail-849889.html
[atguigu@hadoop102 hive]$ hiveserver2
注意:如果是服务器,需要打开安全组文章来源地址https://www.toymoban.com/news/detail-849889.html
到了这里,关于HiveOnSpark安装的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!