【Sqoop】Sqoop的使用(连接MySQL、Hive)

这篇具有很好参考价值的文章主要介绍了【Sqoop】Sqoop的使用(连接MySQL、Hive)。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

Sqoop的使用

使用sqoop前需要启动Hadoop,检查Hadoop的运行状态,我们这里使用的是伪分布式

[root@hr conf]# jps
1538 DataNode
2114 NodeManager
1689 SecondaryNameNode
1834 ResourceManager
1421 NameNode
2831 Jps

Sqoop连接MySQL需要MySQLjar包,吧jar包放置Sqooplib目录下

先测试Sqoop能否连接MySQL查看存在的数据库

sqoop list-databases --connect jdbc:mysql://192.168.0.161:3306/ --username root --password 123456

结果中的警告是因为没有配置Hbase,忽略即可

Warning: /home/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
22/03/16 04:27:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/03/16 04:27:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/03/16 04:27:14 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
Wed Mar 16 04:27:14 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
information_schema
hive
mysql
performance_schema
sys

为了连接HIve我们需要将Hive组件lib目录下的hive-common-3.1.2.jar放到Sqoop组件的lib目录下

cp /home/hive/lib/hive-common-3.1.2.jar /home/sqoop/lib/

在MySQL中创建一个数据库和数据表,并随机插入一些数据

创建数据库和表

create database sample;
create table student(number char(9) primary key, name varchar(10));

添加数据

insert into student values('01','zhangsan');
insert into student values('02','lisi');
insert into student values('03','wangwu');

查看数据

mysql> select * from student;
+--------+----------+
| number | name     |
+--------+----------+
| 01     | zhangsan |
| 02     | lisi     |
| 03     | wangwu   |
+--------+----------+
3 rows in set (0.00 sec)

mysql> 

Hive中创建数据库和数据表

hive> create database sample; 
OK
Time taken: 0.566 seconds
hive> use sample;
OK
Time taken: 0.015 seconds
hive>  create table student(number STRING, name STRING) row format delimited fields terminated by ',';
Time taken: 0.499 seconds

MySQL 导出数据,导入Hive

sqoop参数说明

参数 意义
–connect MySQL 数据库连接 URL。
–username 和–password MySQL 数据库的用户名和密码。
–table 导出的数据表名。
–fields-terminated-by Hive 中字段分隔符。
–delete-target-dir 删除导出目的目录。
–num-mappers Hadoop 执行 Sqoop 导入导出启动的 map 任务数。
–hive-import --hive-database 导出到 Hive 的数据库名。
–hive-table 导出到 Hive 的表名。
sqoop import --connect jdbc:mysql://192.168.0.161:3306/sample --username root --password 123456 --table student --fields-terminated-by ',' --delete-target-dir --num-mappers 1 --hive-import --hive-database sample --hive-table student

运行日志

Warning: /home/sqoop/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/sqoop/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
22/03/16 05:03:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
22/03/16 05:03:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/03/16 05:03:24 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/03/16 05:03:24 INFO tool.CodeGenTool: Beginning code generation
Wed Mar 16 05:03:24 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/03/16 05:03:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/03/16 05:03:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/03/16 05:03:25 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop
Note: /tmp/sqoop-root/compile/05aad1252e73aa2f62b414a7722fe98d/student.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/03/16 05:03:27 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/05aad1252e73aa2f62b414a7722fe98d/student.jar
22/03/16 05:03:28 INFO tool.ImportTool: Destination directory student is not present, hence not deleting.
22/03/16 05:03:28 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/03/16 05:03:28 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/03/16 05:03:28 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/03/16 05:03:28 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/03/16 05:03:28 INFO mapreduce.ImportJobBase: Beginning import of student
22/03/16 05:03:28 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/03/16 05:03:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/03/16 05:03:28 INFO client.RMProxy: Connecting to ResourceManager at hr/192.168.0.161:8032
Wed Mar 16 05:03:30 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/03/16 05:03:30 INFO db.DBInputFormat: Using read commited transaction isolation
22/03/16 05:03:30 INFO mapreduce.JobSubmitter: number of splits:1
22/03/16 05:03:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1647416961901_0001
22/03/16 05:03:31 INFO impl.YarnClientImpl: Submitted application application_1647416961901_0001
22/03/16 05:03:31 INFO mapreduce.Job: The url to track the job: http://hr:8088/proxy/application_1647416961901_0001/
22/03/16 05:03:31 INFO mapreduce.Job: Running job: job_1647416961901_0001
22/03/16 05:03:40 INFO mapreduce.Job: Job job_1647416961901_0001 running in uber mode : false
22/03/16 05:03:40 INFO mapreduce.Job:  map 0% reduce 0%
22/03/16 05:03:47 INFO mapreduce.Job:  map 100% reduce 0%
22/03/16 05:03:48 INFO mapreduce.Job: Job job_1647416961901_0001 completed successfully
22/03/16 05:03:48 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=133750
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=30
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3708
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=3708
		Total vcore-seconds taken by all map tasks=3708
		Total megabyte-seconds taken by all map tasks=3796992
	Map-Reduce Framework
		Map input records=3
		Map output records=3
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=68
		CPU time spent (ms)=1020
		Physical memory (bytes) snapshot=110944256
		Virtual memory (bytes) snapshot=2081656832
		Total committed heap usage (bytes)=30474240
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=30
22/03/16 05:03:48 INFO mapreduce.ImportJobBase: Transferred 30 bytes in 20.1651 seconds (1.4877 bytes/sec)
22/03/16 05:03:48 INFO mapreduce.ImportJobBase: Retrieved 3 records.
22/03/16 05:03:48 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table student
Wed Mar 16 05:03:48 EDT 2022 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
22/03/16 05:03:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
22/03/16 05:03:48 INFO hive.HiveImport: Loading uploaded data into Hive
22/03/16 05:03:48 INFO conf.HiveConf: Found configuration file file:/home/hive/conf/hive-site.xml
22/03/16 05:03:49 INFO hive.HiveImport: which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/home/jdk/bin:/home/jdk/bin:/home/sqoop/bin)
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/home/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Found binding in [jar:file:/home/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
22/03/16 05:03:51 INFO hive.HiveImport: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
22/03/16 05:03:52 INFO hive.HiveImport: Hive Session ID = 4b655330-3a4e-4ed4-b40f-599c1bff9652
22/03/16 05:03:52 INFO hive.HiveImport: 
22/03/16 05:03:52 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/home/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
22/03/16 05:03:59 INFO hive.HiveImport: Hive Session ID = 57f33b0b-6433-4c3e-8277-4f020467d73b
22/03/16 05:04:00 INFO hive.HiveImport: OK
22/03/16 05:04:00 INFO hive.HiveImport: Time taken: 1.046 seconds
22/03/16 05:04:00 INFO hive.HiveImport: Loading data to table sample.student
22/03/16 05:04:01 INFO hive.HiveImport: OK
22/03/16 05:04:01 INFO hive.HiveImport: Time taken: 0.805 seconds
22/03/16 05:04:02 INFO hive.HiveImport: Hive import complete.
22/03/16 05:04:02 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

去查看HIve上的表

hive> select * from sample.student;
OK
01	zhangsan
02	lisi
03	wangwu
Time taken: 2.154 seconds, Fetched: 3 row(s)

Hive 导出数据,导入到 MySQL

参数 意义
–connect MySQL 数据库连接 URL。
–username 和–password MySQL 数据库的用户名和密码。
–table 导出的数据表名。
–fields-terminated-by Hive 中字段分隔符。
–export-dir Hive 数据表在 HDFS 中的存储路径。
sqoop export --connect "jdbc:mysql://192.168.0.161:3306/sample?useUnicode=true&characterEncoding=utf-8" --username root --password 123456 --table student --input-fields-terminated-by ',' --export-dir /user/hive/warehouse/sample.db/student/*

Sqoop其他的命令
列出 MySQL 数据库中的所有数据库。

sqoop list-databases -connect jdbc:mysql://localhost:3306/ -username root -password 123456

连接MySQL并列出 sample数据库中的表。

sqoop list-tables -connect jdbc:mysql://localhost:3306/sample -username root -password 123456

将关系型数据的表结构复制到hive中,只是复制表的结构,表中的内容没有复制过去。

sqoop create-hive-table -connect jdbc:mysql://localhost:3306/sample -table student -username root -password 123456 -hive-table test

其中–table studentMySQL 中的数据库 sample中的表–hive-table testhive中新建的表名称。


从关系数据库导入文件到Hive中。

sqoop import --connect jdbc:mysql://localhost:3306/sample --username root --password 123456 --table student --delete-target-dir --num-mappers 1 --hive-import --hive-database default --hive-table test

Hive中的表数据导入到 MySQL 中,在进行导入之前,MySQL中的表hive_test表必须已经提前创建好。

sqoop export -connect jdbc:mysql://localhost:3306/sample -username root -password 123456 -table student --input-fields-terminatedby '\001' -export-dir /user/hive/warehouse/test

从数据库导出表的数据到 HDFS 上文件。

sqoop import -connect jdbc:mysql://localhost:3306/sample -username root -password 123456 -table student --num-mappers 1 -targetdir /user/test

从数据库增量导入表数据到 HDFS中。文章来源地址https://www.toymoban.com/news/detail-468061.html

sqoop import -connect jdbc:mysql://localhost:3306/sample -username root -password 123456 -table student --num-mappers 1 -targetdir /user/test -check-column number -incremental append -last-value 0

到了这里,关于【Sqoop】Sqoop的使用(连接MySQL、Hive)的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • Sqoop【实践 01】Sqoop1最新版 MySQL与HDFS\Hive\HBase 核心导入导出案例分享+多个WRAN及Exception问题处理(一篇即可学会在日常工作中使用Sqoop)

    1️⃣ 查看所有命令【 sqoop help 】 2️⃣ 查看某条命令的具体使用方法【 sqoop help COMMAND 】 查询MySQL所有数据库,通常用于 Sqoop 与 MySQL 连通测试: 【具体参数可以使用 sqoop help list-databases 查看】实例: 结果: 【2个】WARN处理: 查询指定数据库中所有数据表,这里要注意一下

    2024年03月18日
    浏览(41)
  • 大数据bug-sqoop(二:sqoop同步mysql数据到hive进行字段限制。)

    新增加三个参数 –query “${sql}” 这个参数添加对应表的sql语句。注意结尾必须添加 $CONDITIONS ,必须添加where 条件,如果没有where条件,写成where 1=1。案例如下: \\\"select id,key_id,key_type,\\\'\\\' as encryption_cert_chain,device_type,account_id_hash,user_identifier,user_id,request_id,device_id,vehicle_id,vehicl

    2024年02月12日
    浏览(32)
  • Sqoop导出hive/hdfs数据到mysql中---大数据之Apache Sqoop工作笔记006

    然后我们看看数据利用sqoop,从hdfs hbase中导出到mysql中去   看看命令可以看到上面这个   这里上面还是mysql的部分,然后看看 下面--num-mappers 这个是指定mapper数 然后下面这个export-dir这里是,指定hdfs中导出数据的目录 比如这里指定的是hive的一个表/user/hive/warehouse/sttaff_hive 然后下

    2024年02月03日
    浏览(28)
  • sqoop把hive中的数据导入mysql

    记录简单案例 首先开启集群: 在hive中建库建表,并插入一条数据来为自己做实验: 验证一下,是否插入成功: 在mysql中建表,并且要与hive中传过来的数据字段相对应: 建表后为空: 用sqoop将hive中的数据传到mysql中: export 导出数据, –connect 连接数据库的参数, –userna

    2024年02月07日
    浏览(70)
  • sqoop同步命令从mysql同步到hive

    sqoop import --connect jdbc:mysql://192.168.253.142:8066/youfanshop  --username mycat --password 123456 --query \\\'SELECT id,name,age,address,telphone,qq,weixin,email,sex,birthday,account FROM user WHERE $CONDITIONS\\\' --fields-terminated-by \\\'t\\\'  --null-string \\\'**\\\'  --target-dir /user/hive/warehouse/youfanshop.db/user  --hive-table youfanshop.user  --m 1 -

    2024年02月06日
    浏览(27)
  • sqoop(DataX)-MySQL导入HIVE时间格问题

    用公司的大数据平台(DataX)导数,已经开发上线一个多月的一批报表,突然有同事说有个报表数据不准。出在时间字段上。 分析: 1、先看了原数据MySQL字段类型为datetime,目标字段为timestamp类型; 2、经发现所有时间的差距都是8小时,怀疑是因为时区转换的原因; 3、对比其他

    2024年02月02日
    浏览(37)
  • Sqoop将hive处理的数据导出到MySQL

    目录 部署前提 一、创建数据上传到hive中 创建学生表数据:vim stu.txt 创建课程表数据:vim course.txt 在hive中创建学生表和课程表: 上传数据到hdfs上的hive中: 二、处理hive中的数据 查询学生姓名和课程信息,创建结果表stu_result: 插入数据: 三、在mysql中创建结果表 四、将hive中处

    2024年02月03日
    浏览(31)
  • [sqoop]将hive查询后的数据导入到MySQL

    一、知识点 export:将Hive的表导入到mysql叫导出 搜了很多,发现sqoop在hive导出到mysql时 1)不支持where参数对数据进行过滤。 2)不支持指定hive表的方式导出,只能指定Hive目录进行导出。 二、操作 1、在MySQL中建表 2、在hive中建表 3、将hive表导入到Mysql对应的表

    2024年02月15日
    浏览(32)
  • 用sqoop导出hive parquet 分区表到mysql

    确保你已经安装并配置好了Sqoop工具,并且可以连接到Hadoop集群和MySQL数据库。 创建一个MySQL表来存储导出的数据。请确保MySQL表的结构与Hive Parquet分区表的结构匹配。 使用Sqoop的export命令来执行导出操作。以下是一个示例命令: 替换 mysql_host、database_name、mysql_username 和 mysq

    2024年02月14日
    浏览(33)
  • MySQL中的业务数据该如何正确导入到Hive中 - Sqoop

    水善利万物而不争,处众人之所恶,故几于道💦   1. 使用Sqoop脚本将MySQL中的数据迁移到HDFS   2. 在Hive中建立与之对应的表   3. 将HDFS中的数据load到 Hive 数仓的ODS层的表中 1 . 使用Sqoop 将 MySQL中的数据导入到HDFS上 使用示例: mysql_to_hdfs.sh all 2021-02-01 导出的数据用lzo压缩,并

    2024年02月11日
    浏览(37)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包