[大数据 Sqoop,hive,HDFS数据操作]

这篇具有很好参考价值的文章主要介绍了[大数据 Sqoop,hive,HDFS数据操作]。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

🥗前言:

🥗实现Sqoop集成Hive,HDFS实现数据导出

🥗依赖:

🥗配置文件:

🥗代码实现:

🥗控制器调用:

🥗Linux指令导入导出:

🥗使用Sqoop将数据导入到Hive表中。例如：

🥗使用Sqoop将数据从Hive表导出到MySQL中。例如：

🥗使用Sqoop将数据导入到HDFS中。例如：

🥗使用Sqoop将数据从HDFS中导出到本地文件系统中。例如：

🥗前言:

以为是结束,原来是开始,无语

🥗实现Sqoop集成Hive,HDFS实现数据导出

🥗依赖:

<dependency>
    <groupId>org.apache.sqoop</groupId>
    <artifactId>sqoop</artifactId>
    <version>1.4.7</version>
</dependency>

<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-jdbc</artifactId>
    <version>3.1.2</version>
</dependency>

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.3.1</version>
</dependency>

🥗配置文件:

# Hive配置
hive.jdbc.driverClassName=org.apache.hive.jdbc.HiveDriver
hive.jdbc.url=jdbc:hive2://localhost:10000/default
hive.jdbc.username=hive
hive.jdbc.password=

# HDFS配置
fs.defaultFS=hdfs://localhost:9000

# Sqoop配置
sqoop.bin.path=/path/to/sqoop/bin

🥗代码实现:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.sqoop.Sqoop;
import org.apache.sqoop.tool.SqoopTool;
import org.apache.sqoop.tool.SqoopToolExport;
import org.apache.sqoop.tool.SqoopToolImport;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.jdbc.core.JdbcTemplate;
import org.springframework.stereotype.Component;

import javax.annotation.Resource;
import java.io.IOException;

@Component
public class SqoopHiveHDFSImportExport {
    @Resource
    private JdbcTemplate jdbcTemplate;

    @Value("${hive.jdbc.driverClassName}")
    private String hiveDriverClassName;

    @Value("${hive.jdbc.url}")
    private String hiveUrl;

    @Value("${hive.jdbc.username}")
    private String hiveUsername;

    @Value("${hive.jdbc.password}")
    private String hivePassword;

    @Value("${fs.defaultFS}")
    private String defaultFS;

    @Value("${sqoop.bin.path}")
    private String sqoopBinPath;

    public void importDataToHive() {
        String[] sqoopArgs = new String[]{
                "--connect", "jdbc:mysql://localhost:3306/test",
                "--username", "root",
                "--password", "password",
                "--table", "table_name",
                "--hive-import",
                "--hive-table", "hive_table_name",
                "--hive-overwrite",
                "--hive-drop-import-delims",
                "--input-fields-terminated-by", ",",
                "--input-lines-terminated-by", "\n",
                "--null-string", "\\N",
                "--null-non-string", "\\N",
                "--direct"
        };
        SqoopTool tool = SqoopTool.getTool("import");
        Sqoop sqoop = new Sqoop((SqoopToolImport) tool);
        int res = Sqoop.runSqoop(sqoop, sqoopArgs);
        if (res != 0) {
            throw new RuntimeException("Sqoop import failed");
        }
    }

    public void exportDataToHDFS() throws IOException {
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", defaultFS);
        FileSystem fs = FileSystem.get(conf);
        Path outputPath = new Path("/path/to/hdfs/directory");
        if (fs.exists(outputPath)) {
            fs.delete(outputPath, true);
        }
        String[] sqoopArgs = new String[]{
                "--connect", "jdbc:mysql://localhost:3306/test",
                "--username", "root",
                "--password", "password",
                "--table", "table_name",
                "--export-dir", outputPath.toString(),
                "--input-fields-terminated-by", ",",
                "--input-lines-terminated-by", "\n",
                "--update-key", "id",
                "--update-mode", "allowinsert",
                "--batch",
                "--direct"
        };
        SqoopTool tool = SqoopTool.getTool("export");
        Sqoop sqoop = new Sqoop((SqoopToolExport) tool);
        int res = Sqoop.runSqoop(sqoop, sqoopArgs);
        if (res != 0) {
            throw new RuntimeException("Sqoop export failed");
        }
    }

    public void importDataFromHDFS() {
        String[] sqoopArgs = new String[]{
                "--connect", hiveUrl,
                "--username", hiveUsername,
                "--password", hivePassword,
                "--table", "hive_table_name",
                "--export-dir", "/path/to/hdfs/directory",
                "--input-fields-terminated-by",",
                "--input-lines-terminated-by", "\n",
                "--null-string", "\N",
                "--null-non-string", "\N",
                "--direct"
           };
          SqoopTool tool = SqoopTool.getTool("import");
          Sqoop sqoop = new Sqoop((SqoopToolImport) tool);
          int res = Sqoop.runSqoop(sqoop, sqoopArgs);
           if (res != 0) {
            throw new RuntimeException("Sqoop import failed");
             }
        }

public void exportDataFromHive() {
    String[] sqoopArgs = new String[]{
            "--connect", "jdbc:mysql://localhost:3306/test",
            "--username", "root",
            "--password", "password",
            "--table", "table_name",
            "--export-dir", "/path/to/hdfs/directory",
            "--input-fields-terminated-by", ",",
            "--input-lines-terminated-by", "\n",
            "--update-key", "id",
            "--update-mode", "allowinsert",
            "--batch",
            "--direct"
    };
    SqoopTool tool = SqoopTool.getTool("export");
    Sqoop sqoop = new Sqoop((SqoopToolExport) tool);
    int res = Sqoop.runSqoop(sqoop, sqoopArgs);
    if (res != 0) {
        throw new RuntimeException("Sqoop export failed");
    }
}

🥗指令含义:

🥗exportDataFromHive方法:

export: 表示使用Sqoop的导出命令。
--connect: 指定MySQL的JDBC连接URL。
--username: 指定MySQL的用户名。
--password: 指定MySQL的密码。
--table: 指定要导出的MySQL表名。
--export-dir: 指定HDFS中的导出目录。
--input-fields-terminated-by: 指定输入文件的字段分隔符。
--input-lines-terminated-by: 指定输入文件的行分隔符。
--update-key: 指定更新数据时使用的主键。
--update-mode: 指定更新数据的模式，allowinsert表示允许插入新数据。
--batch: 表示启用批处理模式。
--direct: 表示使用直接模式，将数据直接写入MySQL中，而不是先写入HDFS中再导入到MySQL中。

exportDataFromHive方法中的sqoopArgs字符串中指定的--export-dir参数是HDFS中的目录，是因为在将Hive中的数据导出到MySQL时，需要将数据先导出到HDFS中，然后再使用Sqoop将数据从HDFS导入到MySQL中。这是因为Sqoop是基于Hadoop的数据传输工具，可以将数据从Hadoop生态系统中的各种数据源（如HDFS、Hive、HBase等）导入到关系型数据库中，也可以将数据从关系型数据库导出到Hadoop生态系统中的各种数据源中。

在exportDataFromHive方法中，我们使用Sqoop将Hive中的数据导出到HDFS中，然后再使用Sqoop将HDFS中的数据导入到MySQL中。这样做的好处是可以将数据存储在HDFS中，方便进行后续的数据处理和分析。同时，使用HDFS作为中间存储介质，可以提高数据传输的效率和稳定性，避免了直接将数据从Hive导出到MySQL中可能出现的网络传输问题和数据丢失问题。

🥗importDataToHive方法

import: 表示使用Sqoop的导入命令。
--connect: 指定MySQL的JDBC连接URL。
--username: 指定MySQL的用户名。
--password: 指定MySQL的密码。
--table: 指定要导入的MySQL表名。
--hive-import: 表示将数据导入到Hive中。
--hive-table: 指定Hive中的表名。
--hive-overwrite: 表示覆盖已存在的Hive表。
--hive-drop-import-delims: 表示删除导入数据中的分隔符。
--input-fields-terminated-by: 指定输入文件的字段分隔符。
--input-lines-terminated-by: 指定输入文件的行分隔符。
--null-string: 指定输入文件中的空字符串。
--null-non-string: 指定输入文件中的非字符串类型的空值。
--direct: 表示使用直接模式，将数据直接导入到Hive中。

这些指令的具体含义可以参考Sqoop的官方文档。其中，--input-fields-terminated-by和--input-lines-terminated-by指定了输入文件的字段分隔符和行分隔符，这里使用了逗号和换行符。--null-string和--null-non-string指定了输入文件中的空字符串和非字符串类型的空值，这里使用了\N表示空值。--hive-overwrite表示覆盖已存在的Hive表，--hive-drop-import-delims表示删除导入数据中的分隔符，这两个指令可以保证数据导入的正确性。

这些指令的具体含义可以参考Sqoop的官方文档。其中，--input-fields-terminated-by和--input-lines-terminated-by指定了输入文件的字段分隔符和行分隔符，这里使用了逗号和换行符。--update-key指定了更新数据时使用的主键，这里使用了id作为主键。--update-mode指定了更新数据的模式，这里使用了allowinsert表示允许插入新数据。

🥗控制器调用:

@Autowired
private SqoopHiveHDFSImportExport sqoopHiveHDFSImportExport;

public void importDataToHive() {
    sqoopHiveHDFSImportExport.importDataToHive();
}

public void exportDataToHDFS() throws IOException {
    sqoopHiveHDFSImportExport.exportDataToHDFS();
}

public void importDataFromHDFS() {
    sqoopHiveHDFSImportExport.importDataFromHDFS();
}

public void exportDataFromHive() {
    sqoopHiveHDFSImportExport.exportDataFromHive();
}

🥗Linux指令导入导出:

需要准备的工作

安装Hive和HDFS,sqoop 这些都是在java环境之上进行
编写Hive表

🥗使用Sqoop将数据导入到Hive表中。例如：

sqoop import --connect jdbc:mysql://localhost/mydatabase --username myusername \n--password mypassword --table mytable --hive-import --create-hive-table \n--hive-table mytable --hive-database mydatabase --null-string '\N' --null-non-string '\N'

🥗使用Sqoop将数据从Hive表导出到MySQL中。例如：

sqoop export --connect jdbc:mysql://localhost/mydatabase --username myusername \n--password mypassword --table mytable --export-dir /user/hive/warehouse/mydatabase.db/mytable \n--input-fields-terminated-by ',' --input-lines-terminated-by '
'

🥗使用Sqoop将数据导入到HDFS中。例如：

sqoop import --connect jdbc:mysql://localhost/mydatabase --username myusername \n--password mypassword --table mytable --target-dir /user/hive/warehouse/mydatabase.db/mytable \n--null-string '\N' --null-non-string '\N'

🥗使用Sqoop将数据从HDFS中导出到本地文件系统中。例如：

sqoop export --connect jdbc:mysql://localhost/mydatabase --username myusername \n--password mypassword --table mytable --export-dir /user/hive/warehouse/mydatabase.db/mytable \n--input-fields-terminated-by ',' --input-lines-terminated-by '
' --outdir ./ \n--bindir ./ --m 1 --direct --fetch-size 1000

该命令从HDFS的/user/hive/warehouse/mydatabase.db/mytable目录下导出数据，然后将数据导入到本地文件系统中。导出的数据文件会保存在当前目录下。文章来源地址https://www.toymoban.com/news/detail-493271.html

到了这里，关于[大数据 Sqoop,hive,HDFS数据操作]的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！