通过源代码修改使 Apache Hudi 支持 Kerberos 访问 Hive 的功能

这篇具有很好参考价值的文章主要介绍了通过源代码修改使 Apache Hudi 支持 Kerberos 访问 Hive 的功能。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

Hudi 0.10.0 Kerberos-support 适配文档

文档说明

本文档主要用于阐释如何基于 Hudi 0.10.0 添加支持 Kerberos 认证权限的功能。

主要贡献:

  1. 针对正在使用的 Hudi 源代码进行 Kerberos-support 功能扩展,总修改规模囊括了 12 个文件约 20 处代码共计 约 200 行代码;
  2. 对 Hudi 0.10.0 的源代码进行了在保持所有自定义特性的基础上,支持了基于 Kerberos 权限认证并同步 Hive 表的功能;
  3. 对此类工作大致的工作思路进行主要流程的汇总并予以实例

主要思路及操作如下所示:

  1. 根据博客 将hudi同步到配置kerberos的hive3 中的阐述添加 Kerberos 认证功能并检验其可行性
  2. 根据目前未被 Merge 到 Main Branch 的 Hudi 官方的代码 PR Hudi-2402 和 xiaozhch5 的分支 基于本地分支进行代码比对、梳理和修改
  3. 添加了从 Hudi 表同步 Hive 表时支持 Kerberos 的功能及相应各配置项
  4. 根据修改后的代码添加了 pom 文件中的各种依赖关系及配置项

分支目录

如下为进行修改的分支编号。查看 commit 内容并找到自定义分支与 Hudi 0.10.1 主分支的 commit 时间线差距。

可以定位到当前所在 commit 的 hashcode_id 为 4c65ca544,而 Hudi 0.10.1 主分支的 hashcode_id 为 84fb390e4


commit 4c65ca544b91e828462419bbc12e116bfe1dbc2c (origin/0.10.1-release-hive3-kerberos-enabled)
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Wed Mar 2 00:15:05 2022 +0800

    新增krb5.conf文件路径,模式使用/etc/krb5.conf

commit 116352beb2e028357d0ffca385dd2f11a9cef72b
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Tue Mar 1 23:30:08 2022 +0800

    添加编译命令

commit ffc26256ba4cbb52ea653551ea88d109bc26e315
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Tue Mar 1 23:14:21 2022 +0800

    适配hdp3.1.4编译,解决找不到包的问题

commit fbc53aa29e63dc5b097a3014d05f6b82cfcf2a70
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Tue Mar 1 22:20:03 2022 +0800

    [MINOR] Remove org.apache.directory.api.util.Strings import

commit 05fee3608d17abbd0217818a6bf02e4ead8f6de8
Author: xiaozhch5 <xiaozhch5@mail2.sysu.edu.cn>
Date:   Tue Mar 1 21:07:34 2022 +0800

    添加flink引擎支持将hudi同步到配置kerberos的hive3 metastore,仅针对Flink 1.13引擎,其他引擎未修改

commit 84fb390e42cbbb72d1aaf4cf8f44cd6fba049595 (tag: release-0.10.1, origin/release-0.10.1)
Author: sivabalan <n.siva.b@gmail.com>
Date:   Tue Jan 25 20:15:31 2022 -0500

    [MINOR] Update release version to reflect published version 0.10.1

对比两版本之间的修改信息

在定位到 commit 版本差异后,使用 git diff 命令比对代码内容差异,并将对比结果输出至文件。

具体命令为 git diff 4c65ca544 84fb390e4 >> commit.diff

diff --git a/compile-command.sh b/compile-command.sh
deleted file mode 100644
index c5536c86c..000000000
--- a/compile-command.sh
+++ /dev/null
@@ -1,9 +0,0 @@
-mvn clean install -DskipTests \
--Dhadoop.version=3.1.1.3.1.4.0-315 \
--Dhive.version=3.1.0.3.1.4.0-315 \
--Dscala.version=2.12.10 \
--Dscala.binary.version=2.12 \
--Dspark.version=3.0.1 \
--Dflink.version=1.13.5 \
--Pflink-bundle-shade-hive3 \
--Pspark3
\ No newline at end of file
diff --git a/hudi-aws/pom.xml b/hudi-aws/pom.xml
index 8c7f6dc73..d853690c0 100644
--- a/hudi-aws/pom.xml
+++ b/hudi-aws/pom.xml
@@ -116,12 +116,6 @@
             <artifactId>mockito-junit-jupiter</artifactId>
             <scope>test</scope>
         </dependency>
-
-        <dependency>
-            <groupId>com.google.code.findbugs</groupId>
-            <artifactId>jsr305</artifactId>
-            <version>3.0.0</version>
-        </dependency>
     </dependencies>
 
     <build>
diff --git a/hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java b/hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
index b7d6adf38..1968ef422 100644
--- a/hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
+++ b/hudi-common/src/test/java/org/apache/hudi/common/testutils/FileCreateUtils.java
@@ -19,6 +19,7 @@
 
 package org.apache.hudi.common.testutils;
 
+import org.apache.directory.api.util.Strings;
 import org.apache.hudi.avro.model.HoodieCleanMetadata;
 import org.apache.hudi.avro.model.HoodieCleanerPlan;
 import org.apache.hudi.avro.model.HoodieCompactionPlan;
@@ -72,8 +73,6 @@ public class FileCreateUtils {
 
   private static final String WRITE_TOKEN = "1-0-1";
   private static final String BASE_FILE_EXTENSION = HoodieTableConfig.BASE_FILE_FORMAT.defaultValue().getFileExtension();
-  /** An empty byte array */
-  public static final byte[] EMPTY_BYTES = new byte[0];
 
   public static String baseFileName(String instantTime, String fileId) {
     return baseFileName(instantTime, fileId, BASE_FILE_EXTENSION);
@@ -222,7 +221,7 @@ public class FileCreateUtils {
   }
 
   public static void createCleanFile(String basePath, String instantTime, HoodieCleanMetadata metadata, boolean isEmpty) throws IOException {
-    createMetaFile(basePath, instantTime, HoodieTimeline.CLEAN_EXTENSION, isEmpty ? EMPTY_BYTES : serializeCleanMetadata(metadata).get());
+    createMetaFile(basePath, instantTime, HoodieTimeline.CLEAN_EXTENSION, isEmpty ? Strings.EMPTY_BYTES : serializeCleanMetadata(metadata).get());
   }
 
   public static void createRequestedCleanFile(String basePath, String instantTime, HoodieCleanerPlan cleanerPlan) throws IOException {
@@ -230,7 +229,7 @@ public class FileCreateUtils {
   }
 
   public static void createRequestedCleanFile(String basePath, String instantTime, HoodieCleanerPlan cleanerPlan, boolean isEmpty) throws IOException {
-    createMetaFile(basePath, instantTime, HoodieTimeline.REQUESTED_CLEAN_EXTENSION, isEmpty ? EMPTY_BYTES : serializeCleanerPlan(cleanerPlan).get());
+    createMetaFile(basePath, instantTime, HoodieTimeline.REQUESTED_CLEAN_EXTENSION, isEmpty ? Strings.EMPTY_BYTES : serializeCleanerPlan(cleanerPlan).get());
   }
 
   public static void createInflightCleanFile(String basePath, String instantTime, HoodieCleanerPlan cleanerPlan) throws IOException {
@@ -238,7 +237,7 @@ public class FileCreateUtils {
   }
 
   public static void createInflightCleanFile(String basePath, String instantTime, HoodieCleanerPlan cleanerPlan, boolean isEmpty) throws IOException {
-    createMetaFile(basePath, instantTime, HoodieTimeline.INFLIGHT_CLEAN_EXTENSION, isEmpty ? EMPTY_BYTES : serializeCleanerPlan(cleanerPlan).get());
+    createMetaFile(basePath, instantTime, HoodieTimeline.INFLIGHT_CLEAN_EXTENSION, isEmpty ? Strings.EMPTY_BYTES : serializeCleanerPlan(cleanerPlan).get());
   }
 
   public static void createRequestedRollbackFile(String basePath, String instantTime, HoodieRollbackPlan plan) throws IOException {
@@ -250,7 +249,7 @@ public class FileCreateUtils {
   }
 
   public static void createRollbackFile(String basePath, String instantTime, HoodieRollbackMetadata hoodieRollbackMetadata, boolean isEmpty) throws IOException {
-    createMetaFile(basePath, instantTime, HoodieTimeline.ROLLBACK_EXTENSION, isEmpty ? EMPTY_BYTES : serializeRollbackMetadata(hoodieRollbackMetadata).get());
+    createMetaFile(basePath, instantTime, HoodieTimeline.ROLLBACK_EXTENSION, isEmpty ? Strings.EMPTY_BYTES : serializeRollbackMetadata(hoodieRollbackMetadata).get());
   }
 
   public static void createRestoreFile(String basePath, String instantTime, HoodieRestoreMetadata hoodieRestoreMetadata) throws IOException {
diff --git a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
index 621aea3d2..77c3f15e5 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
@@ -653,36 +653,6 @@ public class FlinkOptions extends HoodieConfig {
       .withDescription("INT64 with original type TIMESTAMP_MICROS is converted to hive timestamp type.\n"
           + "Disabled by default for backward compatibility.");
 
-  public static final ConfigOption<Boolean> HIVE_SYNC_KERBEROS_ENABLE = ConfigOptions
-      .key("hive_sync.kerberos.enable")
-      .booleanType()
-      .defaultValue(false)
-      .withDescription("Whether hive is configured with kerberos");
-
-  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KRB5CONF = ConfigOptions
-      .key("hive_sync.kerberos.krb5.conf")
-      .stringType()
-      .defaultValue("")
-      .withDescription("kerberos krb5.conf file path");
-
-  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_PRINCIPAL = ConfigOptions
-      .key("hive_sync.kerberos.principal")
-      .stringType()
-      .defaultValue("")
-      .withDescription("hive metastore kerberos principal");
-
-  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KEYTAB_FILE = ConfigOptions
-      .key("hive_sync.kerberos.keytab.file")
-      .stringType()
-      .defaultValue("")
-      .withDescription("Hive metastore keytab file path");
-
-  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KEYTAB_NAME = ConfigOptions
-      .key("hive_sync.kerberos.keytab.name")
-      .stringType()
-      .defaultValue("")
-      .withDescription("Hive metastore keytab file name");
-
   // -------------------------------------------------------------------------
   //  Utilities
   // -------------------------------------------------------------------------
diff --git a/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java b/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
index bedc20f9b..1c051c8cd 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
@@ -86,11 +86,6 @@ public class HiveSyncContext {
     hiveSyncConfig.skipROSuffix = conf.getBoolean(FlinkOptions.HIVE_SYNC_SKIP_RO_SUFFIX);
     hiveSyncConfig.assumeDatePartitioning = conf.getBoolean(FlinkOptions.HIVE_SYNC_ASSUME_DATE_PARTITION);
     hiveSyncConfig.withOperationField = conf.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
-    hiveSyncConfig.enableKerberos = conf.getBoolean(FlinkOptions.HIVE_SYNC_KERBEROS_ENABLE);
-    hiveSyncConfig.krb5Conf = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KRB5CONF);
-    hiveSyncConfig.principal = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_PRINCIPAL);
-    hiveSyncConfig.keytabFile = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KEYTAB_FILE);
-    hiveSyncConfig.keytabName = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KEYTAB_NAME);
     return hiveSyncConfig;
   }
 }
diff --git a/hudi-hadoop-mr/pom.xml b/hudi-hadoop-mr/pom.xml
index ef0ea945a..7283d74f0 100644
--- a/hudi-hadoop-mr/pom.xml
+++ b/hudi-hadoop-mr/pom.xml
@@ -67,17 +67,6 @@
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-jdbc</artifactId>
-      <exclusions>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
-      </exclusions>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
     </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
index 2701820b8..9b6385120 100644
--- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
+++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
@@ -123,21 +123,6 @@ public class HiveSyncConfig implements Serializable {
   @Parameter(names = {"--conditional-sync"}, description = "If true, only sync on conditions like schema change or partition change.")
   public Boolean isConditionalSync = false;
 
-  @Parameter(names = {"--enable-kerberos"}, description = "Whether hive configs kerberos")
-  public Boolean enableKerberos = false;
-
-  @Parameter(names = {"--krb5-conf"}, description = "krb5.conf file path")
-  public String krb5Conf = "/etc/krb5.conf";
-
-  @Parameter(names = {"--principal"}, description = "hive metastore principal")
-  public String principal = "hive/_HOST@EXAMPLE.COM";
-
-  @Parameter(names = {"--keytab-file"}, description = "hive metastore keytab file path")
-  public String keytabFile;
-
-  @Parameter(names = {"--keytab-name"}, description = "hive metastore keytab name")
-  public String keytabName;
-
   // enhance the similar function in child class
   public static HiveSyncConfig copy(HiveSyncConfig cfg) {
     HiveSyncConfig newConfig = new HiveSyncConfig();
@@ -162,11 +147,6 @@ public class HiveSyncConfig implements Serializable {
     newConfig.sparkSchemaLengthThreshold = cfg.sparkSchemaLengthThreshold;
     newConfig.withOperationField = cfg.withOperationField;
     newConfig.isConditionalSync = cfg.isConditionalSync;
-    newConfig.enableKerberos = cfg.enableKerberos;
-    newConfig.krb5Conf = cfg.krb5Conf;
-    newConfig.principal = cfg.principal;
-    newConfig.keytabFile = cfg.keytabFile;
-    newConfig.keytabName = cfg.keytabName;
     return newConfig;
   }
 
@@ -199,11 +179,6 @@ public class HiveSyncConfig implements Serializable {
       + ", sparkSchemaLengthThreshold=" + sparkSchemaLengthThreshold
       + ", withOperationField=" + withOperationField
       + ", isConditionalSync=" + isConditionalSync
-      + ", enableKerberos=" + enableKerberos
-      + ", krb5Conf=" + krb5Conf
-      + ", principal=" + principal
-      + ", keytabFile=" + keytabFile
-      + ", keytabName=" + keytabName
       + '}';
   }
 }
diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
index 56553f1ed..b37b28ed2 100644
--- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
+++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
@@ -23,7 +23,6 @@ import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.api.Partition;
-import org.apache.hadoop.security.UserGroupInformation;
 import org.apache.hudi.common.fs.FSUtils;
 import org.apache.hudi.common.model.HoodieFileFormat;
 import org.apache.hudi.common.model.HoodieTableType;
@@ -44,7 +43,6 @@ import org.apache.parquet.schema.MessageType;
 import org.apache.parquet.schema.PrimitiveType;
 import org.apache.parquet.schema.Type;
 
-import java.io.IOException;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
@@ -77,20 +75,8 @@ public class HiveSyncTool extends AbstractSyncTool {
     super(configuration.getAllProperties(), fs);
 
     try {
-      if (cfg.enableKerberos) {
-        System.setProperty("java.security.krb5.conf", cfg.krb5Conf);
-        Configuration conf = new Configuration();
-        conf.set("hadoop.security.authentication", "kerberos");
-        conf.set("kerberos.principal", cfg.principal);
-        UserGroupInformation.setConfiguration(conf);
-        UserGroupInformation.loginUserFromKeytab(cfg.keytabName, cfg.keytabFile);
-        configuration.set(HiveConf.ConfVars.METASTORE_USE_THRIFT_SASL.varname, "true");
-        configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL.varname, cfg.principal);
-        configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_KEYTAB_FILE.varname, cfg.keytabFile);
-      }
-
       this.hoodieHiveClient = new HoodieHiveClient(cfg, configuration, fs);
-    } catch (RuntimeException | IOException e) {
+    } catch (RuntimeException e) {
       if (cfg.ignoreExceptions) {
         LOG.error("Got runtime exception when hive syncing, but continuing as ignoreExceptions config is set ", e);
       } else {
diff --git a/hudi-utilities/pom.xml b/hudi-utilities/pom.xml
index ad32458d2..474e0499d 100644
--- a/hudi-utilities/pom.xml
+++ b/hudi-utilities/pom.xml
@@ -352,18 +352,8 @@
           <groupId>org.eclipse.jetty.orbit</groupId>
           <artifactId>javax.servlet</artifactId>
         </exclusion>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
       </exclusions>
     </dependency>
-
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-service</artifactId>
diff --git a/packaging/hudi-flink-bundle/pom.xml b/packaging/hudi-flink-bundle/pom.xml
index c4ee23017..640c71d68 100644
--- a/packaging/hudi-flink-bundle/pom.xml
+++ b/packaging/hudi-flink-bundle/pom.xml
@@ -138,7 +138,6 @@
                   <include>org.apache.hive:hive-service-rpc</include>
                   <include>org.apache.hive:hive-exec</include>
                   <include>org.apache.hive:hive-metastore</include>
-                  <include>org.apache.hive:hive-standalone-metastore</include>
                   <include>org.apache.hive:hive-jdbc</include>
                   <include>org.datanucleus:datanucleus-core</include>
                   <include>org.datanucleus:datanucleus-api-jdo</include>
@@ -445,7 +444,6 @@
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-exec</artifactId>
       <version>${hive.version}</version>
-      <scope>${flink.bundle.hive.scope}</scope>
     </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
@@ -489,17 +487,8 @@
           <groupId>org.eclipse.jetty</groupId>
           <artifactId>*</artifactId>
         </exclusion>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
       </exclusions>
     </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-common</artifactId>
@@ -694,12 +683,6 @@
           <version>${hive.version}</version>
           <scope>${flink.bundle.hive.scope}</scope>
         </dependency>
-        <dependency>
-          <groupId>org.apache.hive</groupId>
-          <artifactId>hive-standalone-metastore</artifactId>
-          <version>${hive.version}</version>
-          <scope>${flink.bundle.hive.scope}</scope>
-        </dependency>
       </dependencies>
     </profile>
   </profiles>
diff --git a/packaging/hudi-integ-test-bundle/pom.xml b/packaging/hudi-integ-test-bundle/pom.xml
index ee2605de3..30704c8c9 100644
--- a/packaging/hudi-integ-test-bundle/pom.xml
+++ b/packaging/hudi-integ-test-bundle/pom.xml
@@ -408,10 +408,6 @@
           <groupId>org.pentaho</groupId>
           <artifactId>*</artifactId>
         </exclusion>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
       </exclusions>
     </dependency>
 
@@ -428,19 +424,9 @@
           <groupId>javax.servlet</groupId>
           <artifactId>servlet-api</artifactId>
         </exclusion>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
       </exclusions>
     </dependency>
 
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
-    </dependency>
-
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-common</artifactId>
diff --git a/packaging/hudi-kafka-connect-bundle/pom.xml b/packaging/hudi-kafka-connect-bundle/pom.xml
index d2cc84df7..bf395a411 100644
--- a/packaging/hudi-kafka-connect-bundle/pom.xml
+++ b/packaging/hudi-kafka-connect-bundle/pom.xml
@@ -306,17 +306,6 @@
             <artifactId>hive-jdbc</artifactId>
             <version>${hive.version}</version>
             <scope>${utilities.bundle.hive.scope}</scope>
-            <exclusions>
-                <exclusion>
-                    <groupId>org.apache.hadoop</groupId>
-                    <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-                </exclusion>
-            </exclusions>
-        </dependency>
-        <dependency>
-            <groupId>org.apache.hadoop</groupId>
-            <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-            <version>${hadoop.version}</version>
         </dependency>
 
         <dependency>
diff --git a/packaging/hudi-spark-bundle/pom.xml b/packaging/hudi-spark-bundle/pom.xml
index 44f424540..d8d1a1d2d 100644
--- a/packaging/hudi-spark-bundle/pom.xml
+++ b/packaging/hudi-spark-bundle/pom.xml
@@ -289,17 +289,6 @@
       <artifactId>hive-jdbc</artifactId>
       <version>${hive.version}</version>
       <scope>${spark.bundle.hive.scope}</scope>
-      <exclusions>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
-      </exclusions>
-    </dependency>
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
     </dependency>
 
     <dependency>
diff --git a/packaging/hudi-utilities-bundle/pom.xml b/packaging/hudi-utilities-bundle/pom.xml
index 9384c4f01..360e8c7f1 100644
--- a/packaging/hudi-utilities-bundle/pom.xml
+++ b/packaging/hudi-utilities-bundle/pom.xml
@@ -308,18 +308,6 @@
       <artifactId>hive-jdbc</artifactId>
       <version>${hive.version}</version>
       <scope>${utilities.bundle.hive.scope}</scope>
-      <exclusions>
-        <exclusion>
-          <groupId>org.apache.hadoop</groupId>
-          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-        </exclusion>
-      </exclusions>
-    </dependency>
-
-    <dependency>
-      <groupId>org.apache.hadoop</groupId>
-      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
-      <version>${hadoop.version}</version>
     </dependency>
 
     <dependency>
diff --git a/pom.xml b/pom.xml
index 36aed4785..470f7db2d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1164,10 +1164,6 @@
       <id>confluent</id>
       <url>https://packages.confluent.io/maven/</url>
     </repository>
-    <repository>
-      <id>hdp</id>
-      <url>https://repo.hortonworks.com/content/repositories/releases/</url>
-    </repository>
   </repositories>
 
   <profiles>

代码阅读及理解

针对上述进行修改的各个 java 文件及相应类或方法,详细阅读并理解其整体代码结构及思维逻辑,并针对改动理解其改动意义。

针对正在使用的 Hudi 版本进行代码修改

针对我们正在使用的 Hudi 源代码进行 Kerberos-support 功能扩展,总修改规模囊括了 12 个文件约 20 处代码共计 约 200 行代码。

具体修改结果如下:

diff --git a/README.md b/README.md
index 2b32591..f31070b 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,17 @@
 
 # 寮€鍙戞棩蹇? 
+## November/28th/2022
+
+1. 鏍规嵁鍗氬 [灏唄udi鍚屾鍒伴厤缃甼erberos鐨刪ive3](https://cloud.tencent.com/developer/article/1949358) 涓殑闃愯堪娣诲姞 Kerberos 璁よ瘉鍔熻兘骞舵楠屽叾鍙鎬?+2. 鏍规嵁鐩墠鏈 Merge 鍒?Main Branch 鐨?Hudi 瀹樻柟鐨勪唬鐮?PR [Hudi-2402](https://github.com/apache/hudi/pull/3771) 鍜?[xiaozhch5 鐨勫垎鏀痌(https://github.com/xiaozhch5/hudi/tree/0.10.1-release-hive3-kerberos-enabled) 鍩轰簬鏈湴鍒嗘敮杩涜浠g爜姣斿銆佹⒊鐞嗗拰淇敼
+3. 娣诲姞浜嗕粠 Hudi 琛ㄥ悓姝?Hive 琛ㄦ椂鏀寔 Kerberos 鐨勫姛鑳藉強鐩稿簲鍚勯厤缃」
+4. 鏍规嵁淇敼鍚庣殑浠g爜娣诲姞浜?pom 鏂囦欢涓殑鍚勭渚濊禆鍏崇郴鍙婇厤缃」
+
+Ps: 鏈浣跨敤鐨勭紪璇戝懡浠や负 `mvn clean install -^DskipTests -^Dcheckstyle.skip=true -^Dmaven.test.skip=true -^DskipITs -^Dhadoop.version=3.0.0-cdh6.3.2 -^Dhive.version=3.1.2 -^Dscala.version=2.12.10 -^Dscala.binary.version=2.12 -^Dflink.version=1.13.2 -^Pflink-bundle-shade-hive3`
+
+// Ps: 鏈浣跨敤鐨勭紪璇戝懡浠や负 `mvn clean install -^DskipTests -^Dmaven.test.skip=true -^DskipITs -^Dcheckstyle.skip=true -^Drat.skip=true -^Dhadoop.version=3.0.0-cdh6.3.2  -^Pflink-bundle-shade-hive2 -^Dscala-2.12 -^Pspark-shade-unbundle-avro`
+
 ## August/2nd/2022
 
 1. 淇敼浜?Hudi 涓?Flink 鏁版嵁娌夐檷鐩稿叧鐨勪富瑕佽繍琛屾祦绋嬶紝骞舵坊鍔犱簡璇稿鐢ㄤ簬杈呭姪鏂板姛鑳藉疄鐜扮殑绫诲睘鎬с€佺被鏂规硶鍜屽姛鑳藉嚱鏁帮紱
diff --git a/hudi-aws/pom.xml b/hudi-aws/pom.xml
index 34114fc..636b29c 100644
--- a/hudi-aws/pom.xml
+++ b/hudi-aws/pom.xml
@@ -116,6 +116,13 @@
             <artifactId>mockito-junit-jupiter</artifactId>
             <scope>test</scope>
         </dependency>
+
+        <dependency>
+            <groupId>com.google.code.findbugs</groupId>
+            <artifactId>jsr305</artifactId>
+            <version>3.0.0</version>
+        </dependency>
+
     </dependencies>
 
     <build>
diff --git a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
index e704a34..413e9ed 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/configuration/FlinkOptions.java
@@ -653,6 +653,40 @@ public class FlinkOptions extends HoodieConfig {
       .withDescription("INT64 with original type TIMESTAMP_MICROS is converted to hive timestamp type.\n"
           + "Disabled by default for backward compatibility.");
 
+  // ------------------------------------------------------------------------
+  //  Kerberos Related Options
+  // ------------------------------------------------------------------------
+
+  public static final ConfigOption<Boolean> HIVE_SYNC_KERBEROS_ENABLE = ConfigOptions
+          .key("hive_sync.kerberos.enable")
+          .booleanType()
+          .defaultValue(false)
+          .withDescription("Whether hive is configured with kerberos");
+
+  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KRB5CONF = ConfigOptions
+          .key("hive_sync.kerberos.krb5.conf")
+          .stringType()
+          .defaultValue("")
+          .withDescription("kerberos krb5.conf file path");
+
+  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_PRINCIPAL = ConfigOptions
+          .key("hive_sync.kerberos.principal")
+          .stringType()
+          .defaultValue("")
+          .withDescription("hive metastore kerberos principal");
+
+  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KEYTAB_FILE = ConfigOptions
+          .key("hive_sync.kerberos.keytab.file")
+          .stringType()
+          .defaultValue("")
+          .withDescription("Hive metastore keytab file path");
+
+  public static final ConfigOption<String> HIVE_SYNC_KERBEROS_KEYTAB_NAME = ConfigOptions
+          .key("hive_sync.kerberos.keytab.name")
+          .stringType()
+          .defaultValue("")
+          .withDescription("Hive metastore keytab file name");
+
   // ------------------------------------------------------------------------
   //  Custom Flush related logic
   // ------------------------------------------------------------------------
diff --git a/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java b/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
index 1c051c8..a1e1da3 100644
--- a/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
+++ b/hudi-flink/src/main/java/org/apache/hudi/sink/utils/HiveSyncContext.java
@@ -86,6 +86,13 @@ public class HiveSyncContext {
     hiveSyncConfig.skipROSuffix = conf.getBoolean(FlinkOptions.HIVE_SYNC_SKIP_RO_SUFFIX);
     hiveSyncConfig.assumeDatePartitioning = conf.getBoolean(FlinkOptions.HIVE_SYNC_ASSUME_DATE_PARTITION);
     hiveSyncConfig.withOperationField = conf.getBoolean(FlinkOptions.CHANGELOG_ENABLED);
+    // Kerberos Related Configurations
+    hiveSyncConfig.enableKerberos = conf.getBoolean(FlinkOptions.HIVE_SYNC_KERBEROS_ENABLE);
+    hiveSyncConfig.krb5Conf = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KRB5CONF);
+    hiveSyncConfig.principal = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_PRINCIPAL);
+    hiveSyncConfig.keytabFile = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KEYTAB_FILE);
+    hiveSyncConfig.keytabName = conf.getString(FlinkOptions.HIVE_SYNC_KERBEROS_KEYTAB_NAME);
+    // Kerberos Configs END
     return hiveSyncConfig;
   }
 }
diff --git a/hudi-hadoop-mr/pom.xml b/hudi-hadoop-mr/pom.xml
index df2a23b..e61dbd4 100644
--- a/hudi-hadoop-mr/pom.xml
+++ b/hudi-hadoop-mr/pom.xml
@@ -67,6 +67,17 @@
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-jdbc</artifactId>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
     </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
index 9b63851..624300f 100644
--- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
+++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
@@ -123,6 +123,22 @@ public class HiveSyncConfig implements Serializable {
   @Parameter(names = {"--conditional-sync"}, description = "If true, only sync on conditions like schema change or partition change.")
   public Boolean isConditionalSync = false;
 
+  // Kerberos Related Configuration
+  @Parameter(names = {"--enable-kerberos"}, description = "Whether hive configs kerberos")
+  public Boolean enableKerberos = false;
+
+  @Parameter(names = {"--krb5-conf"}, description = "krb5.conf file path")
+  public String krb5Conf = "/etc/krb5.conf";
+
+  @Parameter(names = {"--principal"}, description = "hive metastore principal")
+  public String principal = "hive/_HOST@EXAMPLE.COM";
+
+  @Parameter(names = {"--keytab-file"}, description = "hive metastore keytab file path")
+  public String keytabFile;
+
+  @Parameter(names = {"--keytab-name"}, description = "hive metastore keytab name")
+  public String keytabName;
+
   // enhance the similar function in child class
   public static HiveSyncConfig copy(HiveSyncConfig cfg) {
     HiveSyncConfig newConfig = new HiveSyncConfig();
@@ -147,6 +163,13 @@ public class HiveSyncConfig implements Serializable {
     newConfig.sparkSchemaLengthThreshold = cfg.sparkSchemaLengthThreshold;
     newConfig.withOperationField = cfg.withOperationField;
     newConfig.isConditionalSync = cfg.isConditionalSync;
+    // Kerberos Related Configs
+    newConfig.enableKerberos = cfg.enableKerberos;
+    newConfig.krb5Conf = cfg.krb5Conf;
+    newConfig.principal = cfg.principal;
+    newConfig.keytabFile = cfg.keytabFile;
+    newConfig.keytabName = cfg.keytabName;
+    // Kerberos Related Configs END
     return newConfig;
   }
 
diff --git a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
index 3bbaee1..2fa0e86 100644
--- a/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
+++ b/hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
@@ -38,6 +38,7 @@ import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.hive.conf.HiveConf;
 import org.apache.hadoop.hive.metastore.api.Partition;
+import org.apache.hadoop.security.UserGroupInformation;
 import org.apache.log4j.LogManager;
 import org.apache.log4j.Logger;
 import org.apache.parquet.schema.GroupType;
@@ -45,6 +46,7 @@ import org.apache.parquet.schema.MessageType;
 import org.apache.parquet.schema.PrimitiveType;
 import org.apache.parquet.schema.Type;
 
+import java.io.IOException;
 import java.util.ArrayList;
 import java.util.HashMap;
 import java.util.List;
@@ -77,8 +79,23 @@ public class HiveSyncTool extends AbstractSyncTool {
     super(configuration.getAllProperties(), fs);
 
     try {
+
+      // Start Kerberos Processing Logic
+      if (cfg.enableKerberos) {
+        System.setProperty("java.security.krb5.conf", cfg.krb5Conf);
+        Configuration conf = new Configuration();
+        conf.set("hadoop.security.authentication", "kerberos");
+        conf.set("kerberos.principal", cfg.principal);
+        UserGroupInformation.setConfiguration(conf);
+        UserGroupInformation.loginUserFromKeytab(cfg.keytabName, cfg.keytabFile);
+        configuration.set(HiveConf.ConfVars.METASTORE_USE_THRIFT_SASL.varname, "true");
+        configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_PRINCIPAL.varname, cfg.principal);
+        configuration.set(HiveConf.ConfVars.METASTORE_KERBEROS_KEYTAB_FILE.varname, cfg.keytabFile);
+      }
+
       this.hoodieHiveClient = new HoodieHiveClient(cfg, configuration, fs);
-    } catch (RuntimeException e) {
+    } catch (RuntimeException | IOException e) {
+      // Support IOException e
       if (cfg.ignoreExceptions) {
         LOG.error("Got runtime exception when hive syncing, but continuing as ignoreExceptions config is set ", e);
       } else {
diff --git a/hudi-utilities/pom.xml b/hudi-utilities/pom.xml
index 470ad47..5b95ffb 100644
--- a/hudi-utilities/pom.xml
+++ b/hudi-utilities/pom.xml
@@ -352,8 +352,19 @@
           <groupId>org.eclipse.jetty.orbit</groupId>
           <artifactId>javax.servlet</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
       </exclusions>
     </dependency>
+
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
+    </dependency>
+
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-service</artifactId>
diff --git a/packaging/hudi-flink-bundle/pom.xml b/packaging/hudi-flink-bundle/pom.xml
index fc8d183..27b52d3 100644
--- a/packaging/hudi-flink-bundle/pom.xml
+++ b/packaging/hudi-flink-bundle/pom.xml
@@ -139,6 +139,7 @@
                   <include>org.apache.hive:hive-service-rpc</include>
                   <include>org.apache.hive:hive-exec</include>
                   <include>org.apache.hive:hive-metastore</include>
+                  <include>org.apache.hive:hive-standalone-metastore</include>
                   <include>org.apache.hive:hive-jdbc</include>
                   <include>org.datanucleus:datanucleus-core</include>
                   <include>org.datanucleus:datanucleus-api-jdo</include>
@@ -442,6 +443,7 @@
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-exec</artifactId>
       <version>${hive.version}</version>
+      <scope>${flink.bundle.hive.scope}</scope>
       <exclusions>
         <exclusion>
           <groupId>javax.mail</groupId>
@@ -503,8 +505,17 @@
           <groupId>org.eclipse.jetty</groupId>
           <artifactId>*</artifactId>
         </exclusion>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
       </exclusions>
     </dependency>
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
+    </dependency>
     <dependency>
       <groupId>${hive.groupid}</groupId>
       <artifactId>hive-common</artifactId>
@@ -706,6 +717,12 @@
           <version>${hive.version}</version>
           <scope>${flink.bundle.hive.scope}</scope>
         </dependency>
+        <dependency>
+          <groupId>org.apache.hive</groupId>
+          <artifactId>hive-standalone-metastore</artifactId>
+          <version>${hive.version}</version>
+          <scope>${flink.bundle.hive.scope}</scope>
+        </dependency>
       </dependencies>
     </profile>
   </profiles>
diff --git a/packaging/hudi-kafka-connect-bundle/pom.xml b/packaging/hudi-kafka-connect-bundle/pom.xml
index d5f90db..8d1e1a4 100644
--- a/packaging/hudi-kafka-connect-bundle/pom.xml
+++ b/packaging/hudi-kafka-connect-bundle/pom.xml
@@ -306,6 +306,18 @@
             <artifactId>hive-jdbc</artifactId>
             <version>${hive.version}</version>
             <scope>${utilities.bundle.hive.scope}</scope>
+            <exclusions>
+                <exclusion>
+                    <groupId>org.apache.hadoop</groupId>
+                    <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+                </exclusion>
+            </exclusions>
+        </dependency>
+
+        <dependency>
+            <groupId>org.apache.hadoop</groupId>
+            <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+            <version>${hadoop.version}</version>
         </dependency>
 
         <dependency>
diff --git a/packaging/hudi-spark-bundle/pom.xml b/packaging/hudi-spark-bundle/pom.xml
index 3544e31..8dd216f 100644
--- a/packaging/hudi-spark-bundle/pom.xml
+++ b/packaging/hudi-spark-bundle/pom.xml
@@ -293,6 +293,18 @@
       <artifactId>hive-jdbc</artifactId>
       <version>${hive.version}</version>
       <scope>${spark.bundle.hive.scope}</scope>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>
+
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
     </dependency>
 
     <dependency>
diff --git a/packaging/hudi-utilities-bundle/pom.xml b/packaging/hudi-utilities-bundle/pom.xml
index a3da0a8..d5e944a 100644
--- a/packaging/hudi-utilities-bundle/pom.xml
+++ b/packaging/hudi-utilities-bundle/pom.xml
@@ -312,6 +312,18 @@
       <artifactId>hive-jdbc</artifactId>
       <version>${hive.version}</version>
       <scope>${utilities.bundle.hive.scope}</scope>
+      <exclusions>
+        <exclusion>
+          <groupId>org.apache.hadoop</groupId>
+          <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+        </exclusion>
+      </exclusions>
+    </dependency>
+
+    <dependency>
+      <groupId>org.apache.hadoop</groupId>
+      <artifactId>hadoop-yarn-server-resourcemanager</artifactId>
+      <version>${hadoop.version}</version>
     </dependency>
 
     <dependency>
diff --git a/pom.xml b/pom.xml
index 58f6130..ff760bb 100644
--- a/pom.xml
+++ b/pom.xml
@@ -44,20 +44,20 @@
     <module>hudi-timeline-service</module>
     <module>hudi-utilities</module>
     <module>hudi-sync</module>
-    <!--<module>packaging/hudi-hadoop-mr-bundle</module>-->
-    <!--<module>packaging/hudi-hive-sync-bundle</module>-->
-    <!--<module>packaging/hudi-spark-bundle</module>-->
-    <!--<module>packaging/hudi-presto-bundle</module>-->
-    <!--<module>packaging/hudi-utilities-bundle</module>-->
-    <!--<module>packaging/hudi-timeline-server-bundle</module>-->
-    <!--<module>docker/hoodie/hadoop</module>-->
-    <!--<module>hudi-integ-test</module>-->
-    <!--<module>packaging/hudi-integ-test-bundle</module>-->
-    <!--<module>hudi-examples</module>-->
+    <module>packaging/hudi-hadoop-mr-bundle</module>
+    <module>packaging/hudi-hive-sync-bundle</module>
+    <module>packaging/hudi-spark-bundle</module>
+    <module>packaging/hudi-presto-bundle</module>
+    <module>packaging/hudi-utilities-bundle</module>
+    <module>packaging/hudi-timeline-server-bundle</module>
+    <module>docker/hoodie/hadoop</module>
+<!--    <module>hudi-integ-test</module>-->
+<!--    <module>packaging/hudi-integ-test-bundle</module>-->
+    <module>hudi-examples</module>
     <module>hudi-flink</module>
     <module>hudi-kafka-connect</module>
     <module>packaging/hudi-flink-bundle</module>
-    <!--<module>packaging/hudi-kafka-connect-bundle</module>-->
+    <module>packaging/hudi-kafka-connect-bundle</module>
   </modules>
 
   <licenses>
@@ -1084,6 +1084,10 @@
       <id>confluent</id>
       <url>https://packages.confluent.io/maven/</url>
     </repository>
+    <repository>
+      <id>hdp</id>
+      <url>https://repo.hortonworks.com/content/repositories/releases/</url>
+    </repository>
   </repositories>
 
   <profiles>


编译命令

使用如下命令进行编译,并将编译出的包放置到集群的相应位置。

编译命令如下:

mvn clean install -^DskipTests -^Dcheckstyle.skip=true -^Dmaven.test.skip=true -^DskipITs -^Dhadoop.version=3.0.0-cdh6.3.2 -^Dhive.version=3.1.2 -^Dscala.version=2.12.10 -^Dscala.binary.version=2.12 -^Dflink.version=1.13.2 -^Pflink-bundle-shade-hive3

与 Hudi, Flink 和 Hive 相关的 Jar 包如下所示。共计三个:

packaging/hudi-hive-sync-bundle/target/hudi-hive-sync-bundle-0.10.0.jar
packaging/hudi-hadoop-mr-bundle/target/hudi-hadoop-mr-bundle-0.10.0.jar
packaging/hudi-flink-bundle/target/hudi-flink-bundle_2.12-0.10.0.jar

环境部署

分别放置于集群环境的如下位置:文章来源地址https://www.toymoban.com/news/detail-621793.html

  1. 于 $HIVE_HOME/auxlib 中放置 hudi-hive-sync-bundle-0.10.0.jarhudi-hadoop-mr-bundle-0.10.0.jar
cd $HIVE_HOME/auxlib
ls ./
hudi-hive-sync-bundle-0.10.0.jar
hudi-hadoop-mr-bundle-0.10.0.jar
  1. 于 $FLINK_HOME/lib 中放置 hudi-flink-bundle_2.12-0.10.0.jar
cd  $FLINK_HOME/lib
ls ./
...
hudi-flink-bundle_2.12-0.10.0.jar
...

到了这里,关于通过源代码修改使 Apache Hudi 支持 Kerberos 访问 Hive 的功能的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • Mac上传项目源代码到GitHub的修改更新

    最近在学习把代码上传到github,不得不说,真的还挺方便 这是一个关于怎样更新项目代码的教程。 首先,在本地终端命令行打开至项目文件下 第一步:查看当前的git仓库状态,可以使用git status git status 第二步:更新项目文件 git add * 第三步:输入git commit -m “更新说明”

    2024年02月12日
    浏览(39)
  • Java版本+企业电子招投标系统源代码+支持二开+Spring cloud

    功能模块: 待办消息,招标公告,中标公告,信息发布 描述: 全过程数字化采购管理,打造从供应商管理到采购招投标、采购合同、采购执行的全过程数字化管理。通供应商门户具备内外协同的能力,为外部供应商集中推送展示与其相关的所有采购业务信息(历史合作、考

    2024年02月07日
    浏览(49)
  • 【Java】智慧工地管理系统源代码,支持二次开发,SaaS模式

    智慧工地系统围绕工程现场人、机、料、法、环及施工过程中质量、安全、进度、成本等各项数据满足工地多角色、多视角的有效监管,实现工程建设管理的降本增效。 1、施工现场管理难:安全事故频发,人工巡检难度大,质量进度协同难等问题仍没有得到解决; 2.人员管理

    2024年02月04日
    浏览(44)
  • VS2022 C++修改Window系统DNS源代码V2.0

    这是自己使用VS2022 C++编写开发的Window系统下修改DNS脚本程序第2个版本,适合Win10系统和Win7系统。cfg.txt文件存放要修改的DNS,最多4个。 详细源代码如下: setdns.cpp

    2024年02月11日
    浏览(52)
  • Python版基于pygame的玛丽快跑小游戏源代码,玛丽冒险小游戏代码,支持双人模式

    基于pygame的玛丽快跑小游戏源代码,玛丽冒险小游戏代码,支持双人模式 按空格进入单人模式,按‘t’进入双人模式,双人模式下玛丽1采用空格键上跳,玛丽2采用方向上键上跳。 完整代码下载地址:Python版基于pygame的玛丽快跑小游戏源代码 完整代码下载地址:Python版基于

    2024年02月11日
    浏览(63)
  • 【精选】各种节日祝福(C语言,可修改),Easyx图形库应用+源代码分享

    博主:命运之光✨✨ 专栏:Easyx图形库应用📂 目录 ✨一、程序展示  范例一:❤新年祝福❤ 范例二:❤母亲节祝福❤ ✨二、项目环境 简单介绍一下easyx图形库应用 Easyx图形库 ✨三、运行效果展示(视频) ✨四、程序源代码分享 🍓文字可以自由输入(●\\\'◡\\\'●)🍓 🍓输入格

    2024年02月05日
    浏览(57)
  • Java企业电子招投标系统源代码,支持二次开发,采用Spring cloud框架

    在数字化采购领域,企业需要一个高效、透明和规范的管理系统。通过采用Spring Cloud、Spring Boot2、Mybatis等先进技术,我们打造了全过程数字化采购管理平台。该平台具备内外协同的能力,通过待办消息、招标公告、中标公告和信息发布等功能模块,实现了对供应商的集中管理

    2024年02月03日
    浏览(61)
  • Python拼图游戏源代码,可定制拼图图片,支持多种难度,可九宫格、十六宫格、二十五宫格

    配置环境 安装pygame模块 pip install pygame 引入资源 将照片,添加到resources/pictures路径下 照片.jpg格式 主函数代码 pintu.py 一个配置文件cfg,一个资源路径resources,存放字体和图片。 主函数代码放在这里: 1、定义四个可移动函数,在存在空格的情况下,允许向空格的方向移动。

    2024年02月12日
    浏览(58)
  • Java版本+企业电子招投标系统源代码+支持二开+招投标系统+中小型企业采购供应商招投标平台

      功能模块: 待办消息,招标公告,中标公告,信息发布 描述: 全过程数字化采购管理,打造从供应商管理到采购招投标、采购合同、采购执行的全过程数字化管理。通供应商门户具备内外协同的能力,为外部供应商集中推送展示与其相关的所有采购业务信息(历史合作、

    2024年02月07日
    浏览(44)
  • 【UE Unreal Camera】【保姆级教程二】【包含源代码】手把手教你通过UE获取摄像头帧数据

       【UE Unreal Camera】【保姆级教程二】【包含源代码】手把手教你通过UE获取摄像头帧数据~ c6ebbaddb1aff.png)   在UE 摄像头教程一中,我们已经通过Unreal自带的媒体播放器打开了摄像头,并且将摄像头的数据展示在了游戏画面中。当然这只是最基本的功能,一般情况下,我们

    2024年02月01日
    浏览(54)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包