Hive UDF-Toy模板网

这篇具有很好参考价值的文章主要介绍了Hive UDF。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

当Hive提供的内置函数不能满足查询需求时，用户可以根据自己业务编写自定义函数（User Defined Functions, UDF), 然后在HiveQL中调用。

例如有这样一个需求：为了保护用户隐私，当查询数据的时候，需要将用户手机号的中间四位用*号代替，比如手机号18001292688需要显示为180****2688。这时候就可以写一个自定义函数实现这个需求。

新建项目MyUDF,添加Maven依赖

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.example</groupId>
    <artifactId>MyUDF</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <hive.version>2.1.1-cdh6.1.0</hive.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>jdk.tools</groupId>
            <artifactId>jdk.tools</artifactId>
            <version>1.8</version>
            <scope>system</scope>
            <systemPath>${JAVA_HOME}/lib/tools.jar</systemPath>
        </dependency>
        <!--Hadoop common包-->
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.10.2</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.hive/hive-exec -->
        <dependency>
                <groupId>org.apache.hive</groupId>
                <artifactId>hive-exec</artifactId>
                <version>${hive.version}</version>
        </dependency>
    </dependencies>
        <!--添加CDH的仓库-->
    <repositories>
        <repository>
            <id>nexus-aliyun</id>
            <url>http://maven.aliyun.com/nexus/content/groups/public</url>
        </repository>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
        </repository>
        </repositories>

    <build>
            <plugins>
                <plugin>
                    <groupId>org.apache.maven.plugins</groupId>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.6.0</version>
                    <configuration>
                        <source>1.8</source>
                        <target>1.8</target>
                        <encoding>UTF-8</encoding>
                    </configuration>
                </plugin>
            </plugins>
    </build>

</project>

新建类hive.demo.MyUDF

package hive.demo;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

/**
 * Hive自定义函数类
 */
public class MyUDF extends UDF{
    /**
     * @param text
     * 调用函数时需要传入的参数
     * @return 隐藏后的手机号码
     * 自定义函数类需要一个名为evaluate()的方法，Hive将调用该方法
     */
    public String evaluate(Text text){
        String result = "手机号码错误！";
        if(text != null && text.getLength() == 11){
            String inputStr = text.toString();
            StringBuffer sb = new StringBuffer();
            sb.append(inputStr.substring(0,3));
            sb.append("****");
            sb.append(inputStr.substring(7));
            result = sb.toString();
        }
        return result;
    }
}

打包MyUDF.jar上传至路径，比如/home/hadoop/

在Hive CLI中执行

hive>add jar /home/hadoop/MyUDF.jar;

创建函数名称

CREATE TEMPORARY FUNCTION formatPhone AS 'hive.demo.MyUDF';

新建一个表测试一下这个自定义的函数

CREATE TABLE t_user(id INT, phone STRING);
INSERT INTO TABLE t_user 
SELECT 1, '13123567589'
UNION ALL SELECT 2, '15898705673'
UNION ALL SELECT 3, '18001292688';

Hive UDF,大数据开发,hive,hadoop,数据仓库