Problem Statment: Lets say you have a table in hive called person and there are few data in ‘name’ column which has empty space in the beginning. And you need to trim those empty space. As Hive does not have trim function, so, how to achieve this?
Solution: You need to write your own custom UDF.
Steps to follow while writing your own UDF.
- your class should extends org.apache.hadoop.hive.ql.exec.UDFUDF
- Your class must have atlas one evaluate() method
- Build a jar out of your class
- Add the jar in hive class path
- Apply your custom UDF function
Look at below for the detailed implementation step by step
Step 1: Create a file called person in your local machine. leave some empty space for name ‘kumar’
Saroj:~ saroj$ cat > person
saroj software CA
kumar finance NYC
john sales CA
Step 2: Now start hive server and open hive console.
hive> create table person(name string, occupation string, city string)
> row format delimited
> fields terminated by '\t';
hive> load data local inpath '/Users/saroj/person' into table person
Step 3: Now create a project and create class as below in your eclipse /IntelliJ editor . Add the hadoop-core jar and hive jars(which are present inside /apache-hive-1.*/lib) as the dependencies in the build path.
package com.hadoop.hive;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public class SampleUDF extends UDF {
Text t = new Text();
public Text evaluate(Text str) {
if (str == null) {
return str;
} else {
t.set(StringUtils.strip(str.toString()));
}
return t;
}
}
Step 4: Build the project as a jar.
Step 5: Now go to hive cli again and add the jar into hive class path
hive> add jar /Users/saroj/sampleUdf.jar;
hive>create temporary function myudf as 'com.hadoop.hive.SampleUDF';
hive>select myudf(name) from person;
You will get the result as without any space for ‘kumar’
saroj software CA
kumar finance NYC
john sales CA
