Problem Statment: Lets say you have a table in hive called person and there are few data in ‘name’ column which has empty space in the beginning. And you need to trim those empty space. As Hive does not have  trim function, so, how to achieve this?

Solution: You need to write your own custom UDF.

Steps to follow while writing your own UDF.

  1. your class should extends org.apache.hadoop.hive.ql.exec.UDFUDF
  2. Your class must have atlas one evaluate() method
  3. Build a jar out of your class
  4. Add the jar in hive class path
  5. Apply your custom UDF function

Look at below for the detailed implementation step by step

Step 1: Create a file called person in your local machine. leave some  empty space for name ‘kumar’


Saroj:~ saroj$ cat > person
saroj software CA
kumar     finance NYC
john sales CA

Step 2: Now start hive server and open hive console.


hive> create table person(name string, occupation string, city string)
> row format delimited
> fields terminated by '\t';

hive>  load data local inpath '/Users/saroj/person' into table person

 

Step 3: Now create a project and create class as below in your eclipse /IntelliJ editor . Add the hadoop-core jar and hive jars(which are present inside /apache-hive-1.*/lib) as the dependencies in the build path.


package com.hadoop.hive;
import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

public class SampleUDF extends UDF {
Text t = new Text();
public Text evaluate(Text str) {
if (str == null) {
return str;
} else {
t.set(StringUtils.strip(str.toString()));
}
return t;
}
}

Step 4: Build  the project as a jar.

Step 5: Now go to hive cli again and add the jar into hive class path


hive> add jar /Users/saroj/sampleUdf.jar;
hive>create temporary function myudf as 'com.hadoop.hive.SampleUDF';
hive>select myudf(name) from person;

You will get the result as without any space for ‘kumar’


saroj software CA
kumar finance NYC
john sales CA