Combiner is used for performance benefits by reducing the data volume between Map and Reduce.
Each combiner is associated with each mapper. It takes the output of mapper as input and does shuffling and sorting and send the result to reducer for further processing.It takes each key-value pair from the Map phase, processes it, and produces the output as key-value collection pairs.
Combiner is also called as mini reducer. It acts between Mapper and Reducer.
Lets say, you have the below input file
hello hi saroj rout kumar
hey hello what hi rout hello
hi nishant saroj rout what
Mapper will take the input file and give the output as below
(hello,1) (hi,1) (saroj,1) (rout,1) (kumar,1)
(hey,1) (hello,1) (what,1) (hi,1) (rout,1) (hello,1)
(hi,1) (nishant,1) (saroj,1) (rout,1) (what,1)
The Combiner phase takes each key-value pair as input from Mapper output(mentioned above), processes it, and produces the output as key-value collection pairs as below.
(hello,1,1) (hey,1) (hi,1,1,1) (kumar,1)
(nishant,1) (rout,1,1,1) (saroj,1,1) (what,1,1)
There is no difference between Reducer and Combiner code.
We just need to set the combiner class in the driver program as below
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
The reducer output would be as below
(hello,2) (hey,1) (hi,3) (kumar,1) (nishant,1) (rout,3)
(saroj,2) (what,2)
