Let us see the word count example in java. There is many java classes involved but we will be mainly focussing on 3 classes.
i.e. Mapper class, Reducer class and the class where Mapper and Reducer class would be executed.
In the above example we have named the mapper class as 'WordCountMapper' which extends MapReduceBase and implements the Mapper<LongWritable,Text,Text,IntWritable> class.Where the generic type parameters of the Mapper represents :
Then we have the map() function, which has four parametres. 'LongWritable key' is the line number and 'Text value' is the entire line, 'OutputCollector <Text, IntWritable> output' gets the output in key value pair.
Now, what we have to do is convert this line to String type and split using a space (" "), so that we get the individual word from that line.
Then we will be running a 'for' loop, so that we can get the output from the map in key value pair.
output.collect(new Text(word), new IntWritable(1));
The above line says, collect each word(which will be a key) and initialize it to '1'(which is the value).
Relate it with the above example, the code will become more clear.
In the above example we have named the reducer class as 'WordCountReducer' which extends MapReduceBase and implements the Reducer <Text,IntWritable,Text,IntWritable> class.Where the generic type parameters of the Reducer represents :
Next we have the reduce() function, which has 'Text key' as the first parameter and 'Iterator<IntWritable> values' as the second parameter.
Now, the beauty of Hadoop is, it does not pass all the keys to the reduce() function. But only passes one key to reduce() and the 'Iterator<IntWritable> values' contains all the keys associated to it. i.e. If you remember the example where {In , 1} occurred in Map1, also {In , 1} occurred in Map2. So, 'In' is the key and it's value '1' is present in two places. So, all we have to do is add all the 1's to get the desired word count.
And the same is done in the next step. We have executed a 'for' loop and added all the 1's to get the desired output. i.e. {In, 2}.
So, we have the Mapper and Reducer already defined. Now, we need to have a class which will run this Mapper and Reducer.
In the above example we can see there is a run() method which contains all the necessary classes to run the job. So, we have defined an Object of 'JobConf' class. The 'JobConf' class has the configuration details of the job.
In the above two lines we are setting the Mapper and Reducer class.
In the above four lines we are defining the Mapper Output key value class and Reducer Output key value class. i.e. It defines the key and value of the Mapper and Reducer are going to be of what data type?
Above we have set the input path of the file to be parsed(i.e. story.txt) and the output file, which we have defined in the command line and passed as an argument from the main method.
Finally the runJob() method runs the job.
Now, if you come to the main() method, we have used the run(new WordCount(), args) method of ToolRunner class after passing the class name (i.e. WordCount) and the arguments taken by the main method for actual execution.