In todays date the craze of internet is at it's peak. The number of people accessing web is increasing from millions to billions. And so is the data. The data is so huge that it goes beyond the capacity of processing and storage. This type of data is called Big Data.
Now just think for a moment. Is it possible for a single computer to solve this purpose, no matter how much powerful the computer is? The answer is no. The main problem is not just storage but the processing as well. What if Google returned your search results after 15 minutes. What if Facebook had taken half an hour to upload a photo.
The solution to this problem is a distributed computing system.
In simple words just think how a manager works in an office. When a new assignment comes, he distributes the work among his co-workers.
Similar case applies for Distributed Computing System. A single computer is not responsible for handling all the tasks. Rather the work is distributed among several computers connected to each other. Where each computer is called a node and the group of computers connected to each other is called a cluster.
The standalone computers are commodity hardwares(Low cost computers). If we increase the number of nodes(standalone computers), it's not just the storage capacity that increases but the processing capability increases as well.
So, there should be someone who is responsible for :
To solve the above problems Hadoop came into picture.