Map Reduce - Definition, Principe Features & Uses

Definition : A programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithms on a cluster.

Map Reduce has two main operations: Map & Reduce

Map : Splitting the query into different parts and map them to systems containing relevant data.

Reduce : Output from the map phase is shuffled and group by key & value pairs & then each worker node produces a collection of value in parallel, which is combined later to give the final output.

Principle features of Map Reduce

Scheduling : Map Reduce involves two operations which are Map & Reduce which are executed by dividing large problems into smaller chunks which run into parallel by different computing resources.
Synchronization : The framework tracks all the tasks along with their timings & start reduction process after the computation of mapping.
Co-location of Code/Data (Data Locality): Effectiveness of Data Processing depends upon the location of data required to execute it.
Best results are obtained when both code & data resides on the same machine.
Handling of Errors/Faults : Have a high level of fault tolerance & robustness in handling errors. As they have high chances of failure. Map reduce must have the capability of recognizing the fault and rectify it.
Scale out Architecture : Built in such a way that they can accommodate more machines as and when required. This possibility makes Map Reduce programming model more suited to the higher computational demands of big data.

Uses of Map Reduce

Web Page Visits : A researcher wants to know how many times the website of a particular newspaper was visited/accessed.
The map task would read the logs of web page & make a complete list, which may look like:

(emailURL,1)
(newspaperURL,1)
(sportsnewsURL,1)
(newspaperURL,1)
(politicsnewsURL,1)
(newspaperURL,1)

The reduce function would find the result for newspaperURL & show (newspaperURL,3)
Web Page Visitor Paths : Consider an advocacy group wishes to know how visitors get to know about the website. Map function scans for result of the type
The final output of which will be, (target,source)
Word Frequency : A researcher wishes to read articles about flood but does not want to read those articles where flood is a minor topic.
Map function will count number of times a specific word appeared (document,frequency)
Word Count : A person wishes to know the number of times celebs talk about work bestseller writer in blogs, posts & tweets of celebrities,
list will be in format : (word,count)

While this method is not guaranteed to be fast, the main reason behind using this method is to efficiently use all the system resources and sharing the load. Map Reduce does not work efficiently on small systems where results can be achieved quickly because it will consume more time & system resources to first break the query into parts & process is separately. It is efficient where a large number of data is there & a huge amount of computation is to be done in order to get the results. Due to which the dependency on one machine decreases and adding or removing servers & data storage spaces becomes easy which makes this method more durable & reliable for large systems.
Map reduce is capable of generating results by efficiently distributing the work between thousands of systems, make efficient use of there resources and then generat the required results by computing petabytes of data.

Read my articles on Technology for more.

Chetan Jindal

Search This Site

Astrological Combinations In Horoscope for Marrying A Rich Spouse