MapReduce: The programming model and practice
Abstract
In this tutorial, we first introduce the MapReduce programming model, illustrating its power by couple of examples. We discuss the MapReduce and its relationship to MPI and DBMS. Performance is a key feature of the Google MapReduce implementation and we will discus a few techniques used to achieve this goal. Google MapReduce exploits data locality to reduce network overhead. We utilize different scheduling techniques to ensure a job is progressing in the presence of variable system load. Finally, since failures are common in our data centers, we provide a number of failure avoidance and recovery features to ensure the job completion in such environment.
