Refinement: Locality Optimization
Master scheduling policy:
- Asks GFS for locations of replicas of input file blocks
- Map tasks typically split into 64MB (== GFS block size)
- Map tasks scheduled so GFS input block replica are on same machine or same rack
Effect: Thousands of machines read input at local disk speed
- Without this, rack switches limit read rate