Refinement: Redundant Execution
Slow workers significantly lengthen completion time
- Other jobs consuming resources on machine
- Bad disks with soft errors transfer data very slowly
- Weird things: processor caches disabled (!!)
Solution: Near end of phase, spawn backup copies of tasks
- Whichever one finishes first "wins"
Effect: Dramatically shortens job completion time