Jump to Content

Mahout in Action

Robin Anil
Sean Owen
Ted Dunning
Ellen Friedman
Manning, Manning Publications Co. Sound View Ct. #3B Greenwich, CT 06830 (2010), pp. 350

Abstract

A computer system that learns and adapts as it collects data is an extraordinarily interesting and powerful concept. With new technologies to capture, store, and process information, machine learning has moved from the academic edges of computer science to the middle of the mainstream. Mahout, an open source machine learning library, captures the core algorithms of recommendation systems, classification, and clustering in ready-to-use, scalable libraries. With Mahout, you can immediately apply the machine learning techniques that drive Amazon, Netflix, and other data-centric businesses to your own projects. Mahout in Action explores machine learning through Apache's scalable machine learning project, Mahout. Following real-world examples, it introduces practical use cases, and then illustrates how Mahout can be applied to solve them. It places particular focus on issues of scalability, and how to apply these techniques against large data sets using the Apache Hadoop framework. In this book, you'll use Mahout to dive into three practical applications of machine learning: Recommendations. Using group user history and preferences you can make accurate recommendations for individual users. This is an extremely powerful principle, because accurate recommendations are beneficial both to customers and vendors. Clustering. Learn to automatically discover logical groupings with groups of data or data sets, such as documents or lists. This technique is especially useful to search and data mining applications. Classification. Determining on the fly whether a thing fits a category based on its attributes and previous history can help instantaneously organize unstructured groups. For instance, you'll learn about filtering techniques that decide whether email messages should be considered "spam." Mahout in Action is written primarily for developers who need to become better practitioners of machine learning techniques. It is also appropriate for researchers who understand the techniques and want to understand how to apply them effectively at scale. It assumes familiarity with Java, and some basic grounding in machine learning techniques, but no previous exposure to Mahout is necessary.