Mahout in Action
Venue
Manning, Manning Publications Co. Sound View Ct. #3B Greenwich, CT 06830 (2010), pp. 350
Publication Year
2010
Authors
Robin Anil, Sean Owen, Ted Dunning, Ellen Friedman
BibTeX
Abstract
A computer system that learns and adapts as it collects data is an extraordinarily
interesting and powerful concept. With new technologies to capture, store, and
process information, machine learning has moved from the academic edges of computer
science to the middle of the mainstream. Mahout, an open source machine learning
library, captures the core algorithms of recommendation systems, classification,
and clustering in ready-to-use, scalable libraries. With Mahout, you can
immediately apply the machine learning techniques that drive Amazon, Netflix, and
other data-centric businesses to your own projects. Mahout in Action explores
machine learning through Apache's scalable machine learning project, Mahout.
Following real-world examples, it introduces practical use cases, and then
illustrates how Mahout can be applied to solve them. It places particular focus on
issues of scalability, and how to apply these techniques against large data sets
using the Apache Hadoop framework. In this book, you'll use Mahout to dive into
three practical applications of machine learning: Recommendations. Using group user
history and preferences you can make accurate recommendations for individual users.
This is an extremely powerful principle, because accurate recommendations are
beneficial both to customers and vendors. Clustering. Learn to automatically
discover logical groupings with groups of data or data sets, such as documents or
lists. This technique is especially useful to search and data mining applications.
Classification. Determining on the fly whether a thing fits a category based on its
attributes and previous history can help instantaneously organize unstructured
groups. For instance, you'll learn about filtering techniques that decide whether
email messages should be considered "spam." Mahout in Action is written primarily
for developers who need to become better practitioners of machine learning
techniques. It is also appropriate for researchers who understand the techniques
and want to understand how to apply them effectively at scale. It assumes
familiarity with Java, and some basic grounding in machine learning techniques, but
no previous exposure to Mahout is necessary.
