Google hostload prediction based on Bayesian model with optimized feature combination
Venue
Journal Parallel and Distributed Computing (2014)
Publication Year
2014
Authors
Sheng Dia, Derrick Kondo, Walfredo Cirne
BibTeX
Abstract
We design a novel prediction method with Bayes model to predict a load fluctuation
pattern over a long-term interval, in the context of Google data centers. We
exploit a set of features that capture the expectation, trend, stability and
patterns of recent host loads. We also investigate the correlations among these
features and explore the most effective combinations of features with various
training periods. All of the prediction methods are evaluated using Google trace
with 10,000+heterogeneous hosts. Experiments show that our Bayes method improves
the long-term load prediction accuracy by 5.6%–50%, compared to other
state-of-the-art methods based on moving average, auto-regression, and/or noise
filters. Mean squared error of pattern prediction with Bayes method can be
approximately limited in [10−8 ,10−5 ]. Through a load balancing scenario, we
confirm the precision of pattern prediction in finding a set of idlest/busiest
hosts from among 10,000+ hosts can be improved by about 7% on average.
