Projecting Disk Usage Based on Historical Trends in a Cloud Environment
Venue
ScienceCloud 2012 Proceedings of the 3rd International Workshop on Scientific Cloud Computing, ACM, pp. 63-70
Publication Year
2012
Authors
Murray Stokely, Amaan Mehrabian, Christoph Albrecht, Francois Labelle, Arif Merchant
BibTeX
Abstract
For three years, we collected detailed usage information for data stored in distributed filesystems in a large private cloud spanning dozens of clusters on multiple continents. Specifically, we measured the disk space usage, I/O rate, and age of stored data for thousands of different engineering users and teams. We find that although the individual timeseries often have non-stable usage trends, regional aggregations, user classification, and ensemble forecasting methods can be combined to provide a more accurate prediction of future use for the majority of users.
We applied this methodology for the storage users in one geographic region and back-tested these techniques over the past three years to compare our forecasts against actual usage. We find that by classifying a small subset of users with unforecastable trend changes due to known product launches, we can generate three-month out forecasts with mean absolute errors of less than ~12%. This compares favorably to the amount of allocated but unused quota that is generally wasted with manual operator-set quotas.
