Beyond Heuristics: Learning to Classify Vulnerabilities and Predict Exploits
Venue
Proceedings of the Sixteenth ACM Conference on Knowledge Discovery and Data Mining (KDD-2010), pp. 105-113
Publication Year
2010
Authors
Mehran Bozorgi, Lawrence Saul, Stefan Savage, Geoffrey M. Voelker
BibTeX
Abstract
The security demands on modern system administration are enormous and getting
worse. Chief among these demands, administrators must monitor the continual ongoing
disclosure of software vulnerabilities that have the potential to compromise their
systems in some way. Such vulnerabilities include buffer overflow errors,
improperly validated inputs, and other unanticipated attack modalities. In 2008,
over 7,400 new vulnerabilities were disclosed—well over 100 per week. While no
enterprise is affected by all of these disclosures, administrators commonly face
many outstanding vulnerabilities across the software systems they manage. A key
question for systems administrators is which vulnerabilities to prioritize. From
publicly available databases that document past vulnerabilities, we show how to
train classifiers that predict whether and how soon a vulnerability is likely to be
exploited. As input, our classifiers operate on high dimensional feature vectors
that we extract from the text fields, time stamps, cross-references, and other
entries in existing vulnerability disclosure reports. Compared to current
industry-standard heuristics based on expert knowledge and static formulas, our
classifiers predict much more accurately whether and how soon individual
vulnerabilities are likely to be exploited.
