Publication Data
Traffic Anomaly Detection Based on the IP Size Distribution
Abstract: In this paper we present a data-driven framework for
detecting machine-generated traffic based on the IP size, i.e., the number of users
sharing the same source IP. Our main observation is that diverse machine-generated
traffic attacks share a common characteristic: they induce an anomalous deviation from
the expected IP size distribution. We develop a principled framework that automatically
detects and classifies these deviations using statistical tests and ensemble learning.
We evaluate our approach on a massive dataset collected at Google for 90 consecutive
days. We argue that our approach combines desirable characteristics: it can accurately
detect fraudulent machine-generated traffic; it is based on a fundamental
characteristic of these attacks and is thus robust (e.g., to DHCP re-assignment) and
hard to evade; it has low complexity and is easy to parallelize, making it suitable for
large-scale detection; and finally, it does not entail profiling users, but leverages
only aggregate statistics of network traffic.
