Cloak of Visibility: Detecting When Machines Browse a Different Web
The contentious battle between web services and miscreants involved in blackhat search engine optimization and malicious advertisements has driven the underground to develop increasingly sophisticated techniques that hide the true nature of malicious sites. These web cloaking techniques hinder the effectiveness of security crawlers and potentially expose Internet users to harmful content. In this work, we study the spectrum of blackhat cloaking techniques that target browser, network, or contextual cues to detect organic visitors. As a starting point, we investigate the capabilities of ten prominent cloaking services marketed within the underground. This includes a first look at multiple IP blacklists that contain over 50 million addresses tied to the top five search engines and tens of anti-virus and security crawlers. We use our findings to develop an anti-cloaking system that detects split-view content returned to two or more distinct browsing profiles with an accuracy of 95.5% and a false positive rate of 0.9% when tested on a labeled dataset of 94,946 URLs. We apply our system to an unlabeled set of 135,577 search and advertisement URLs keyed on high-risk terms (e.g., luxury products, weight loss supplements) to characterize the prevalence of threats in the wild and expose variations in cloaking techniques across traffic sources. Our study provides the first broad perspective of cloaking as it affects Google Search and Google Ads and underscores the minimum capabilities necessary of security crawlers to bypass the state of the art in mobile, rDNS, and IP cloaking.