Publication Data
Let's Parse to Prevent Pwnage
Abstract: Software that processes rich content suffers from endemic
security vulnerabilities. Frequently, these bugs are due to data confusion:
discrepancies in how content data is parsed, composed, and otherwise processed by
different applications, frameworks, and language runtimes. Data confusion often enables
code injection attacks, such as cross-site scripting or SQL injection, by leading to
incorrect assumptions about the encodings and checks applied to rich content of
uncertain provenance. However, even for well-structured, value-only content, data
confusion can critically impact security, e.g., as shown by XML signature
vulnerabilities [12]. This paper advocates the position that data confusion can be
effectively prevented through the use of simple mechanisms—based on parsing—that
eliminate ambiguities by fully resolving content data to normalized, clearly-understood
forms. Using code injection on the Web as our motivation, we make the case that
automatic defense mechanisms should be integrated with programming languages,
application frameworks, and runtime libraries, and applied with little, or no,
developer intervention. We outline a scalable, sustainable approach for developing and
maintaining those mechanisms. The resulting tools can offer comprehensive protection
against data confusion, even when multiple types of rich content data are processed and
composed in complex ways.
