We propose a simple neural architecture for natural language inference. Our
approach uses attention to decompose the problem into subproblems that can be
solved separately, thus making it trivially parallelizable. On the Stanford Natural
Language Inference (SNLI) dataset, we obtain state-of-the-art results with almost
an order of magnitude fewer parameters than previous work and without relying on
any word-order information. Adding intra-sentence attention that takes a minimum
amount of order into account yields further improvements.