Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task

  Abstract