Mariusz Bojarski, Anna Choromanska, Krzysztof Choromanski, Bernard Firner, Larry Jackel, Urs Muller, Karol
This paper proposes a new method, that we call VisualBackProp, for visualizing
which sets of pixels of the input image contribute most to the predictions made by
the convolutional neural network (CNN). The method heavily hinges on exploring the
intuition that the feature maps contain less and less irrelevant information to the
prediction decision when moving deeper in the network. The technique we propose was
developed as a debugging tool for CNN-based systems for steering self-driving cars
and is therefore required to run in real-time, i.e. the proposed method was
designed to require less computations then single forward propagation per image.
This makes the presented visualization method a valuable debugging tool which can
be easily used during training or inference. We furthermore justify our approach
with theoretical argument and theoretically confirm that the proposed method
identifies sets of input pixels, rather than individual pixels, that
collaboratively con- tribute to the prediction. Our theoretical findings stand in
agreement with experimental results. The empirical evaluation shows the
plausibility of the proposed approach on road data.