We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes
sequences with variable lengthed output units as a function of both the input
sequence and the output sequence. We present a training algorithm which samples
valid extensions and an approximate decoding algorithm. We experiment with the Wall
Street Journal speech recognition task. Our LSD model achieves 12.9% WER compared
to a character baseline of 14.8% WER. When combined with a convolutional network on
the encoder, we achieve 9.6% WER.