Speech recognition systems have used the concept of states as a way to decompose
words into sub-word units for decades. As the number of such states now reaches the
number of words used to train acoustic models, it is interesting to consider
approaches that relax the assumption that words are made of states. We present here
an alternative construction, where words are projected into a continuous embedding
space where words that sound alike are nearby in the Euclidean sense. We show how
embeddings can still allow to score words that were not in the training dictionary.
Initial experiments using a lattice rescoring approach and model combination on a
large realistic dataset show improvements in word error rate.