State-of-the-art automatic speech recognition (ASR) systems typically rely on
pre-processed features. This paper studies the time-frequency duality in ASR
feature extraction methods and proposes extending the standard acoustic model with
a complex-valued linear projection layer to learn and optimize features that
minimize standard cost functions such as cross entropy. The proposed Complex Linear
Projection (CLP) features achieve superior performance compared to pre-processed
Log Mel features.