Hinton, Geoffrey, et al. “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." Signal Processing Magazine, IEEE 29.6 (2012): 82-97.
DNN gets great success on speech recognition. The previous state-of-the-art acoustic modeling method is GMM-HMM, which had been already used in read world. However, GMM is not good at modeling nonlinear manifold. Replacing GMM with DNN, DNN-HMM outperforms GMM-HMM by a large margin.
The DNN’s input is MFCC feature, and the output is acoustic state. It is pretrained as a DBN (deep belief network), which is stacked with multiple RBMs (restricted Boltzmann machine).
RBM are first trained separately, and then stacked together, added a softmax layer on the top, becoming a pretrained DNN.
The experiments are based on TIMIT dataset. There are 192 test sentences.
As you can see, GMM-HMM is 25.6%, while the best DNN-based model get 20.0% on the benchmark.