Paper Note: Deep neural networks for acoustic modeling in speech recognition

Hinton, Geoffrey, et al. “Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups." Signal Processing Magazine, IEEE 29.6 (2012): 82-97.

DNN gets great success on speech recognition. The previous state-of-the-art acoustic modeling method is GMM-HMM, which had been already used in read world. However, GMM is not good at modeling nonlinear manifold. Replacing GMM with DNN, DNN-HMM outperforms GMM-HMM by a large margin.

 

DBN-DNN

The DNN’s input is MFCC feature, and the output is acoustic state. It is pretrained as a DBN (deep belief network), which is stacked with multiple RBMs (restricted Boltzmann machine).

RBM are first trained separately, and then stacked together, added a softmax layer on the top, becoming a pretrained DNN.

螢幕快照 2016-05-26 下午11.32.02.png

Evaluation

The experiments are based on TIMIT dataset. There are 192 test sentences.

螢幕快照 2016-05-26 下午11.34.58.png

As you can see, GMM-HMM is 25.6%, while the best DNN-based model get 20.0% on the benchmark.

廣告
Paper Note: Deep neural networks for acoustic modeling in speech recognition

發表迴響

在下方填入你的資料或按右方圖示以社群網站登入:

WordPress.com 標誌

您的留言將使用 WordPress.com 帳號。 登出 /  變更 )

Google photo

您的留言將使用 Google 帳號。 登出 /  變更 )

Twitter picture

您的留言將使用 Twitter 帳號。 登出 /  變更 )

Facebook照片

您的留言將使用 Facebook 帳號。 登出 /  變更 )

連結到 %s