Date of Publication :7th June 2018
Abstract: This paper discuss a novel approach to detect speech files using a frame classifier. The speech files tends to have the subphones, corresponding to a phone, recognized in sequence, while run through a frame classifier. Duration of subphone sequence corresponding to a phone also tends to differ in speech and noise. Distributions are used to capture the count statistics of recognized subphone sequence, along with the phone duration. A probabilistic framework is formulated to score a wave file for the presence of speech. Relevant speech and noise datasets are used to benchmark the approach.
Reference :
-
- Ananya Misra, “ NonSpeech Segmentation in Web Videos”,
- Hossein Hadian, Daniel Povey, Hossein Sameti, Sanjeev Khudanpur, “Phone duration modeling for LVCSR using neural networks”
- E. Verteletskaya, K. Sakhnov, “Voice Activity Detection for Speech Enhancement Application”,
- Reinhard Sonnleitner, Bernhard Niedermayer, Gerhard Widmer, Jan Schluter, “A Simple and Effective Spectral Feature for Speech Detection in Mixed Audio Signal”,
- Zhihao Ahang and Jinlong Lin,“Robust Voice Activity detection Based on Pitch and Subband Energy”
- Atanas Ouzouniv,“A Robust Feature for Speech Recognition