Pannala, Vishala ; Yegnanarayana, B. (2021) A neural network approach for speech activity detection for Apollo corpus Computer Speech & Language, 65 . p. 101137. ISSN 0885-2308
Full text not available from this repository.
Official URL: http://doi.org/10.1016/j.csl.2020.101137
Related URL: http://dx.doi.org/10.1016/j.csl.2020.101137
Abstract
This paper describes a new method for speech activity detection (SAD) based on the recently proposed single frequency filtering (SFF) analysis of speech signals and a neural network model. The SFF analysis gives instantaneous spectrum of the speech signal at each sampling instant. The frequency resolution of the spectrum is decided by the number of frequencies used in the SFF analysis, which in turn depends on the frequency spacing. Using a frequency spacing of 10 Hz and a sampling frequency of 8 kHz, a 401 dimensional spectrum, covering 0–4 kHz, is obtained at each sampling instant. This is used as a feature vector to train an artificial neural network (ANN) model to discriminate (noisy) speech and nonspeech (mostly noise). The output of the trained ANN model for a given test utterance gives speech/nonspeech decision at every sampling instant. Post processing of the decision is used for SAD. The system generated SAD is evaluated on the Apollo corpus for SAD task in terms of detection cost function (DCF). The DCF values of the proposed system on the development and evaluation datasets are 3.1% and 4.6%, respectively, whereas the DCF values of the reported baseline system are 8.6% and 11.7%, respectively.
Item Type: | Article |
---|---|
Source: | Copyright of this article belongs to Elsevier Science. |
ID Code: | 135771 |
Deposited On: | 17 Aug 2023 05:08 |
Last Modified: | 17 Aug 2023 05:08 |
Repository Staff Only: item control page