Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system

Yegnanarayana, B. ; Prasanna, S. R. M. ; Zachariah, J. M. ; Gupta, C. S. (2005) Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system IEEE Transactions on Speech and Audio Processing, 13 (4). pp. 575-582. ISSN 1063-6676

Full text not available from this repository.

Official URL: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arn...

Related URL: http://dx.doi.org/10.1109/TSA.2005.848892

Abstract

This paper proposes a text-dependent (fixed-text) speaker verification system which uses different types of information for making a decision regarding the identity claim of a speaker. The baseline system uses the dynamic time warping (DTW) technique for matching. Detection of the end-points of an utterance is crucial for the performance of the DTW-based template matching. A method based on the vowel onset point (VOP) is proposed for locating the end-points of an utterance. The proposed method for speaker verification uses the suprasegmental and source features, besides spectral features. The suprasegmental features such as pitch and duration are extracted using the warping path information in the DTW algorithm. Features of the excitation source, extracted using the neural network models, are also used in the text-dependent speaker verification system. Although the suprasegmental and source features individually may not yield good performance, combining the evidence from these features seem to improve the performance of the system significantly. Neural network models are used to combine the evidence from multiple sources of information.

Item Type:Article
Source:Copyright of this article belongs to IEEE.
ID Code:57761
Deposited On:29 Aug 2011 11:58
Last Modified:29 Aug 2011 11:58

Repository Staff Only: item control page