Speaker change detection in casual conversations using excitation source features

Dhananjaya, N. ; Yegnanarayana, B. (2008) Speaker change detection in casual conversations using excitation source features Speech Communication, 50 (2). pp. 153-161. ISSN 0167-6393

[img]
Preview
PDF - Author Version
940kB

Official URL: http://www.sciencedirect.com/science/article/pii/S...

Related URL: http://dx.doi.org/10.1016/j.specom.2007.08.003

Abstract

In this paper we propose a method for speaker change detection using features of excitation source of the speech production mechanism. The method uses neural network models to capture the speaker-specific information from a signal that represents predominantly the excitation source. The focus in this paper is on speaker change detection in casual telephone conversations, in which short (<5 s) speaker turns are common. Excitation source features are a better choice for modeling a speaker, when limited amount of speech data is available, when compared to the vocal tract system features. Linear prediction residual is used as an estimate of the excitation source signal. Autoassociative neural network models are proposed to capture the higher order relations among the samples of the residual signal. Speaker models are generated for every one second of voiced speech from the first few seconds of the conversation. These models are used to detect the speaker change points. Performance of the proposed method for speaker change detection is evaluated on a database containing several two-speaker conversations.

Item Type:Article
Source:Copyright of this article belongs to Elsevier Science.
Keywords:Change Detection; Multispeaker Conversation; Autoassociative Neural Network (AANN) Models; Excitation Source Features; Linear Prediction (LP) Residual
ID Code:57730
Deposited On:29 Aug 2011 12:08
Last Modified:18 May 2016 09:02

Repository Staff Only: item control page