Automatic Speaker Indexing in Corrupted Speech
H. Sayoud, S. Ouamour, M. Boudraa

DOI: 10.14209/its.2002.876
Evento: 2002 International Telecommunications Symposium (ITS2002)
Keywords:
Abstract
"Speaker indexing can broadly be divided into two problems: Locating the points of speaker change (Segmentation) and Identifying the speaker in each segment (Labeling). An important obstacle, in the speaker tracking, is the corruption of the speech signal during its recording or in a telephonic conversation. In this paper, we are interested in the corruption of the speech signal by the most probable noises during audiovisual recording and the mixture of the speech signal with music, in order to test the robustness of our speaker tracking method. For this purpose, we choose the SOSM method (Second Order Statistical Measures), applied for segments of 2 seconds duration with an overlapping of 50%. The speaker indexing becomes very difficult if the recordings are made in a noisy environment or if music is mixed with the speech. The evaluation of our method is done in TIMIT, and each discussion consists on sequences of speech signals uttered by 2 different speakers, concatenated into one speech file (the speakers are arbitrarily chosen from a population of 37 different speakers). So, each speech file contains several speaker transitions by file. In a second step, we have corrupted the database by three types of noise, namely: the office noise, the human noise and the background noise. Moreover, we have inserted music inside the discussion signals, for example, at the beginning, at the middle and at the end of the discussion. The results got are severely discussed according to each case: clean environment, noised environment, presence of music, etc. As an example, the error rate of the tracking varies from 5%, in a clean environment, to 34%, in a noised environment (+6 dB). Moreover, we remark that the error rate increases when the SNR decreases. Concerning the music, we remark that the speaker indexing is not perturbed by the concatenation of the music sequences, which is interesting in the case of the musical advertisement."

Download