VTLN Through Frequency Warping Based on Pitch
Carla Lopes, Fernando Perdigão
DOI: 10.14209/its.2002.844
Evento: 2002 International Telecommunications Symposium (ITS2002)
"This article describes a vocal tract length normalization (VTLN) procedure through pitch based frequency warping. This procedure aim to reduce de inter-speaker variability, present in speech signals. It is also described a method for coarticulation phenomena compensation, that reduce speech signal variability due to phonetic context. This procedure operates at the phonetic level since makes a modeling of coarticulation events, and at the linguistic level since these units lead to alternative pronunciation rules. Inter speaker variability removal is performed by a traditional speaker normalization method, which consists in expanding or compressing the Mel bank filter bandwidth in order to normalize the vocal tract length (VTL) of each speaker to a standard one. The estimation of VTL is, in previous works, based on formant information, but authors pointed out as an obstacle for better results, the difficulty of formant frequency estimation. The method presented in this paper overcomes such problem since we estimate the warping factor (WF) through pitch. The recognition results presented on this paper, for a telephone digit recognition task prove that this procedure leads to similar improvements to those obtained with traditional methods based on formant information."Download