2002 International Telecommunications Symposium

  Página de 19  

GMM versus AR-Vector Models for Text Independent Speaker Verification
Charles B. de Lima, Abraham Alcaim, José A. Apolinário Jr.

DOI: 10.14209/its.2002.817
Keywords:
Abstract
"This paper presents a performance evaluation of two classification systems for text independent speaker verification: the Gaussian Mixture Model (GMM) and the AR-Vector Model. For the GMM, 32, 16, 8 and Gaussians are evaluated. On the other hand, an order 2 model with the Itakura symmetric distance was used for the AR-Vector. Both classification systems presented no errors when training and testing times were not smaller than 60s and 30s, respectively. Using s as the test time, the most accurate classification systems errors were between 0.4 and 3.3%. With 3s test, the errors presented by the GMM were around 6 to 7% whereas those for the AR-Vector were above 10%. However, the best results using 10s as testing and training times were obtained with the AR-Vector, with errors around 3.2%."

Download
Comparison of a Low Bit Rate Speech Compression Algorithm with the MELP Standard for Fricatives and Stops
Rodrigo C. de Lamare, Abraham Alcaim

DOI: 10.14209/its.2002.822
Keywords:
Abstract
"In this paper we examine the fricatives and stops encoding of a very low bit rate speech compression algorithm, based on a mixed multiband excitation system. The algorithm incorporates several improvements over previously reported coders. One of them is the use of a specific modelling and synthesis strategy for fricatives and stops at 400 b/s. The codec, which operates at an average rate of 1.2 kb/s, is compared with the North American standard 2.4 kb/s MELP coder for sentences with a large concentration of fricatives and stops sounds. Subjective listening tests indicate that the two codecs are comparable for both types of sounds, although the sound specific scheme operates at 400 b/s, whilst the MELP operates at 2.4 kb/s."

Download
Analysis of Postfilters for Low Bit Rate Speech Coders in Tandem Connections
Rodrigo C. de Lamare, Abraham Alcaim

DOI: 10.14209/its.2002.825
Keywords:
Abstract
"In this paper we analyse postfiltering techniques for very low bit rate speech coders in tandem connections. A mixed multiband excitation (MMBE) linear predictive coding (LPC) algorithm, that encodes voiced frames at 1.75 kb/s and unvoiced frames at 0.4 kb/s, is employed to assess the performance of different postfilters in tandem connections. We perform a comparative analysis of the well known adaptive spectral enhancement (ASE) technique with a recently reported approach, called spectral envelope restoration combined with noise reduction (SERNR) postfilter, using the same MMBE platform. Subjective listening tests in tandem connections show that the SERNR technique is clearly superior to the ASE postfilter."

Download
On the effect of the language in CMS channel normalization
Dirceu Gonzaga da Silva, José A. Apolinário Jr., Charles B. de Lima

DOI: 10.14209/its.2002.829
Keywords:
Abstract
"This paper presents a modification in a technique of channel normalization widely known as Cepstral Mean Subtraction (CMS). This modification is based on the introduction of language dependent phonetic modification. A careful investigation using Brazilian Portuguese was carried out showing that it is possible to improve the CMS channel identification through a constant vector, associated with the language, obtained from an estimation of the mean cepstral coefficients from clean speech signal over time. As a consequence of better channel estimation, better features normalization is attained. Computer simulations were carried out with cepstral coefficients extracted from Mel-scale in a speaker identification experiment where the proposed technique, in some cases, improved the recognition rate on the top of the CMS good results."

Download
Estimation of the Subjective Quality of Speech Signals using the Kohonen Self-Organizing Maps
Jayme G. A. Barbedo, Moisés V. Ribeiro, Amauri Lopes, João M. T. Romano

DOI: 10.14209/its.2002.834
Keywords:
Abstract
"This paper deals with the application of the Kohonen Self-Organizing Maps (KSOM) to methods of objective speech quality assessment. The performance of the objective methods so far proposed depends on many factors, among which the required mapping between the objective and subjective domains is one of the most important. The purpose of this paper is to present new results about application of the KSOM networks to replace the traditional third-order polynomial mapping. Some distinct network topologies and techniques for the extraction of the signal parameters are presented and tested in the context of the \u201cMedida Objetiva de Qualidade de Voz\u201d (MOQV) objective method applied to situations present in a traditional database."

Download
A Sequential Search Algorithm with Signal-Selected Pulse Amplitudes
Lucas M. J. Barbosa, Luís G. Meloni

DOI: 10.14209/its.2002.840
Keywords:
Abstract
"With the objective of reducing computational complexity in algebraic code-excited linear predictive (ACELP) coders, this paper describes the use of a low-complexity sequential search algorithm with signalselected pulse amplitudes. The described algorithm was inserted in both the ETSI GSM-AMR and ITU-T G.729 codecs, and when compared to some standard algorithms it showed to be quite efficient while causing only a slight degradation in the voice quality."

Download
VTLN Through Frequency Warping Based on Pitch
Carla Lopes, Fernando Perdigão

DOI: 10.14209/its.2002.844
Keywords:
Abstract
"This article describes a vocal tract length normalization (VTLN) procedure through pitch based frequency warping. This procedure aim to reduce de inter-speaker variability, present in speech signals. It is also described a method for coarticulation phenomena compensation, that reduce speech signal variability due to phonetic context. This procedure operates at the phonetic level since makes a modeling of coarticulation events, and at the linguistic level since these units lead to alternative pronunciation rules. Inter speaker variability removal is performed by a traditional speaker normalization method, which consists in expanding or compressing the Mel bank filter bandwidth in order to normalize the vocal tract length (VTL) of each speaker to a standard one. The estimation of VTL is, in previous works, based on formant information, but authors pointed out as an obstacle for better results, the difficulty of formant frequency estimation. The method presented in this paper overcomes such problem since we estimate the warping factor (WF) through pitch. The recognition results presented on this paper, for a telephone digit recognition task prove that this procedure leads to similar improvements to those obtained with traditional methods based on formant information."

Download
CELP speech coding: comparisons in terms of quantization techniques for the synthesis filter parameters
R. S. Maia, R. J. R. Cirigliano, D. Rojtenberg, F. G. V. Resende Jr.

DOI: 10.14209/its.2002.850
Keywords:
Abstract
"Vector quantization of the synthesis filter parameters in Code-Excited Linear Prediction (CELP) speech coders is a common procedure nowadays. This paper describes a CELP coder implementation and makes a comparison in terms of quality and bit rate when vector and scalar quantization of the synthesis filter parameters are employed.Usage of vector quantization in comparison to scalar quantization allows for a bit rate reduction of 340 bps, giving rise to a 4,06 kbps CELP coder, keeping a similar subjective evaluation."

Download
A New On-Line Robust Approach to Design Noise-Immune Speech Recognition Systems
Fabian Vargas, Rubem D. R. Fagundes, Daniel Barros Jr.

DOI: 10.14209/its.2002.855
Keywords: Digital Signal Processing (DSP); Speech-Recognition Systems (SRS); Noise Immunity; On-Line Testing; Software-Based Recovery Blocks (SBRB); Area overhead; Performance Degradation
Abstract
"Hereafter, we present a new approach dealing to couple with the harmful effects of noise on speech recognition systems (SRS). This approach is oriented to hardware redundancy and it is essentially a modification of the classic software-based recovery blocks scheme. When compared to conventional approaches using Fast Fourier Transform (FFT) and Hamming Code, the primary benefit of such a technique is to improve system performance when operating in real (i.e., noisy) environments. The second advantage is related to the considerably low complexity and reduced area overhead required for implementation. We implemented a fully version of the proposed algorithm using the C language. The goal of this implementation is twofold: first, it is used in this paper to illustrate the effectiveness of the proposed ideas and enrich further discussions. Second, it will be used in a near future as a system single-source code specification that will be partitioned and implemented according to a HW-SW codesign methodology we have proposed in previous works."

Download
A Speech Recognition Back-End Algorithm for Portuguese Language with Unlimited Vocabulary
Francisco J. Fraga

DOI: 10.14209/its.2002.861
Keywords:
Abstract
"It is possible to implement a speech-to\u2014text system with unlimited vocabulary by connecting two subsystems: A phoneme recognizer, which is performed by sub-syllabic segmenting the incoming speech, and a phonologic-graphemic converter. This paper presents an automatic speech recognition system with these features. The segmentation method and the phoneme recognizer are briefly described while the phonologic-graphemic converter is detailed. The algorithm that allows the transition from the phoneme level to the word level is based on rules obtained from the structure of the Portuguese language. This task is achieved without any kind of pronouncing tables, which allows the system to recognize any word that belongs to the Portuguese lexicon, without limitation on the size of the vocabulary"

Download
  Página de 19