Biblioteca da SBrT

GMM versus AR-Vector Models for Text Independent Speaker Verification

Charles B. de Lima, Abraham Alcaim, José A. Apolinário Jr.

DOI: 10.14209/its.2002.817

Keywords:

Abstract

"This paper presents a performance evaluation of two classification systems for text independent speaker verification: the Gaussian Mixture Model (GMM) and the AR-Vector Model. For the GMM, 32, 16, 8 and Gaussians are evaluated. On the other hand, an order 2 model with the Itakura symmetric distance was used for the AR-Vector. Both classification systems presented no errors when training and testing times were not smaller than 60s and 30s, respectively. Using s as the test time, the most accurate classification systems errors were between 0.4 and 3.3%. With 3s test, the errors presented by the GMM were around 6 to 7% whereas those for the AR-Vector were above 10%. However, the best results using 10s as testing and training times were obtained with the AR-Vector, with errors around 3.2%."

Download

Comparison of a Low Bit Rate Speech Compression Algorithm with the MELP Standard for Fricatives and Stops

Rodrigo C. de Lamare, Abraham Alcaim

DOI: 10.14209/its.2002.822

Keywords:

Abstract

"In this paper we examine the fricatives and stops encoding of a very low bit rate speech compression algorithm, based on a mixed multiband excitation system. The algorithm incorporates several improvements over previously reported coders. One of them is the use of a specific modelling and synthesis strategy for fricatives and stops at 400 b/s. The codec, which operates at an average rate of 1.2 kb/s, is compared with the North American standard 2.4 kb/s MELP coder for sentences with a large concentration of fricatives and stops sounds. Subjective listening tests indicate that the two codecs are comparable for both types of sounds, although the sound specific scheme operates at 400 b/s, whilst the MELP operates at 2.4 kb/s."

Download

Analysis of Postfilters for Low Bit Rate Speech Coders in Tandem Connections

Rodrigo C. de Lamare, Abraham Alcaim

DOI: 10.14209/its.2002.825

Keywords:

Abstract

"In this paper we analyse postfiltering techniques for very low bit rate speech coders in tandem connections. A mixed multiband excitation (MMBE) linear predictive coding (LPC) algorithm, that encodes voiced frames at 1.75 kb/s and unvoiced frames at 0.4 kb/s, is employed to assess the performance of different postfilters in tandem connections. We perform a comparative analysis of the well known adaptive spectral enhancement (ASE) technique with a recently reported approach, called spectral envelope restoration combined with noise reduction (SERNR) postfilter, using the same MMBE platform. Subjective listening tests in tandem connections show that the SERNR technique is clearly superior to the ASE postfilter."

Download

On the effect of the language in CMS channel normalization

Dirceu Gonzaga da Silva, José A. Apolinário Jr., Charles B. de Lima

DOI: 10.14209/its.2002.829

Keywords:

Abstract

"This paper presents a modification in a technique of channel normalization widely known as Cepstral Mean Subtraction (CMS). This modification is based on the introduction of language dependent phonetic modification. A careful investigation using Brazilian Portuguese was carried out showing that it is possible to improve the CMS channel identification through a constant vector, associated with the language, obtained from an estimation of the mean cepstral coefficients from clean speech signal over time. As a consequence of better channel estimation, better features normalization is attained. Computer simulations were carried out with cepstral coefficients extracted from Mel-scale in a speaker identification experiment where the proposed technique, in some cases, improved the recognition rate on the top of the CMS good results."

Download

Estimation of the Subjective Quality of Speech Signals using the Kohonen Self-Organizing Maps

Jayme G. A. Barbedo, Moisés V. Ribeiro, Amauri Lopes, João M. T. Romano

DOI: 10.14209/its.2002.834

Keywords:

Abstract

"This paper deals with the application of the Kohonen Self-Organizing Maps (KSOM) to methods of objective speech quality assessment. The performance of the objective methods so far proposed depends on many factors, among which the required mapping between the objective and subjective domains is one of the most important. The purpose of this paper is to present new results about application of the KSOM networks to replace the traditional third-order polynomial mapping. Some distinct network topologies and techniques for the extraction of the signal parameters are presented and tested in the context of the \u201cMedida Objetiva de Qualidade de Voz\u201d (MOQV) objective method applied to situations present in a traditional database."

Download

A Sequential Search Algorithm with Signal-Selected Pulse Amplitudes

Lucas M. J. Barbosa, Luís G. Meloni

DOI: 10.14209/its.2002.840

Keywords:

Abstract

"With the objective of reducing computational complexity in algebraic code-excited linear predictive (ACELP) coders, this paper describes the use of a low-complexity sequential search algorithm with signalselected pulse amplitudes. The described algorithm was inserted in both the ETSI GSM-AMR and ITU-T G.729 codecs, and when compared to some standard algorithms it showed to be quite efficient while causing only a slight degradation in the voice quality."

Download

VTLN Through Frequency Warping Based on Pitch

Carla Lopes, Fernando Perdigão

DOI: 10.14209/its.2002.844

Keywords:

Abstract

"This article describes a vocal tract length normalization (VTLN) procedure through pitch based frequency warping. This procedure aim to reduce de inter-speaker variability, present in speech signals. It is also described a method for coarticulation phenomena compensation, that reduce speech signal variability due to phonetic context. This procedure operates at the phonetic level since makes a modeling of coarticulation events, and at the linguistic level since these units lead to alternative pronunciation rules. Inter speaker variability removal is performed by a traditional speaker normalization method, which consists in expanding or compressing the Mel bank filter bandwidth in order to normalize the vocal tract length (VTL) of each speaker to a standard one. The estimation of VTL is, in previous works, based on formant information, but authors pointed out as an obstacle for better results, the difficulty of formant frequency estimation. The method presented in this paper overcomes such problem since we estimate the warping factor (WF) through pitch. The recognition results presented on this paper, for a telephone digit recognition task prove that this procedure leads to similar improvements to those obtained with traditional methods based on formant information."

Download

CELP speech coding: comparisons in terms of quantization techniques for the synthesis filter parameters

R. S. Maia, R. J. R. Cirigliano, D. Rojtenberg, F. G. V. Resende Jr.

DOI: 10.14209/its.2002.850

Keywords:

Abstract

"Vector quantization of the synthesis filter parameters in Code-Excited Linear Prediction (CELP) speech coders is a common procedure nowadays. This paper describes a CELP coder implementation and makes a comparison in terms of quality and bit rate when vector and scalar quantization of the synthesis filter parameters are employed.Usage of vector quantization in comparison to scalar quantization allows for a bit rate reduction of 340 bps, giving rise to a 4,06 kbps CELP coder, keeping a similar subjective evaluation."

Download

A New On-Line Robust Approach to Design Noise-Immune Speech Recognition Systems

Fabian Vargas, Rubem D. R. Fagundes, Daniel Barros Jr.

DOI: 10.14209/its.2002.855

Keywords: Digital Signal Processing (DSP); Speech-Recognition Systems (SRS); Noise Immunity; On-Line Testing; Software-Based Recovery Blocks (SBRB); Area overhead; Performance Degradation

Abstract

"Hereafter, we present a new approach dealing to couple with the harmful effects of noise on speech recognition systems (SRS). This approach is oriented to hardware redundancy and it is essentially a modification of the classic software-based recovery blocks scheme. When compared to conventional approaches using Fast Fourier Transform (FFT) and Hamming Code, the primary benefit of such a technique is to improve system performance when operating in real (i.e., noisy) environments. The second advantage is related to the considerably low complexity and reduced area overhead required for implementation. We implemented a fully version of the proposed algorithm using the C language. The goal of this implementation is twofold: first, it is used in this paper to illustrate the effectiveness of the proposed ideas and enrich further discussions. Second, it will be used in a near future as a system single-source code specification that will be partitioned and implemented according to a HW-SW codesign methodology we have proposed in previous works."

Download

A Speech Recognition Back-End Algorithm for Portuguese Language with Unlimited Vocabulary

Francisco J. Fraga

DOI: 10.14209/its.2002.861

Keywords:

Abstract

"It is possible to implement a speech-to\u2014text system with unlimited vocabulary by connecting two subsystems: A phoneme recognizer, which is performed by sub-syllabic segmenting the incoming speech, and a phonologic-graphemic converter. This paper presents an automatic speech recognition system with these features. The segmentation method and the phoneme recognizer are briefly described while the phonologic-graphemic converter is detailed. The algorithm that allows the transition from the phoneme level to the word level is based on rules obtained from the structure of the Portuguese language. This task is achieved without any kind of pronouncing tables, which allows the system to recognize any word that belongs to the Portuguese lexicon, without limitation on the size of the vocabulary"

Download

2002 International Telecommunications Symposium

GMM versus AR-Vector Models for Text Independent Speaker Verification

Charles B. de Lima, Abraham Alcaim, José A. Apolinário Jr.

DOI: 10.14209/its.2002.817

Keywords:

Abstract

Comparison of a Low Bit Rate Speech Compression Algorithm with the MELP Standard for Fricatives and Stops

Rodrigo C. de Lamare, Abraham Alcaim

DOI: 10.14209/its.2002.822

Keywords:

Abstract

Analysis of Postfilters for Low Bit Rate Speech Coders in Tandem Connections

Rodrigo C. de Lamare, Abraham Alcaim

DOI: 10.14209/its.2002.825

Keywords:

Abstract

On the effect of the language in CMS channel normalization

Dirceu Gonzaga da Silva, José A. Apolinário Jr., Charles B. de Lima

DOI: 10.14209/its.2002.829

Keywords:

Abstract

Estimation of the Subjective Quality of Speech Signals using the Kohonen Self-Organizing Maps

Jayme G. A. Barbedo, Moisés V. Ribeiro, Amauri Lopes, João M. T. Romano

DOI: 10.14209/its.2002.834

Keywords:

Abstract

A Sequential Search Algorithm with Signal-Selected Pulse Amplitudes

Lucas M. J. Barbosa, Luís G. Meloni

DOI: 10.14209/its.2002.840

Keywords:

Abstract

VTLN Through Frequency Warping Based on Pitch

Carla Lopes, Fernando Perdigão

DOI: 10.14209/its.2002.844

Keywords:

Abstract

CELP speech coding: comparisons in terms of quantization techniques for the synthesis filter parameters

R. S. Maia, R. J. R. Cirigliano, D. Rojtenberg, F. G. V. Resende Jr.

DOI: 10.14209/its.2002.850

Keywords:

Abstract

A New On-Line Robust Approach to Design Noise-Immune Speech Recognition Systems

Fabian Vargas, Rubem D. R. Fagundes, Daniel Barros Jr.

DOI: 10.14209/its.2002.855

Keywords: Digital Signal Processing (DSP); Speech-Recognition Systems (SRS); Noise Immunity; On-Line Testing; Software-Based Recovery Blocks (SBRB); Area overhead; Performance Degradation

Abstract

A Speech Recognition Back-End Algorithm for Portuguese Language with Unlimited Vocabulary

Francisco J. Fraga

DOI: 10.14209/its.2002.861

Keywords:

Abstract