2002 International Telecommunications Symposium

  Página de 19  

A Brazilian Portuguese TTS Based on HMMS
Guilherme de Oliveira Pinto, Filipe Leandro de F. Barbosa, Fernando Gil V. Resende Jr.

DOI: 10.14209/its.2002.867
Keywords:
Abstract
"This paper presents a Brazilian Portuguese TTS based on HMMs, which uses mel-cepstral coefficients as parameters of speech. We implemented an algorithm which performs a phoneme-based transcription to the Portuguese spoken in Rio de Janeiro. For a given text to be synthesized, after the phoneme transcription, static features are extracted from a sentence HMM, generated by concatenating HMMs sub-word units. An algorithm for choosing the best set of those speech units was also developed. Subjective tests show that the proposed TTS gives better results than a PSOLA based on syllabic units, with the advantage of easy speaker adaptation."

Download
Introducing a new Phonetic Model for Continuous Speech Recognition Systems
Rubem Dutra Ribeiro Fagundes, Juarez Sagebin Corrêa, Pierre Dumouchel

DOI: 10.14209/its.2002.872
Keywords:
Abstract
"The main goal of this work is to describe a new model for a large vocabulary continuous speech recognition system using a phonetic-phonological approach. This work proposes a statistical phonetic structure, applied at the phonetic-phonological level, to improve the speech recognition performance in systems with phonetic-phonological modeling. It is showed that the general likelihood scores are increased, indicating better recognition performances. This is due to the fact that the statistical phonetic structure will lead to enhance some frequent phonetic combinations from the language itself. Such structure should be considered as an additional knowledge base, containing information about the real language phonetic structure. Also this new phonetic-phonological approach should be strongly recommended to use in spontaneous speech recognition systems."

Download
Automatic Speaker Indexing in Corrupted Speech
H. Sayoud, S. Ouamour, M. Boudraa

DOI: 10.14209/its.2002.876
Keywords:
Abstract
"Speaker indexing can broadly be divided into two problems: Locating the points of speaker change (Segmentation) and Identifying the speaker in each segment (Labeling). An important obstacle, in the speaker tracking, is the corruption of the speech signal during its recording or in a telephonic conversation. In this paper, we are interested in the corruption of the speech signal by the most probable noises during audiovisual recording and the mixture of the speech signal with music, in order to test the robustness of our speaker tracking method. For this purpose, we choose the SOSM method (Second Order Statistical Measures), applied for segments of 2 seconds duration with an overlapping of 50%. The speaker indexing becomes very difficult if the recordings are made in a noisy environment or if music is mixed with the speech. The evaluation of our method is done in TIMIT, and each discussion consists on sequences of speech signals uttered by 2 different speakers, concatenated into one speech file (the speakers are arbitrarily chosen from a population of 37 different speakers). So, each speech file contains several speaker transitions by file. In a second step, we have corrupted the database by three types of noise, namely: the office noise, the human noise and the background noise. Moreover, we have inserted music inside the discussion signals, for example, at the beginning, at the middle and at the end of the discussion. The results got are severely discussed according to each case: clean environment, noised environment, presence of music, etc. As an example, the error rate of the tracking varies from 5%, in a clean environment, to 34%, in a noised environment (+6 dB). Moreover, we remark that the error rate increases when the SNR decreases. Concerning the music, we remark that the speaker indexing is not perturbed by the concatenation of the music sequences, which is interesting in the case of the musical advertisement."

Download
Prosodic Speech Modification Using RELP
Fernando S. Pacheco, Rui Seara

DOI: 10.14209/its.2002.882
Keywords:
Abstract
"This paper proposes a method to prosodic speech modification based on residual-excited linear predictive coding (RELP) applied to speech synthesis. In this way, pitch and time scale modifications are carried out requiring a very low computational complexity. Formal subjective test comparing the proposed approach with TD-PSOLA, under conditions of increasing and decreasing the pitch, are achieved. For all tested cases, the obtained results show a better performance of the proposed approach as compared with TD-PSOLA."

Download
Tree Organized Lexicons and Beam Search: Implementational Issues
Carlos Alberto Ynoguti, Fábio Violaro

DOI: 10.14209/its.2002.887
Keywords:
Abstract
"The aim of this work is to enlighten some implementational issues on two widely used techniques for reduction in search space for large vocabulary speech recognition systems: tree organized lexicons and Viterbi Beam Search. We show how we can exploit some symmetries of the search space to use these techniques together and achieve a great reduction in computational load and, consequently, in recognition times. Tests showed that recognition times fall down by a factor of 5 using this procedure."

Download
LSP Transcoding Between G729A and GSM AMR
Helder C. Bertan, Luís G. P. Meloni

DOI: 10.14209/its.2002.892
Keywords:
Abstract
"This paper presents a new method for Linear Spectrum Pair transcoding between G.729A and GSM AMR codecs. The transcoding in the bitstream domain gives better quality and lower complexity than the conventional method in the speech domain. Several simulation results are presented using Perceptual Quality Speech Measure showing a gain of about 0.45 in this measure. This scheme is part of a complete transcoding process under development."

Download
Speaker Adaptation Using Eigenvoices Technique
Liselene de Abreu Borges, Miguel Arjona Ramírez, Rubem Dutra Ribeiro Fagundes

DOI: 10.14209/its.2002.897
Keywords:
Abstract
"This paper discusses speech recognition systems (SRS) using speaker adaptation techniques. The most recent speech recognition systems use Hidden Markov Models (HMM). For such systems, the eigenvoices speaker adaptation technique presents the best performance among other techniques usually suggested by researchers. This performance is due mainly to the limited amount of data necessary to perform speaker adaptation. In our experiments we have reached improvements of around 10% in speaker adaptation system compared with the corresponding independent speaker speech recognition system and just using a very small fraction of speakers’ data. "

Download
Analysis of a Semi-Blind Beamspace-Time Interference Cancellation for WCDMA Systems in Microcellular Environments
Ivan R. S. Casella, Elvino S. Sousa, Paul Jean E. Jeszensky

DOI: 10.14209/its.2002.902
Keywords:
Abstract
"In this paper, we investigate the performance of a semi-blind spatial-temporal beamforming receiver for an asynchronous high data rate direct sequence wideband code division multiple access (DS-WCDMA) system in a microcellular environment. The presented receiver uses subspace channel identification to perform joint channel equalization, multipath energy combination and spatial interference cancellation. The simulation results show a significant performance improvement and reduction of required training symbols when compared against a receiver employing a training-based spatial-temporal recursive least squares (RLS) beamformer."

Download
Hierarchical Recursive Least Squares Space-Time Interference Cancellation for DS-WCDMA Systems
Ivan R. S. Casella, Elvino S. Sousa, Paul Jean E. Jeszensky

DOI: 10.14209/its.2002.908
Keywords:
Abstract
"In this paper, a new training-based spatial-temporal receiver is proposed for an asynchronous wideband direct sequence code division multiple access (DS-CDMA) system. The presented receiver employs a hierarchical recursive least squares (HRLS) algorithm to reduce the computational complexity."

Download
Multirate Performance in Multiuser MMSE and Decorrelating Detectors using Random Spreading Sequences
Gustavo Fraidenraich, Renato Baldini F., Celso de Almeida

DOI: 10.14209/its.2002.913
Keywords: Multiuser CDMA multirate MMSE decorrelating
Abstract
"This paper presents simplified expressions for the mean bit error probability using random spreading sequences on AWGN and multipath Rayleigh fading channels using the multiuser MMSE and decorrelating detectors. It is assumed the multi processing gain schemes of multirate."

Download
  Página de 19