Exploratory Analysis of Linguistic Data based on Genetic Algorithm an its Application to Robust Modeling of Speech Segmental Duration
Edmilson Morais, Fábio Violaro, Alex Meireles
DOI: 10.14209/sbrt.2005.434
Evento: XXII Simpósio Brasileiro de Telecomunicações (SBrT2005)
Keywords: Speech and language technology speech prosody multivariate linear regression genetic algorithm hierarchical data clustering
Abstract
This work presents a new method for exploratory analysis of linguistic data. This new method is based on Genetic Algorithm and it is used to improve the performance of linear regression models for predicting the segmental duration of speech. The proposed method was compared with Regression Trees and with a baseline Linear Regression model (a Linear Regression with topologies selected using multivariate analysis of variance). The experimental results have shown that the proposed method presents better generalization performance (properties to deal with database imbalance) than the Regression Trees and the baseline Linear Regression model. All the evaluations presented in this article were carried out using an American English database from the Toshiba Speech Technology Laboratory in Cambridge, UK.Download