Towards an end-to-end speech recognizer for Portuguese using deep neural networks
gor Macedo Quintanilha, Luiz Wagner Pereira Biscainho, Sergio Lima Netto

DOI: 10.14209/sbrt.2017.73
Evento: XXXV Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT2017)
Keywords: deep learning speech recognition recurrent networks connectionist temporal classification
Abstract
This paper presents an open-source character-based end-to-end speech recognition system for Brazilian Portuguese (PT-BR). The first step of the work was the development of a PT-BR dataset—an ensemble of 4 previous datasets (of which 3 publicly available). The model trained on this dataset is a bidirectional long short-term memory network using connectionist temporal classification for end-to-end training. Several tests were conducted to find the best set of hyperparameters. Without a language model, the system achieves a label error rate of 31.53% on the test set, about 17% higher than commercial systems with a language model. This first effort shows that an all-neural highperformance speech recognition system for PT-BR is feasible.

Download