Um Sistema TTS Baseado em Redes Neurais Profundas Usando Parâmetros Síncronos de Pitch
Ranniery Maia, Rui Seara

DOI: 10.14209/sbrt.2017.148
Evento: XXXV Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT2017)
Keywords: Deep learning deep neural networks speech synthesis
Abstract
In speech synthesis systems based on deep neural networks (DNN), training is usually conducted by using acoustic feature vectors extracted from the speech signal at a fixed frame rate. This paper presents some approaches to use pitch-sychronous acoustic features in speech synthesizers based on DNN, with the goal to improve synthetic speech quality. Experimental results show that the use of frame-based linguistic features, along with pitch-synchronously extracted acoustic parameters, produce better results in terms of objective quality measures.

Download