Um Sistema TTS Baseado em Redes Neurais Profundas Usando Parâmetros Síncronos de Pitch

Ranniery Maia; Rui Seara

doi:10.14209/sbrt.2017.148

Um Sistema TTS Baseado em Redes Neurais Profundas Usando Parâmetros Síncronos de Pitch

Ranniery Maia, Rui Seara

DOI: 10.14209/sbrt.2017.148

Evento: XXXV Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT2017)

Keywords: Deep learning deep neural networks speech synthesis

Abstract

In speech synthesis systems based on deep neural networks (DNN), training is usually conducted by using acoustic feature vectors extracted from the speech signal at a fixed frame rate. This paper presents some approaches to use pitch-sychronous acoustic features in speech synthesizers based on DNN, with the goal to improve synthetic speech quality. Experimental results show that the use of frame-based linguistic features, along with pitch-synchronously extracted acoustic parameters, produce better results in terms of objective quality measures.

Download