Skip to content

Sociedade Brasileira de Telecomunicações

Um Sistema TTS Baseado em Redes Neurais Profundas Usando Parâmetros Síncronos de Pitch

In speech synthesis systems based on deep neural networks (DNN), training is usually conducted by using acoustic feature vectors extracted from the speech signal at a fixed frame rate. This paper presents some approaches to use pitch-sychronous acoustic features in speech synthesizers based on DNN, with the goal to improve synthetic speech quality. Experimental results show that the use of frame-based linguistic features, along with pitch-synchronously extracted acoustic parameters, produce better results in terms of objective quality measures.

Autores :

Estatatísticas de Acesso


Total de visitas: 4

Downloads do artigo: 1