Neutral TTS Female Voice Corpus in Brazilian Portuguese

Pedro H. L. Leite; Edmundo Hoyle; Álvaro Antelo; Luiz F. Kruszielski; Luiz W. P. Biscainho

doi:10.14209/sbrt.2023.1570917697

Neutral TTS Female Voice Corpus in Brazilian Portuguese

Pedro H. L. Leite, Edmundo Hoyle, Álvaro Antelo, Luiz F. Kruszielski, Luiz W. P. Biscainho

DOI: 10.14209/sbrt.2023.1570917697

Evento: XLI Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT2023)

Keywords: Text-to-speech Brazilian Portuguese dataset speech synthesis

Abstract

This paper introduces a new dataset designed to address the limitations in high-quality, diverse and representative datasets for training text-to-speech (TTS) models, specifically for female voices in Brazilian Portuguese. The dataset features a female voice recorded in a professional and controlled environment with neutral emotion and comprises more than 20 hours of recordings. The goal is to facilitate transfer learning and enable the development of more natural-sounding, high-quality, and gender-balanced TTS systems. Alongside the dataset, gender-aware voice transfer experiments are performed to understand the impact of utilizing gender-specific pretrained models for speech synthesis. The results obtained show that same-gender voice transfer yields better speech similarity and intelligibility when compared to cross-gender transfer, emphasizing the importance of gender-aware training procedures and highlighting the need for balanced gender data.

Download