Neural Vocoding for CycleGAN-Based Voice Conversion
Victor P da Costa, Ranniery Maia, Igor Quintanilha, Sergio Lima Netto, Luiz W. P. Biscainho

DOI: 10.14209/SBRT.2020.1570661673
Evento: XXXVIII Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT2020)
Keywords: Voice Conversion Voice Synthesis Generative Adversarial Networks
Abstract
We propose a voice conversion system leveraging recent developments in both voice synthesis and image morphing, which uses CycleGAN to convert mel-spectrograms and neural vocoders to synthesize the converted signals. To evaluate how different vocoders perform in the task, we synthesize converted mel-spectrograms using WaveNet, WaveRNN and MelGAN vocoders. We compare their performances via listening tests, finding that MelGAN and WaveRNN obtained comparable results while WaveNet obtained worse results for converted speech.

Download