Neural Vocoding for CycleGAN-Based Voice Conversion
Victor P da Costa, Ranniery Maia, Igor Quintanilha, Sergio Lima Netto, Luiz W. P. Biscainho

DOI: 10.14209/SBRT.2020.1570661673
Evento: XXXVIII Simpósio Brasileiro de Telecomunicações e Processamento de Sinais (SBrT2020)
Keywords: Voice Conversion Voice Synthesis Generative Adversarial Networks
We propose a voice conversion system leveraging recent developments in both voice synthesis and image morphing, which uses CycleGAN to convert mel-spectrograms and neural vocoders to synthesize the converted signals. To evaluate how different vocoders perform in the task, we synthesize converted mel-spectrograms using WaveNet, WaveRNN and MelGAN vocoders. We compare their performances via listening tests, finding that MelGAN and WaveRNN obtained comparable results while WaveNet obtained worse results for converted speech.
