Comparison of real-time multi-speaker neural vocoders on CPUs
[摘要] Text-to-speech (TTS) and voice conversion are important technologies of speech processing that have long been actively researched. In recent years, deep neural networks have realized great developments in the performance of these technologies, and the naturalness of synthetic speech has become very close to that of natural speech [1,2]. Notably, neural vocoders, which reconstruct speech waveforms from acoustic features using neural networks, can synthesize higher-quality speech waveforms than conventional sourcefilter vocoders. Starting with the introduction of the WaveNet vocoder [3], various neural vocoders have been proposed to date, and they have greatly contributed to the development of neural speech synthesis.
[发布日期] [发布机构]
[效力级别] [学科分类] 声学和超声波
[关键词] Speech synthesis;Neural vocoder;HiFi-GAN;MWDLP;LPCNet [时效性]