Karhila, Reima, Remes, Ulpu, Kurimo, Mikko (2013): HMM-Based Speech Synthesis Adaptation Using Noisy Data: Analysis and Evaluation Methods. In: Proceedings of ICASSP-13, 2013, (Accepted to ICASSP 2013). (Type: Inproceeding | Abstract | BibTeX | Tags: Adaptation, Evaluation, Feature extraction, Noise robustness, Speech synthesis)@inproceedings{Karhila_icassp13,
title = {HMM-Based Speech Synthesis Adaptation Using Noisy Data: Analysis and Evaluation Methods},
author = {Karhila, Reima and Remes, Ulpu and Kurimo, Mikko},
year = {2013},
date = {2013-01-14},
booktitle = {Proceedings of ICASSP-13},
journal = {ICASSP 13},
abstract = {This paper investigates the role of noise in speaker-adaptation of HMM-based text-to-speech (TTS) synthesis and presents a new evaluation procedure. Both a new listening test based on ITU-T recommendation 835 and a perceptually motivated objective mea- sure, frequency-weighted segmental SNR, improve the evaluation of synthetic speech when noise is present. The evaluation of voices adapted with noisy data show that the noise plays a relatively small but noticeable role in the quality of synthetic speech: Naturalness and speaker similarity are not affected in a significant way by the noise, but listeners prefer the voices trained from cleaner data. Noise removal, even when it degrades natural speech quality, improves the synthetic voice.},
note = {Accepted to ICASSP 2013},
keywords = {Adaptation, Evaluation, Feature extraction, Noise robustness, Speech synthesis}
}
This paper investigates the role of noise in speaker-adaptation of HMM-based text-to-speech (TTS) synthesis and presents a new evaluation procedure. Both a new listening test based on ITU-T recommendation 835 and a perceptually motivated objective mea- sure, frequency-weighted segmental SNR, improve the evaluation of synthetic speech when noise is present. The evaluation of voices adapted with noisy data show that the noise plays a relatively small but noticeable role in the quality of synthetic speech: Naturalness and speaker similarity are not affected in a significant way by the noise, but listeners prefer the voices trained from cleaner data. Noise removal, even when it degrades natural speech quality, improves the synthetic voice.
|