Antti Suni, Tuomo Raitio, Dhananjaya Gowda, Reima Karhila, Matt Gibson, Oliver Watts (2014): The Simple4All entry to the Blizzard Challenge 2014. In: Proc. Blizzard Challenge 2014 Workshop, Singapore, 2014. (Type: Inproceeding | Links | BibTeX | Tags: Deep neural network, glottal flow pulse library, glottal inverse filtering, statistical parametric speech synthesis, unsupervised learning, vector space model)@inproceedings{Suni2014,
title = {The Simple4All entry to the Blizzard Challenge 2014},
author = {Antti Suni and Tuomo Raitio and Dhananjaya Gowda and Reima Karhila and Matt Gibson and Oliver Watts},
url = {http://consortium.simple4all.org/files/2014/10/suni14.pdf},
year = {2014},
date = {2014-09-19},
booktitle = {Proc. Blizzard Challenge 2014 Workshop},
address = {Singapore},
keywords = {Deep neural network, glottal flow pulse library, glottal inverse filtering, statistical parametric speech synthesis, unsupervised learning, vector space model}
}
|
Tuomo Raitio, Heng Lu, John Kane, Antti Suni, Martti Vainio, Simon King, Paavo Alku (2014): Voice source modelling using deep neural networks for statistical parametric speech synthesis. In: Proc. of the 22nd European Signal Processing Conference (EUSIPCO), Lisbon, Portugal, 2014. (Type: Inproceeding | Links | BibTeX | Tags: Deep neural network, DNN, glottal flow, statistical parametric speech synthesis, voice source modelling)@inproceedings{Raitio14b,
title = {Voice source modelling using deep neural networks for statistical parametric speech synthesis},
author = {Tuomo Raitio and Heng Lu and John Kane and Antti Suni and Martti Vainio and Simon King and Paavo Alku},
url = {http://consortium.simple4all.org/files/2014/10/raitio14b.pdf},
year = {2014},
date = {2014-09-01},
booktitle = {Proc. of the 22nd European Signal Processing Conference (EUSIPCO)},
address = {Lisbon, Portugal},
keywords = {Deep neural network, DNN, glottal flow, statistical parametric speech synthesis, voice source modelling}
}
|
Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku (2013): Comparing Glottal-Flow-Excited Statistical Parametric Speech Synthesis Methods. In: Proc. ICASSP 2013, 2013. (Type: Inproceeding | Abstract | Links | BibTeX | Tags: excitation, glottal flow, principal component analysis, pulse library, statistical parametric speech synthesis)@inproceedings{Raitio13a,
title = {Comparing Glottal-Flow-Excited Statistical Parametric Speech Synthesis Methods},
author = {Tuomo Raitio and Antti Suni and Martti Vainio and Paavo Alku},
url = {http://consortium.simple4all.org/files/2013/01/icassp_raitio_et_al.pdf},
year = {2013},
date = {2013-01-14},
booktitle = {Proc. ICASSP 2013},
abstract = {This paper studies the performance of glottal flow signal based excitation methods in statistical parametric speech synthesis. The current state of the art in excitation modeling is reviewed and three excitation methods are selected for experiments. Two of the methods are based on the principal component analysis (PCA) decomposition of estimated glottal flow pulses. While the first one uses only the mean of the pulses, the second method uses 12 principal components in addition to the mean signal for modeling the glottal flow waveform. The third method utilizes a glottal flow pulse library from which pulses are selected according to target and concatenation costs. Subjective listening tests are carried out to determine the quality and similarity of the synthetic speech of one male and one female speaker. The results show that the PCA-based methods are rated best both in quality and similarity, but adding more components does not yield any improvements.},
keywords = {excitation, glottal flow, principal component analysis, pulse library, statistical parametric speech synthesis}
}
This paper studies the performance of glottal flow signal based excitation methods in statistical parametric speech synthesis. The current state of the art in excitation modeling is reviewed and three excitation methods are selected for experiments. Two of the methods are based on the principal component analysis (PCA) decomposition of estimated glottal flow pulses. While the first one uses only the mean of the pulses, the second method uses 12 principal components in addition to the mean signal for modeling the glottal flow waveform. The third method utilizes a glottal flow pulse library from which pulses are selected according to target and concatenation costs. Subjective listening tests are carried out to determine the quality and similarity of the synthetic speech of one male and one female speaker. The results show that the PCA-based methods are rated best both in quality and similarity, but adding more components does not yield any improvements.
|
Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku (2013): Synthesis and Perception of Breathy, Normal, and Lombard Speech in the Presence of Noise. In: Special issue of Computer Speech and Language on 'The Listening Talker', 2013. (Type: Article | Abstract | Links | BibTeX | Tags: Adaptation, Breathy speech, intelligibility, Lombard speech, statistical parametric speech synthesis, Vocal effort)@article{Raitio13b,
title = {Synthesis and Perception of Breathy, Normal, and Lombard Speech in the Presence of Noise},
author = {Tuomo Raitio and Antti Suni and Martti Vainio and Paavo Alku},
url = {http://dx.doi.org/10.1016/j.csl.2013.03.003},
year = {2013},
date = {2013-01-14},
journal = {Special issue of Computer Speech and Language on 'The Listening Talker'},
abstract = {This papers studies the synthesis of speech on a wide vocal effort continuum and its perception in the presence of noise. Three types of speech is recorded and studied along the continuum: breathy, normal, and Lombard speech. Corresponding synthetic voices are created by training and adapting statistical parametric speech synthesis system GlottHMM. Natural and synthetic speech along the continuum is assessed in listening tests that evaluate the intelligibility, quality, and suitability of speech in three different realistic multichannel noise conditions: silence, moderate street noise, and extreme street noise. The evaluation results are encouraging in showing that the synthesized voices with varying vocal effort are rated similarly to their natural counterparts both in terms of intelligibility and suitability.},
keywords = {Adaptation, Breathy speech, intelligibility, Lombard speech, statistical parametric speech synthesis, Vocal effort}
}
This papers studies the synthesis of speech on a wide vocal effort continuum and its perception in the presence of noise. Three types of speech is recorded and studied along the continuum: breathy, normal, and Lombard speech. Corresponding synthetic voices are created by training and adapting statistical parametric speech synthesis system GlottHMM. Natural and synthetic speech along the continuum is assessed in listening tests that evaluate the intelligibility, quality, and suitability of speech in three different realistic multichannel noise conditions: silence, moderate street noise, and extreme street noise. The evaluation results are encouraging in showing that the synthesized voices with varying vocal effort are rated similarly to their natural counterparts both in terms of intelligibility and suitability.
|
Antti Suni, Tuomo Raitio, Martti Vainio, Paavo Alku (2012): The GlottHMM Entry for Blizzard Challenge 2012: Hybrid Approach. In: Proc. of the Blizzard Challenge 2012 Workshop, 2012. (Type: Inproceeding | Abstract | BibTeX | Tags: glottal inverse filtering, glottal flow pulse library, hybrid, statistical parametric speech synthesis)@inproceedings{suni2012blizzard,
title = {The GlottHMM Entry for Blizzard Challenge 2012: Hybrid Approach},
author = {Antti Suni, Tuomo Raitio, Martti Vainio, Paavo Alku},
year = {2012},
date = {2012-09-14},
booktitle = {Proc. of the Blizzard Challenge 2012 Workshop},
abstract = {This paper describes the GlottHMM speech synthesis system for Blizzard Challenge 2012. The aim of the GlottHMM system is to combine high-quality vocoding and detailed prosody modeling in order to produce expressive, high quality synthetic speech. GlottHMM is based on statistical parametric speech synthesis, but it uses a glottal flow pulse library for generating the excitation signal. Thus, it can be regarded as a hybrid system using the pulses as concatenative units that are selected according to the statistically generated voice source feature trajectories. This year’s speech material was challenging, but despite that we were able to achieve a clean, intelligible voice with decent above average prosody characteristics.},
keywords = {glottal inverse filtering, glottal flow pulse library, hybrid, statistical parametric speech synthesis}
}
This paper describes the GlottHMM speech synthesis system for Blizzard Challenge 2012. The aim of the GlottHMM system is to combine high-quality vocoding and detailed prosody modeling in order to produce expressive, high quality synthetic speech. GlottHMM is based on statistical parametric speech synthesis, but it uses a glottal flow pulse library for generating the excitation signal. Thus, it can be regarded as a hybrid system using the pulses as concatenative units that are selected according to the statistically generated voice source feature trajectories. This year’s speech material was challenging, but despite that we were able to achieve a clean, intelligible voice with decent above average prosody characteristics.
|
Tuomo Raitio, Antti Suni, Martti Vainio, Paavo Alku (2012): Wideband Parametric Speech Synthesis Using Warped Linear Prediction. In: Proc. Interspeech 2012, 2012, ISSN: 1990-9770. (Type: Inproceeding | Abstract | Links | BibTeX | Tags: statistical parametric speech synthesis, warped linear prediction, wide-band, WLP)@inproceedings{Raitio12,
title = {Wideband Parametric Speech Synthesis Using Warped Linear Prediction},
author = {Tuomo Raitio and Antti Suni and Martti Vainio and Paavo Alku},
url = {http://consortium.simple4all.org/files/2012/04/is2012.pdf},
issn = {1990-9770},
year = {2012},
date = {2012-09-09},
booktitle = {Proc. Interspeech 2012},
abstract = {This paper studies the use of warped linear prediction (WLP) for wideband parametric speech synthesis. As the sampling frequency is increased from the usual 16 kHz, linear frequency resolution of conventional linear prediction (LP) cannot efficiently model the speech spectrum. By using frequency warping that weights perceptually the most important formant information, spectral models with better accuracy and lower model orders can be utilized. In this work, WLP is embedded in a parametric speech synthesizer to efficiently create wideband synthetic speech. Experiments show that WLP-based wideband synthetic speech is rated better compared to narrowband speech and wideband LP-based speech.},
keywords = {statistical parametric speech synthesis, warped linear prediction, wide-band, WLP}
}
This paper studies the use of warped linear prediction (WLP) for wideband parametric speech synthesis. As the sampling frequency is increased from the usual 16 kHz, linear frequency resolution of conventional linear prediction (LP) cannot efficiently model the speech spectrum. By using frequency warping that weights perceptually the most important formant information, spectral models with better accuracy and lower model orders can be utilized. In this work, WLP is embedded in a parametric speech synthesizer to efficiently create wideband synthetic speech. Experiments show that WLP-based wideband synthetic speech is rated better compared to narrowband speech and wideband LP-based speech.
|