Romanian Broadcast News

This speech and text dataset has been collected and processed with the available Simple4All tools in order to demonstrate the performance of speaking style adaptation for the Romanian baseline synthetic voice already created from the standard RSS corpus towards the prosody and expressive style of the main presenter of the broadcast news.

Speaker diarization toolkit Dexter is used to automatically select the speech of the main presenter from the broadcasted news (music, noise and speaker discrimination), then the Voice Cloning Toolkit is used for speaking style adaptation.

The dataset contains about 6 hours of Romanian broadcasted news: peech, noise overlapped over speech, music, overlapping speakers. Each of the records is about 10 mins long with the text available for the main presenter. To evaluate the speaker diarization performance, five of the broadcasted news are labeled at the speaker level and the corresponding RTTM labels are provided. The speech of the main presenter is automatically extracted and about 50 mins of speech is obtained and labeled with the available text at the sentence level. This data is used to demonstrate the speaking style adaptation.

- the synthetic baseline voice: [sample 1][sample 2]
- the natural voice of the main presenter of the news: [sample 1][sample 2];
- the new adapted voice: [sample 1][sample 2].

The provided speech and text datasets are licensed unde a Creative Commons Attribution 3.0 Unported License.