The dataset contains speech and text of Romanian parliamentary speeches from various public meetings in the period 2011-2014. The text is aimed to be used for the political style genre modeling and detection. Audio data may be used for creation of a synthetic neutral political voice.
CONTENTS:
The speech material is labeled at the speaker level and the text has been checked to correspond to the speech. The recordings are realized at the sampling frequency of 44KHz, 16 bits/sample. The total amount of speech data is: 21 hours of speech, 339 speakers acting in 1031 speech interventions. The audio data has a certainreverberation, which correspond to the realistic recordings in the parliamentary meeting room.
Each folder contains the following files:
* YYYY_MM_DD.wav (the audio recording of the meeting from DD/MM/YYYY);
* YYYY_MM_DD.txt (the text corresponding to the whole meeting);
* Label_Track.txt (the annotation at the speaker level in the format: t_start t_stop Speaker_Id (the “Speaker_Id” is encoded with: V1, V2, V3,…)
* the files named with “Vi.txt” and “Vi-j.txt” contain the corresponding text of the speaker “Vi” for its first, respectively “j-th” intervention).
SAMPLES:
– Text (in Romanian): “… Doamnelor şi domnilor, Vă rog să luaţi loc în bănci, pentru a începe şedinţa comună a Camerei Deputaţilor şi Senatului. ”
– the wav file.
LICENSE:
The provided speech and text datasets are licensed under a Creative Commons Attribution 3.0 Unported License.