You are here

TOSCA-MP Speech Ground Truth

This multilingual dataset was created within the TOSCA-MP project as ground truth data for the evaluation of automatic transcription and spoken language translation technologies. The dataset includes two video genres - television broadcast news and talk-shows - and covers four languages.
Besides segmentation, turn and speaker identification, and orthographic transcription, a very rich annotation on the audio signal has been carried out, both at the linguistic level (overlapped speech and foreign speech) and the acoustic level (e.g. background noise, applause and cough, music such as songs and jingles).
Orthographic transcriptions were generated by non-expert workers through crowdsourcing and revised by expert transcribers. Rich annotation was carried out by expert transcribers only.

Annotated and transcribed videos:

  • Flemish: 5h:51m (news), 6h:13m (talk shows)
  • English: 5h:07m (news only)
  • German: 4h:03m (news),  5h:02m (talk shows)
  • Italian 3h:54m (news), 7h:21m (talk shows)

Furthermore, a subset of the broadcast news data (around two hours, corresponding to about 20,000 words) was translated by professional translators in the following directions:

  • Flemish to English 
  • English to  Italian
  • German to English
  • German to Italian

The TOSCA-MP Speech Ground Truth is distributed under a Creative Commons Attribution 4.0 International license (CC BY 4.0). Due to copyright issues only the ground truth generated is distributed here, but corresponding videos are available (links are provided in the ground truth documentation). 

To obtain the data, please follow this link:

 

Publications or presentations containing results obtained through the use of TOSCA-MP Speech Ground Truth should cite the following reference:

R. Sprugnoli,  G.  Moretti,  M.   Fuoli, D.   Giuliani,  L.   Bentivogli, E. Pianta,  R. Gretter, F.   Brugnara. 2013. "Comparing two  methods for crowdsourcing speech  transcription". In Proceedings of the IEEE International Conference  on  Acoustics,  Speech  and  Signal  Processing  (ICASSP), pp. 8116-8120.

Contacts

For more infomation please contact Diego Giuliani [ giuliani_at_fbk.eu ]

Technology type: