What is Modality Corpus?

The MODALITY corpus consists of over 30 hours of multimodal recordings. The database contains high-resolution, high-framerate stereoscopic video streams and audio signals obtained from a microphone array and a laptop microphone. The corpus can be employed to develop an AVSR system, as every utterance was labelled. Recordings in noisy conditions can be used to test the robustness of speech recognition systems.

License

Distribution and usage of this corpus is allowed under following conditions:

The corpus is provided as it is. The authors do not warrant that the corpus will be free from errors or will be suitable for any particular purpose.
The authors of the corpus are not responsible for any direct or indirect problems that may be caused to the user of this corpus.
The use of the corpus is limited to research and educational purposes only.
Any work (eg. journal articles, technical reports, conference papers etc.) resulting from the use of the MODALITY corpus must cite the following papers:

Czyzewski, A., Kostek, B., Bratoszewski, P. et al. J Intell Inf Syst (2017) 49: 167. https://doi.org/10.1007/s10844-016-0438-z

Jachimski D., Czyżewski A., A comparative study of English viseme recognition methods and algorithms; Multimedia Tools and Applications, Multimed Tools Appl (2018) 77: 16495. https://doi.org/10.1007/s11042-017-5217-5

Kawaler, M. & Czyżewski, A. J Intell Inf Syst (2019) 53: 381. Speech database including facial expressions recorded with the Face Motion Capture system, J Intell Inf Syst (2019) 53: 381. https://doi.org/10.1007/s10844-019-00547-y

Corpus Features

35 speakers

including 17 natives and 18 non-natives

Full HD / 100 FPS

video capture

Different recording conditions

includes recordings in clean and noisy conditions

Labeled material

Corpus contains hand-made label files as ground truth for AVSR algorithms

2.1 TB

of high quality audio-visual material

8 PCM audio streams

gathered from a microphone array

Commands/sentences

Includes separated commands and continuous sentences

Time-of-Flight camera recordings

enabling the depth image for further analysis

What is Modality Corpus?

License

Corpus Features

35 speakers

Full HD / 100 FPS

Different recording conditions

Labeled material

2.1 TB

8 PCM audio streams

Commands/sentences

Time-of-Flight camera recordings

What do I need to get started?

Fast connection

A lot of disk space

VLC Media Player

Many ideas