TU-Berlin: MPEG-7 Spoken Content Demonstrator

Technische Universität Berlin
Communication Systems Group
Project: MPEG-7-based Audio Annotation for the Archival of Digital Video

MPEG-7

Spoken Content Demonstrator

[ Home | Upload | Extraction | Result ]

Introduction

This demonstration tool extracts an MPEG-7 SpokenContent description from an input speech signal.

The MPEG-7 SpokenContent Description Scheme (DS) is a standardized representation of the output of an Automatic Speech Recognition (ASR) system.

It consists of:

A header containing some general information about the spoken signal and the ASR system (notably: the word or phone lexicon used by the recogniser and some phone confusion statistics).
A lattice consisting of an oriented graph in which the different paths represent the different possible transcriptions. Each node in the graph represents a time point between the beginning and the end of the speech signal. A link between two nodes corresponds to a recognition hypothesis (e.g. a word or a phone).

This information is stored in a specific MPEG-7 XML format.

This demonstration is based on a phone recognizer using a lexicon of 45 German phonetic units (including silence). Since we do not define any word model here, the resulting lattices only contain phone hypotheses. The extracted SpokenContent DSs can be used for different types of applications, especially for spoken document retrieval (SDR).

Steps

Upload an audio file in WAV or MP3 format.
Start the SpokenContent extraction process.
Download the resulting MPEG-7 XML Spoken Content description.

>> Report errors.
>> Since Apr 2004, Last changes: Mar 2012