Technische Universität Berlin
Communication Systems Group
Project: MPEG-7-based Audio Annotation for the Archival of Digital Video
Spoken Content Demonstrator
[ Home | Upload | Extraction | Result ]
This demonstration tool extracts an MPEG-7 SpokenContent description from
an input speech signal.
The MPEG-7 SpokenContent Description Scheme (DS) is a standardized
representation of the output of an Automatic Speech Recognition (ASR)
It consists of:
This information is stored in a specific MPEG-7 XML format.
- A header containing some general information about the spoken
signal and the ASR system (notably: the word or phone lexicon used by the
recogniser and some phone confusion statistics).
- A lattice consisting of an oriented graph in which the different
paths represent the different possible transcriptions. Each node in the
graph represents a time point between the beginning and the end of the
speech signal. A link between two nodes corresponds to a recognition
hypothesis (e.g. a word or a phone).
This demonstration is based on a phone recognizer using a lexicon of 45
German phonetic units (including silence). Since we do not define any word
model here, the resulting lattices only contain phone hypotheses.
The extracted SpokenContent DSs can be used for different types of
applications, especially for spoken document retrieval (SDR).
- Upload an audio file in WAV or MP3 format.
- Start the SpokenContent extraction process.
- Download the resulting MPEG-7 XML Spoken Content description.
>> Report errors.
>> Since Apr 2004, Last changes: Mar 2012