by Martha Larson, Thomas Beckers and Volker Schlögell
The archives departments of radio broadcasters are currently facing face two significant challenges, namely, how to store rapidly increasing amounts of radio content, and how to satisfy the rising demand for easy retrieval of audio clips that can be recycled into new programs. A pilot project demonstrates that digital audio processing techniques have the potential to provide much-needed support.
Radio broadcasters rely on highly specialized staff to archive broadcast content and respond to requests from journalists and editors for audio content on certain topics. As radio expands rapidly into the digital world, the amount of radio content produced and the demand for a convenient way to access this content for recycling has been growing at a rate that threatens to overwhelm archives departments. The pilot project Audiomining is being undertaken in Germany by Westdeutscher Rundfunk (WDR) and Deutsche Welle (DW) in cooperation with the Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS). It has developed an audio archive prototype that demonstrates that automatic audio processing methods have a clear and concrete potential to provide critical support for archivists, journalists and editors in the face of these challenges.
Currently, many radio broadcasters maintain extensive databases containing annotations of analogue radio recordings, painstakingly compiled by the archive staff. When the archives department receives a request from a journalist or editor, the metadata in these databases is searched and the corresponding analogue recording can be located in storage. Information concerning the recorded content that is not noted in the annotations is effectively 'lost' in the archive, since it cannot be retrieved. As radio broadcasters move towards completely digital workflows, it becomes possible to use automatically generated metadata to supplement the human-produced annotations.
The Audiomining prototype system provides both an indexing interface - which allows archivists to load new radio content into the system for processing - and a search interface. The search interface allows archivists not only to retrieve programs from the archive using titles and production dates, but also to type in keywords, which are then searched for in speech recognition transcripts. This option means that the content of radio broadcasts is directly searchable. The search interface returns a hit list, and individual hits can be opened with a simple click in the graphic audio browser. The audio browser displays a radio program as a series of cuts corresponding to segments of the program containing music or speech. Those containing speech are further divided into segments spoken by the individual speakers, who are assigned speaker index numbers. The graphic audio browser displays keywords that have been found in the radio program at their relative positions, and it is possible to click on keywords and jump into the audio at the exact point when the keyword is spoken.
The interfaces of the Audiomining system were developed in close cooperation with archivists from WDR and DW. The project blended tried-and-true techniques used by the archives departments with new digital audio technology in order to created a concept for a new integrated workflow, which would provide comfortable and intuitive support for archivists for both annotation and retrieval of radio content. Archivists feel that the structured browsing offered by the graphic audio interface will allow them to listen to radio programs in a targeted way, using their annotation time to concentrate on adding high-level semantic labels to targeted radio segments. The keyword search also has clear potential to help archivists locate sections of radio broadcasts, in particular interviews that are relevant to user requests.
The indexing module of the Audio-mining stand-alone prototype produces metadata in MPEG7 format. First, it uses audio segmentation, based on the well-known Bayesian Information Criterion, to determine boundaries at which the quality of the audio changes (for example at a speaker turn). It then applies a classifier that separates speech from non-speech, which is generally music. In the next step, it groups all the speech segments into classes that are acoustically similar. These classes correspond to speakers and are assigned a speaker index. Finally, the speech segments are sent to the speech recognizer for the generation of speech recognition transcripts on which the keyword search is carried out.
The Audiomining project is in its final evaluation stage and has accomplished its goal of demonstrating that digital audio processing technology can be smoothly incorporated into archivists' workflows. Automatic systems will not replace human archivists in the foreseeable future. However, the potential inherent in automatic structuring and audio keyword search demonstrates promise to provide significant relief for radio broadcasters, who are inundated with audio content and sorely in need of techniques that make spoken audio as easily accessible as text.
Please contact:
Martha Larson, Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Germany
Tel: +49 2241 14 1980
E-mail: martha.larsoniais.fraunhofer.de
Thomas Beckers
WDR Dokumentation & Archive, Germany
Tel: +49 221 220 4799
E-mail: thomas.beckerswdr.de
Volker Schlögell, Deutsche Welle Archive-Bibliothek-Dokumentation, Germany
Tel: +49 228 429 4368
E-mail: volker.schloegelldw-world.de