Grammar Enables Effective Multimedia Search Queries

by Fedde van der Lijn and Menzo Windhouwer

Retrieving files from a multimedia database is like finding a book in a library - without a catalogue of keywords you could search for ages. However, generating and updating such a catalogue is almost as difficult. CWI introduces so-called feature grammar systems to facilitate these tasks.

Multimedia is everywhere. Libraries and museums digitize their collections and make them available to interested parties. Cheap disk space has brought the storage of large amounts of multimedia within everyone’s reach. At the same time however, it has complicated the retrieval of objects.

With the advent of digital cameras, DVDs and MP3s it has become very easy to compile large multimedia libraries. At the same time, it has become increasingly difficult to effectively search collections that contain a wide range of media types like text, images, movies and audio.

Not only have databases become larger, they also contain more types of media. While most people are able to find a text containing certain keywords, anyone who has ever used Google Image Search to look for a specific picture knows that finding what you need can be far from easy. The majority of search engines can only interpret textual data. They cannot ‘see’ what is depicted in an image or ‘hear’ what is on an MP3.

Annotation
One way of dealing with this problem is to annotate media objects in advance. Annotations describe particular features of the stored media objects and can be used to guide semantic search queries. For example, when MP3s are annotated for genre and background information on the performing artist, search queries like ‘find me all blues songs played by guitarists from Mississippi’ can give meaningful results.
Annotation can be done manually, but for large multimedia collections this quickly becomes impossible. It is therefore necessary to turn to automatic annotation using extraction algorithms. These algorithms can perform easy tasks, such as finding the length of an MP3, or complicated ones, like detecting human faces in images. Designing clever extraction algorithms is a necessary condition for an automatic annotation system.

Context Dependency
Yet high-quality extraction algorithms alone are not enough. Equally important is a system to coordinate the extraction of annotations. The main problem is that annotations depend on context, and in practice this means they depend on each other. This has consequences for the way extraction algorithms should be used. For example, feeding charts or logos to a face extractor would be a waste of system resources and time. The annotation system makes sure the extractor is only applied to images with a high chance of containing faces, like photos or drawings.

A lack of context also complicates incremental maintenance. Without knowledge of context and interdependencies, the entire annotation process must be rerun every time an extraction algorithm is added or replaced. The annotation system finds dependencies and determines exactly which annotations should be updated and which can be re-used.

Context dependency of annotations can also cause extraction algorithms to give ambiguous answers, especially when dealing with complex features. This can result in outputs like ‘this image either contains a human or a pig’. The annotation system should be able to handle these ambiguities.

Valid Sentences
CWI solves these difficulties by combining database technology with ideas from formal language theory to form the Acoi annotation management system. Its basis is the theory of feature grammar systems. Such a system not only describes the annotations themselves, but also their dependencies and contexts. Feature grammar can be compared to grammar in natural language. Grammar determines which word classes can be combined and in what order to form a valid sentence. A feature grammar system does the same for annotations and extraction algorithms. It determines what extraction algorithms must be called to form a valid annotation ‘sentence’.

Since the feature grammar system also stores each annotation’s place in the network of interdependencies, incremental maintenance is possible. When updating the database, sentences can be reinterpreted to determine which extractions must be redone. Furthermore, techniques from formal language theory could be modified and used in the annotation system. Resolving ambiguities, for example, is a classic problem in this branch of computer science.

Case Studies
Acoi has proven its capabilities in a variety of case studies. Together with a number of basic extraction algorithms it was used to create an annotation index for a collection of Web pages. Furthermore, it was used in combination with a presentation generator to unlock the digitized collection of the Rijksmuseum Amsterdam to the public. The generator, also developed at CWI, uses the annotations to automatically compose a semantically structured multimedia presentation on a user-defined subject.

CWI’s feature grammar system is unique. Other annotation systems have been developed, but all lack the explicit storage of annotation context. As a result, Acoi is the only system that elegantly handles ambiguities and allows for incremental maintenance. In the near future, Acoi will be used in the MultimediaN project in which CWI participates.

Links:
http://www.cwi.nl/~acoi
http://www.windhouwer.nl/menzo/professional/index.html
http://www.cwi.nl/ins1

Please contact:
Menzo Windhouwer,
University of Amsterdam, The Netherlands
Tel: +31 20 525 3104
E-mail: M.A.Windhouweruva.nl

Martin Kersten, CWI, The Netherlands
Tel: +31 20 592 4066
E-mail: Martin.Kerstencwi.nl