ERCIM News No.35 - October 1998
SARI - A System for Semantical Information Retrieval
by Kuldar Taveter
TILA (Tools for Information Retrieval and Organization) is a project of the Finnish National Multimedia Research Programme whose goal is to design and develop an agent-based system for retrieval and organization of heterogeneous information that can be in different forms and lie in different locations. The SARI (Software Agents for Retrieval of Information) system is intended to act as a broker between human users or other computerized systems (ie applications) needing information at one end, and heterogeneous information sources with different search engines at the other.
SARIs architecture reflects the systems role as a broker between its users and information sources. In the above figure SARIs agents of the following types are depicted:
- Application Agents represent the users (humans or other computerized systems) to the SARI system. They send agent messages containing information retrieval requests to Control Agents
- Search Agents mediate information sources. They compile queries coming from Control Agents into the query languages of their information sources, and send the results back to the Control Agents
- Control Agents act as brokers in the SARI system. Each Control Agent receives agent messages containing information retrieval requests from Application Agents, decides to which Search Agents it forwards the requests, sends messages containing the retrieval requests to the appropriate Search Agents, receives messages containing search results from the Search Agents, combines them into information retrieval results, and sends the retrieval results back to the Application Agents
- Ontology Agent contains metadata in the form of ontologies that describe the conceptual structure of the information present in the information sources used by SARI.
In addition, there are also Content Provider Agents that represent content providers to the SARI system. Content providers are organizations or individuals who own one or more information sources that are accessible to the SARI system. Content Provider Agents take for example care of mediating metadata about the information in its information sources to SARI.
Control Agents form the heart of SARI. They make their brokering decisions on the grounds of the user information lying in user profiles, and of the metadata about the information to be retrieved lying in ontologies. Control Agents can form federations with each other, as a rule, but there is just one Control Agent in the present pilot version of SARI.
The content of any information retrieval request originating at some Application Agent is translated into the internal query language SAL (SAri query Language) before it is forwarded to the Control Agent. The query is translated into the query language of an information source by its Search Agent. In this way, for n applications and m information sources, only n+m compilers need to be built.
The conceptual structure of the information contained in the information sources available to SARI is described by ontologies. An ontology is a description of the concepts and inter-concept relationships of some problem domain. The ontologies for relational databases used by SARI are derived from their schemas. Ontology can also be a classification that the information in an information source is based on. An example of this is the APL database Ultika containing statistical information about the Finnish foreign trade which is used by SARI. Since SARI includes an implementation of the Resource Description Format (RDF) proposed by the W3 Consortium, the ontologies describing Web resources are specified as RDF schemas and descriptions for SARI. Ontologies can be graphically browsed in SARI.
One of the most important problems that has to be solved in semantical information retrieval from heterogeneous sources is to reconcile different conceptualizations of the world represented by different information sources. In SARI the concepts of different ontologies are linked to each other by making use of the notions of viewpoint and bridge. The ontologies interlinked in such a way form the ontological structure that can be viewed from different perspectives. For example, there is a bridge between the concepts Commodity and Product which are respectively the root classes of the classifications under the foreign trade and manufacturing viewpoints.
Future goals with SARI include making the formation of bridges between the concepts of different ontologies semiautomatic, and also semiautomatic generation of RDF metadata from Web resources.
The SARI system is being worked out in Finland jointly by VTT Information Technology, Tampere University of Technology, and Tampere University. The project started in March 1996, and will continue until March 1999.
Please contact:
Kuldar Taveter - VTT
Tel: +358 9 456 6044
E-mail: Kuldar.Taveter@vtt.fi