by Lothar Rostek, Dietrich Fischer and Wiebke Möhr
Efficient publishing of large-scale reference works should be based on a central knowledge resource which can be updated continually and extended preserving consistency. Print and hypermedia publications - possibly individualized - could be drawn from such a resource. If the publisher's knowledge base is fine-grained enough in its structure it allows for the automatic generation of graphics and text representing its contents. These demands, in particular the latter one, are not yet met by publication systems and will remain long-term research goals.
A GMD team set up to develop a model publication environment that allows to build and maintain such a publisher's knowledge base and demonstrates the feasibility of innovative publication processes and products. As a concrete reference work application, we use the Dictionary of Art which is to be published by Macmillan Publishers as a 34-volume print edition in 1995. More than 6.000 authors and 60 editors have been involved in its conventional publication process over a period of altogether 15 years.
Within the RACE II Europublishing Project 2042 the Dictionary of Art material created during the conventional production process is exploited for building up a knowledge base. In order to create and maintain this resource, we developed the central component of our publication environment, the Editor's Workbench, an ensemble of innovative tools. The skeleton of the knowledge base is an object network consisting of Dictionary of Art articles and domain-specific objects, such as representations of art styles, artists, and works of art. The schema and update behaviour of these object types are modelled using the frame based representation tool SFK (Smalltalk Frame Kit). It supports a variety of consistency checks and atomicity of complex update operations.
For knowledge acquisition, i.e. building up the object network, we experiment with automatic text processing techniques ranging from pattern-oriented parsing to full text analysis applied to biography articles. The biography head is a particularly well-structured, densely phrased piece of text that contains the essential facts of a person's life according to editorial guidelines. These guidelines are encoded in the rules of the parsing and text-to-object conversion tool. For browsing and editing the object network we designed and implemented PooH, the Pilot object-oriented Hypernet Editor tool. It supports the intensional and extensional definition of subnets (i. e., definition by a query or by selection), the notion of object presentation styles as well as network views (list, text, graphic). First results in the automatic generation of text and graphics presenting knowledge stored in the database have been achieved.