Creating Hypertextual Dossiers in XML for the Italian Parliament

ERCIM News No.41 - April 2000 [contents]

Creating Hypertextual Dossiers in XML for the Italian Parliament

by Andrea Marchetti and Patrizia Andronico

During the last two years, a working group at the Institute for Telematic Applications of the Italian National Research Council (IAT-CNR) has been following the evolution of XML-related technologies. This group was requested by the Italian Chamber of Deputies to collaborate on the creation of a website to be presented during the course of the Annual European Parliamentary Technology Assessment (EPTA) Conference held in Rome, December 1999.

The EPTA Conference signalled the introduction of new high-tech equipment into the Italian Chamber of Deputies. A meeting-room for members of the Italian parliament was equipped with desks, each with their own built-in PC and Internet access. The Conference website provided access to the texts of the speakers’ presentations and documen-tation on the themes of the Conference. Links were built between the various documents on the basis of a thesaurus, consisting of topics identified by experts in the fields of interest.

For each term of the thesaurus, documentation was created containing:

a definition
associated terms in the thesaurus
one or more quotations
pointers to related websites
bibliography.

This documentation had to be collected, processed and rendered homogeneous in a very short period (approximately 10 days) by people working in different geographical areas. A requirement on the application was that it should allow real-time processing on the documentation, in order to be able to extract phrases from the texts, at the moment of their presentation, and insert them as enrichments of the thesaurus entries.

For each of the topics identified, the logical schemata used to structure the material made available (presentations, articles, thesaurus with its associated documentation) were defined in the form of XML DTDs. All the material was edited in XML and experts then inserted further markups to signal thematic associations between different items. The material was then sent to IAT for validation and for an initial processing aimed at transferring information extracted from the thesaurus and inserting it in the documentation associated to the terms of the thesaurus. Selected images were also associated with this documentation.

At present, very few browsers allow the visualisation of XML pages. For this reason, all of the material produced was translated into HTML and inserted into the website.

Layout of the Website

The information present on the browser had to include:

the programme of the day;
the documents available for each speaker (full paper, abstract, bibliography, others), directly accessible from the programme;
the index of terms from the EPTA Conference Thesaurus;
the associated documentation.

This requirement led us to adopt a frame architecture. Each frame contained a type of document. The frames were organised so that the page was read in a circular direction, following the order given in the above list. At the centre of the page, the thesaurus documentation frame formed the central point of navigation.

For each document type, a specific CSS stylesheet was defined.

Conclusions

This application involved defining a complex model to organise the information. The IT and document experts worked together on the development of this model: the IT operators were involved in the choice of documentation, and the documentation experts were similarly consulted with respect to the choice of technologies. This integration contributed greatly to the success of this experimental application; the website was set up much more quickly than could have been expected.

The site has functions (such as the consultation of the documents in various languages) which have not yet been activated, partly as a result of lack of time and partly because available browsers do not yet fully support XML and associated standards. Currently, the conversion into HTML is performed off-line. In the near future it is expected that it will be possible to develop a version of the site in which the documents remain in XML format, being translated into HTML only when requested by a browser that does not support XML. The results of the XML project Apache may be used for this. The site can be consulted via Internet at: http://epta.camera.it, selecting the link ‘Hypertextual Dossier’

Links:

EPTA: http://www.atkinsoft.ndirect.co.uk/epta/
Apache project: http://xml.apache.org

Please contact:
Andrea Marchetti or Patrizia Andronico - IAT-CNR
Tel: +39 050 315 2649 or +39 050 315 2090
E-mail: andrea.marchetti@iat.cnr.it or patrizia.andronico@iat.cnr.it