ERCIM News No.25 - April 1996 - CNR
DBT on Internet
by Eugenio Picchi, Lisa Biagini and Luca Fiorani
We describe the Internet versions of DBT, a textual database system developed
in Pisa, at the Istituto di Linguistica Computazionale (ILC-CNR). The aim
is to provide a system that allows linguists and language scholars to access
and query textual archives located on servers throughout the world, offering
them a range of search and language analysis tools.
Over the last ten years, DBT, a textual database system designed to meet
the needs of literary and linguistic text analysis applications has been
widely adopted by the Italian academic and research world (and not only
Italian). The system procedures to structure a machine readable text in
DBT format are so simple that a new text can be prepared for database inclusion
within a few minutes. This means that there are now thousands of texts already
structured in DBT form in the archives of university and research institutes
throughout Italy and beyond, and this number is growing rapidly.
The importance of rendering language and text resources reusable is strongly
felt in the scientific community. There is thus a concerted move towards
making existing resources available as widely as possible, while respecting
copyright and intellectual property rights. For this reason we began to
study the best way of making it possible not only for local users but for
scholars working anywhere in the world to consult the geographically distributed
DBT archives. It was very clear that the ideal medium for this is that provided
by Internet.
Two distinct approaches have been developed to make the DBT system and DBT
structured archives accessible over the network. In the first approach,
known as DBTNET, a client-server version of the system has been created
which allows the user to directly access and query texts located on servers
at geographically remote sites in the same way as when using the stand-alone
DBT system. The client-server dialog uses the TCP/IP protocol. Our objective
has been to offer the same user interface and the full range of sophisticated
search and analysis capabilities of the stand-alone system. As much information
as possible is maintained at the client site in order to reduce the client-server
operations to the minimum, thus optimizing the system response times.
The alternative version, DBTWEB, developed in parallel, uses the HTTP protocol,
the HTML formatting language and the most common WEB interfaces (Mosaic
and Netscape). The main advantage of adopting this technology is that it
facilitates navigation over the network. Information is made available in
the form of pages of multimedia and hypertext data. The hypertext links
point to other pages which can be located anywhere in the network. These
standards are independent and do not depend on the platform, or computing
system employed by the client; this means that they are directly usable
by all platforms that can communicate with the Internet. This has contributed
greatly to their popularity.
DBTWEB is now in an advanced stage of development. Distributed textual databases
can be consulted through an IR system based on a traditional client-server
model. Using CGI (Common Gateway Interface) scripts, the HTTP server can
retrieve structured or compound information not generally directly accessible
by most well known browsers. The gateways assume the role of interfaces,
in both directions, between the Web and the database.
DBTWEB creates a hypertextual study environment, dynamically transforming
the results of a generic query into an HTML page, which can in its turn
be consulted by other queries. It offers all the main functionalities of
the standard DBT system such retrieval of frequency values, contexts, extended
contexts, structured searches, etc. Routines for user identification and
authentication make it possible to organise the consultation in work sessions;
save and restore facilities are also provided.
Please contact:
Eugenio Picchi - ILC-CNR
Tel: +39 50 540681
E-mail: picchi@ilc.pi.cnr.it
return to the contents page