ERCIM News No.26 - July 1996 - SZTAKI
Natural Language Processing at the Artificial Intelligence Group in
Szeged
by Tibor Gyimóthy
The Research Group on Artificial Intelligence (Hungarian Academy
of Sciences and József Attila University, Szeged) has a long tradition
in Machine Learning and Natural Language Processing. Recent work includes
the investigation of application of Inductive Logic Programming in Natural
Language Processing and Speech Recognition.
Natural languages have become increasingly important in information processing
because of the increase in demands for easy access to knowledge both in
translation and user interfacing. There are several ongoing practical and
research projects working on problems involving natural languages, such
as database queries or translations.
In the Research Group on Artificial Intelligence (RGAI) we have developed
a natural language interface (NLI) method and system for the plane geometric
construction. In this system complex constructions can be specified by correct
English sentences. The system is able to process and to execute these constructions.
The NLI part of the system has been formulated on the basis of an attribute
grammar specification. This work has been done in cooperation with researchers
at SZTAKI, the Computer and Automation Research Institute, Hungarian Academy
of Sciences.
A basic question in producing NLIs for similar systems is assessing which
parts can and which cannot be reused in other languages or other domains.
We argue that the parts of an NLI which can be generated from some high-level
specifications are the lexical, syntactic and semantic analysers, prepared
on the basis of attribute grammar definitions. Special attention is devoted
to problems related to the specification of static semantics of NLIs. The
transportability of the implemented NLI to another language (Hungarian)
is investigated.
An Inductive Logic Programming learning method (called IMPUT) is applied
to improve the correctness of the hand-written NLI in the domain of plane
geometry, as well as to transform this NLI to other languages and application
fields. The NLI specified by an attribute grammar has been transformed to
Definite Clause Grammar formalism. The IMPUT interactive learning system
combines the unfolding specialization algorithm with an algorithmic debugging
technique. The main idea is that the identification of a clause to be unfolded
is of crucial importance in the effectiveness of the specialization process.
If a negative example is covered by the current version of the initial program
there is at least one clause which is responsible for the incorrect coverage.
The algorithmic debugging method is used to identify this buggy clause instance.
In another project, the problem of speech processing using higher level
models of languages is investigated. The idea is that instead of considering
a particular low level speech recognition system, a model of the error of
low level (phoneme or phonetic level) speech recognition is built. What
makes this possible is that the difficulties encountered during recognition
characterize the data and input device. Mass information is available on
typical errors encountered during speech recognition. An error model for
Hungarian is built. Various interaction schemes (top-down, bottom-up, recurrent)
are implemented and experiments are conducted to measure the influence of
the interaction on the probability of error. A few higher level language
model candidates are as follows: vocabularies, N-grams, context dictionaries,
syntactic grammars, Bayesian networks, NLIs based on attribute grammars,
etc.
Please contact:
Tibor Gyimóthy - Hungarian Academy of Sciences & József
Attila University
Tel: +36 62 454 139
E-mail: gyimi@inf.u-szeged.hu
return to the contents page