ERCIM News No.26 - July 1996 - SZTAKI

Natural Language Processing at the Artificial Intelligence Group in Szeged

by Tibor Gyimóthy

The Research Group on Artificial Intelligence (Hungarian Academy of Sciences and József Attila University, Szeged) has a long tradition in Machine Learning and Natural Language Processing. Recent work includes the investigation of application of Inductive Logic Programming in Natural Language Processing and Speech Recognition.

Natural languages have become increasingly important in information processing because of the increase in demands for easy access to knowledge both in translation and user interfacing. There are several ongoing practical and research projects working on problems involving natural languages, such as database queries or translations.

In the Research Group on Artificial Intelligence (RGAI) we have developed a natural language interface (NLI) method and system for the plane geometric construction. In this system complex constructions can be specified by correct English sentences. The system is able to process and to execute these constructions. The NLI part of the system has been formulated on the basis of an attribute grammar specification. This work has been done in cooperation with researchers at SZTAKI, the Computer and Automation Research Institute, Hungarian Academy of Sciences.

A basic question in producing NLIs for similar systems is assessing which parts can and which cannot be reused in other languages or other domains. We argue that the parts of an NLI which can be generated from some high-level specifications are the lexical, syntactic and semantic analysers, prepared on the basis of attribute grammar definitions. Special attention is devoted to problems related to the specification of static semantics of NLIs. The transportability of the implemented NLI to another language (Hungarian) is investigated.

An Inductive Logic Programming learning method (called IMPUT) is applied to improve the correctness of the hand-written NLI in the domain of plane geometry, as well as to transform this NLI to other languages and application fields. The NLI specified by an attribute grammar has been transformed to Definite Clause Grammar formalism. The IMPUT interactive learning system combines the unfolding specialization algorithm with an algorithmic debugging technique. The main idea is that the identification of a clause to be unfolded is of crucial importance in the effectiveness of the specialization process. If a negative example is covered by the current version of the initial program there is at least one clause which is responsible for the incorrect coverage. The algorithmic debugging method is used to identify this buggy clause instance.

In another project, the problem of speech processing using higher level models of languages is investigated. The idea is that instead of considering a particular low level speech recognition system, a model of the error of low level (phoneme or phonetic level) speech recognition is built. What makes this possible is that the difficulties encountered during recognition characterize the data and input device. Mass information is available on typical errors encountered during speech recognition. An error model for Hungarian is built. Various interaction schemes (top-down, bottom-up, recurrent) are implemented and experiments are conducted to measure the influence of the interaction on the probability of error. A few higher level language model candidates are as follows: vocabularies, N-grams, context dictionaries, syntactic grammars, Bayesian networks, NLIs based on attribute grammars, etc.

Please contact:
Tibor Gyimóthy - Hungarian Academy of Sciences & József Attila University
Tel: +36 62 454 139
E-mail: gyimi@inf.u-szeged.hu

return to the contents page