ERCIM News No.26 - July 1996 - CWI

Computational Linguistics at CWI - the Logic of Ambiguity

by Jan van Eijck

Computational Linguistics combines insights from formal language theory, empirical linguistics, and logic, with the overall aim to implement natural language understanding systems on computers. A group of applied logicians at CWI has looked at the logical underpinnings of computational linguistic tools such as semantic representation languages, feature logics, and tree description logics.

CWI has been involved in a large-scale national project on the application of tools from dynamic logic to natural language understanding, in a European (LRE) project FraCaS (a Framework for Computational Semantics), and right now has an ongoing effort on analysing large collections of Key Phrases in scientific documents. The main aim here is to build a thesaurus of key phrases for mathematics, on the basis of a very large collection of key phrases for mathematical papers (from the Zentralblatt für Mathematik).

The work on dynamic logic has resulted in a proposal for a framework for dynamic semantics, in an analysis of Discourse Representation Theory in terms of dynamic logic, and in various publications on modal tree logics. One of the yields of the FraCaS work has been an analysis of the logic of ambiguity by J. van Eijck and J. Jaspars. To this we now turn.

In the formal study of natural language semantics the representation of ambiguous information is one of the major problems. Initial representations of NL expressions are often ambiguous, due to lack of information about the meanings of lexical items (lexical ambiguity), the ways in which anaphoric elements are to be resolved (anaphoric under-specification), attachment ambiguities (structural ambiguity) and the choice between various possible scope orderings between operators (scope ambiguity).

The principal reason for wanting to construct a meaning representation for a natural language sentence is to get a handle on the information conveyed by that sentence. Is the sentence consistent with a given body of information? If the sentence is true, what follows from it? If a natural language sentence is ambiguous, as many natural language sentences are, the key question becomes: how can we find a representation for it that we can reason with?

There are many kinds of ambiguity in natural language. The most local ambiguities are lexical ambiguities, like the one in The ball was splendid or I went to the bank, and referential ambiguities, like John addressed her, when there is a fixed list of possible antecedents for the pronoun. A different kind of ambiguity has to do with scope under-specification caused by the interaction of parts of speech. Examples are Every boy didn't appear or Everybody in this room has to sign one document. Related are ambiguities of distribution, as in The boys ordered two sandwiches, where it is left unspecified whether the object distributes over the subject (two sandwiches each) or not (two sandwiches altogether). Finally, there is an open-ended spectrum of under-specification caused by some kind of incompleteness or flaw in the linguistic data (even: corruption of the data). Under this heading we have structural ambiguities, like the two readings of John saw the girl with the telescope, or the problem of what to make of John ... (noise) ... the girl with the telescope.

Suppose a sentence A is ambiguous between readings A1 and A2 . Here are some desiderata for what A means:

if someone informs us that A is true, then one should be allowed to con-clude that at least one of A1, A2 is true
if one is sure that A1 and A2 are both true, then one can safely assert that A is true, the ambiguity of A notwithstanding
if someone informs us that not A1 is true, then one should be allowed to conclude that at least one of not A1, not A2 is true
if one is sure that neither A1 nor A2 is true, then one can safely assert that not A is true, the ambiguity of A notwithstanding
unless A1 and A2 are logically equivalent, A or not A cannot be a logical truth
unless A1 and A2 are logically equivalent, A and not A need not be a contradiction.

To explain this final point a bit further, note that in this example we do not insist that several occurrences of the same expressions be disambiguated in the same way. To take a real-life example, consider the following sentence:

Every boy did not appear, and it is not the case that every boy did not appear.

If both occurrences of the ambiguous Every boy did not appear in this example sentence are disambiguated in the same way, then the example sentence is indeed contradictory. If we do not insist on this, however, then it is not. In the work of Van Eijck and Jaspars (Ambiguity and Reasoning, CWI Report CS-R9616, Amsterdam 1996), ambiguous logical languages are introduced which extend classical propositional and predicate logic and which can deal with lexical ambiguity and scope ambiguity. It turns out that an ambiguous consequence relation can be defined and axiomatized that satisfies all of the desiderata given above.

Computational linguistics research at CWI concentrates on theoretical issues and emphasises the use of tools from pro-gramming language analysis for the analysis of natural language. It is expected, however, that the insights thus gained will be of great use for the more down-to-earth endeavour of building practically useful natural language interfaces.

Please contact:
Jan van Eijck - CWI
Tel: +31 20 592 4052
E-mail: jve@cwi.nl

return to the contents page