Interpretation in Dynamic Text Understanding (1990)
Henrik Prebensen
Faculty of Letters
Institute of Romance Languages
University of Copenhagen
This contribution presents some of the ideas behind CODEXUS (Computational Dynamic Text Understanding), a project at the University of Copenhagen. It discusses certain problems in relation to the use of a model theoretic framework in natural language semantics, notably the role of local syntax rules, global structure rules, such as rules of priority, and dynamic interpretation which add new resources to the process of interpretation in the form of representations of discourse referents. It is claimed that the principle of compositionality cannot control interpretation of natural language entirely and that garden-pathlike phenomena in discourse show that meaning is essentially sketchy. Hence semantic interpretation can be neither precise nor complete in the model theoretic (i.e. denotational) sense.
1. Standard Models of Language Understanding
Natural language understanding is a domain in which logic, linguistics, psychology and cognitive science, mathematics and computer science, and even philosophy are involved. Logic as far as logical form, semantic interpretation and inference are concerned. Linguistics treats the relationships between surface form and logical form. Psychology and cognitive science study the nature of mental representations and simulation of mental processes. Mathematics and computer science treat effective computation and computer simulations. Philosophy is concerned with questions of language, mind and world.
Logic has become central since the happy days of Generative Semantics in the late sixties and Montague Grammar. Due to its central position, the run-off-the-mill model of a language comprehension system is model-theoretic. It comprises two algebraic levels: a level of meaning (possible world or set of situations), represented by some set-theoretic construct over a domain of basic entities, a level of formal language (logical form), which is structurally disambiguated. The level of language is mapped onto the level of meaning by a homomorphic mapping.
From a level of natural language input, surface structures are translated into logical form by another mapping, preferably homomorphic, too. This ought to be where linguistics comes in. Procedural concerns are the domain of cognitive and computer scientists.
As in all multidisciplinary enterprises, it is difficult to obtain coherence. A couple of examples may show this:
Logicians discuss semantic interpretation of sentences in terms of truth conditions and propositions, as if declarative sentences were the only kind of sentences. The treatment of interrogative and imperative sentences is equally important, and may even be more basic to a theory of meaning. In natural language discourse, for example, interrogatives are used when we want to examine the truth value of propositions. Declaratives are presupposed to be true by a conversational convention. Therefore, they can be used to convey information (change mind states). Their semantic value, therefore, must be somewhat richer than a sheer truth value.
Logicians usually handle problems of anaphora as if pronouns were the only anaphoric devices of natural language and should be identified with variables. But one ought not overlook the linguistic fact that definite noun phrases (definite descriptions) and pronouns are both anaphoric expressions:
(1) A man and a woman entered. He/the man was tall and slim, she/the woman short and podgy.
Pronouns and definite descriptions follow the same 'deep' anaphoric rules. The choice between them depends on surface conditions of identifying links with antecedents.
The term semantics has a very precise usage among logicians compared with especially linguists. On the other hand, the logician's concept is very narrow. Linguistics, is basically an empiric, not a formal science. Formal models can falsify beliefs about language, meaning and reality. They certainly cannot justify anything. They are indispensable building blocks in theories of natural language, especially when extended.
2. Static Interpretation And The Principle of Compositionality
The principle of compositionality is essential in model-theoretic approaches to natural language semantics:
(2) The meaning of a compound expression is determined by the meanings of its immediate syntactic constituents and their mode of combination.
The principle of compositionality constrains semantic interpretation. If, for example, two interpretations of the same logical connective were possible, say the classical truth-functional one (the familiar truth-tables) and the one in Lukasiewicz's three-valued system from 1920, then interpretation would be non-deterministic, i.e. not a homomorphism, and not even a function. Non-determinism shows that some information is missing, pertaining to what basis the choice should be made on. A supplementary parameter should be added as a remedy.
The modes of combination of expressions are governed by rules of syntax. The principle forbids rules that produce expressions with more than one derivation. The role of syntax in semantic interpretation is thus to control the order in which semantic operations take place, given that basic expressions have been assigned meanings and that each syntactic operation is paired with a semantic rule. In other terms, the meaning of a sentence is determined by the lexical meaning of the 'words' and its syntactic derivation. The language of logical form is structurally unambiguous.
Therefore, the only sources of information that semantic interpretation has to access, are: (i) a dictionary, which will give the meanings of 'words' in terms of their extensions, (ii) a grammar, i.e. the set of syntax rules which will yield the meanings of expressions built by the rules, (iii) a set-theoretic construct (model, possible world or world fragment), which satisfies the true sentences of the language.
Semantic interpretation takes a logical form (sentence) as input, and yields its denotation (truth value) as output. If it is implemented as a process, it runs through a sequence of intermediate states, none of which leaves any trace. The relevant sources of information remain unchanged from the beginning to the end. In this respect, semantic interpretation of model theory can be called static.
3. Natural Language And Logical Form
How do we translate ambiguous sentences of natural langage into disambiguated logical forms? Which translation process can map ambiguous input onto unambiguous output?
The most radical proposal is Montague Grammar. Montague wanted translation to be a homomorphic mapping from source language (natural language) to target language (logical form), i.e. a purely syntactic process making use of nothing but algebraic structures in both languages. Montague Grammar defines an ambiguous language, L, as a pair <A,R>. A is a disambiguated language (actually it is rather a grammar). R a relation whose domain is a subset of all proper expressions of A, and whose range is a set of expressions generated by rules derived from rules of A. They produce expressions with more than one structural description (derivation). Hence, R is a one-many relation of structure deletion.
If structural information is deleted from an expression, we can no longer retrace its mode of derivation. The process of translation cannot, then, be deterministic and controlled by syntax alone. Either it must have control-information at its disposal (i.e. besides dictionary, syntax rules and model). Or it must iterate over rules, producing a set of alternative translations, each of which is an unambiguous logical form. As a consequence of this, semantic interpretation will produce a set of meanings for a given sentence, each relative to a certain translation (or choice of rules). A given interpretation must display its own history of translation. Thus a sentence of the form
(3)a. p and q implies r, [with assignments {false,true,true}]
(e.g. Thackeray wrote 'Ivanhoe' and 'Ivanhoe' enchanted the world so 'Ivanhoe' brought wealth to its author), can only be interpreted if the question
(3)b. does conjunction have priority over implication?
can be answered from a set of priority rules, or if the set
(3)c. {<Yes, conjunction has priority over implication>,<No, implication has priority over conjunction>}
is accepted as an answer.
Montague's proposal is (3)c. It is meant to be 'first assigning meanings to expressions of an unambiguous 'language', and, then, pairing unambiguous expressions with the expressions we wish to describe'. But the relationship between an ambiguous expression and a plurality of unambiguous expressions is of course one-many and hence cannot be stated as a function, unless extra structural information is introduced in new parameters. Ambiguous languages, as defined by Montague, are not really ambiguous, but are pairs consisting of a sentence (string of words) and information about a unique analysis 'disambiguating' the sentence.
The solution (3)b constrains interpretation by a conventional rule of priority. This is an extension of the basic stock of resources (lexicon, syntax and model) with a new set: priority rules. These are really global rules of control different from local rules of syntactic combination. With (3)b, translation is still a (homomorphic) function, but with an additional parameter.
Only with an additional parameter can translation be construed as a function. Only if it is a (homomorphic) function, can semantic interpretation be induced as wanted by Montague. And only if interpretation is induced, can the level of logical form be dispensed with as an independent level of representation.
The montagovian concept is declarative, 'mathematical'. Montague, apparently, did not think in terms of procedural feasibility. The information about the structural analysis of a given ambiguous sentence, which is needed to generate an unambiguous translation as input for interpretation, must come from somewhere and in a principled way. But there are no indications, in Montague's papers, of the proper course to a solution.
The principle of compositionality cannot be kept as it stands, with the stock of syntax rules as rules of control. The meaning of an expression is no longer controlled solely by rules of syntax, but also by other structural rules, e.g. rules of priority. The principle may be restated like this:
(4) The meaning of a compound expression is determined by the meanings of its immediate syntactic constituents and a n-tuple of local and global structural rules controlling their combination.
In the following, I shall no longer preserve a strict distinction between translation and interpretation in a model of language understanding, but refer to them as one composite process of interpretation.
4. Dynamic Interpretation
Ambiguity in natural language has been discussed mostly in connection with quantifiers, determiners and anaphora, attitudinal contexts, and indexicals.
One problem with these expressions is that in order to interpret them, information about the interpretation of previous expressions must sometimes be available. Such antecedent-consequent links between expressions contribute to coherence of discourse.
In other terms, to interpret the italicized expressions:
(5) Every farmer owns a donkey
(6) The man was tall and slim
(7) She was short and podgy
(8) Mary was looking for a unicorn to pat on the back
(9) I am fine now
in discourse, we must know 'what we are talking about': farmers in general or the farmers on a certain latifundium (5)? Which individual, and unique with respect to which property (6), (7)? A specific animal (8)? The speaker in which discourse situation (9)? The meaning of the noun phrases cannot be determined only by lexical lookup, syntax rules or rules of semantic interpretation in a model.
What we need, is the possibility to access representations generated in the interpretation of previous noun phrases. The interpretation of (5)-(8) in contexts like:
(10) This latifundium is not average. Every farmer owns a donkey.
(11) A man and a woman entered. The man was tall and slim.
(12) A man and a woman entered. He was tall and slim. She was short and podgy.
(13) Mary and Laura had a little lamb and five cute unicorns. Laura was playing with the little lamb. Mary was looking for a unicorn to pat on the back
(14) My uncle wrote a postcard to me. 'I am fine now', it said.
must limit iteration to the set of farmers of the latifundium, stored among the discourse referents (10), limit the range uniqueness to the two salient discourse referents (11), (12), limit the choice of individual to the set of five pet unicorns (13) or anchor the speaker to the uncle's discourse situation. So sets of previously established discourse referents must be available, i.e. a new parameter added to the process of interpretation.
I shall leave aside the discussion of how to represent discourse referents. I shall only mention that linguistic form plays a role in anaphoric links: morphological person, gender and number, grammatical function (e.g. subject/object) and focussing. Discourse referents are not just entities in a model, possible world or situation. They are made salient by the way they are mentioned.
The internal structure of discourse databases is not important here either. What is important is that they are dynamic. We can take them to be empty from the beginning. Then they grow as interpretation of discourse proceeds. They contain referents of all kind of expressions, not only noun phrases. Practically any expression can enter into anaphoric relations. Some discourse referents are ephemeral. They arise in the scope of a sentential operator (negation, modality e.g.) and live as long as a stretch of discourse continues in the same 'mode'.
In short, the output of the interpretation process is not only the denotation of the input sentence. During processing, information concerning subparts are stored in a dynamic discourse memory. This information is available to subsequent interpretation. It is continuously updated by additions and deletions.
Interpretation is a dynamic proces. It is not limited to lexical look up, syntactic parsing and satisfaction in a model. It uses a very wide range of information resources.
It is very difficult to see how the principle of compositionality can be entirely preserved. Anaphora, for example, can hardly be processed along the line, that their meaning is determined by the meanings of their immediate syntactic constituents and the set of local and global structural rules controlling their combination. The interpretation of anaphora must use special processes (rules) searching their antecedents in a discourse environment created by previous interpretation.
5. Garden-pathlike Phenomena In Dynamic Interpretation
The model of dynamic text interpretation outlined above is still rather orthodox. It proceeds by an online, sentence by sentence interpretation of a text. As a result of this process, each sentence will be assigned a complete and unambiguous meaning. The text will have been 'compiled'. The dynamic memory base of heterogeneous representations generated during the process will be a kind of side-effect.
There is still a very narrow correspondence between syntactic structure of the individual sentence and its meaning. However, here comes a phenomenon which fits badly into the orthodox picture. In a certain number of cases, the meaning of a sentence will not be disambiguable online, but only afterwards. In linguistic analysis, the sentence
(15) I saw a boy on a hill with a telescope
would be structurally ambiguous:
(16)a. I [saw [a boy on a hill with a telescope]]
object NP-node dominated by VP
b. I [saw [a boy] [on a hill with a telescope]]
object NP + locative PP-node dominated by VP
c. I [saw [a boy on a hill] [with a telescope]]
object NP + concomitant PP dominated by VP
d. I [saw [a boy] [on a hill] [with a telescope]]
object NP + locative PP + concomitant PP (relative to a boy) dominated by VP
e. I [saw [a boy on a hill]] [with a telescope]
object NP + instrumental PP dominated by the s-node
f. I [saw [a boy] [on a hill]] [with a telescope]]
object NP + locative PP dominated by VP, instrumental PP dominated by S
g. I [saw [a boy]] [on a hill with a telescope]
object NP + location PP dominated by the S-node
h. I [saw [a boy]] [on a hill] [with a telescope]
object NP + location NP dominated by S + instrumental PP dominated by S
i. ?I [saw [a boy]] [on a hill] [[with a telescope]]
object NP + concomitant PP discontinuously dominated by VP, locative PP dominated by S.
Certain linguists have tried to resolve this kind of ambiguity by using rules of priority. This kind of solution is well-known in mathematics (multiplication before addition), logic (conjunction before implication) and computer languages like PASCAL, where nested if then else statements are interpreted in accordance with a rule of local attachment. A similar rule of local attachment has been proposed in the analysis of prepositional complements, giving (16)a priority over all the rest.
A rule of priority does not work in this case. First, it would be empirically hard to motivate an order to be imposed on all these readings. The rule would be ad hoc. Second, the rule would only work in combination with a rule of backtracking, creating a garden-pathlike situation:
(16) The boat floated down the stream sank
floated is first taken as the verb. When sank is encountered, we are in a cul-de-sac. So we wind up our thread back to make floated a complement to boat. Backtracking works fine with a small sentence. With a whole text it becomes problematic. Can one backtrack two hundred pages, if one interpreted the first sentence in the text wrongly?
All the analyses correspond to perfectly good semantic situations, e.g..

(17)a.+b

(17)c.+d.

(17)e.+f

(17)g
Unfortunately, the same structural ambiguity exists in:
(18) I saw the boy on the hill with the telescope
In a dynamic framework, however, the interpretation of (18) will not give rise to the same multiple interpretations. Normally, (18) will be unambiguous. The definite determiners will trigger a search among established discourse referents. They are presented as 'known' and will help disambiguate the NPs. Thus, in the following cases, the meaning is clear:
(19) The son of the man next door profoundly wanted to play with my new telescope. One day, when he had been in our house to play with my children, I discovered that it had disappeared. Struck with suspicion, I went for a round of inspection. I came to a hill in the neighbourhood. And what did I see? - I saw the boy on the hill with the telescope.
(20) A month ago, I bought a brand new telescope. The next day, my neighbour's son had disappeared, and they asked me to help them find him. I brought my new acquisition with me, went to the edge of a little forest near to a hill. And lo! I saw the boy on the hill with the telescope.
In (15), the NPs have indefinite determiners. This signals that no information is available concerning discourse referents. They are unknown. No cues to guide our choice of structure.
Now, everything may become transparent later on in the same discourse:
(21) I saw a boy on a hill with a telescope. From there, he had better chances to see the unicorn he was looking for.
(22) I saw a boy on a hill with a telescope. When I came home, I realized that I would never have seen him without the telescope.
Only, the garden-path solution will not work in an online text understanding system. It is a well-known fact, that the complete meaning of certain sentences in a text may not be clear until the last sentence has been interpreted.
6. Sketchiness Of Meaning
Something seems to be wrong with the idea of text comprehension as resting on a complete and unambiguous interpretation of sentences, and of sentence interpretation as basically resting on complete processing of syntactic structure.
First, no formal grammar of natural language can determine syntactic structure completely and uniquely without access to meaning:
(15) I saw a boy on a hill with a telescope
has nine structures and nine meanings. It can be satisfied by at least nine different situations. It is open until we know which meaning was intended.
Next, it is simply not true that a comprehensor has to interpret every single sentence uniquely and completely, i.e. down to the level of satisfaction in one situation, in order to understand text meaning. Let sentence (15) be the first sentence in a discourse. Clearly, one would never stop the speaker to ask: 'What exactly do you mean? What is the role of the hill? and of the telescope?' This would launch a kind of unlimited, left recursive search, because many other circumstances are not clear: When did the event take place (year, day, hour, minute, second and hundredth of second)? Where did it take place (degree, minute, etc. of latitude and longitude)? Who exactly were the participants? How exactly was the boy? the hill? the telescope? ... The dialogue turns into a Kafkaesque police interrogation. We would never come to the next meaning in this discourse. There can always be filled more empty slots in a sentence.
Sketchiness is a necessary ingredient of communication. We simply must have confidence in our conversational partners, in their ability to organize sentences and discourse structure so that we grasp what is essential. Having heard or read (15), we normally assume a wait-and-see attitude. If the role of the telescope is important, the speaker surely will tell us in due time. If not, why bother? Speakers have the right to colour their discourse with redundant elements. Listeners can understand sentences without having an exhaustive meaning representation.
Finally, computational language understanding is not feasible with a design which requires completeness. Assume that the dialogue continues with new ambiguous sentences like: He wanted to see a unicorn. His sister wanted to marry a Norwegian. She admired her father, and so did I... This would end with memory overflow or with a 'combinatorial' output of X*Y*Z*...*W different meanings or meaning representations.
A system of text understanding which cannot handle sketchiness as an essential feature of sentence meaning is neither linguistically correct, nor cognitively adequate, nor computationally realistic.
In practice, language users expect understanding to be as precise as is relevant for an actual purpose. Between people, the purpose of communication is normally to achieve change in views or in action. People exchange information in order to attune their concepts. Suppose two persons talk about a mathematical subject. One uses the constant p . What does p mean exactly? Both know that the other doesn't know. p is the name of a procedure by which they can, in principle, calculate a value which is good enough for their purpose, whether 22/7 or 3.1415927 or may be 3.14159 26535 89793 28465 02884 19716 93993 7510 is the level of precision. They may even communicate a whole day without having the same notion of p in mind. As a Christian and a Moslem talking about God. It doesn't really matter, as long as both know how to agree on a value if it becomes important. The meaning of an expression, the concept behind it, is a decision procedure we can use, if necessary, but never use unnecessarily.
The meaning or denotation of a sentence is to be taken in much the same way. It is not a truth value. It is neither a relation or a set of relations. When one talks about a boy one saw on a hill with a telescope, people take the sentence as the name of a sketchy representation related to a set of procedures giving the whos and whats and whens and whys of the intended meaning. Brief they know a set of questions to ask.
Knowing one's language, then, could be described as knowing elementary processes, named by words, and how to combine them into procedures, named by phrases. However, sentence structure and discourse structure are not like the control structures in the ALGOL-family of programming languages. They are much more like clauses of a PROLOG program which are processed only if they are activated by a goal. Or like shell programming and piping in UNIX. I.e. they are like systems with processes stored in libraries and eventually linked together to form a running program.
This analogy, however, breaks down on one point. Understanding does not imply that a program is actually linked and executed. The comprehensor may only check whether the necessary procedures are in the libary, and whether a control structure is sufficiently specified to execute the essential parts of a program.
Understanding the meaning of a sentence or a text, therefore, has a sketchy character. The comprehensor outlines a draft or general representation, useful as a memorandum for a later occasion.
7. Interpretation In Dynamic Text Understanding
A computational system of text understanding can use some ideas from model theoretic semantics and dynamic interpretation of natural language. But it should not fall into the trap of complete and unambiguous interpretation. The first simple version of system that we have started work on in Copenhagen, CODEXUS (Computational Dynamic Text Understanding), is conceived along the following lines.
First, a computational text understanding system should not primordially simulate a person. The fundamental requirement is that it can read a text and answer questions concerning its meaning. This means that the system must be able to take a text as input, store it in a database, having first verified that all words figure in its dictionary. One might compare the task of this module with the task of a schoolchild, preparing a page of Caesar's Gallic Wars. This is understanding of a very limited and formal kind.
Second, the system must interpret natural language questions concerning the meaning of the text. This implies that questions must be translated into a formal representation. The translation module borrows as much as possible from dynamic interpretation as discussed above. It treats questions (and eventually declarative comments) sentence by sentence.
It performs lexical look up and as much lexical disambiguation as possible on a morpho-syntactic basis. Then it performs a 'poor' syntactic analysis. This means that it identifies major phrases, but not a complete phrase structure. Thus the sentence
(23) Who saw a boy on a hill with a telescope
will be analysed as having a verb phrase, saw, two noun phrases, who and a boy, and two prepositional phrases, on a hill and with a telescope. Only phrases directly dependent on the verb (subject, object, ...) will receive structural descriptions: who and a boy. However, as the system can perform anaphoric resolution, it would be able to identify complex NPs having established discourse referents as antecedents, e.g. the boy on the hill. Finally the system builds a simple predicate structure: saw(who, a boy), leaving the two PPs as 'free'. Thus the representation of the sentence is something like (tense is not taken into consideration in this short sketch):
(24) saw(who,a boy) - on a hill - with a telescope
Third, the system will try to unify this structure with any information in the text having the same meaning. In the first place, it must search for not only the verb see, but also synonyms and hyperonyms meaning visual perception. In the next place, it must interpret the matching sentences, using the same module as for questions. Finally, it must try out unifications with a boy, on a hill and with a telescope along the same lines. If unifications are possible, a unification of 'subjects' with who should provide the answer.
During interpretation, information concerning discourse referents is dynamically stored for use in later discourse.
8. Conclusion
The standard concept of text understanding systems implies, that the comprehensor must process the text online and generate a total meaning representation. This meaning representation is a kind of compiled version of the text. And - as always with compilations - it is static. The idea of the CODEXUS concept is that the basic text database is as similar to the original text as possible. The basic idea of the system is that text understanding is dynamic in this sense, that the questions you ask determine the interpretations you get out from the text.
References
Dowty, David
R. et al.: Introduction to Montague Semantics, 1981
Dyer, M.G. et al.: BORIS - An Experiment in In-Depth
Understanding of Narratives, Artificial Intelligence, 20,
1983, p. 15-62
Groenendijk, Jeroen & Martin Stokhof: Dynamic Predicate
Logic, unpublished paper, 1987
Harel, David: Dynamic Logic, in Handbook of
Philosophical Logic, II, ed. D. Gabby and F Guenter, 1984
Heim, Irene: E-type Pronouns And Donkey Anaphora,
Linguistics And Philosophy, 13, 1990, p. 137-77
Kratzer, A.: An Investigation of The Lumps of Thought,
Linguistics And Philosophy, 12, 1989, p. 607-653
Lukasiewicz, Jan: Philosophische Bemerkungen zu Mehrwertigen
Systemen des Aussagenkalküls, in, Comptes Rendus de la
société des sciences et des lettres de Warsowie, Classe III,
1931
Montague, Richard: Universal Grammar, 1970, in Formal
Philosophy, Selected Papers of Richard Montague, ed. Richmond
H. Thomason, 1974
Partee, Barbara H. et al: Mathematical Methods in Linguistics,
1990
Zeevat, H.: A compositional Approach to Discourse
Representation Theory, Linguistics and Philosophy, 12, 1989,
p. 95-131