As is often the case in so-called classical cognitive science, the best way to understand what is going on in SHRDLU is to work from the top down– to start by looking at the general overall structure and then drill down into the details. Strictly speaking, SHRDLU consists
Does the shortest thing the tallest pyramid’s support supports support anything green?
Figure 2.1 A question for SHRDLU about its virtual micro-world. (Adapted from Winograd1972)
of twelve different systems. Winograd himself divides these into three groups. Each group carries out a specific job. The particular jobs that Winograd identifies are not particularly surprising. They are exactly the jobs that one would expect any language- processing system to carry out.
1 The job of syntactic analysis:SHRDLU needs to be able to“decode” the grammatical structure of the sentences that it encounters. It needs to be able to identify which units in the sentence are performing which linguistic function. In order to parse any sentence, a language user needs to work out which linguistic units are functioning as nouns (i.e. are picking out objects) and which are functioning as verbs (i.e. characterizing events and processes).
2 The job of semantic analysis:Understanding a sentence involves much more than decoding its syntactic structure. The system also needs to assign meanings to the individual words in a way that reveals what the sentence is stating (if it is a statement), or requesting (if it is a request). This takes us from syntax to semantics.
3 The job of integrating the information acquired with the information the system already possesses:The system has to be able to explore the implications of what it has just learnt for the information it already has. Or to call upon information it already has in order to obey some command, fulfill a request, or answer a question. These all require ways of deducing and comparing the logical consequences of stored and newly acquired information.
We can identify distinct components for each of these jobs– the syntactic system, the semantic system, and the cognitive-deductive system. As mentioned earlier, Winograd does not see these as operating in strict sequence. It is not the case that the syntactic system does its job producing a syntactic analysis, and then hands that syntactic analysis over to the semantic system, which plugs meanings into the abstract syntactic structure, before passing the result on to the cognitive-deductive system. In SHRDLU all three systems operate concurrently and are able to call upon each other at specific points. What makes this possible is that, although all three systems store and deploy different forms of knowledge, these different forms of knowledge are all represented in a similar way. They are all represented in terms of procedures.
The best way to understand what procedures are is to look at some examples. Let us start with the syntactic system, since this drives the whole process of language under- standing. (We cannot even get started on thinking about what words might mean until we know what syntactic jobs those words are doing – even if we have to make some hypotheses about what words mean in order to complete the process of syntactic analysis.) One very fundamental “decision” that the syntactic system has to make is whether its input is a sentence or not. Let us assume that we are dealing with a very simple language that only contains words in the following syntactic categories: Noun (e.g. “block” or “table”), Intransitive Verb (e.g. “___ is standing up”), Transitive Verb (e.g.“___ is supporting ___”), Determiner (e.g. “the” or “a”).
Figure 2.2 presents a simple procedure for answering this question. Basically, what the SENTENCE program does is exploit the fact that every grammatical sentence must
contain a noun phrase (NP) and a verb phrase (VP). It tests for the presence of a NP; tests for the presence of a VP; and then checks that there is no extra “junk” in the sentence.
Of course, in order to apply this procedure the syntactic system needs procedures for testing for the presence of noun phrases and verb phrases. This can be done in much the same way – by checking in an algorithmic manner whether the relevant syntactic units are present. Figure 2.3 gives two procedures that will work in our simple language.
Moving to the job of semantic analysis, SHRDLU represents the meanings of words by means of comparable procedures. Instead of procedures for picking out syntactic categories, these procedures involve information about the micro-world and actions that the system can perform in the micro-world. One of the words in SHRDLU’s vocabulary is CLEARTOP. We can say that something (say, a block) is CLEARTOP when it does not have anything on it. CLEARTOP can also function as a command (as the command to remove anything resting on the block). CLEARTOP is represented in SHRDLU by the very simple procedure presented in Figure 2.4. This definition of CLEARTOP exploits other“concepts,” such as SUPPORT and GET RID OF. Each of these other concepts has its own procedure, which may well call upon the CLEARTOP procedure.
In order to see how these procedures work to allow SHRDLU to follow instructions in the micro-world and answer questions about it we can look at the first few exchanges in a sample dialog described by Winograd in his 1973 paper. SHRDLU’s contributions are in capital letters, while the sentences in italics were typed by a person. The commen- tary in normal type is by Winograd.
PARSE a NP PARSE a VP RETURN failure Yes Yes No No No Yes Any words left?
RETURN success
Figure 2.2 An algorithm for determining whether a given input is a sentence or not. (Adapted from Winograd1972)
1. Pick up a big, red block. OK
The system answers“OK” when it carries out a command. In order to pick up the red block, it had to clear it off by finding a space for the green one and moving it away. The result is shown inFigure 2.5.
PARSE a DETERMINER PARSE a NOUN RETURN success RETURN failure Yes Yes No No PARSE a VERB Is it TRANSITIVE? Is it INTRANSITIVE? RETURN failure Yes No No Yes No PARSE a NP RETURN success No Yes Yes
Figure 2.3 Algorithms for identifying noun phrases and verb phrases. (Adapted from Winograd1973)
2. Grasp the pyramid.
I DON’T UNDERSTAND WHICH PYRAMID YOU MEAN.
There are three pyramids in the scene and the program knows that phrases beginning with“the” are intended to refer to a specific object the speaker has in mind. So SHRDLU does nothing.
3. Find a block which is taller than the one you are holding and put it in the box. The situation is depicted inFigure 2.5. The system has to figure out what is meant by“one” and “it,” as well as use deduction to decide which block it is holding. BY“IT” I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING
There were two possible referents for“it”: the one selected, and “the one you are holding,”
OK
The hand has to set down the red block to pick up the taller blue one. We are now atFigure 2.6.
4. What does the box contain?
THE BLUE PYRAMID AND THE BLUE BLOCK 5. What is the pyramid supported by?
THE BOX
Earlier the system responded to“the pyramid” by saying “I DON’T UNDERSTAND WHICH PYRAMID YOU MEAN”. Now SHRDLU has identified the pyramid that was meant– the pyramid just mentioned in the response to the previous question. Certainly, SHRDLU has its limitations. No one (certainly not Winograd) has ever claimed that it provides an accurate model of human language use– or even of a small segment of
Go Does X SUPPORT an OBJECT Y? Assert that X is CLEARTOP Yes No GET-RID-OF Y To cleartop X
Figure 2.4 Procedure for applying the concept CLEARTOP. (Adapted from Winograd1972)
human language use. As Winograd himself was quick to acknowledge, SHRDLU does not really do justice to how conversations actually work in real life. Conversations are social interactions, not simply sequences of unconnected questions and answers. They involve agents trying to make sense of the world and of each other simultaneously.
Every level of linguistic understanding involves assumptions and guesses about what the other partner in the conversation is trying to communicate. It also involves assump- tions and guesses about what they are trying to achieve. These are not always the same. In making and assessing those assumptions and guesses we use all sorts of heuristics and principles. We tend to assume, for example, that people generally tell the truth; that they don’t say things that are pointless and uninformative; and that what they say reflects what they are doing more generally. This is all part of what linguists call the pragmatics of conversation. But there is nothing in SHRDLU’s programming that even attempts to do justice to pragmatics.
But to criticize SHRDLU for neglecting pragmatics, or for steering clear of complex linguistic constructions such as counterfactuals (statements about what would have Figure 2.5 SHRDLU acting on the initial command to pick up a big red block. See the dialog in the text for what led up to this. (Adapted from Winograd1972: 8)
happened, had things been different) is to miss what is genuinely pathbreaking about it. SHRDLU illustrates a view of linguistic understanding as resulting from the interaction of many, independently specifiable cognitive processes. Each cognitive process does a par- ticular job – the job of identifying noun phrases, for example. We make sense of the complex process of understanding a sentence by seeing how it is performed by the interaction of many simpler processes (or procedures). These cognitive processes are themselves understood algorithmically (although this is not something that Winograd himself stresses). They involve processing inputs according to rules. Winograd’s proced- ures are sets of instructions that can be followed mechanically, just as in the classical model of computation (seesection 1.2above).
2.2
How do mental images represent?
One way to try to understand a complex cognitive ability is to try to build a machine that has that ability (or at least some primitive form of it). The program that the machine runs is a model of the ability. Often the ability being modeled is a very primitive and simplified form of the ability that we are trying to understand. This is the case with
Find a block which is taller than the one you are holding and put it in the box.
Figure 2.6 Instruction 3 in the SHRDLU dialog: “Find a block which is taller than the one you are holding and put it in the box.” (Adapted from Winograd1972: fig. 3)
SHRDLU, which was intended to model only a very basic form of linguistic understand- ing. But even in cases like that, we can still learn much about the basic principles of cognitive information processing by looking to see how well the model works. This is why the history of cognitive science has been closely bound up with the history of artificial intelligence.
We can think of artificial intelligence, or at least some parts of it, as a form of experimentation. Particular ideas about how the mind works are written into programs and then we “test” those ideas by seeing how well the programs work. But artificial intelligence is not the only way of developing and testing hypotheses open to cognitive scientists. Cognitive scientists have also learnt much from the much more direct forms of experiment carried out by cognitive psychologists. As we saw in the previous chapter, the emergence of cognitive psychology as a serious alternative to behaviorism in psychology was one of the key elements in the emergence of cognitive science. A good example of how cognitive psychology can serve both as an inspiration and as a tool for cognitive science came with what has come to be known as the imagery debate.
The imagery debate began in the early 1970s, inspired by a thought-provoking set of experiments on mental rotation carried out by the psychologist Roger Shepard in collab- oration with Jacqueline Metzler, Lynn Cooper, and other scientists. This was one of the first occasions when cognitive scientists got seriously to grips with the nature and format of mental representation– a theme that has dominated cognitive science ever since. The initial experiments (and many of the follow-up experiments) are rightly recognized as classics of cognitive psychology. From the perspective of cognitive science, however, what is most interesting about them is the theorizing to which they gave rise about the format in which information is stored and the way in which it is processed.