• No results found

SInclair Corpus Concordance Collocation

N/A
N/A
Protected

Academic year: 2021

Share "SInclair Corpus Concordance Collocation"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Evaluating

-

instances

Introduction

For the last four chapters, we have been studying concordances in one

form or another. Each instance has been taken to be as important as any

other, and has had to be accounted for. This is a valuable discipline, but

only the very first step towards the automation of text study. In this

chapter and the next, tve begin to evaluate concordances and devise

new

kinds of information about

language.

The starting

poil~t of

this

ch3pter is pro5atlIy

~lne~yected--it

is

r h ~ t

most actual examples are unrepresentative of the pattern of the P.-ord

o r phrase for which they are chosen.

Such is the intricate nature of the ties between one segment of text

and the surrounding text, and the relation betwren the text and the

world and the intended outcomes of the communication, that the act

of plucking

a

few words from any text is not likely to provide

a

freestanding instance of its constituent words, each acting typicall>-.

The vast majority can be safel!~ discarded when their statistical

contribution to the concordance as a whole has been recorded. We

need

a

lot of text so that there will al~vays

be

a

sufficient residue of

useful examples, and also to provide criteria for discarding the others

in the first place.

Throw away your evidence

The policy of discarding examples, and particularly examples which do

not

fit

a description, is likely to have to struggle for popularity in

linguistics.

The

Cult of the Counter-erample is still very strong, in myth

if not always in observance, and it is important for students of text to

define a careful position in this regard: xvhich will be quite different

from that of students of sentences. For example, the computer corpora

of the early sixties, Brown (KuEera and Francis

1967)

and

LOB

(Hofland and Johansson

1982), represent a transitional stage; they

(2)

s, Concorc hnce, Coli location Evalua ting instan nost carefully const an attemp

lnstance is cherishr ~ r p o r a ar

n words) for this to De posslble for determ~ned .,,,,,a,,.

I

wisdom of corpus linguistics is that fairly vords or even fewer, are adequate for

the frequency of occurrence of so

,,.,,,

natlcal or function quite high. In the LOB corpus, for )le (one million wo glish printed in the

UK

in the year

,

the commonest 1 r. are almost all grammatical, and

in frequency from the at 68,315 to people at 953 (the 'lexical' .ange are said 2,074, time 1,654, man 1,072, years re a few stragglers, like shall 348, itself 272, nor 200, to whatsoever 7, and whichsoever 1; but there are nu-

merous Instances of gran enable conven-

tional grammatical stater

However, the availabil corpora makes

it nosqible to evaluate conventlona. 5 c a n L l I l ~ a ~ ~ ~ a ~ a~a~ements. Presum-

he shorter original corpora did little more than confirm the Ily agreed positions on English grammar. The new evidence ts that grammatical generalizations d o not rest on a rigid of hundreds of 'ferent were I each i millio ~ -- meted in :d. The cc 1 ~t to be re1 .e just sm .

.

Occasional I is given in the grammar to Survey material, but no attempt i 1 confront and account for the evidence. Hardly

any of the ~ A ~ I Iare citations, though citations must have been I ~ ~ L ~ readily available. One is forced to conclude that the authors were following a methodology which gave low priority to one of the concerns of this book, which is to press for the use of actual language data as a basis for all descriptive statements.

A valid generalization about data must relate to the data in a systematic way; each relevant instance must either support the gener- alization or exhibit features which make the generalization subordi- nate to some other descriptive statement. Hence, it is important to

fix

on a particular body of data (which is best chosen on non-linguistic principles), and then engage with every instance. If that procedure is adopted in language work, it soon becomes necessary to acquire very large quantities of data, or else generalizations cannot be made. Language is very complex, and people use it for their own ends, without normally being conscious of the relation between their verbal behaviour and the way that behaviour is characterized. They are creative, or expedient, or casual, or confused; or they have unusual matters to put into usual words, so they have to combine them in unusual ways.

It is, therefore, necessary to have access to a large corpus because the normal use of language is highly specific, and good representative examples are hard to find. This is as true of grammar as of lexis, because grammar is not made of just the patterns of the common grammatical words, but relies on the whole vocabulary of the language.

One further factor makes it essential to collect a large number of instances. Many words have more than one meaning, sense, or usage, and these occur in very uneven distribution. As far as I know, no systematic research has yet been done in this area, so the following remarks are speculations based on observation and occasional probings.

Frequent words have, in general, a more complex set of senses than infrequent words. If we divide and number senses in the conventional dictionary manner, we may discover a statistical relationship between

the number word and the number of different senses

it realizes. 1 ation of instances of a frequent word is

not just ma ever more clear evidence of complexity.

In addition to this, we must allow that, just as some words are much non than others, some senses of one word are r e xesentati, all enoug c-,bnle -, ve, and h (one .eference s made to ."",..I',, - r : received ra, of one 31 purpos

. .

small gram- -rqIl,=rl The corpo. matic: -. - - . . . million v es, since ? . gramr exam1 1961) ranee words is lrds of En 00 word: 0 - words 1,067: else 1 C in that r I. There a. i9, down lmatical nents to

t

ity nowac words, SUI )e made. lays of ml 1 ,,,,,, Fficient to ~ c h larger *:..."I -- r --- ably, t genera sugges

.

found; indivic when S l ~ r l . ition, but lual worc IOU look ;

.

.

are the a Is and ph at a lot of ccumulat (rases. Th . . . ~t at oncc --a,. ...?":I ion of the e langua! b. -l.l- L-L- : patterns se looks

1 evldence has not L c ~ i l a v a l l a u l r : uelurc. Llngulscs nave nad to

1 their intuitions, their limit ity for thorough textual is, and whatever has caugl *ye or ear as they have ltered large extents of language behavlour, in their dailv lives or

ma1 work

onaries o, In belngs t e their

here is as ~ ~ s t i r u t e . This method 1s l~kely to CllL U I I d s ~ a I in E I I X I ~ ~ I I and perhaps miss some of the regular,

humdrum pattel

In grammars, since

A Modem Englrsh Grammar (Jesp,,~~.. 1/39; 1949). c v e n n Com-

sive Grammar of the Englis, et al.

relies heavily on invented ex,

CGEL had the corpus of the ilable

r v l ~ . is a corDus approaching one mllllon words, spann~ng t w e ~ five ye: lcluding a ial proportion of spoken Engl

- ---

rely or analysi encour red capac ~t their e

. .

. in thei~ The: r professic great dicti les, and t h + t h e

...

f English L yet no SI --1:-I. tsed hums 1 . ' . examp h;"hI;" 'ns. the tradit . #

-

ion of citi , - ded away F - . - ~ . A prehen 1985) I The I t n :+. .I. b Langua amples. - - -) (Quirk Jsage ava ge (CGEI

'

English I o f occurr Hence, thc Ire of the

.

s . ences of a : accumul ;ame, but Survey of

.,,.

'

tY-

ish. nuch mor

(3)

Corpus, ( ing instanc

common than other senses of the same wo common. So if we need, say, fifty occurrences ,, a

order to describe it thoroughly, then the corpus has to b i lalxc cllt

ifty instances of the least common sense. In pr; lecision about the 'least common sense' is an art

?hat the size, there are always loose ends, un,~~,,,

,

,

~mples, etc. But wherever this limit is 1

.epancies in the frequency of the reco uce a heavy demand for very long te,

luding a high proportion that could never occur? The

: characterized the field linguists in the first half of this

u L y , ,,,, the latter choice has been evident in the linguistics of the

thirty years.

he new option opened up b nputer is to evalua l l l ~ ~ a n c e ~ and select the most typi ~plete set of typical

plify the dominant al patterns of the language ~ r s e to abstraction, to generalization. The mass :ach contain just a nent of typicality, but a few contaln several typical features. 11, Jucll ,,Lcumstances, although it may sound paradoxical, examples which are typical are rather uncommon, and have to be found by statistical methods.

It is, therefore, unnecessary to make a sharp distinction between abstract and actual language structure-the sort of distinction em- bodied in Saussure's langue and parole or Chomsky's competence and performance. The existence of these dichotomies is t o allow us to abstract from the chaos of life a system of meaningful choices and to insulate the abstract system. I have already conceded that some proportion of the complexity of text may be attributable t o accidental or random factors, but that is far from sufficient explanation. It may indeed have obscured what actually goes on. In fact, the main sim- plification that is introduced by conventional grammar has nothing t o d o with the purity of abstraction as against the chaos of life. It is merely the decoupling of lexis and syntax.

In the explicit theoretical statement of linguistics, grammatical and lexical patterns vary independently of each other. In most grammars, it is an assumption that is obviously taken for granted. For example, it is rare for a grammar to note that a certain structure is only appropriate for a particular sense of a word. The same goes for mor~hologv. In contrast, grammars attribute independent meaning t o

SYl

ionary to note the commor C

Pa 11ar sense. Pedagogical dictic e

increasingly seelng t h ~ s as essential information for learners, but it n added in the form of afterthoughts such as usage notes. The implicit stance of a conventional dictionary entry is that most of the words in daily use have several meanings, and any occurrence of the word could m e of the meanings. If this ]ally the I 1-

would be virtually impossik y times I e o f a w o 3 I,,", . -, irred, inc ner choicc ...-7 - " A , OCCL forn ---a to f that the d matter m Jug11 * find e; no nses, C C l l l last T ...C*. .te actual instances ~y the con cal. A con occasion; observe

t

and this a1 odd ex: luge disc1 will prod1 fixed, we gnized se, CtS. shall nses, uld exem' lout recol nstances c : structur; or indeed small eler '" .-..,-I.

id

langi uage

> .

I ne alstlnctlon has otten btcll 1114°C U C L W C C I I LCXL allu language on a dimension of abstraction. Language i! 1; it is realized

in text, which is a collection of instant an inadequate

point of view, because we do not en g like text by

'generating' word strings from grammars. In particular, there is hardly any allowance for the combinatorial meanings in text. If text (in- cluding, and in particular, spoken text) is not a strict realization of meaningful abstract decisions, then either it is subject to random result of decisions which are not recorded hich take precedence over those which are. lnents of the rather mystical notion of

'coherence' that is lpetence of grammars.

Random factors wi coherence arises; so we

are forced to con route is not through

conventional grammars, but L l l I " U f ; ~ ~ DVILIC KIIIJ of functional analysis.

Actual text will always be deviant v ltural rules of

the conventional kind. Some of the fa ieviance have

already been mentioned-creativity, U , , , , . ~ , U , I D L I ~ U ~ I I , , ~ , ex~ediency,

inattention, confusion, and the need I ther

major factor is shared knowledge amc :ads

to the actual occurrence of many utte

rule (for example, an obligatory translt~ve verk t a n by

ere the real-world thing that could gi ssed

)bvious).

ammarian's dilemma is this: does he o r she study ac*--l

it of them are untyj loes he or study a set of inst ~ich have

s a n abstr. ces. This i d up wit1 act systen is clearly ; I anythin .

.

-

distortior in the abs Many of ,

.

1, or it is il tract systc these ar . 1 part the :m, but wl 'e compo beyond I ill certain! d u d e th; the gener, ly not exp at the re. +l.--..-L . ative CON llain how alization . - - - I - : - - vith respe ctors t h a ~ "C,l$,,"" - ~ ... .. ct t o struc t lead t o c

.,.-..----,

to expres! Png comm :ranees

. .

w ; the unu: ~unicaton hich are I A A jual. Ano j, which 1( 3roscribec

-.

ltactic a n Equally, i tterns of : .

.

rangemen it is rare a word in .

.

tS. for a dict a particu I syntacti jnaries a r . . . ) occurrin ve rise t o ~g withou an expre!

.~

~ object wh abject is c The gr;

-

mstances, gnore acc knowing :ual insta ..--- , that mos nces and c u a a she not were act1 Ae. case, CON ;rial any ( unication

(4)

Corpu. s, Concorc lance, Col location Evab rating instances The

dump If two

: decoupling of lexi: :ax leads t tion of a I

that is called 'idio~ ;eology', ' m', and t

systems are held tl lependent 1 other, tk

.ces of one constraining the other will be consigned to a limbo for :atures, occasional observations, usage notes, etc. But if evidence iulates to suggest that a substantial proportion of the language ption is of this mixed nature. then the original decoupline must

. ~

be called into c grave doubts 01

and syntax.

In modern l e x ~ c a ~ researcn, Ir IS p a n o r me long-rerm rasK ro specify

accurately the established phrases of a language. A phrase can be defined for the moment as a co-occurrence of words which creates a sense that is not the simple combination of the sense of each of the

words. One is first strucl of phrases, then

by their flexibility and v; :teristically crea-

tive extensions and a d a ~ imes more often

*I,-n *he 'ordinary' form

tful to start by supposing that lexical [an that they vary independently of sand synt my, 'phra: o vary inc .

.

o the crea 'collocatic :ly of eack :ubbish he like. ien any

.

-

I

would threaten the Ition of realization in language-that structure realizes sen :refore normally differentiates one sense from another.

ture are not independent of each other and not

ir must be associated. Here we can frame a hypothesis

tl bstitute for the IangueJparole distinction. We can postulate that the underlying unit of composition is an integrated sense- structure complex, but that the exigencies of text frequently obscure this. This position offers a sharp contrast to the atomistic model featured by most grammars, and the argument is developed in the next chapter.

Our descriptive task then becomes the identification of the regular and typical associations, leading to the identification of one or more 'citation forms' for each distinct sense. The distinguishing features of the citation forms could then be stated, and explanations could be offered for the occurrence of non-citation forms. A citation form would involve a modest step in abstraction. It is also likely that many citation forms contain some systematic variables, such as pronoun selections, which leaves a modicum of independence t o the grammar.

basic nc se and the Instan odd fe accurr descri If sense lseparable lat can ac

.

.

and struc :, then they x as a sul pestion. 1 the wisc The evidc !om of pc " becomin, separate " ~- le casts of lexis :rice now lstulating g availab domains ( by the fi xriability, )tations v xity and I then by t rhich occl :egularity .he charac ur, somet LA,-,, L In tl and s) each c 'lis work, I rntactic cl )there it is much hoices c o ~

Procedure

d

struci

der of thi . ?

Meaning an

ture

How, then, d o we find the citation forms, especially since we believe text to be largely composed of non-citation forms? I propose to outline a method for tackling one area of structure, in this case collocation, which gives promise of valuable results. The same principle can be applied to other structural features.

The procedure begins with a machine-generated concordance to a large corpus, as we have used in previous studies in this book. The usual kind of concordance is adequate, where all the occurrences of a word-form are retrieved, each in the middle of a line of text. A line of text may contain as many as eight or nir )n either side of the central word, or node, and we do not expe more than four or five on either side.

A concordance h of the properties of a natural text, and it is reasonable forth ,s of statistical analysis to treat each cited line as if it were a r and so to examine the vocabulary of the concordance. In or1 this, a list is compiled in frequency order, of all the word-forms I I I the concordance. These are called the

3de. This raw list is then processed as

For the remain, s chapter lapter 4), I should like to

widen the domaln ot syntax to include lexical structure as well, and call the broader domain structure. In the spirit of the preceding argument, I shall define structure as any privileges of occurrence of morphemes; we do not in the first analvsis have to decide whether these are lexical

tactic--o

then best arable?

tunately I ~ssible.

than one sense can we reallzea ~y tne same srructure, and, in the :st case, by the same word.

must, then, consider whethe

f it is much more than incidental an, ,,,,,,L,,,,, it will

constitute strong evidence for the independence of lexis and I

However, although ambiguity causes great headaches in aut parsing, if we look at the way people actually operate with langc see it as a sporadic and almost accidental coincidence of I

.arely constituting a communicative F (as in

. .

. Cl or syn Is it Unfor N r T - ~ r as so of to hypoth not. If thi t of both. sense and J, ambig1 I I . I ten-a bi. esize that at were sc 1 1. are insep; d be impc structure ~ i t y woull lvlore simplt We atic. I ie words c ct to need las many I :r ambigu

.

.

lental or 5 .--I +Lam ity is incic A nPP..'-;.- syntax. omatic (age we .,,1;,, e purpose Gentence, der to d o , . -. . . - .

(5)

ance, Coll,

Corpus -. -.uating instances

so that o r the node Daley ( I !

node thert wclc llu statistical indicatic he node. At present

meen one and five

valanced and unbalanLcu, Lu 11 L l I C l c 1 3 a l l U p ~3 C L L l l l s . ~ ~ I ~ ~ ~ ~ ~

: lines are ly to be an inclair, Jc . . trimmed : tracted by mes, and 0

..,---

..-

11y those \ are left in 370) that words tha

.

It was str beyond fi t are reasc ,ongly sug our word mably ;gested s from

Findings

I

This technique, in ma1 form, was recently applied to the concordance of the woru 3econd. The word was chosen as being fairly frequent (over 1,200 occurrences in 7.3 million), and as having two rather distinctive major senses. It was found that the first pass identified the Second World War as a phrase which had 1 4 oc- currences in the 50 most typical. The next pass, omitting the 1 4 phrases, identified a major sense which was strongly associated with preceding the, occasionally his or her, and with words like first, third, time, year, act, child, and wife in the environment. The next pass identified a sense which was strongly associated with preceding per, and before that a word like cycles, radians. A number of similar in- stances had a instead of per but a is also occasionally used in the other main sense.

There was little else except a hint of possible phrases second hand and second class.

The two main meanings of second, then, are associated one with definiteness and the other with indefiniteness. This is at least as important as the observation that one is a modifier and one a noun.

A closer look at the full concordance confirms these findings. There is, however, a third fairly prominent use of second which does not emerge in the collocational analysis. This requires neither a definite nor an indefinite determiner, and the word functions as a discourse organizer. It is quite often preceded by and. It is not surprising that this use does not attract strong lexical collocations, because it occurs according t o the exigencies of the discourse and should be largely independent of such things as content, topic, message.

indings are crude, preliminary, and partial. No doubt a study ily would identify the third sense of second as a discourse t o be absorbed into the lemma second(1y). The study of night add res and new uses, and so on. In due course, has at least managed to isolate the most

n another trial at an international confer-

I CIILC

111 I / o o . Lllr; CIIuLuLl)rlL system successfully distinguished among

sole = bottom of shoes or fel a. The like in S a provisic --.- ->

--

power ents of

,

both

..--

the of tl ben I.-1, Ins of the : :ing with t side of ,"

,...

-+.,. ~ttractive :nvironml the node

-..-

",.+. we are e, : words I ,,A ,+ ,, cperiment on either a : c ,.L,...-

b. There is no point in considering very infrequent collocates, and there is usually a long tail to the frequency lists. A suitable cut-off point-for example, less than ten per cent of the frequency of the node-should be d e t e ~ :mined.

collocater - - A

.

c. Each of the remaining 1 ting its

freauenc~ in the c o n c u ~ u a ~ ~ c c LU ~ L S uvclall ~ L C U U C I I L Y 111 he full

d a word which score high. ; is given :

.-

.&-

----

a weightir --I1 L ---.. pus. So a <es a disti

.

..

common nctive co' word gel llocation :s a low r with the 1 ating, ant node will .

. -

,.

,,<h line ot the concordance is now examined tor the typicality dding up the weightings of each collocate 2 concordance is now re-sorted into an order most typical instances should come to the Ltes, by a

ment. Thc and the this poinl

shed, and

omatic procedure is not yet fully s largely on a subjective basis. First, there is a search ction and repul- rhich frequently

1 1 L l l c a a l i r c llllc. 4 1 1 " C l d 1 1 3 W 1 1 1 C 1 1 1 1 c V C 1 Uu. 1 ihe other asDects

t onward the study S, an aut continue: any ob for the sion, f

,.-*..-.

vious phr clusterin or examF , *L, ,A,. ases are ic g of collo Je, pairs ,I:,, ,,A lentified a cates and and grou I

-,.:-..

..,L nd remov their mu, p s of col i*L ..a=.-- red. Next, tual attra, locates w A,. TI.-.. 4 U C C U l 1 of stru freque c n

,."

~ctural pa nt words, ;ht in-th s, orderir ie occurre ~g of item! :nce of th s in the lir le very ie, and tterning : syntactic are broug structure If it i an atte sense i:

. .

~ h e i e f of seconc organizer s suspected that there are two o r more principal senses of a word,

:mpt is made to isolate a sense, using explicit criteria. When a

s fully described, all the lines that exemplify it are then removed, and the new, shorter concordance is reprocessed from the begin1 '

Gradually, this procedure should identify the distinct senses word. Each cycle will, however, reduce the size of the remai concordance substantially, and the overall size of the corpllc quick]: a limiting factor.

new featu echnique leaning. h ling. of a ning will seconds n we shall I basic con ---a :.. I

-

see. The t trast of m 0 0 0 + L A . sole = on .

.

(6)

f a n e , Col location

clusion

The cc that can

1

..-a 1,. I---:. )e drawn

-

- - A ----A from this --- -- -:A and other . - _ . C _ L - ~ ;is that it is folly r v U ~ W U ~ I C 1~x1s allu a y ~ ~ ~ a x , or elrner or rnose ana semantics. The realization of meaning is much more explicit than is suggested by abstract grammars. The model of a highly generalized formal syntax, with slots into which fall neat lists of words, is suitableonlv in rare uses and specialized texts. By far the majori?

occurrence of common words in common pa of those common patterns. Most everyda

independent meaning, or meanings, but are components or a rich repertoire of multi-word patterns te up text. This is totally

obscured by the procedures of c o ~ 1 grammar.

I

The next chapter takes up this a ~ g u u l r ~ ~ t in detail. The notion of I

(Sinclair 1984).

i

i of text tterns, or y words is made in slight v do not h - . - . - of the ariants ave an .

.

Introduction

This chapter concludes the description of word co-occurrence as we currently conceive it. The next stage is t o write a dictionary of collocations, and the project is in hand (Sinclair et al. forthcoming).

The argument brings together a number of themes that have been developing throughout the book, in particular, the notions of dependent and independent meaning, and the relation of texts to grammar.

that ma1 lventiona

citatic )n forms i s develop parate pc

Two models of interpretation

It is contended here that in order t o explain the way in which meaning arises from language text, we have to advance two different principles of interpretation. One is not enough. No single principle has been advanced which accounts for the evidence in a satisfactory way. The two principles are.

The open-choice principle

This is a way of seeing language text as the result of a very large number ,

of complex choices. At each point where a unit is completed (a word o r a phrase or a clause), a large range of choice opens up and the only restraint is grammaticalness.

This is probably the normal way of seeing and describing language. iller' model, evisaging texts 2 of d from a lexicon which sa cal ally any word can occur. Sin

is believed to operate simultaneously on several levels, there IS a very

complex pattern of choices in progress at any mome :he underlying principle is simple enough.

Anv ~egmental approach to description which deals with prugrrssive

of this tyl re shows it clearly: 1

-

- .

1s a series tisifies 10, Ice langua It is often slots whi, restraints . . called a ': ch have t

.

At each slot-and-f :o be fille slot, virtu .

.

. ...,

"-r

(7)

structed in of course is not the preposition of that is found in gram! S.

The preposition of is normally found after the noun head o a1 group, or in a quantifier like a pint of

... .

In an open-choic 3f can be followed bv anv nominal group (see Chapter 6 for details).

Similarly, c untable noun that dictionaries mention;

its meanin: 3f the word, but of the phrase. If it were a countablF llv,,, ,,I the singular it would have to be preceded by a determiner t o be grammatical, so it clearly is not.

It would be reasonable to add phrases like of course t o the list of compounds, like cupboard, whose elements have lost their semantic identity, and make allowance for the intrusive word space. The same treatment could be given to hundreds of similar phrases-any occasion I where one decision leads to more than one word in text. Idioms,

I

proverbs, clichks, technical terms, jargon expressions, phrasal verbs, and the like could a red by a fairly simple statement.

However, the pri idiom is far more pervasive and elusive than we have allovc

.

It has been noted by many writers o n language, but its importance nas been largely neglected. Some features of the idiom principle follow:

the t: on tt ree are thc ie open-cl e choice

F

hoice prin mar book f a nomin e model, ( .

.

.. ~oints. Virtually all grammars are con:

ciple.

zourse is I

g is not a

I, en.... ;*

lot the col property (

. .

clear that words dc [-choice principle (

nts on consecutive cnolces. w e wvulu not prouuce normal text ly by operating the open-choice principle.

) some extent, the nature of the world around us :d in the nization of language and contributes to the unrandomness. Things

which occur physicallj chance of being

mentioned together; als~ ;ophical area, and

the results of exercisin features such as

contrasts or series. But even allowlng for these, there are many ways of saying things, many choices withi ;e that have little or nothing t o do with the world outside.

There are sets of linguistic choi come under the heading of ;ister, and which can be seen as large-scale conditioning choices.

ice a register choice is made, a: choices,

:n all the slot-by-slot choices ar or even,

some cases, pre-empted.

Allowing for register as well, there is still f, ch opportunity for 1 choice in the model, and the principle of idio orward to account : for the restraints that are not captured by th oice model.

"le principle of idiom is that a language user nas available t o him -r a large number of semi-preconstructed phrases that constitute e choices, even though they might appear to be analysable into lents. T o some extent, this may reflect the recurrence of similar ltions in human affairs; it may illust tural tendency t o .omy of effort; or it may be motivated the exigencies of time conversation. However it arises, :n relegated to an ior position in most current linguistics, v ~ ~ c x u a i it does not fit the

I-choice model. : its simplest, the p~ ~ltaneous choice of I ,ates effectively as :turally bogus, may another.

..'here there is no v a r i a r ~ o ~ ~ 111 me unrasr. we are ueallrlr wlrn a ralrlv

trivi. that the u g h re- ..- 1 3 not o c c ~ joes not - I . - : - . .

.

provide i I",. srrali simp Tc orga r togethe] o concept ~g a num r have a s i n thesa ber of 01 . , - a stronger me philos rganizing 11 be cove nciple of red so far. n languap ces which nd these a e massive Ire norma ly reducec lly social J in scope

a. Many phrases have an indeterminate extent. As a n example, consider set eyes seems to attract a pronoun subject, and either never or a

1

conjunction like the moment, the first time, and the wc an auxiliary t o set. H o w much of this is integral t o the yluaaL, and how much is in the nature of

collocational attraction?

b. Many phrases allow internal lexical variation. For example, there seems t o be little t o choose bemeen in some cases and in some in-

stances gn fire and set fire to x .

on. This I tempora )rd has as ,LC-'.#., ar too mu1 Im is put fi le open-ch .~ 1.. situa econ real- ;..$-. rate a nal in part b) it has bet c h,,n..c.3 een set x ( ow intern; 1";- -"I.,""

a1 lexical syntactic variation. ( he

i r 5 rrw I ~ L v ~ J

, I U ~ ~ , ~

to

...

.

The word it is part of the phrase, and -though this verb can vary to was and perhaps can

Jot can be replaced by any 'broad' negative, including

,-, ,,.,, dtc. In is fixed, but hiscan be replaced by any possessive )me names with 's. Nature is

c. Many I - L A - - - )hrases all :z9- - - - L

;.-

, 1 1 1 1 L L oper A1 simc pnrase so is th include hardly, e verb is- ! modals.

P

I.,",mlN 0 .inciple ol w o word F idiom ca S, for exal n be seen nple, . . of cc in the a p 3urse. Thi parently s phrase . . oper a single \. disappea vord, anc r in time, : I the war' 3s we see i~ d space, n maybe, I which is gnyway, pronot d. Many .

.

m and per phrases a! fixed. er. Contir is not in i strw and 'VCI

IIOW some some variation in word ordl

ing the last example, we can postulate to recriminate

:n thc wri ting syste ' grammai " o t in the ; nature of

11 1 m and the

(8)

Collocatic us, Conco Iany uses ,110cation ~idence. of word I; for exal s and ph mple, har rases attl d work, r -act other bard luck words il , hard fa( n strong :ts, hard between t frequent c .

-.

he sense t me.

.o which our intuitions give priority, a .nd the ml

4 I.he commonest mean~ngs ot many less common words are not tnose d by introspection. Sense 1 offered in the CED 1 is

)w (a fugitive etc.) in order to capture or overtak far nmonest meaning is sense 5, 'to apply onesc ~e's stuales, hobbies, interests etc.)'.

Lany uses ot words and phrases show a tendency to co-occur with main grammatical choices. For example, it was pointed out in hapter 5 that the phrasal verb set about, in its meaning of some- king like 'inaugurate', is closelv associated with a following verb in le -ing form, for example, sei

:cond verb is usually transiti ery often, set will be found I

Iany uses of word! ' ' ncy to occur in a

main semantic en. le verb happen is

sociated with unpl d the like.

i

I

supplie 'to follc the col , . for pursuc e', yet by

:If

to (on t ;?bout /el ve, for ex in co-occi wing

... .

:ample, sc Jrrence p; What is n !t about tc atterns. lore, the

?sting it. From this we can put forward some tentative generalizations:

1 There is a broad general tendency for frequent words, or frequent senses of words, to have less of a clear and independent meaning than less frequent words or senses. These meanings of frequent words are difficult to identify and explain; and, with the very

'

frequent words, we are reduced to talking about uses rather than meanings. The tendency can be seen as a progressive delexicalization, o r reduction of the distinctive contribution made by that word to the meaning.

s and phr vironmen leasant th .ases s h o ~ t. For ex ings-act

overwhelming nature ot this ev~dence leads us to elevate the

:iple of idiom from being a rather minor feature, compared with

I

lmar, to being a t least as important as grammar in the explanation

3w meaning arises in text. S u ~ ~ o r t comes unexpectedly from a

1

diffe rent quar

ence fron

L 1

2 This dependency of meaning correlates with the operation of the idiom principle to make fewer and larger choices. The evidence of collocation supports the point. If the words collocate significantly, then to the extent of that significance, their presence is the result of a single choice.

z long rex

I

r currenr iexical analysis of long texts, a numDer of problems have I

icipated:

Evid

1- ,I

111 1111

arise n, not all le 'meani~ . . of which ~ g s ' of vel

.

A - . . were ant .y frequer ",",",I...

kt, SO-call( ancal words are a JIbadache ir, dlJr I F A L C V ~ ~ ~ ~ I I ~ . but the LJLULJICIII they typify fits in

with some of the newer diffic~

:d gramm

---Ll-- 3 The 'core' meaning of a word-the one that first comes to mind for

most people-will not normally be a delexical one. A likely hy- pothesis is that the 'core' meaning is the most frequent independent sense. This hypothesis . . would have to be extensively tested, but if it

1 to hold it would help to explain the discrepancy :en the most frequent sens hat lost important or central on

2 Some 'meanings' of frequent woras seem 1 ry little meaning at

all. for example, take, in takea look at this; r n a ~ e ~ n makeup your mind.

1

:o have vei ...- 1..:- .. inings of y introspc - _ L A provec referre intuitic

.

- -

;ood then we betwe ts is the n )nest me: the commonest words are not the

le commc eanings si rck as 'the d to ab; Dn sugges ;e and w e. - L A - ~pplied b xtion; for example, the meaning of

: posterior parL ur rhe human body, extending from the mck to the pelvis' (Collins English Dictionary (CED) 2nd edition

'86 sense 1) is not a very common meaning. Not until sense 47, the cond adverbial sense, do we come to 'in, to or towards the original ~ r t i n g point, place or condition', which is closer to the commonest usage in our evidencc

4 ~ o s t normal text a maae up of the occurrence of frequeuc woruh. and '

the frequent senses of less frequent words. Hence, normal Fly delexicalized, and appears to be formed by exercise lom

nrincivle, with occasional switchine to the open-choice C I I I I I C I ~ I C . .

I

text is larl of the id: ..-:-A-ln set St; " and unrei is also u

-

,C c s v c

.

) subject to atteml

.,,,,

+r. h, of courst ~t to ana ? to 6 lyse xed

-.

English v -..: 3.- .- 5 Just a: gramn eramn s it is misleading ; (ealing tc

natical analysis, it nhelpful

naticallv anv ~ortioll "1 r r n L ~vhich app,,~, Lv ,e construc I th

o f :

link most speakers of

senses, whatever the eviuence rrom rrequency. wnat 1s alsquletlng

is the apparer

'

good re;

vould agrt c ~ - ~ - - c ~ ee with th

-

e CED's c r,

.

I . "-- - on the : idiom pr , L inciple. ason for 112

(9)

us, Conco

Collocatio

The last polnt contalns an implication that a descrip how users know which way to interpret each portic The boundaries between stretches constructed on d

will not normally be clear-cut, and not all stretches carry as much

'COUYSC ds

3f gramm : recogniz

ire ~ncompatible with each other. I'he one into

her; the switch from one model to tl arp. The

els are diametrically opposed.

le last two points taken together suggesr one reason wny language

is often indetermin lence very flexible

e. If the 'switch poi1 tepretation are not

~ y s explicitly signal r sharply contrast-

vays of interpreting the data, then ~t IS qulte llkely that an utterance

not be interpreted in exac in which it was

tructed. Also, two listeners, ill not interpret in iselv the same way

forward the prop the first

rinciple, since m o s ~ c t will be Whenever there is [son, the ,pretlve process swltcnes to tne open-choice princ~p~e, and quickly

: again. Lexical choices which are unexpected in their environment presumably occasion a switch; choices which, if grammatically .preted, would be unusual are an affirmation of the o ~ e r a t i o n of the n princip

)me texts r ater than

nal use of txample.

e poems may contrast tne two prlnclples or ~nterpretation. But are specialized genres that dditional

ding.

thus appears that a model of !au5ua5r; which divlur;s rlallllllal a l ~ d

lexls, and whi e grammar to prov e

points, is a sec iodel. It cannot be :t

still has many: intswhere the open- o

--'

- . I t has an aosrracr relevance, in thesense tnacmucn or cne text snows tential for being analysed as the result of open choices, butthe other ciple, the idiom principle, dominates. The open-choice analysis d be imagined as an analytical process which goes on in princivle all the time, but whose results are only intermitten

tion must . indicate tterance. rinciples

This view of how the two principles are deployed in interpretation can be used to make predictions about the way people behave, and the accuracy of the predictions can be used as a measure of the accuracy of the model. Areas of relevant study include: the transitional prob- abilities of words; the prevalent notion of chunking (see Chapter 9); the occurrence of hesitations, etc., and the placement of boundaries; and the behaviour of subiects trving to guess the next word in a mystery text. )n of an u, lifferent p evidc norn It :nce as

01

nal rules ( should be oes to suf ar. ed that th . . ;gest that e two ma .

-.

it is not c ldels of la :onstructc nguage th :d by the lat are in use 2 anot mod TI re is no sl le other 1

Collocation

text in us alwa ate in its i ~ t s ' betwe led, and

. .

t nterpreta en two mi he two ml

.

. . tion and 1 odes of in1 odes offel

.

. . a

.

The above is the framework within which I would like to consider the role of collocation. Collocation, as has been mentioned, illustrates the idiom principle. On some occasions, words appear t o be chosen in pairs or groups and these are not necessarily adjacent.

One aspect of collocation has been of enduring interest. When two words of different freauencies collocate significantly, the collocation has a different value in tl tion of each of the two words. If word a is Ing v will cons Drec .tly the s, or two r( ame way -aders, w losal that :of the tel good rea

.

.

r FC mod ' inter ie descripl word b, thc I it is for 1 )r normal e to be a p ,pretable texts, wl plied is th by this p e can put e idiom p rinciple. '

twice as frequent as :n each time they occur together is twice as important for b t h a ~ 2.Thi.s is because that particular event ac- counts for twice the proportion of the occurrence of b than of a. Inter

back will inter

So when all theoccuriences o f a with b are counted U D and evaluated.

one figure is recorded in the profile of a, and another figure double the size, is recorded in the profile of b.

By entering the same set of events twice, once as the collocation of a with b and again as the collocation of b with a, one incurs the strictures of Benson, Brainerd, and Greaves (1985) who say 'there are le. nay b e c o ~ the open-' mposed ir choice pri . . I I a traditic nciple; lej In which r :a1 statern . I r . nakes gre; lents, for z norn #-

t ems here: mnting of nodes and double of

.

The part: kes compi double cc s now add utation ur ...

.

. counting n the who ~ccurate'. . .. .

require ac practice i n under c up to considerably more thai le,

I ider any statistical model inz In

practice, the posslbll~ty ot double entry allows us t o highlight two different aspects of collocation.

I would like to consider separately the two types of collocation instanced above, using the term node for the word that is being studied, and the term collocate for any word that occurs in tl ed ch uses th :ondary n switch poi . L . & ~ ~~ ide a strin relinquist -choice mi . t

-

~g of lexic, ~ e d , becat 3del will c 1 r , a1 choic Ise a tex ome intl he specifi IS both no environment of a node. Each successive word in a text is t h ~ d e and collocate, though never at the same time.

W ~ L --

-

is node and b is collocate, I shall call this d ~ t o n w u r u col-

tly called wncn u

(10)

Collocatio: us, Concoi o-occurre ;enera1 sig n the title -.

c back is infrequent and carries no convi~ I Y

g , Of the last category, the form anger :rs

i! ~y Look Back in Anger.

The nouns and verbs listed below as collocating witn oacu are representative only. Given the uncertainty at the limits of statistical significance, it could be more misleading to include doubtful contend- ers. Thus, whileget, go, and bring are unlikely to be challenged. beach,

box, and

I

onvincing when the actual ir .re

examined.

The qua itance being scrutinized is co ce

within four words of back, on either side, this being the cut-oft polnt established some years ago (Jones and Sinclair 1974). No account is taken of syntax, punctuation, change of speaker, or anything other than the word-forms themselves.

No doubt the studies which succeed this one will sharpen up the picture considerably. For example, the evidence of back suggests that few intuitively interesting collocations cross a punctuation mark. But it would be unwise to generalize from the pattern of one word, particularly such an unusual one as back. Now that tagged and parsed texts are becoming available, the co-patterning of lexical and gram- matical choices is open to research. But it is still important to draw attention to the strength of patterning which emerges from the rawest of unprocessed data.

In pushing forward into new kinds of observation of language, the computer is simultaneously pulling us back to some very basic facts that are often ignored in linguistics. The set of four choices, a,b,c,k, from the alphabet, arranged in the sequence b,a,c,k with nothing in between them, that is, back, is an important linguistic event in its own right, long before it is ascribed a word-class or a meaning. It is difficult for users of English t o notice this, but it is the computer's starting point.

:nce with 1 ;nificance.

of the pl:

ction of a1 only occu node and a is collocate, I shall call this upward collocation. The whole

of a given word list may be treated in this way.

There appears t o be a systematic difference between upward and downward collocation. Upward collocation, of course, is the weaker pattern in statistical terms, and the words tend to be elements of grammatical frames, or superordinates. Downward collocation by ----rast gives us a semantic analvsis of a word.

724s are m' llification uch less cl for an ins . . . € back 1 provisio rentiate se nal way, 'parate sel terns, in : ,t to diffe~ ~c groups.

I standard ot statlstlcal slgnlficance is clalmed a t present, because

y typical collocations are of such low frequency compared with the all length of a text. Because of the low frequency of the vast ~rity of words, almost any repeated collocation is a most unlikely

t, but bec: :t of texts i this kind

still be t h lf chance I

~wever, n r of Engl tance of

! patterns. v l l c lciognizes ~ I I C I I I illllliediatel~. U C L d U a C they are

Ires of the organization of te bly retrieved by introspectio

distinguishing upward and downward collocation 1 have made a

buffer area of (plus or n :he node

word. For example, let ;when it

is examined as a node, Let u worc will 1 N, IS illustral 1 back. I sl put the cc :e colloca~ la11 make )Ilocates i

. ,

. tional pat no atteml nto ad hc . , . . with the xes, but L

..

man: over; majc even' may Hc *I.,,, %use these le result o 10 speake n,, ,,, is so large. factors. ish woulc .I.", ,,: ,unlikely 1 i doubt t. events of i he impor La,,..,, L l I C > C featu relial In

.

F , xts; often n.

.

-

sublimin: al, they c:

~ i n u s ) 15 us take a . collocate! per cent c word occ~ are grou bf the freq urring l , C ~ped into: uency of i 100 times; pward co :nt of the eutral col . - Ilocates- node frec locates- -those wh luency (tf between ose own iat is, 1,l 85 per ce - - - occurrent 50); nt and 13 - - :e is over .5 per cer .~ ~ . 115 per ~t of the . - a ~ d e frequ' rea; Iwnward ency (in t collocate his instan s-less th ce, 850 a1 an 80 per nd 1,150), this is tt Ie buffer ze, 850).

Analysis of the collocational pattern c )f back Upward collocates: back

Prepositions/adverbs/conjuncr~ons: UL, (down), from, ini

cent (in tl to, now, I Neui warc a sul .&--

tral collocates are added on an ad hoc basis to upward o

1 groups, and are given round brackets. Since this h: nmary account of a very large set of data, I have remov

I L C ~ U S which seem to be of little general significance. These I L I C I U U C

persc es, contracted forms like I'll, and word-form

lr down- IS to be ed some :--1...l- then, tc Pronouns n,...,a.-.-:%,. I, up, wht ;: her, him ?n 1, me, she. .c. L o w L; , them, we r u a a c w l r c p r u l l u u l l a . .,c., , ,,.S, my, (your)

(11)

dance, Co llocation Collocation

borne, hotel, ofice, road, streets, village,

The n

after

pronc

' . I

neaning of back as 'return' attracts expressions ot time and place; and where are also prominent. The presence of four subject suns may have a more general explanation than anything to do .

back, but the absence of you and I from the list may be worth ling. Possessive prc lggest the anatomical sense of back

vould explain why their d o not figure prominently. The rerbsget and go a r t 3UyLIVlJinate~ of a large number of verbs of In, many of which will be found in the downward collocates. ave selected a few examples of these words t o show the way in

h the basic syntax of back is established. The sets of examples

~ u l l o w the four categories mentioned abov-

I

I Nouns: camp, flat, garden, I

yard

bed, chair, couch, door, sofa, wall, window, feet, forehead, hair, hand, head, neck, shoulder, car, seat

mind, sleep,

kitchen, living room, porch, room.

wirn pursi and v two 1 ,nouns SL they and

.

.?..,.a%?,-.-,

The word-class groupings above are based on frequency with back;

many words actually occur in more than one word-class. Verbs are given in their most frequent form. Note the preponderance of past tense verbs, reflecting the temporal meaning of back.

The prepositions and adverbs suggest some typical phrases with

back, and the nouns are largely those of direction, physical space, and human anatomy. A few typical examples follow:

motic I h whicl L - l l - - .eally was : drive bac

--..

-..-

-

g back at to the ten ,,l.,,I, 4 school race L,.*" n-..:. like bein :k d o w n 1 ---..+.. ? A .

Verbs: You arrive back on the Thursday May bring it back into fashion We climbed back up on the stepladder The ne back t o England

She back on flowers

It P ites back to the war The bearer drew back in fear We drove back t o Cambridge

You can fall back on something definite I flew back home in a light aircraft He flung back the drapes joyously Don't try to hold her back She lay back in the darkness He leaned back in his chair

He looked back a t her, and their eyes met

Pay me back for all you took from me

Pulled back the bedclothes and climbed into bed I pushed back my chair and made to rise Shall I put it back in the box for you I rolled back onto the grass

She sat back and crossed her legs Edward was sent back t o school He shouted back

I The girl stared back

'

walking back to Fifth Aven

w I I F I I VUL pal C I I L ~ L ~ I I I C uaLn 1 I u 1 r 6 I all:

~llowed him back into the u '

lefty slap o n the back

n turned back t o the booksht

have him

ck t o her nice t o h;

? went Pack to the Pungalov

Ten can I went ba ~ o u l d be back hot typing ave them 1 ne, docto back :y had cot never cut ossibly dh E has goni b want bat

. .

e back to :k into hi an back to m y cabi.. your dorr get back

L y uaik to the sallic u L a L

her pare1 s office n ) back to

:

)w I must a,, ,,L o r nitory a t t o work .-ma .-.net Dow Verb J ~~ nward co s: arrive, llocates: 1 bring, et J back c., climbc / - I 1 , etc., dal J~ 1 1 ?d, come,

.

n~ n etc., cut I :es, etc., ~ w , erc., arove, erc., Tall, erc., Tiew, pung, nanaea, nold, etc., ked, lay, etc., leaned, etc., looked, looking, etc., pay, pulled, etc., shed, etc., put, ran, rocking, rolled, rush, sank, sat, etc., sent, etc., w t e d , snapped, stared, stepped, steps, etc., stood, threw, traced,

pried, etc., walked, etc., wavc art jer P U shc .d. ), past, t o

~sitions: along, behind, ontc ward, tot right

uards

.:rbs: again, forth, further, slowly, strh

(12)

Prepo Adver sitions: I 7 location I tive: s: I

H e stepped back and said

...

H e then stood back for a minute The woman threw her head back

These could be traced back to the early sixties -1e turned back t o the book!

she walked back to the bus !

We waved back like anythin 3ands heid behind

Walked back towa: ;ater we came bacl Rock us gently bac If you look further The straight back t -4e went slowly bac

rhings wc

:

crawled . . ~ u l d soon back t o c . . ~y back tc ry back tc ,ven a bac

-

L,.J his back rd the h o ~ k a ~ a i n .. .. 0 - - - k and for back in n o his cabi , . . . :k to his I get back amp '11 drive you back to your flat Vot a bit like his back garden -Ie turned and went back home We had t o go back to the hotel You've just got back from the ofice Set back from the road

The back streets of Glasgow 911 the wa ) the village 3 n his WL; ) the apartment

Without e .k yard

30 back t~ " C U

i e leaned back in I

jtepping outside th

9 man standing by the back rom went back t o

3ritain would be b; l e brushed back h With the back of his nana

;he put her head back against the seat The hairs o n the back of my neck l e gestured back over his shoulder rhey got back into

is chair e back dc

. . .

the wind( ~ c k on its is hair r I th ny files n 3ook 21 the car 20 )or wall >w feet Collocation :re was so he back o :n we go

I

,me beer on the back seat tf his mind

lack to sleep again You must come back to the kitchen She went back into the living room Beside me here on the back porch He came back into the room

variation.

n two pal

ms of sen . . .

Conch

All the eviaence polnrs r o a n underlying rigidity of phraseology, despite a rich superficial , Hardly any collocates occur more than

once in more tha. xerns. The phraseology is frequently dis- criminatory in ter se; for example, there are almost as many

I instances of flat on her back as back to her flat. Some, like arrive, seem

characteristic of the spoken language, some, like hotel, show the wisdom of allowing a nine-word span for collocation.

Early predictions of lexical structure were suitably cautious; there was no reason to believe that the patterns of lexis should map o n t o semantic structures. For one thing, lexis was syntagmatic and se- mantics was paradigmatic; for another, lexis was limited to evidence of physical co-occurrence, whereas semantics was intuitive and asso- ciative.

The early results given here are characteristic of present evidence; there is a great deal of overlap with semantics, and very little reason t o posit an independent semantics for the purpose of text description.

(13)

words about words

Introduction

In the final chapter, we look at the way in which people explain the meaning of words, especially in dictionaries. Although lexicography is a practical skill, a dictionary is a systematic description of a language. In turn, it must be assumed that any such description rests on the foundations of a theoretical position, whether articulated or not.

The argument in this chapter makes something of a contrast with that in earlier work (Sinclair 1984), where I make a case against the attempt t o devise a theory of lexicography. At that time, lexicography seemed t o me t o be almost entirely a matter of managing a number of routine factors like resources and project aims. The relevant theory was linguistic theory, pure and simple. Expertise in computation, printing, book design, reference, and other skills was required from time to time, but this was not felt t o be of a theoretical nature. even when. as in computational science, theory was readily available. Lexicography was held up by the practitioners to be a largely practical matter, and ~ -

theories in the way.

However, in the later stages of compiling the Cobuild dictionary (Sinclair etal. 1987) it was decided to develop a new style of presenting

lexicographical information. The process began in a straightforward attempt t o explain the meaning and use of words in ordinary English sentences, and it ended in a radical critique of conventional lexicd- graphy. This exercise now appears to be the first step in articulating a theory of language reflexivity-the capacity of language to talk about itself. The importance of this capacity has not been properly recognized as yet, o r even the extent of its occurrence in everyday usage. This chapter hopes to contribute to a better understanding of language about language.

The rationale is I 3anks (1987). Eachentry for in the Cobuild dictionar: vith some formal matters lik n g of word-forms. and a ruruc to ~ronunciation. Then. ~ a ~ a e ~ a o h bv

jet out in I y begins \:

-

-.-:-I- a word :e a listi

- - -

- ~ h , the me; "

~nings anc 1 uses of tk

123

(14)

Corpus, explanat an extra :ics. .

.

znce, Collc e main lr and is usually ; with abl

of each ion there i an example. To the side of th,

text is column ~reviated notes o n gramm:

semanl

In thls chapter, I shall concentrate on the structure of explanations. The explanations lead t o hypotheses about inference, metalanguage, and the general nature of lexical statement.

Here are some re~resentative lexical statements from the Cobuild diction

A hc eople live

If YOU u c l c d r aul~lcurlc, y u u w11r d victory O k c l LIICIII 111 a c v l ~ ~ c a ~ a u u l

battle, game or argument. ire substance is not mixed wit

nething happens often, it hap1 nuch of the time.

Struc

These I 11 sum The fir: shown I . I ana rnl topic o text. F' and i f the op, 1 which p .-*.

...

-

ture

statement f you defe; A pure -AL:-- 1. -. .g else. times or r

risible into two principal parts:

A house 1s a b I which people live

...

at someone you M ry over them in a contest

...

: substance is not i t h anything else. ernlng nappens often it happens many times or much ofthe time.

uilding in lin a victo~

: mixed w

- - -.

st parts of each sentence break down further into two sub-parts, by the type-face changes. One or more words are in bold type, e rest is in roman. The word or phrase in bold type is called the ~f the sentence, and the rest of the first part can be called the co-

or example, in the second statement above, defeat is the topic,

you and someone constitute the co-text.

~ h k

second part of each sentence is an explanatory comment on the topic, and is called the comment. Comments are sometimes divisible ac- cording to the surface syntax. This is called chunking; in this kind of sen- tence, successive chunks express gradually increasing depth of dec

''

There is another element of struc ~ c h of the it can occur physically in either thc t or the se

element is an indication of the actu,. structu,,, called

erator. In the statements abc a. the outset of the first part: if;

b. at the outset of the second part Table 1 shows t he analys is so far:

:ture in ea : first par1 ,,I ,,"*,". Statemen :cond par ,..a ,."A :c we, the o : is. perators ; are: call. ts, and t. This -. . rds about r

(15)

us, Conco rdance, C, rds about

Var

The1 nresc

iation i

n

co-te: Kt

'the first I: .rt

'topic' 'operator' 'comment'

a woman a s a COW

...

First Pa

e types of a r .

!art that a re quite d 'e are som

ented so f TOPIC

Abo The

ut the wo

statement l l l a v "c auuui lllc wulu lrscll. dllu 111dy not use the device

lrd itself - - - - * L-

-

u describe utting the 3u use nat I . word as urally to i~ r- topic in a ndicate th; 1 . I I In appror at you thir 1 .

riate con text, for c ing is very t I :xample: obvious. 1 I eport s t m ~ ctures Table 2: R ik someth

I , . . The other escribed in similar ways. The terms topic,

operator, anu commenr are re-used but in lower case and inside inverted commas to make it clear that they are embedded. Note that the comment a t the lower level is the topic a t the text level.

The Cobuild dictionary is sparing with explanations of this type, and only uses them when it would be misleading t o ignore the subjective quality of the meaning. For example, it was implied above that 'smooth' and 'strong' were inherent qualities, but on closer inspection they are seen t o be quite subjective. Something is smooth only if there is general agreement about that as a description of it. Some objective qualities of the object referred to will be relevant in deciding whether or not it can be called smooth. In contrast, if you consider something or someone 'smast : seems to be a very personal judgement.

5 can be dl

-.-- - . - L

, example!

.-3 .-..

oralnary wrltten kngllsn, tne wora naturally In tne aDove example Id be highlighted in some way-italics or inverted commas usually.

e it is a dictionary headword and in bold face, it is not further nguished. However. this type of sentence is a different way of

ling the jc amples are:

-.

WOU

Sincl disti

tack ,b of exp lanation. Other ex o r thing!

aturalistic j that

...

leanwhile I l l c a l l > W I I I I C a "articular thing is happening. the air.

: describ

.

---

les people --.LZI-

-

: means : some01 t o rise an 1e means d float in

...

w n: The or P

rc people mean ling', that

t may be : her than lbout wh; what the at people word o r 1 mean wh phrase mc

en they use a word Eans:

statemen'

hrase, rat

Structure: verb explanations

If you descl If you say t If you say t l L you refer LU ~ U I I I C U I I C d p l p ~ q u c n n . Y U U

...

ribe a w o hat some hat some *- man as a thing get5 thing is SI

.--

-- -

-

Animate subjects COW

...

; u p your mashing, .:

---..,.,.

1. nose

...

you

...

..-..

I return t o the normal type of entry, to consider further the structure of the co-text. The focus is on the explanation of verbs.

In nearly every entry there is reference t o a person; the sort of person who will be using English. The neutral way of referring t o this person is with the pronoun you, and in this sense it is used many times o n each page of the Cobuild dictionary.

Occasionally, though, you is felt not t o be appropriat ~pli- cation of using you is that the sentence expresses so; that anyone might . . reasonably . . and normally do, so when we a ling ndesirable, the pronoun you may be re- mple:

lese cases, rtant to point out tt nion of the speaker ~ c i a l tocol :.Nothing is inheren ling', in the way that ight be 'srnuucn ur 'strong'. The co-tex~ ~ ~ ~ c ~ u d e s a verb such as ribe, say, refer, call, and the topic is fou

llilar secondary structure. This strategy 1

.ammar, and typically a report contains a statement ~ n s ~ d e ence, in: 'If you describe a woman as 2 IOU are reported as

ng that the woman is a cow. That I a cow is close in cture t o a house is a building, which is i e already analysed. new structure including reDort can be reurt.>mted as follows in le 2:

lat the opi tly 'smash LA..& :--I.. I t I11 desc a sin in gr .e. The im rnething 1 re explair nd in a su is part o f t bordinate he 'report . . . clause or ' category it. 1 cow

...'

! uoman is a structur' - - - - - - - things w placed b! hich are ! y some on^ socially u e, for exa: sayi

eone b u q lake a noise

...

eone totters, they walk in an unsteady way

...

eone fling prison,

...

)s, they m

I ne

References

Related documents

1. Subject to the approval of the Supervisory Board and Stichting Beheer van prioriteitsaandelen N.V. Holdingmaatschappij De Telegraaf, the Managing Board shall decide every year what

Over the past nearly two decades, Nutrasource has expanded its services far beyond its original omega-3 blood test to include international regulatory capabilities,

The minimal time an investigator has to wait is the time needed to perform the next step. Additionally, if that step is completed outside of working hours,

The house, built in 1911, is an early Vancouver home, valued for its association with the growth and development of the Cedar Cottage neighbourhood, once part of the

We have provided access to a Remote Import Server which is local to the Exchange environment, to allow importing mail files using an FTP connection and RDP to a

“Authors Meet Readers/Critics: The Solicitor General and the United States Supreme Court: Executive Branch Influence and Judicial Decisions.” Roundtable at the 2013 annual meetings

The fi ndings of this investigation clearly asserted that (I) BDNF levels were lowered in Wet-AMD but not in Dry- AMD patients, (II) the CRP expression was elevated in Dry-AMD

Counterfactual simulations reveal the impact of design parameters on partic- ipation and quality outcomes: multiple prizes discourage stronger participants and encourage