• No results found

Definitionsandclassifications

SusanneHandl

2. Basic framework

2.1 Definitionsandclassifications

Theclassicandmostbasicdefinitionofwordco-occurrencesis,ofcourse,J.R.

Firth’s(1957)“Youshallknowawordbythecompanyitkeeps”.Firthisgenerally

consideredthefatherofcollocation.Manyotherdefinitionsfollowed,butthey

canallbemoreorlessassignedtofourmajorcategories.

Thefirstgroupconsistsoftext-orienteddefinitionsliketheonebySinclair

(1991),whoseescollocationas“theoccurrenceoftwoormorewordswithina

shortspaceofeachotherinatext”(1991:170).5Althoughthisseemstrivialat

firstglance,itisthebasictenetforrecognisingcollocations,theoneonwhichall

classificationshavetobebased,sincewithouttextsyntagmaticrelationswould

notexist.

Otherdefinitionsemphasisetheassociativenatureofcollocation:Firthpoints

out that it is an order of mutual expectancy (cf. Palmer 1968:181). There is a

certainassociativebondbetweentwowordsthatcollocate.Aitchison(2003:91)

assignsanevengreaterroletocollocations,whenshesaysthat“[w]ordmean-ing is probably learned by notassignsanevengreaterroletocollocations,whenshesaysthat“[w]ordmean-ing the words which come alongside”. Sinclair

(1991:109ff.)goesonestepfurtherinpostulatingtheidiomprincipleoflanguage,

whichholdsthatforalargepartoftextproductionweusesemi-preconstructed

phrasesthatwechoosesimultaneouslywhenspeakingorwriting.Hisexample

hereisof course,whichisnottheresultofcombiningthewordsofandcourse,

buttheoutcomeofasinglechoice.Usingtheterm‘semi-’intheexplanationof

thisprincipleallowsforacertainvariationinthepreconstructionofphrases.An

expressionlike of coursewouldbeafullypreconstructeditem,whereasclassi-calcollocationslikehard+work/luck/factsarelessfixed.Thisillustratesthefact

thatsyntagmaticlexicalrelationsareagradablephenomenonoflanguage.So,the

idiomprincipledoesnotonlyapplytocompoundsthatalmosthavethestatusof

separatelexemes,ortoidiomswiththeirnon-compositionalmeaning,butalsoto

loosercombinationsofwordsthataresimplyactivatedtogether,suchas:

 (4) to pay attention

  a clear conscience closely tied to.

Athirdtypeofdefinitionismainlystatisticallyoriented.Thequestioniswhether

theco-occurrenceoftwowordsonlyoccursbychanceorwhetheritreappears

withgreaterthanrandomprobability(cf.Halliday1961;Sinclair1966).Thisisthe

majordefinitionusedasabasisforallcorpuslinguisticstudiesofcollocation,and

italsoplaysanimportantroleinmyanalysis.

Thelastgroupofdefinitionscanbemainlyseenasacounter-positiontothe

statisticaldefinition.Itcouldbecalledthesemantictype,sinceresearcherslike

Hausmann(1979,1984,1985),Benson(1985),Bensonetal.(1991)andKlotz

(2000)trytoputtherelationbetweenco-occurringwordsdowntoaspectsof

meaning.Thisleadstoadistinctionbetweenthebasisandthecollocator–later

5. SeealsoHalliday&Hasan(1976)andHoey(1991).

50 SusanneHandl

renamedtheautosemanticandsynsemanticcomponents(Hausmann1997)6−

forthetwoelementsofacollocation,andtoatypologyoflexicalcollocationsus-ingsemanticfeaturestodeterminethecollocator.Thustheverbin reach a verdict

isassignedthemeaning[CREATION],whereas[ACTIVATION]isthefeature

presentin fly a kite(Benson1985:191).Thedrivingforcebehindthisapproachis

alexicographicdescriptionofcollocations.

Thesefourcategoriesareonlytentativegroupings,andithastobenotedthat

adefinitioncan,ofcourse,beassignedtomorethanonetype.Thevarietyofdefi-nitionsentailsavarietyofclassifications,sinceclassificationsystemsdependon

thepointofviewtakentowardscollocationandonthecriteriaused.Again,four

majorgroupsofapproaches,thatsometimesoverlapormergeintooneanother,

can be distinguished. First, there are the binary classifications, where colloca-tionissimplycontrastedwithfreecombinationwithoutfurthersubdivisionor

specification(cf.Firth1957;Sinclair1966;Greenbaum1970).Second,thereare

typologicalclassificationswithfixedclassesthattheoreticallyshouldhaveneatly

definedlabels–althoughinrealitythisdoesnotalwayswork–suchasfreecon-structionvs.collocationvs.idiom(Weinreich1969;Heid1994),orcollocation

vs.co-creationvs.counter-creation(Hausmann1984).Then,thereisthemost

convincingtypeofgradualclassification,wherecollocationisseenasastretchon

thecontinuumbetweenfreewordcombinationsandfullyfixedidiomsorcom- pounds(cf.Cowie1978;Bensonetal.1986;Carter1987).Finallythereisapro-posalbySchmid(2003)toclassifycollocationsasaprototypicalcategorywiththe

mosttypicalexamplesinthecentreandmoreperipheralmembersattheedges.

Theproblemwiththistypeofclassificationisthatnotonlyisthecollocationitself

agradualphenomenon,butthecriteriausedtodetermineacollocationcanalso

begradual.Soitmaybehelpfultohaveamoredetailedlookatthecriteriacom-monlyusedtodescribeanddelimitcollocation.

2.2 Criteria

Aswiththeclassificationsinvolved,collocationalcriteriaalsopresentaverydi-versepicture.Thedifferentdefinitionsarebasedonvarioussetsofcriteriaapplied

tolexicalco-occurrencestoamajororminordegree.Basically,twomaintypesof

criteriacanbedistinguished:prerequisitesandcontinua.

6. AlthoughHausmannrecognisesthatcollocationisanorientedrelation,heallocatesthe

rolesofbasisandcollocatorsimplyonsemanticgrounds.Hedoesnotconsiderfrequenciesin

reallanguage,whichisthemethodemployedinthisstudy.

Theprerequisitesareconditionsthathavetobefulfilledinordertobeable

to talk of collocations at all. They can also be seen as the defining criteria, in

contrasttotheclassifyingcriteria.Theindispensablecriterionfordefiningacol-locationis,ofcourse,theco-occurrenceoftwoormorewords(cf.Sinclair1966;

Stubbs1995;Moon1998).Thiscouldbeconsideredtooobvioustomention,but,

asaprerequisite,ithaslogicalconsequencesforthepotentialareasthatprovide

materialforcollocations.Itmeansthatthewordsinquestionmustbeopento

combination; they must belong, for example, to the same register or text type

(cf.Lipka2002:184f.),sinceotherwisetheywillusuallynotoccurtogether.Asa

secondcriterion,theyalsohavetooccurinacommoncontext(cf.Sinclair1966;

Carter1987;Hoey1991)or,tobemoreprecise,inacommonco-text.7Thisdoes

not,however,implythattheynecessarilyhavetobepartofthesamesentenceor

thesameclause.Itisoftenpossiblefortheelementsofacollocationtobesepa-ratedbyinterveninglinguisticmaterialasinthisfamousexamplebyGreenbaum

(1970:11)withthecollocationalcomponentscollectandstamps:

 (5) a. They collect many things, but chiefly stamps.

  b. They collect many things, but [they] chiefly [collect] stamps.

Theonlyconditionthathastobefulfilledisthatthesyntacticalrelationbetween

theconstituentsinquestionallowsareconstructionofanadjacentcollocationas

givenin(5b).Example(6)showsatextsample,takenfromareportontheinter-net,wherestampandcollectdonotformacollocation;rather,collectcollocates

withrevenue.

 (6) The first adhesive postage stamp was used in Great Britain in 1840. At the time, the British post office was having trouble collecting revenue.

  (JimWatsononhttp://pages.ebay.co.uk/community/library/catindex-stamps-hist.html)

Continuaaremoredifficultcriteria,inthattheyarethemselvesgradable.They

donotsimplyapplyornotapply,rathertheyareapplicabletovaryingdegrees

todifferentkindsofcollocations.Thefirstcontinuumissemantictransparency,

whichcanbeseenasthecounterparttoidiomaticity.Itislargelyresponsiblefor

thedistinctionbetweencollocationsandidioms,althoughaclearboundaryhas

notbeendetermined(cf.Carter1987;Fernando1996).So,intermsofprototype

theory,wearedealingwithtwocategorieswithfuzzyboundaries,dependingon

thedegreeofsemantictransparencyawordcombinationexhibits.Thisalsohas

7. Thisalsoimpliesco-occurrenceinthesametext,asmentionedbySinclair(1991)inhis

text-orienteddefinition.

52 SusanneHandl

todowiththenotionofcompositionality,thequalitynormallyascribedtocol-locations.Takingacloserlookatregularlyco-occurringwords,itisclearthat

theycannotbedividedintoonegroupwherethemeaningofalargerexpression

issimplythesumofitsparts,andanothergroupthatconveysameaningtotally

independentofthesemanticcomponentsofitselements(cf.Carter1987:63f.).

With real examples, there is always something in-between. A recurring word

combinationcanacquireanewmeaning.Thiscanbeeitherafeebleconnotation,

asinthecaseofthephrasalverbto set in,whichaccordingtoSinclair(1994:21)

usuallyreferstosomethingunpleasant.Oritcanbeacompletelynewdenotation,

acquiredfromthefrequencyofitsusageinthiscombinationorinaspecificcon-text.Thisholdsfortheverbto run,whichhasitsliteralmeaning‘movequickly’in

thecombinationto run a race,buthasanewdenotation‘toorganiseorcontrol’in

to run a farm(cf.Gläser1986:43).So,dependingonthesemanticcontributionan

elementmakestothemeaningofthewholeexpression,therearedifferentdegrees

oftransparencyoropacity8 (foranin-depthdiscussionofthenotionofnon-com-positionality,seeSvenssoninpress).

Anothercriterionthatisscaledonacontinuumistheso-calledcollocational

range,whichissimplythenumberofpotentialcollocatesanode(i.e.theword

beinganalysed)cantake.Thus,anodecanhaveaveryrestricted,oraratherwide

range.Thelargerthelistofpotentialcombinatorypartnersis,thelesstypicalitis

asacollocation.Acombinationwithaveryrestrictedrange,ontheotherhand,

iseitheranidiomoracomplexlexeme.Theexampleswiththeverbto facein(7)

showanarrowingofthecollocationalrange,anditsconsequences.(7a)clearly

hasthestatusofcollocationbecauseofitscollocationalrange,whereas(7b)isa

sortoftransitionarea,and(7c)onlyhasonepossiblecollocateandmustbeas-signedtotheclassofidioms(cf.Aisenstadt1979:71f.).

 (7) a. to face + the facts/truth/problems/realityetc.

  b. to face + charges/counts

  c. to face + the music

Whatalsobecomesevidenthereisafurthercomplicationfortheclassification,

namelythefactthatthecriteriaareinterdependent.Thereseemstobeaparal-lelbetweencollocationalrangesandsemantictransparency.Inmorerestricted

ranges,like(7b)and(7c),thereisagrowingtendencytowardssemanticopacity

inatleastoneoftheelements.

8. Thesyntactic-fixednessofwordco-occurrencescould,ofcourse,beaddedhere.However,

tomymindthisismoreproperlyconsideredacriterionforsubclassifyingidioms,andisnotof

greathelpfortheconceptofcollocation.

Thethirdgradablecriterionistheessentialoneforcorpuslinguisticstudies

ofcollocation,namelyfrequency.Especiallyinrecentdecades,ascorpusresearch

has become more and more prominent, collocation studies have increasingly

been frequency-based (cf. Sinclair 1991; Tognini-Bonelli 2001; Hunston 2002;

Stubbs2002;Bartsch2004).Thestudypresentedhereisalsofrequency-based,

andIassumethatthequestionofwhetherornottwowordsfrequentlyco-occuris

ofprimeimportanceindecidingontherelevanceofthatcollocationforlearners

ofalanguage.Butonehastobecareful,becausefrequencyaloneisnotareliable

criterion.Furtherstatisticalaspectswhichtakequestionsofprobabilityandinter-relationbetweentheelementsintoaccountalsohavetobeincorporated.Taken

together,thesetwocontinua(collocationalrangeandfrequency)canbeusedto

deriveafundamentalcriterionforcollocationswhichisobservableandeasyto

grasp,namelythepredictabilityormutualexpectancyofwords.Predictabilityis

acognitiveorpsychologicalfeaturewhichisdecisiveforcollocations.Thiscan

easilybeexperiencedinassociationtests,orevenineverydayconversation,when

ahearerfeelsthats/heisabletocontinueanutterancebegunbyaspeaker.Native

speakersoftenonlybecomeawareofcollocationswhentheyareusedcreativelyor

inappropriatelyinatext:youimmediatelystumbleoversuchunusualexpressions

whenreadingorhearingthem.So,theobservabilityofthiscriterionisusually

restrictedtoartificialexperimentalsituationsordependsonchance.Butwiththe

helpoflargedatasetsandcorpuslinguisticmethods,therolethatpredictability

playscanbeatleastapproximatelymeasured.

Basedonthesecriteria,Ihavedevelopedamulti-dimensionalclassification,

whereeachitemcanbepositionedatdifferentpointsalongthedimensions,thus

incorporatingallthecharacteristicsofacollocationinsteadofhighlightingonly

onefeature.Thisintegrativemethodhasalsobeenusedinotherapproaches.For

example,Barkema(1996)criticisestraditionalterminologyandclaimsthat,for

theclassificationofidioms,

[...]awell-definedmodelisrequiredthatdistinguishesbetweenvariousdescrip-

tivedimensionsandatthesametimepaysheedtothescalarnatureofthediffer-enttypesofcharacteristics. (Barkema1996:154)

3. A multi-dimensional framework 3.1 Adetailedviewofthethreedimensions

Thecontinuadescribedabovewereusedtoestablishthreedimensions,eachrang-

ingfromminimumtomaximumononecriterion.Forthislearner-orientedap-54 SusanneHandl

proach,theextremepointsonthescalewereexcludedfromtheareaofcollocation,

sincetheyoutlinetheborderzonebetweenacollocationandanidiomorcom-pound,ontheonehand,andfreead-hoccombination,ontheother.Instead,acore

areawasdetermined,whichcontainsthemostobviousandclearexamples,andI

concentratedonthisareaofprototypicalcollocationsbecauseitmakesupalarge

partofthesyntagmaticrelationsthatcannoteasilybeassignedtohardandfast

categories.Stillwithinthecollocationalarea,itispossibletogradewordco-occur-rencesalongthesedimensions,thuscharacterisingmoreorlesstypicalexamples

fortheconceptofcollocation.Figure2givesanideaofthethreedimensions.

Thefirstdimensionisbasedonthevariationofthesemanticcontribution

ofoneelementtothewholeexpression.Bycomparingthemeaningsofisolated

itemswiththoseoftheitemswithinthecombinationthecollocationcanbeposi-tionedalongthecontinuum.Ifthemeaninginsidethecombinationisthesameas

themeaningoutside(e.g.,into run a race)theexpressionismaximallytranspar-entandispositionedtowardsthefree-combinationendpointofthedimension.If

knowingthemeaningsoutsidethecombinationdoesnothelpinunderstanding

thewholeexpression(e.g.into run the gauntlet orto face the music),thisisase-manticallyopaqueidiom.

Thelexicaldimensionisguidedbythesizeofthecollocationalrange.Ina

corpusquery,therangeofanodewordcanbedeterminedbyretrievingthelistof

alltheco-occurringlexicalitemsfromitsconcordance.Atypicalcollocationmay

consistofelementschosenfromarestrictedsetoflexicalitems,i.e.fromasmall

collocationalrange.Theremaybealternativecombinationsforsimilarmeanings

(asin(8a)),orcompletelydifferentcollocationsbuiltwiththesamenode(asin

(8b)).

 (8) a. in the near/not-too-distant/immediate/foreseeable + future

  b. uncertain/painful/bright + future Figure 2. Amulti-dimensionalclassification

Theendpointsofthescaleareagainreservedforidiomsandcompoundsinthe

caseofverysmallranges,andforfreecombinationsifthereisalargerange.

Thelastcontinuuminthemodel,thestatisticaldimension,showsasimilar

distributionofcollocation,idiom/compoundandfreeco-occurrence.Whilethe

lattertwoholdtheextremepositions,determinedbyeitherthehighestorthelow-eststatisticalscores,collocationoccupiesthecoreareaofthedimension.Aswell

astheprobabilitymeasuresnormallyusedinlargecorpora,thedecisivecriterion

istherelationbetweentheindependentfrequenciesofthesingleitemsandthe

frequencyoftheircombination.

Aseachcollocationalpartnerhasitsownoverallfrequencyinthecorpus,

twodifferentscoresforthecollocationcanbedetermineddependingonwhich

constituentischosen.Theresultingcollocationalfactor(seeSection3.2below)

describestheimpactalexicalitemhasonthecollocationitoccursin.Thisgives

risetothegeneralobservationthatcollocationscannotbeallocatedtothethree

dimensionsdescribedhereassinglespots;ratherthedimensionalclassification

hastobeeffectedforeachcollocationalpartnerseparately.Thisdoublingofthe

classificationholdsnotonlyforthestatisticaldimension,butalsoforboththe

semanticandthelexicaldimensions,sothatweendupwithanevenmorecom-plexpictureofcollocation.Thecriteriaofsemantictransparencyorcontribution,

collocationalrangeandfrequencyhavetobeconsideredforeachelementofacol-location,sothatitcanbeassigneditsownposition;andthepositionofthewhole

collocationisthenacollocationalprofiledefinedbythesinglepositionsoneach

dimension.Figure3providesanillustrationofthethreedimensions,doubledfor

two-wordcollocations.

Figure 3. Acollocationalprofile

56 SusanneHandl

3.2 Thestatisticaldimensionasastartingpointforarevisedaccount

 ofcollocation

Thenewclassificationofcollocationdescribedinthisarticleresultsfromalarge-scalecorpusanalysisofsyntagmaticlexicalrelationsintheBritish National Cor- pus.Theaimwastodeviseamethodofdeterminingthescopeofrelevantcol-locationforadvancedlearnersofEnglish.Iran250highlyfrequentwordsasthe

nodesforanalysisthroughtheBNC,eachreturning200statisticallysignificant

collocates.Availablesignificancescores9 incorporaintegratethequestionofran-domco-occurrenceintovariousassociationmeasures.Irrespectiveofwhichkind

ofmeasureischosen,theyallsharetheassumptionthatcollocationisnotjusta

randomco-occurrence,butaunitmadeupofelementsthathaveacertaincon- nectiontoeachother.Accordingtothis,awordcombinationisjudgedtobesig-nificantifitspartnersco-occurmoreoftenthantheywouldifthewordsinthe

corpusweredistributedbychance.

Thesescores,donot,however,distinguishbetweenthecollocationalpartners

intermsofrelevance.Themutualdependencyexpressedishypothesisedtobea

constantandbalancedrelationship,i.e.thescoreisthesameforeachconstitu-entofacollocation(cf.Berry-Rogghe1973;Barnbrook1996;McEnery&Wilson

1996;Kennedy1998;Hunston2002;Meyer2002).Buttheundisputedcriterionof

predictabilitysuggeststhatthestatusoftheelementsinacollocationmustbeun-equal,or,atleast,thateachconstituenthasacertainforcetopredicttheotherone.

Inordertocapturethisunequalstatusofthepartnersinacollocation,Ipropose

anewscorethatrelatesthefrequencyofthesingleitem(i.e.alloccurrencesofthe

word)tothefrequencyoftheitemwithinthecollocation(i.e.itsoccurrencesin

thecombinationinquestion).Thisautomaticallyleadstothedevelopmentoftwo

differentfactorsforeachcollocation,oneforeachpartner.Theso-calledcolloca-tionalfactor(CF)iscalculatedasaratiobetweenthefrequencyofthecollocation

(Fcombined)andthefrequencyoftheindependentword(Fisolated).Theformulais

giveninFigure4.

Inthecorpusanalysis,themethodproducesaspreadsheet,asshowninFig- ure5,whichlistseverynodewithitscollocate,theirpart-of-speechtags,thevari- ousfrequencies,aprobabilitymeasure,inthiscasetheZscore,thatisincorpo-ratedinthecollocationalfactor,andtheCFsforeachofthepartners.

9. Themostwidelyusedtestsaret-tests,chi-squared,MI(mutualinformation)scoresandZ

scores(fordetailssee,forexample,Barnbrook1996;McEnery&Wilson1996andHunston

2002).Zscoreswerechosenforthisstudymainlybecauseoftheireaseofuseandthefactthat

2002).Zscoreswerechosenforthisstudymainlybecauseoftheireaseofuseandthefactthat