SusanneHandl
2. Basic framework
2.1 Definitionsandclassifications
Theclassicandmostbasicdefinitionofwordco-occurrencesis,ofcourse,J.R.
Firth’s(1957)“Youshallknowawordbythecompanyitkeeps”.Firthisgenerally
consideredthefatherofcollocation.Manyotherdefinitionsfollowed,butthey
canallbemoreorlessassignedtofourmajorcategories.
Thefirstgroupconsistsoftext-orienteddefinitionsliketheonebySinclair
(1991),whoseescollocationas“theoccurrenceoftwoormorewordswithina
shortspaceofeachotherinatext”(1991:170).5Althoughthisseemstrivialat
firstglance,itisthebasictenetforrecognisingcollocations,theoneonwhichall
classificationshavetobebased,sincewithouttextsyntagmaticrelationswould
notexist.
Otherdefinitionsemphasisetheassociativenatureofcollocation:Firthpoints
out that it is an order of mutual expectancy (cf. Palmer 1968:181). There is a
certainassociativebondbetweentwowordsthatcollocate.Aitchison(2003:91)
assignsanevengreaterroletocollocations,whenshesaysthat“[w]ordmean-ing is probably learned by notassignsanevengreaterroletocollocations,whenshesaysthat“[w]ordmean-ing the words which come alongside”. Sinclair
(1991:109ff.)goesonestepfurtherinpostulatingtheidiomprincipleoflanguage,
whichholdsthatforalargepartoftextproductionweusesemi-preconstructed
phrasesthatwechoosesimultaneouslywhenspeakingorwriting.Hisexample
hereisof course,whichisnottheresultofcombiningthewordsofandcourse,
buttheoutcomeofasinglechoice.Usingtheterm‘semi-’intheexplanationof
thisprincipleallowsforacertainvariationinthepreconstructionofphrases.An
expressionlike of coursewouldbeafullypreconstructeditem,whereasclassi-calcollocationslikehard+work/luck/factsarelessfixed.Thisillustratesthefact
thatsyntagmaticlexicalrelationsareagradablephenomenonoflanguage.So,the
idiomprincipledoesnotonlyapplytocompoundsthatalmosthavethestatusof
separatelexemes,ortoidiomswiththeirnon-compositionalmeaning,butalsoto
loosercombinationsofwordsthataresimplyactivatedtogether,suchas:
(4) to pay attention
a clear conscience closely tied to.
Athirdtypeofdefinitionismainlystatisticallyoriented.Thequestioniswhether
theco-occurrenceoftwowordsonlyoccursbychanceorwhetheritreappears
withgreaterthanrandomprobability(cf.Halliday1961;Sinclair1966).Thisisthe
majordefinitionusedasabasisforallcorpuslinguisticstudiesofcollocation,and
italsoplaysanimportantroleinmyanalysis.
Thelastgroupofdefinitionscanbemainlyseenasacounter-positiontothe
statisticaldefinition.Itcouldbecalledthesemantictype,sinceresearcherslike
Hausmann(1979,1984,1985),Benson(1985),Bensonetal.(1991)andKlotz
(2000)trytoputtherelationbetweenco-occurringwordsdowntoaspectsof
meaning.Thisleadstoadistinctionbetweenthebasisandthecollocator–later
5. SeealsoHalliday&Hasan(1976)andHoey(1991).
50 SusanneHandl
renamedtheautosemanticandsynsemanticcomponents(Hausmann1997)6−
forthetwoelementsofacollocation,andtoatypologyoflexicalcollocationsus-ingsemanticfeaturestodeterminethecollocator.Thustheverbin reach a verdict
isassignedthemeaning[CREATION],whereas[ACTIVATION]isthefeature
presentin fly a kite(Benson1985:191).Thedrivingforcebehindthisapproachis
alexicographicdescriptionofcollocations.
Thesefourcategoriesareonlytentativegroupings,andithastobenotedthat
adefinitioncan,ofcourse,beassignedtomorethanonetype.Thevarietyofdefi-nitionsentailsavarietyofclassifications,sinceclassificationsystemsdependon
thepointofviewtakentowardscollocationandonthecriteriaused.Again,four
majorgroupsofapproaches,thatsometimesoverlapormergeintooneanother,
can be distinguished. First, there are the binary classifications, where colloca-tionissimplycontrastedwithfreecombinationwithoutfurthersubdivisionor
specification(cf.Firth1957;Sinclair1966;Greenbaum1970).Second,thereare
typologicalclassificationswithfixedclassesthattheoreticallyshouldhaveneatly
definedlabels–althoughinrealitythisdoesnotalwayswork–suchasfreecon-structionvs.collocationvs.idiom(Weinreich1969;Heid1994),orcollocation
vs.co-creationvs.counter-creation(Hausmann1984).Then,thereisthemost
convincingtypeofgradualclassification,wherecollocationisseenasastretchon
thecontinuumbetweenfreewordcombinationsandfullyfixedidiomsorcom- pounds(cf.Cowie1978;Bensonetal.1986;Carter1987).Finallythereisapro-posalbySchmid(2003)toclassifycollocationsasaprototypicalcategorywiththe
mosttypicalexamplesinthecentreandmoreperipheralmembersattheedges.
Theproblemwiththistypeofclassificationisthatnotonlyisthecollocationitself
agradualphenomenon,butthecriteriausedtodetermineacollocationcanalso
begradual.Soitmaybehelpfultohaveamoredetailedlookatthecriteriacom-monlyusedtodescribeanddelimitcollocation.
2.2 Criteria
Aswiththeclassificationsinvolved,collocationalcriteriaalsopresentaverydi-versepicture.Thedifferentdefinitionsarebasedonvarioussetsofcriteriaapplied
tolexicalco-occurrencestoamajororminordegree.Basically,twomaintypesof
criteriacanbedistinguished:prerequisitesandcontinua.
6. AlthoughHausmannrecognisesthatcollocationisanorientedrelation,heallocatesthe
rolesofbasisandcollocatorsimplyonsemanticgrounds.Hedoesnotconsiderfrequenciesin
reallanguage,whichisthemethodemployedinthisstudy.
Theprerequisitesareconditionsthathavetobefulfilledinordertobeable
to talk of collocations at all. They can also be seen as the defining criteria, in
contrasttotheclassifyingcriteria.Theindispensablecriterionfordefiningacol-locationis,ofcourse,theco-occurrenceoftwoormorewords(cf.Sinclair1966;
Stubbs1995;Moon1998).Thiscouldbeconsideredtooobvioustomention,but,
asaprerequisite,ithaslogicalconsequencesforthepotentialareasthatprovide
materialforcollocations.Itmeansthatthewordsinquestionmustbeopento
combination; they must belong, for example, to the same register or text type
(cf.Lipka2002:184f.),sinceotherwisetheywillusuallynotoccurtogether.Asa
secondcriterion,theyalsohavetooccurinacommoncontext(cf.Sinclair1966;
Carter1987;Hoey1991)or,tobemoreprecise,inacommonco-text.7Thisdoes
not,however,implythattheynecessarilyhavetobepartofthesamesentenceor
thesameclause.Itisoftenpossiblefortheelementsofacollocationtobesepa-ratedbyinterveninglinguisticmaterialasinthisfamousexamplebyGreenbaum
(1970:11)withthecollocationalcomponentscollectandstamps:
(5) a. They collect many things, but chiefly stamps.
b. They collect many things, but [they] chiefly [collect] stamps.
Theonlyconditionthathastobefulfilledisthatthesyntacticalrelationbetween
theconstituentsinquestionallowsareconstructionofanadjacentcollocationas
givenin(5b).Example(6)showsatextsample,takenfromareportontheinter-net,wherestampandcollectdonotformacollocation;rather,collectcollocates
withrevenue.
(6) The first adhesive postage stamp was used in Great Britain in 1840. At the time, the British post office was having trouble collecting revenue.
(JimWatsononhttp://pages.ebay.co.uk/community/library/catindex-stamps-hist.html)
Continuaaremoredifficultcriteria,inthattheyarethemselvesgradable.They
donotsimplyapplyornotapply,rathertheyareapplicabletovaryingdegrees
todifferentkindsofcollocations.Thefirstcontinuumissemantictransparency,
whichcanbeseenasthecounterparttoidiomaticity.Itislargelyresponsiblefor
thedistinctionbetweencollocationsandidioms,althoughaclearboundaryhas
notbeendetermined(cf.Carter1987;Fernando1996).So,intermsofprototype
theory,wearedealingwithtwocategorieswithfuzzyboundaries,dependingon
thedegreeofsemantictransparencyawordcombinationexhibits.Thisalsohas
7. Thisalsoimpliesco-occurrenceinthesametext,asmentionedbySinclair(1991)inhis
text-orienteddefinition.
52 SusanneHandl
todowiththenotionofcompositionality,thequalitynormallyascribedtocol-locations.Takingacloserlookatregularlyco-occurringwords,itisclearthat
theycannotbedividedintoonegroupwherethemeaningofalargerexpression
issimplythesumofitsparts,andanothergroupthatconveysameaningtotally
independentofthesemanticcomponentsofitselements(cf.Carter1987:63f.).
With real examples, there is always something in-between. A recurring word
combinationcanacquireanewmeaning.Thiscanbeeitherafeebleconnotation,
asinthecaseofthephrasalverbto set in,whichaccordingtoSinclair(1994:21)
usuallyreferstosomethingunpleasant.Oritcanbeacompletelynewdenotation,
acquiredfromthefrequencyofitsusageinthiscombinationorinaspecificcon-text.Thisholdsfortheverbto run,whichhasitsliteralmeaning‘movequickly’in
thecombinationto run a race,buthasanewdenotation‘toorganiseorcontrol’in
to run a farm(cf.Gläser1986:43).So,dependingonthesemanticcontributionan
elementmakestothemeaningofthewholeexpression,therearedifferentdegrees
oftransparencyoropacity8 (foranin-depthdiscussionofthenotionofnon-com-positionality,seeSvenssoninpress).
Anothercriterionthatisscaledonacontinuumistheso-calledcollocational
range,whichissimplythenumberofpotentialcollocatesanode(i.e.theword
beinganalysed)cantake.Thus,anodecanhaveaveryrestricted,oraratherwide
range.Thelargerthelistofpotentialcombinatorypartnersis,thelesstypicalitis
asacollocation.Acombinationwithaveryrestrictedrange,ontheotherhand,
iseitheranidiomoracomplexlexeme.Theexampleswiththeverbto facein(7)
showanarrowingofthecollocationalrange,anditsconsequences.(7a)clearly
hasthestatusofcollocationbecauseofitscollocationalrange,whereas(7b)isa
sortoftransitionarea,and(7c)onlyhasonepossiblecollocateandmustbeas-signedtotheclassofidioms(cf.Aisenstadt1979:71f.).
(7) a. to face + the facts/truth/problems/realityetc.
b. to face + charges/counts
c. to face + the music
Whatalsobecomesevidenthereisafurthercomplicationfortheclassification,
namelythefactthatthecriteriaareinterdependent.Thereseemstobeaparal-lelbetweencollocationalrangesandsemantictransparency.Inmorerestricted
ranges,like(7b)and(7c),thereisagrowingtendencytowardssemanticopacity
inatleastoneoftheelements.
8. Thesyntactic-fixednessofwordco-occurrencescould,ofcourse,beaddedhere.However,
tomymindthisismoreproperlyconsideredacriterionforsubclassifyingidioms,andisnotof
greathelpfortheconceptofcollocation.
Thethirdgradablecriterionistheessentialoneforcorpuslinguisticstudies
ofcollocation,namelyfrequency.Especiallyinrecentdecades,ascorpusresearch
has become more and more prominent, collocation studies have increasingly
been frequency-based (cf. Sinclair 1991; Tognini-Bonelli 2001; Hunston 2002;
Stubbs2002;Bartsch2004).Thestudypresentedhereisalsofrequency-based,
andIassumethatthequestionofwhetherornottwowordsfrequentlyco-occuris
ofprimeimportanceindecidingontherelevanceofthatcollocationforlearners
ofalanguage.Butonehastobecareful,becausefrequencyaloneisnotareliable
criterion.Furtherstatisticalaspectswhichtakequestionsofprobabilityandinter-relationbetweentheelementsintoaccountalsohavetobeincorporated.Taken
together,thesetwocontinua(collocationalrangeandfrequency)canbeusedto
deriveafundamentalcriterionforcollocationswhichisobservableandeasyto
grasp,namelythepredictabilityormutualexpectancyofwords.Predictabilityis
acognitiveorpsychologicalfeaturewhichisdecisiveforcollocations.Thiscan
easilybeexperiencedinassociationtests,orevenineverydayconversation,when
ahearerfeelsthats/heisabletocontinueanutterancebegunbyaspeaker.Native
speakersoftenonlybecomeawareofcollocationswhentheyareusedcreativelyor
inappropriatelyinatext:youimmediatelystumbleoversuchunusualexpressions
whenreadingorhearingthem.So,theobservabilityofthiscriterionisusually
restrictedtoartificialexperimentalsituationsordependsonchance.Butwiththe
helpoflargedatasetsandcorpuslinguisticmethods,therolethatpredictability
playscanbeatleastapproximatelymeasured.
Basedonthesecriteria,Ihavedevelopedamulti-dimensionalclassification,
whereeachitemcanbepositionedatdifferentpointsalongthedimensions,thus
incorporatingallthecharacteristicsofacollocationinsteadofhighlightingonly
onefeature.Thisintegrativemethodhasalsobeenusedinotherapproaches.For
example,Barkema(1996)criticisestraditionalterminologyandclaimsthat,for
theclassificationofidioms,
[...]awell-definedmodelisrequiredthatdistinguishesbetweenvariousdescrip-
tivedimensionsandatthesametimepaysheedtothescalarnatureofthediffer-enttypesofcharacteristics. (Barkema1996:154)
3. A multi-dimensional framework 3.1 Adetailedviewofthethreedimensions
Thecontinuadescribedabovewereusedtoestablishthreedimensions,eachrang-
ingfromminimumtomaximumononecriterion.Forthislearner-orientedap-54 SusanneHandl
proach,theextremepointsonthescalewereexcludedfromtheareaofcollocation,
sincetheyoutlinetheborderzonebetweenacollocationandanidiomorcom-pound,ontheonehand,andfreead-hoccombination,ontheother.Instead,acore
areawasdetermined,whichcontainsthemostobviousandclearexamples,andI
concentratedonthisareaofprototypicalcollocationsbecauseitmakesupalarge
partofthesyntagmaticrelationsthatcannoteasilybeassignedtohardandfast
categories.Stillwithinthecollocationalarea,itispossibletogradewordco-occur-rencesalongthesedimensions,thuscharacterisingmoreorlesstypicalexamples
fortheconceptofcollocation.Figure2givesanideaofthethreedimensions.
Thefirstdimensionisbasedonthevariationofthesemanticcontribution
ofoneelementtothewholeexpression.Bycomparingthemeaningsofisolated
itemswiththoseoftheitemswithinthecombinationthecollocationcanbeposi-tionedalongthecontinuum.Ifthemeaninginsidethecombinationisthesameas
themeaningoutside(e.g.,into run a race)theexpressionismaximallytranspar-entandispositionedtowardsthefree-combinationendpointofthedimension.If
knowingthemeaningsoutsidethecombinationdoesnothelpinunderstanding
thewholeexpression(e.g.into run the gauntlet orto face the music),thisisase-manticallyopaqueidiom.
Thelexicaldimensionisguidedbythesizeofthecollocationalrange.Ina
corpusquery,therangeofanodewordcanbedeterminedbyretrievingthelistof
alltheco-occurringlexicalitemsfromitsconcordance.Atypicalcollocationmay
consistofelementschosenfromarestrictedsetoflexicalitems,i.e.fromasmall
collocationalrange.Theremaybealternativecombinationsforsimilarmeanings
(asin(8a)),orcompletelydifferentcollocationsbuiltwiththesamenode(asin
(8b)).
(8) a. in the near/not-too-distant/immediate/foreseeable + future
b. uncertain/painful/bright + future Figure 2. Amulti-dimensionalclassification
Theendpointsofthescaleareagainreservedforidiomsandcompoundsinthe
caseofverysmallranges,andforfreecombinationsifthereisalargerange.
Thelastcontinuuminthemodel,thestatisticaldimension,showsasimilar
distributionofcollocation,idiom/compoundandfreeco-occurrence.Whilethe
lattertwoholdtheextremepositions,determinedbyeitherthehighestorthelow-eststatisticalscores,collocationoccupiesthecoreareaofthedimension.Aswell
astheprobabilitymeasuresnormallyusedinlargecorpora,thedecisivecriterion
istherelationbetweentheindependentfrequenciesofthesingleitemsandthe
frequencyoftheircombination.
Aseachcollocationalpartnerhasitsownoverallfrequencyinthecorpus,
twodifferentscoresforthecollocationcanbedetermineddependingonwhich
constituentischosen.Theresultingcollocationalfactor(seeSection3.2below)
describestheimpactalexicalitemhasonthecollocationitoccursin.Thisgives
risetothegeneralobservationthatcollocationscannotbeallocatedtothethree
dimensionsdescribedhereassinglespots;ratherthedimensionalclassification
hastobeeffectedforeachcollocationalpartnerseparately.Thisdoublingofthe
classificationholdsnotonlyforthestatisticaldimension,butalsoforboththe
semanticandthelexicaldimensions,sothatweendupwithanevenmorecom-plexpictureofcollocation.Thecriteriaofsemantictransparencyorcontribution,
collocationalrangeandfrequencyhavetobeconsideredforeachelementofacol-location,sothatitcanbeassigneditsownposition;andthepositionofthewhole
collocationisthenacollocationalprofiledefinedbythesinglepositionsoneach
dimension.Figure3providesanillustrationofthethreedimensions,doubledfor
two-wordcollocations.
Figure 3. Acollocationalprofile
56 SusanneHandl
3.2 Thestatisticaldimensionasastartingpointforarevisedaccount
ofcollocation
Thenewclassificationofcollocationdescribedinthisarticleresultsfromalarge-scalecorpusanalysisofsyntagmaticlexicalrelationsintheBritish National Cor- pus.Theaimwastodeviseamethodofdeterminingthescopeofrelevantcol-locationforadvancedlearnersofEnglish.Iran250highlyfrequentwordsasthe
nodesforanalysisthroughtheBNC,eachreturning200statisticallysignificant
collocates.Availablesignificancescores9 incorporaintegratethequestionofran-domco-occurrenceintovariousassociationmeasures.Irrespectiveofwhichkind
ofmeasureischosen,theyallsharetheassumptionthatcollocationisnotjusta
randomco-occurrence,butaunitmadeupofelementsthathaveacertaincon- nectiontoeachother.Accordingtothis,awordcombinationisjudgedtobesig-nificantifitspartnersco-occurmoreoftenthantheywouldifthewordsinthe
corpusweredistributedbychance.
Thesescores,donot,however,distinguishbetweenthecollocationalpartners
intermsofrelevance.Themutualdependencyexpressedishypothesisedtobea
constantandbalancedrelationship,i.e.thescoreisthesameforeachconstitu-entofacollocation(cf.Berry-Rogghe1973;Barnbrook1996;McEnery&Wilson
1996;Kennedy1998;Hunston2002;Meyer2002).Buttheundisputedcriterionof
predictabilitysuggeststhatthestatusoftheelementsinacollocationmustbeun-equal,or,atleast,thateachconstituenthasacertainforcetopredicttheotherone.
Inordertocapturethisunequalstatusofthepartnersinacollocation,Ipropose
anewscorethatrelatesthefrequencyofthesingleitem(i.e.alloccurrencesofthe
word)tothefrequencyoftheitemwithinthecollocation(i.e.itsoccurrencesin
thecombinationinquestion).Thisautomaticallyleadstothedevelopmentoftwo
differentfactorsforeachcollocation,oneforeachpartner.Theso-calledcolloca-tionalfactor(CF)iscalculatedasaratiobetweenthefrequencyofthecollocation
(Fcombined)andthefrequencyoftheindependentword(Fisolated).Theformulais
giveninFigure4.
Inthecorpusanalysis,themethodproducesaspreadsheet,asshowninFig- ure5,whichlistseverynodewithitscollocate,theirpart-of-speechtags,thevari- ousfrequencies,aprobabilitymeasure,inthiscasetheZscore,thatisincorpo-ratedinthecollocationalfactor,andtheCFsforeachofthepartners.
9. Themostwidelyusedtestsaret-tests,chi-squared,MI(mutualinformation)scoresandZ
scores(fordetailssee,forexample,Barnbrook1996;McEnery&Wilson1996andHunston
2002).Zscoreswerechosenforthisstudymainlybecauseoftheireaseofuseandthefactthat
2002).Zscoreswerechosenforthisstudymainlybecauseoftheireaseofuseandthefactthat