Masaki Murata Kiyotaka Uchimoto Qing Ma Hitoshi Isahara
Communications Research Laboratory, Ministry of Posts and Telecommunications
588-2, Iwaoka, Nishi-ku, Kobe, 651-2492, Japan
tel:+81-78-969-2181 fax:+81-78-969-2189 http://www-karc.crl.go.jp/ips/murata
fmurata,uchimoto,qma,isaharag@crl.go.jp
Abstract
Thispaperdescribestwonewbunsetsuidentication
methods using supervised learning. Since Japanese
syntactic analysis is usually done after bunsetsu
identication, bunsetsu identication is important
for analyzing Japanese sentences. In experiments
comparing the four previously available
machine-learning methods (decision tree, maximum-entropy
method, example-based approach anddecision list)
andtwonewmethodsusingcategory-exclusiverules,
the new method using the category-exclusive rules
withthehighestsimilarityperformedbest.
1 Introduction
This paper is about machine learning methods for
identifying bunsetsus, which correspond to English
phrasalunitssuchasnounphrasesandprepositional
phrases. Since Japanese syntactic analysis is
usu-allydoneafterbunsetsuidentication(Uchimotoet
al.,1999),identifying bunsetsuis importantfor
an-alyzingJapanese sentences. Theconventional
stud-iesonbunsetsuidentication 1
haveusedhand-made
rules (Kameda, 1995; Kurohashi, 1998), but
bun-setsuidenticationisnotaneasytask. Conventional
studiesusedmanyhand-maderulesdevelopedatthe
cost of many man-hours. Kurohashi, for example,
made 146 rules for bunsetsu identication
(Kuro-hashi,1998).
In an attempt to reduce the numb er of
man-hours, we used machine-learning methods for
bun-setsuidentication. Becauseit was notclearwhich
machine-learningmethodwouldbetheonemost
ap-propriate for bunsetsu identication, so we tried a
variety of them. In this paper we report
exper-iments comparing four machine-learning methods
(decision tree, maximum entropy, example-based,
anddecisionlistmethods)andournewmethods
us-ingcategory-exclusiverules.
1
Bunsetsuidenticationisaproblemsimilartochunking
(Ramshaw and Marcus, 1995; Sang and Veenstra, 1999) in
2 Bunsetsu identication problem
Weconducted experiments onthe following
super-visedlearning methodsforidentifyingbunsetsu:
Decisiontreemethod
Maximumentropymethod
Example-basedmethod(useofsimilarity)
Decisionlist(useofprobabilityandfrequency)
Method1(useofexclusiverules)
Method2(useofexclusiveruleswiththe
high-est similarity).
In general, bunsetsu identication is done after
morphological and before syntactic analysis.
Mor-phological analysis corresponds to part-of-speech
tagginginEnglish.Japanesesyntacticstructuresare
usually represented by the relations between
bun-setsus,which correspondto phrasal units such as a
noun phrase or a prepositional phrase in English.
So,bunsetsuidenticationisimportantin Japanese
sentenceanalysis.
In this paper, we identify a bunsetsu by using
information from a morphological analysis.
Bun-setsuidenticationistreatedasthetaskofdeciding
whethertoinserta\j"marktoindicatethepartition
between two bunsetsus as in Figure 1. Therefore,
bunsetsuidenticationisdonebyjudgingwhethera
partitionmarkshouldbeinsertedbetweentwo
adja-centmorphemesornot. (Wedonotusetheinserted
partitionmarkinthefollowinganalysisinthispaper
forthesakeofsimplicity.)
Ourbunsetsuidenticationmethodusesthe
mor-phologicalinformationofthetwoprecedingandtwo
succeedingmorphemesofananalyzedspacebetween
twoadjacentmorphemes. Weusethefollowing
mor-phologicalinformation:
(i) Majorpart-of-speech(POS)category, 2
(ii) Minor POScategoryorinectiontype,
(iii) Semanticinformation(therstthree-digit
num-ber of a category numb er as used in \BGH"
(NLRI,1964)),
2
(Kuro-(I) nominative-caseparticle (bunsetsu) objective-caseparticle (identify) .
(Iidentifybunsetsu.)
Figure1: Exampleofidentiedbunsetsus
bun wo kugiru .
(sentence) (obj) (divide) .
((I)divide sentences)
MajorPOS Noun Particle Verb Symb ol
MinorPOS NormalNoun Case-Particle NormalForm Punctuation
Semantics 2 None 217 2
Word 2 wo kugiru 2
Figure2: Informationusedin bunsetsuidentication
(iv) Word(lexicalinformation).
For simplicity we do not use the \Semantic
infor-mation" and \Word" in either of the two outside
morphemes.
Figure 2 shows the information used to judge
whetherornottoinsertapartitionmarkinthespace
between two adjacent morphemes, \wo (obj)" and
\kugiru (divide)," in the sentence \bun wo kugiru.
((I)dividesentences)."
3 Bunsetsu identication process for
each machine-learning method
3.1 Decision-tree method
In this work we used the program C4.5 (Quinlan,
1995) for the decision-tree learning method. The
four typ es of information, (i) major POS, (ii)
mi-norPOS,(iii) semanticinformation,and (iv)word,
mentioned in the previous section were also used
as features with the decision-tree learning method.
As shownin Figure3,the numb er offeatures is 12
(2+4+4+2)becausewedonotuse(iii)semantic
informationand(iv)wordinformationfromthetwo
outsidemorphemes.
InFigure2,forexample,thevalueofthefeature
`themajorPOSofthefarleftmorpheme'is`Noun.'
3.2 Maximum-entropymethod
Themaximum-entropymethodisusefulwithsparse
data conditions and has been used by many
re-searchers (Berger et al., 1996; Ratnaparkhi, 1996;
Ratnaparkhi, 1997; Borthwick et al., 1998;
Uchi-motoet al.,1999). Inourmaximum-entropy
exper-imentweusedRistad'ssystem(Ristad,1998). The
analysisis performedbycalculatingtheprobability
of insertingor notinsertinga partition mark,from
theoutput ofthesystem. Whichever probability is
Inthemaximum-entropymethod,weusethesame
four types of morphological information, (i) major
POS,(ii)minorPOS,(iii)semanticinformation,and
(iv)word,as inthedecision-treemethod. However,
it doesnotconsidera combinationof features.
Un-likethedecision-tree method, asa resultwehad to
combinefeaturesmanually.
First weconsidered a combinationof the bits of
eachmorphologicalinformation. Becausetherewere
four typesofinformation,thetotalnumberof
com-binations was 2 4
01. Since this numb er is large
and intractable, weconsidered that(i) major POS,
(ii) minorPOS,(iii) semanticinformation,and(iv)
wordinformationgraduallybecomemorespecicin
this order,andwecombinedthefourtyp esof
infor-mationin thefollowingway:
InformationA: (i)majorPOS
InformationB:(i)majorPOSand(ii)minorPOS
InformationC:(i)majorPOS,(ii)minorPOSand
(iii)semanticinformation
InformationD: (i)majorPOS,(ii)minorPOS,
(iii)semanticinformationand(iv)word
(1)
WeusedonlyInformationAandBforthetwo
out-side morphemes because we did not use semantic
and wordinformationin thesame wayit isusedin
thedecision-tree method.
Next,weconsideredthecombinationsofeachtyp e
of information. As shown in Figure 4, the numb er
of combinationswas64(2242422).
Fordatasparseness,inadditiontotheabove
com-binations,weconsideredthecasesinwhichrst,one
of the two outside morphemes was not used,
sec-ondly,neitherofthetwooutsideoneswereused,and
thirdly,onlyoneofthetwomiddleonesisused. The
number of features used in the maximum-entropy
methodis152,whichisobtained asfollows: 3
Figure3: Featuresusedin thedecision-treemethod
Farleftmorpheme Leftmorpheme Rightmorpheme Farrightmorpheme
Figure4: Featuresusedin themaximum-entropymethod.
No. offeatures= 2 2 4 2 4 2 2
In Figure 2, the feature that uses Information
B in the far left morpheme, Information D in the
left morpheme, Information C in the right
mor-pheme, and Information A in the far right
mor-pheme is \Noun: Normal Noun; Particle:
Case-Particle: none: wo;Verb: NormalForm: 217;
Sym-bol". Inthemaximum-entropymethodweused for
eachspace152featuressuchasthisone.
3.3 Example-based method (useof
similarity)
Anexample-based method was proposed byNagao
(Nagao, 1984) in an attempt to solve problems in
machine translation. To resolve a problem, it uses
themostsimilar example. Inthepresent work, the
example-based method impartially used the same
four typ es of information (see Eq. (1)) as used in
themaximum-entropymethod.
Tousethismethod,wemustdenethesimilarity
ofaninputtoanexample. Weusethe152patterns
fromthemaximum-entropymethodtoestablishthe
level of similarity. We dene the similarity S
be-tweenaninputandan exampleaccording to which
one ofthese 152 levelsisthematchinglevel, as
fol-lows. (The equation reects theimportance of the
twomiddlemorphemes.)
January1,1995ofaKyotoUniversitycorpus(thenumb erof
spacesbetweenmorphemeswas25,814)byusingthismethod,
S=s(m
theleft,right,farleft,andfarrightmorphemes,and
s(x) is themorphological similarity of a morpheme
x,whichisdened asfollows:
s(x)=1(whennoinformationofxismatched)
2(whenInformationAofxismatched)
3(whenInformationBofxismatched)
4(whenInformationCofxismatched)
5(whenInformationDofxismatched)
(3)
Figure 5 shows an example of the levels of
sim-ilarity. When a pattern matches Information A of
allfourmorphemes, suchas \Noun; Particle;Verb;
Symb ol", itssimilarity is 40,004(222210;000+
222). Whenapattern matchesa pattern, such as
\|; Particle: Case-Particle: none: wo; |; |",its
similarityis50,001(521210;000+121).
The example-based method extracts the
exam-ple with the highest level of similarity and checks
whetherornotthatexampleismarked. Apartition
markisinsertedintheinputdataonlywhenthe
ex-ampleismarked. Whenmultipleexampleshavethe
samehighestlevelofsimilarity,the selectionofthe
best example is ambiguous. In this case, wecount
thenumber of marked and unmarked spaces in all
oftheexamples andchoosethelarger.
3.4 Decision-listmethod(useof probability
and frequency)
The decision-list method was proposed by Rivest
(sentence) (obj) (divide) .
s(x) m
02
m
01
m
+1
m
+2
Noinformation 1 | | | |
InformationA 2 Noun Particle Verb Symb ol
InformationB 3 NormalNoun Case-Particle NormalForm Punctuation
InformationC 4 2 None 217 2
InformationD 5 2 wo kugiru 2
Figure 5: Exampleoflevelsofsimilarity
butareexpandedbycombiningallthefeatures,and
are stored in a one-dimensional list. A priority
or-der is dened in a certain way and all of the rules
arearrangedinthisorder. Thedecision-listmethod
searches for rules from the top of the list and
an-alyzes a particular problem by using only the rst
applicablerule.
Inthis studyweusedin thedecision-listmethod
thesame152typ esofpatternsthatwereusedinthe
maximum-entropymethod.
Todeterminethepriorityorderoftherules,we
re-ferredto Yarowsky'smethod (Yarowsky,1994)and
Nishiokayama'smethod(Nishiokayamaetal.,1998)
andusedtheprobabilityandfrequencyofeach rule
as measures of this priority order. When multiple
rules had the same probability, the rules were
ar-rangedinorder oftheirfrequency.
Suppose, for example, that Pattern A \Noun:
Normal Noun; Particle: Case-Particle: none: wo;
Verb: Normal Form: 217; Symb ol: Punctuation"
occurs 13 times in a learning set and that ten of
theoccurrencesincludetheinsertedpartitionmark.
SupposealsothatPatternB\Noun;Particle;Verb;
Symb ol"occurs123timesinalearningsetandthat
90oftheoccurrencesincludethemark.
Thisexampleisrecognizedbythefollowingrules:
PatternA)Partition 76.9%(10/13), Freq.23
PatternB)Partition 73.2%(90/123), Freq. 123
Manysimilarrulesweremadeandwerethenlisted
inorderoftheirprobabilitiesand,foranyone
prob-ability, in order of their frequencies. This list was
searchedfrom thetopandtheanswerwasobtained
byusingtherstapplicablerule.
3.5 Method1(use ofcategory-exclusive
rules)
Sofar,wehavedescribedthefour existingmachine
learning methods. In the nexttwo sections we
de-scribeourmethods.
Itisreasonable toconsiderthe152patternsused
in three ofthe previousmethods. Now, letus
sup-posethatthe152patternsfromthelearningsetyield
\Partition"meansthattheruledeterminesthata
partition markshould beinsertedintheinputdata
and\non-partition"meansthattheruledetermines
that apartition markshould notbeinserted.
Supposethat whenwesolveahypothetical
prob-lem PatternsA to G are applicable. If weuse the
decision-list method, only Rule A is used, which is
applied rst, and this determines that a partition
mark should not beinserted. For Rules B, C, and
D,althoughthefrequencyofeachruleislowerthan
that ofRule A, thesum of theirfrequenciesof the
rules is higher, so we think that it is better to use
Rules B, C, and D than Rule A. Method 1 follows
thisidea,butwedonotsimplysumupthe
frequen-cies. Instead,wecountthenumberofexamplesused
inRulesB,C,andDandjudgethecategoryhaving
thelargestnumb erofexamplesthatsatisfythe
pat-tern with the highestprobability to be thedesired
answer.
For example, suppose that in theaboveexample
the number ofexamples satisfying RulesB,C, and
D is 65. (Because some examples overlap in
multi-ple rules, the total number of examples is actually
smaller thanthetotalnumb erof thefrequenciesof
the three rules.) In this case, among theexamples
usedbytheruleshaving100%probability,the
num-ber of examples of partition is 65, and thenumb er
ofexamplesofnon-partitionis34. So,wedetermine
that thedesiredansweristopartition.
A rule having 100% probability is called a
category-exclusive rulebecause allthe data
satisfy-ing itbelongto one category, which is either
parti-tion or non-partition. Because for any given space
the numb er of rules used can be as large as 152,
category-exclusiverulesareapplied often 4
. Method
1usesallofthesecategory-exclusiverules,sowecall
it themethodusingcategory-exclusive rules.
Solving problemsbyusing ruleswhose
probabili-tiesarenot100%mayresultinthewrongsolutions.
Almostallofthetraditionalmachinelearning
meth-odssolveproblemsbyusingruleswhoseprobabilities
4
The ratio of the spaces analyzed by using
category-exclusiverulesis99.30%(16864/16983)inExperiment1 of
RuleB: PatternB ) probabilityofpartition 100% (33/33) Frequency33
RuleC: PatternC ) probabilityofpartition 100% (25/25) Frequency25
RuleD: PatternD ) probabilityofpartition 100% (19/19) Frequency19
RuleE: PatternE ) probabilityofpartition 81.3%(100/123) Frequency123
RuleF: PatternF ) probabilityofpartition 76.9% (10/13) Frequency13
RuleG: PatternG ) probabilityofnon-partition 57.4%(310/540) Frequency540
... ... ...
Figure6: anexampleofrulesused inMethod 1
are not 100%. By using such methods, wecannot
hopetoimproveaccuracy. Ifwewanttoimprove
ac-curacy,wemustusecategory-exclusiverules. There
are somecases, however,forwhich, evenif wetake
thisapproach,category-exclusiverulesarerarely
ap-plied. In such cases, we must add new features to
the analysis to create a situation in which many
category-exclusiverulescanbeapplied.
However, it is not sucient to use
category-exclusive rules. There are many meaningless rules
which happen to be category-exclusive only in a
learning set. We must consider how to eliminate
suchmeaninglessrules.
3.6 Method2(using category-exclusive
rules with the highestsimilarity)
Method 2combinestheexample-basedmethodand
Method 1. That is, it combines the method using
similarity andthe method using category-exclusive
rulesinordertoeliminatethemeaningless
category-exclusiverulesmentionedin theprevioussection.
Method 2 also uses 152 patterns for identifying
bunsetsu. These patterns are used as rules in the
samewayasinMethod1. Desiredanswersare
deter-minedbyusingtherulehavingthehighest
probabil-ity. Whenmultipleruleshavethesameprobability,
Method 2 usesthevalueofthe similaritydescribed
inthesectionoftheexample-basedmethodand
an-alyzestheproblem withtherulehavingthehighest
similarity. Whenmultipleruleshavethesame
prob-ability and similarity, the method takesthe
exam-plesusedbytheruleshavingthehighestprobability
andthehighestsimilarity,andchoosesthecategory
with the larger number of examples as the desired
answer,inthesamewayasinMethod1.
However, when category-exclusive rules having
morethanonefrequencyexist,theaboveprocedure
is performed after eliminating all of the
category-exclusiveruleshavingonefrequency. Inotherwords,
category-exclusive rules having more than one
fre-quency are given a higher priority than
category-exclusive rules having only one frequency but
hav-ing a higher similarity. This is because
category-exclusiveruleshavingonlyone frequencyarenotso
4 Experiments and discussion
InourexperimentsweusedaKyotoUniversitytext
corpus (Kurohashi and Nagao, 1997), which is a
taggedcorpusmadeupofarticlesfromtheMainichi
newspaper. Allexperimentsreported in this paper
were performed using articles dated from January
1 to 5, 1995. We obtained the correct information
onmorphologyandbunsetsuidenticationfromthe
tagged corpus.
Thefollowingexperimentswereconducted to
de-termine which supervised learningmethod achieves
thehighestaccuracyrate.
Experiment1
Learningset: January1,1995
Testset: January3,1995
Experiment2
Learningset: January4,1995
Testset: January5,1995
BecauseweusedExperiment1inmakingMethod
1 andMethod 2, Experiment 1 isa closed data set
for Method1 andMethod2. So,weperformed
Ex-periment2.
The results arelisted in Tables 1 to 4. We used
KNP2.0b4(Kurohashi,1997)andKNP2.0b6
(Kuro-hashi, 1998), which are bunsetsu identication and
syntactic analysis systems using many hand-made
rules in addition to the six methods described in
Section 3. BecauseKNPisnotbasedonamachine
learning method but many hand-maderules,in the
KNPresults\Learningset"and\Testset"inthe
ta-bles havenomeanings. Intheexperimentof KNP,
wealsousesmorphologicalinformation inacorpus.
The\F"inthetablesindicatestheF-measure,which
is theharmonicmeanofa recallandaprecision. A
recallisthefractionofcorrectlyidentiedpartitions
out of all the partitions. A precision is the
frac-tion of correctly identied partitions out of all the
spaces whichwerejudgedto havea partition mark
inserted.
Tables1 to4 showthefollowingresults:
Method F Recall Precision
DecisionTree 99.58% 99.66% 99.51%
MaximumEntropy 99.20% 99.35% 99.06%
Example-Based 99.98% 100.00% 99.97%
DecisionList 99.98% 100.00% 99.97%
Method1 99.98% 100.00% 99.97%
Method2 99.98% 100.00% 99.97%
KNP2.0b4 99.23% 99.78% 98.69%
KNP2.0b6 99.73% 99.77% 99.69%
Thenumb erofspacesbetweentwomorphemesis
25,814. Thenumb erofpartitionsis9,523.
Table2: Resultsoftestset ofExperiment1
Method F Recall Precision
DecisionTree 98.87% 98.67% 99.08%
MaximumEntropy 98.90% 98.75% 99.06%
Example-Based 99.02% 98.69% 99.36%
DecisionList 98.95% 98.43% 99.48%
Method1 98.98% 98.54% 99.43%
Method2 99.16% 98.88% 99.45%
KNP2.0b4 99.13% 99.72% 98.54%
KNP2.0b6 99.66% 99.68% 99.64%
Thenumb erofspacesbetweentwomorphemesis
16,983. Thenumb erofpartitionsis6,166.
method. Although the maximum-entropy
method has a weak point in that it does not
learn the combinations of features, we could
overcomethisweaknessbymakingalmostallof
thecombinationsoffeaturestoproduceahigher
accuracyrate.
The decision-list method was better than the
maximum-entropymethodin thisexperiment.
Theexample-basedmethodobtained the
high-estaccuracyrateamongthefourexisting
meth-ods.
Although Method 1, which uses the
category-exclusive rule, was worse than the
example-basedmethod, itwasbetterthanthe
decision-list method. One reason for this was that
thedecision-listmethodchoosesrulesrandomly
whenmultipleruleshaveidenticalprobabilities
andfrequencies.
Method 2, which uses the category-exclusive
rule with the highest similarity, achieved the
highest accuracy rate among the supervised
learningmethods.
The example-based method, the decision-list
method, Method 1 andMethod 2 obtained
ac-curacyratesofabout100%forthelearningset.
Method F Recall Precision
DecisionTree 99.70% 99.71% 99.69%
MaximumEntropy 99.07% 99.23% 98.92%
Example-Based 99.99% 100.00% 99.98%
DecisionList 99.99% 100.00% 99.98%
Method1 99.99% 100.00% 99.98%
Method2 99.99% 100.00% 99.98%
KNP2.0b4 98.94% 99.50% 98.39%
KNP2.0b6 99.47% 99.47% 99.48%
Thenumb erofspacesbetweentwomorphemesis
27,665. Thenumb erofpartitionsis10,143.
Table4: ResultsoftestsetofExperiment2
Method F Recall Precision
DecisionTree 98.50% 98.51% 98.49%
MaximumEntropy 98.57% 98.55% 98.59%
Example-Based 98.82% 98.71% 98.93%
DecisionList 98.75% 98.27% 99.23%
Method1 98.79% 98.54% 99.43%
Method2 98.90% 98.65% 99.15%
KNP2.0b4 99.07% 99.43% 98.71%
KNP2.0b6 99.51% 99.40% 99.61%
Thenumb erofspacesbetweentwomorphemesis
32,304. Thenumb erofpartitionsis11,756.
strongforlearningsets.
The two methods using similarity
(example-basedmethodand Method2)werealways
bet-terthantheothermethods,indicatingthatthe
use of similarity is eective ifwecan dene it
appropriately.
We carried out experiments by using KNP, a
system that uses many hand-made rules. The
F-measureofKNPwashighestinthetestset.
Weused two versions ofKNP,KNP 2.0b4and
KNP 2.0b6. The latter was much better than
the former, indicating that the improvements
made by hand are eective. But, the
mainte-nance of rulesby hand has a limit, so the
im-provementsmadebyhandarenotalways
eec-tive.
TheaboveexperimentsindicatethatMethod 2is
best amongthemachinelearningmethods 5
.
In Table 5 we show some caseswhich were
par-titioned incorrectly with KNP but correctly with
5
In these experiments, the dierences were very small.
But,wethinkthatthedierencesaresignicanttosome
ex-tentbecauseweperformedExperiment1andExperiment2,
thedataweusedarealargecorpuscontainingaboutafew
tenthousandmorphemesandtaggedobjectivelyinadvance,
Table5:CaseswhenKNPwasincorrectandMethod
2 wascorrect
kotsukotsuj NEED
gaman-shi
(steadily) (bepatientwith)
(... bepatientwith...steadily)
yoyuu wo jmotte j NEED
shirizoke
(enoughstrength)obj (have) (beato)
(... beato...havingenoughstrength)
kaisha wo jgurupu-wakej WRONG
shite
companyobj (grouping) (do)
(... dogroupingcompanies)
Method2. Apartition with\NEED"indicatesthat
KNPmissedinsertingthepartitionmark,anda
par-titionwith\WRONG"indicatesthatKNPinserted
thepartitionmarkincorrectly. Inthetestsetof
Ex-periment1,theF-measureofKNP2.0b6was99.66%.
The F-measure increases to 99.83%, under the
as-sumptionthatwhen KNP2.0b6orMethod 2is
cor-rect, the answer is correct. Although the accuracy
rateforKNP2.0b6washigh, thereweresome cases
in which KNP partitioned incorrectly and Method
2 partitioned correctly. A combination of Method
2 with KNP2.0b6 may be able to improve the
F-measure.
The only previous research resolving bunsetsu
identication by machine learning methods, is the
work by Zhang (Zhang and Ozeki, 1998). The
decision-tree method was used in this work. But
this work used only a small numb er of
infor-mation for bunsetsu identication 6
and did not
achieve high accuracy rates. (The recall rate
was 97.6%(=2502/(2502+62)), the precision rate
was92.4%(=2502/(2502+205)),andF-measurewas
94.2%.)
5 Conclusion
To solve the problem of accurate bunsetsu
iden-tication, we carried out experiments comparing
four existing machine-learning methods
(decision-tree method, maximum-entropy method,
example-based method and decision-list method). We
ob-tained the following order of accuracy in bunsetsu
identication.
Example-Based>DecisionList>
MaximumEntropy>DecisionTree
We also described a new method which uses
category-exclusiveruleswith thehighestsimilarity.
Thismethodperformedbetterthantheother
learn-ingmethodsin ourexperiments.
6
ThisworkusedonlythePOSinformationofthetwo
mor-Adam L. Berger, Stephen A. Della Pietra, and Vincent
J.DellaPietra. 1996. AMaximumEntropyApproachto
NaturalLanguageProcessing.ComputationalLinguistics,
22(1):39{71.
Andrew Borthwick, John Sterling, Eugene Agichtein, and
Ralph Grishman. 1998. Exploiting Diverse Knowledge
SourcesviaMaximumEntropyinNamedEntity
Recogni-tion. InProceedingsoftheSixthWorkshoponVeryLarge
Corpora,pages152{160.
MasayukiKameda. 1995.SimpleJapaneseanalysistoolqjp.
TheAssociationforNaturalLanguageProcessing,the1st
NationalConvention,pages349{352. (inJapanese).
SadaoKurohashiandMakotoNagao. 1997.KyotoUniversity
textcorpusproject.pages115{118. (inJapanese).
SadaoKurohashiand MakotoNagao, 1998. Japanese
Mor-phologicalAnalysisSystem JUMANversion3.5.
Depart-mentofInformatics,KyotoUniversity.(inJapanese).
SadaoKurohashi, 1997. JapaneseDependency/Case
Struc-ture Analyzer KNP version 2.0b4. Departmentof
Infor-matics,KyotoUniversity.(inJapanese).
SadaoKurohashi, 1998. JapaneseDependency/Case
Struc-ture Analyzer KNP version 2.0b6. Departmentof
Infor-matics,KyotoUniversity.(inJapanese).
MakotoNagao. 1984.AFrameworkofaMechanical
Transla-tionbetweenJapaneseandEnglishbyAnalogyPrinciple.
ArticialandHumanIntelligence,pages173{180.
Shigeyuki Nishiokayama, Takehito Utsuro, and Yuji
Mat-sumoto. 1998. Extracting preference of dependency
be-tweenJapanesesubordinateclausesfromcorpus.
IEICE-WGNLC98-11,pages31{38. (inJapnese).
NLRI. 1964. (National Language Research Institute).
Word List by Semantic Principles. Syuei Syuppan. (in
Japanese).
J.R.Quinlan.1995. Programsformachinelearning.
Lance A. Ramshaw and Mitchell P. Marcus. 1995. Text
chunkingusingtransformation-basedlearning.In
Proceed-ingsoftheThirdWorkshoponVeryLargeCorpora,pages
82{94.
AdwaitRatnaparkhi. 1996. AMaximumEntropyModelfor
Part-Of-Sp eechTagging.ProceedingsofEmpiricalMethod
forNaturalLanguageProcessings,pages133{142.
AdwaitRatnaparkhi. 1997. ALinearObservedTime
Statis-ticalParserBasedonMaximumEntropyModels. In
Pro-ceedingsof EmpiricalMethodforNaturalLanguage
Pro-cessings.
Eric Sven Ristad. 1998. Maximum Entropy Modeling
To olkit, Release 1.6 beta. http://www.mnemonic.com/
software/memt.
RonaldL. Rivest. 1987. Learning Decision Lists. Machine
Learning,2:229{246.
ErikF.Tjong Kim Sangand Jorn Veenstra. 1999.
Repre-sentingtextchunks.InEACL'99.
Kiyotaka Uchimoto, Satoshi Sekine, and Hitoshi Isahara.
1999. Japanesedependency structure analysis based on
maximum entropy models. In Proceedings of the Ninth
Conference of the European Chapter of the Association
forComputationalLinguistics(EACL),pages196{203.
David Yarowsky. 1994. Decision lists for lexical ambiguity
resolution: Applicationto accent restoration in Spanish
and French. In32th AnnualMeetingof theAssocitation
oftheComputationalLinguistics,pages88{95.
Yujie Zhang and Kazuhiko Ozeki. 1998. The
applica-tion of classication trees to bunsetsu segmentation of
Japanesesentences. JournalofNaturalLanguage