• No results found

4.3 Learning Morphological Paradigms Using

4.3.4 Merging Paradigms

For capturing more general paradigms, paradigm merging is performed. We rank potential paradigms by the ratio of common stems with the total number of stems captured by the paradigm. More precisely, given paradigms P1, P2, let S be the

total number of common stems. Let S1 be the total number of stems in P1 that

ed ing reclaim, aggravat, hogg, trimm, expell, administer, divert, register, stimulat, shap, rehabilitat, exempt, stiffen, spar, deceiv, contam- inat, disciplin, implement, stabiliz, feign, mistreat, extricat, mim- ick, alert, seal, etc.

s d implicate, ditche, amuse, overcharge, equate, despise, torpedoe, curse, plie, supersede, preclude, snare, tangle, eclipse, relinquishe, ambushe, reimburse, alienate, conceive, vetoe, waive, envie, nego- tiate, diagnose, etc.

er ing brows, wring, worship, cropp, cater, stroll, zipp, moneymak, tun, chok, hustl, angl, windsurf, swindl, cricket, painkill, climb, heckl, improvis, scream, scaveng, panhandl, lawmak, bark, clean, lifesav, beekeep, toast, matchmak, bodybuild, etc.

e ed subsid, liquidat, redecorat, exorcis, amputat, fertiliz, reshap, regu- lat, foreclos, infring, eradicat, reverberat, chim, centralis, restructur, crippl, rehabilitat, symbolis, reinstat, etc.

ly er dark, cheap, slow, quiet, fair, light, high, poor, rich, cool, quick, broad, deep, bright, calm, crisp, mild, clever, etc.

0 s benchmark, instrument, pretzel, wheelchair, scapegoat, spike, in- fomercial, catastrophe, beard, paycheck, reserve, abduction, etc.

Table 4.4: Sample paradigms in English

are not present in P1. Then, we can define the expected paradigm accuracy of P1

with respect to P2 by:

Acc1 =

S S + S1

(4.4) Acc2 is defined analogously.

We use the average of Acc1 and Acc2 to compute the combined (averaged)

expected accuracy of the merged paradigms P1, P2:

Acc(P1, P2) = S S+S1 + S S+S2 2 (4.5)

During each iteration, all paradigm pairs having an expected accuracy greater than a given threshold value are merged (see Figure 4.3). Once two paradigms are merged, stems that occur in only one of the paradigms inherit the morphemes from the other paradigm. This mechanism helps create a more general paradigm

i e zemin, faaliyetin, t¨orenler, sec¸im, incelemeler, eyalet, nem, takvim, makineler, y¨ontemin, becerisin, g¨or¨us¸meler, tekni˘gin, merkezin, iklim, g¨or¨unt¨uler, etc.

i a cevab, bakımın, mektuplar, esnaf, olayın, akıs¸ın, miktar, kayd, yas¸amay, bulgular, sular, masrafların, heyecanın, kalan, hakların, anlamın, etc.

i in sanayiin, de˘gerlerin, es¸in, denizler, duman, teminat, erkekler, kurulların, birbirin, vatandas¸larımız, gelis¸mesin, milletvekillerin, partisin, etc.

de e b¨olgesin, d¨uzeyin, y¨onetimin, dergisin, sekt¨or¨un, birimlerin, b¨olgelerin, t¨um¨un, b¨ol¨umlerin, tesislerin, d¨onemin, kongresin, evin, etc.

mesi en izlen, y¨ur¨ut¨ul, degis¸, ¨uretil, gerc¸ekles¸tiril, desteklen, gelis¸tiril, etc. 0 i iman, c¸ekim, mahkemelerin, ¨orneklem, gaflet, yazman, sanat, trendler, mahalleler, eviniz, hamamlar, piller, ¨o˘gretim, olimpiyat, etc.

Table 4.5: Sample paradigms in Turkish

r n kurze, ehemalige, eidgenoessische, professionelle, erste, bes- cheidene, ungewoehnliche, ethnische, unbekannte, besondere, na- tionalsozialistische, deutsche, etc.

e en praechtig, gesichert, dauerhaft, bescheiden, vereinbart, biologisch, natuerlich, oekumenisch, kantonal, unterirdisch, wissenschaftlich, nahegelegen, chinesisch, etc.

t en funktionier, konkurrier, schneid, mitwirk, ansteig, plaedier, pfeif, aufklaer, schluck, ausgleich, weitermach, abhol, ankomm, spazier, speis, aussteig, aufhoer, etc.

er ung versteiger, unterdrueck, erneuer, vermarkt, beschleunig, besetz, geschaeftsfuehr, wirtschaftsfoerder, finanzverwalt, verhandl, etc. 0 s potential, instrument, flohmarkt, vorhang, pilotprojekt, idol, rech-

ner, thriller, ensemble, bebauungsplan, empfinden, defekt, auf- schwung, etc.

Table 4.6: Sample paradigms in German

and helps recover missing word forms. Thus, although some of the word forms do not exist in the corpus, it becomes possible to capture these forms.

P1:{ed, ing}{confirm, detain, affirm, allow, complement, reject, absorb,

protect}

P2:{s, 0}{betray, alter, affirm, reject, protect, confirm, absorb, find, allow,

confirm, detain}

Acc = 0.76,

P1+P2:{ed, ing, s, 0}{confirm, detain, affirm, allow, complement, reject, absorb, protect, betray, alter, find}

Figure 4.3: An illustration of paradigm merging. P 1 and P 2 are merged with an accuracy measure of Acc = 0.76.

es ing e ed sketch, chew, nipp, debut, met, factor, profit, occurr, err, trudg, participat, necessitat, stomp, streak, siphon, stroll, sprint, drizzl, firm, climax, gestur, whipp, roll, tripp, stemm, dangl, shuffl, kindl, broker, chalk, latch, rippl, collaborat, chok, summ, propp, pedal, paralyz, parad, plough, cramm, slack, wad, saddl, conjur, tipp, gallop, totall, catalogu, bundl, barg, whittl, retaliat, straighten, tick, peek, jabb, slimm.

s ing ed 0 benchmark, mothball, weed, snicker, thread, queue, jack, paw, yacht, implement, import, bracket, whoop, conflict, spoof, stunt, bargain, honor, bird, fingerprint, excerpt, handcuff, veil, comment.

Table 4.7: Merged paradigms in English

Some example paradigms that are found by the system are given below in Table 4.7, Table 4.8, and Table 4.9 for English, Turkish, and German respect- ively.

4.4

Morphological Segmentation

Once words are clustered given a corpus thereby creating POS clusters, steps de- scribed earlier are followed to capture paradigms. Having paradigms, words are analysed by following different algorithms for known, unknown, and compound

u a e i yapabileceklerin, kredisin, hizmetleri’n, sevdikleriniz, yeter’, transferlerin, sevkin, elimiz, tehlikelerin, sas, mucizey, te- hditlerin, bakir, muhasebesin, gayrimenkuller, ecevit’, defterim, izlemelerin, tescilin, minarey, tahsilin, lastikler, yerlestirmey. i lar li in ruhsat, semt, ikilem, reaksiyonlar, harc, tip, prim, gidilmis,

kaldirmis, degistirmis, bulunmayacak, aktarmis, bulunacak, kapanacak, yazilabilecek, devredilmis, degisecek, gelmemis.

Table 4.8: Merged paradigms in Turkish

er 0 e en kassiert, beguenstigt, eingeholt, genuegt, an- gelastet, beruehrt, beinhaltet, zurueckgegeben, beschleunigt, initiiert, abgestellt, bewirkt, mitgen- ommen, abgebrochen, beruhigt, besichtigt.

te ung er ten t en lich e fahr, gebrauch, blockier, identifizier, studier, ent- falt, gestalt, agier, passier, sprech, berat, tausch, kauf, such, weck, beug, erreich, bearbeit, beo- bacht, erleid, ueberrasch, halt, helf, oeffn, pruef, uebertreff, bezahl, spring, fuell, toet.

0 te t er lichtenberg, limburg, hill, trier, elmshorn, dreie- ich, praunheim, heusenstamm, heddernheim, hellersdorf, schmitt, muehlheim, lueneburg, kas- sel, schluechtern, preungesheim, rodgau, bieber, osnabrueck, rodheim, muenchen, london, lissabon, seoul, wedding, treptow.

Table 4.9: Merged paradigms in German

words: