• No results found

An Intelligent Multilingual Information Browsing and Retrieval System Using Information Extraction

N/A
N/A
Protected

Academic year: 2020

Share "An Intelligent Multilingual Information Browsing and Retrieval System Using Information Extraction"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

An Intelligent Multilingual Information Browsing and Retrieval

System Using Information Extraction

C h i n a t s u A o n e a n d N i c h o l a s C h a r o c o p o s a n d J a m e s G o r l i n s k y S y s t e m s R e s e a r c h a n d A p p l i c a t i o n s C o r p o r a t i o n ( S R A )

4300 Fair L a k e s C o u r t F a i r f a x , VA 22033

a o n e c @ s r a . c o m

A b s t r a c t

In this paper, we describe our multilingual (or cross-linguistic) information browsing and retrieval system, which is aimed at monolingual users who are interested in in- formation from multiple language sources.

The system takes advantage of information

extraction (IE) technology in novel ways to improve the accuracy o f cross-linguistic retrieval and to provide innovative meth- ods for browsing and exploring multilin- gual document collections. The system in- dexes texts in different languages (e.g., En- glish and Japanese) and allows the users to retrieve relevant texts in their native lan- guage (e.g., English). The retrieved text is then presented to the users with proper names and specialized domain terms trans- lated and hyperlinked. Moreover, the sys- tem allows interactive information discov- ery from a multilingual document collec- tion.

1 I n t r o d u c t i o n

More and more multilingual information is available on-line every day. The World Wide Web (WWW), for example, is becoming a vast depository of mul- tilingual information. However, monolingual users can currently access information only in their na- tive language. For example, it is not easy for a monolingual English speaker to locate necessary in- formation written in Japanese. The users would not know the query terms in Japanese even if the search engine accepts Japanese queries. In addition, even when the users locate a possibly relevant text in Japanese, they will have little idea about what is in the text. Outputs of off-the-shelf machine trans- lation (MT) systems are often of low-quality, and even "high-end" MT systems have problems partic- ularly in translating proper names and specialized domain terms, which often contain the most critical information to the users.

In this paper, we describe our multilingual (or cross-linguistic) information browsing and retrieval system, which is aimed at monolingual users who are interested in information from multiple language

sources. The system takes advantage of information

extraction (IE) technology in novel ways to improve the accuracy of cross-linguistic retrieval and to pro- vide innovative methods for browsing and exploring multilingual document collections. The system in- dexes texts in different languages (e.g., English and Japanese) and allows the users to retrieve relevant texts in their native language (e.g., English). The retrieved text is then presented to the users with proper names and specialized domain terms trans- lated and hyperlinked. The system also allows the

user in their native language to browse and discover

information buried in the database derived from the entire document collection.

2 S y s t e m D e s c r i p t i o n

The system consists of the Indexing Module, the Client Module, the Term Translation Module, and the Web Crawler. The Indexing Module creates and loads indices into a database while the Client Module allows browsing and retrieval of information in the database through a Web browser-based graphical user interface (GUI). The Term Translation Mod- ule is bi-directional; it dynamically translates user queries into target foreign languages and the indexed terms in retrieved documents into the user's native language. The Web Crawler can be used to add tex- tual information from the WWW; it fetches pages from user-specified Web sites at specified intervals, and queues them up for the Indexing Module to in- gest regularly.

For our current application, the system indexes names of people, entities, and locations, and scien- tific and technical (S~zT) terms in both English and Japanese texts, and allows the user to query and browse the database in English. When Japanese texts are retrieved, indexed terms are translated into English.

(2)

guages besides English and J a p a n e s e and other do- mains beyond S & T terms. Moreover, the English- centric browsing and retrieval m o d e can be switched according to the users' language preference so that, for example, a J a p a n e s e user can query and browse English documents in Japanese.

2.1 T h e I n t e l l i g e n t I n d e x i n g M o d u l e

T h e Indexing Module indexes names of people, enti- ties, and locations and a list of scientific and techni- cal (S~zT) terms using state-of-the-art IE technol-

ogy. It uses different configurations of the same

fast indexing engine called N a m e T a g T M for differ-

ent languages. T w o separate configurations ("index- ing servers") are used for English and Japanese, and how the English and J a p a n e s e indexing servers work is described in (Krupka, 1995; Aone, 1996).

In the Sixth Message Understanding Conference (MUC-6), the English system was benchmarked against the Wall Street Journal blind test set for the name tagging task, and achieved a 96% F- measure, which is a combination of recall and preci- sion measures (Adv, 1995),. Our internal testing of the Japanese system against blind test sets of various Japanese newspaper articles indicates t h a t it achieves from high-80 to low-90% accuracy, de- pending on the types of corpora. Indexing names in Japanese texts is usually more challenging t h a n English for two main reasons. First, there is no case distinction in Japanese, whereas English names in newspapers are capitalized, and capitalization is a

very strong clue for English n a m e tagging. Sec-

ond, Japanese words are not separated by spaces and therefore must be segmented into separate words be- fore the n a m e tagging process. As segmentation is not 100% accurate, segmentation errors can some- times cause name tagging rules not to fire or to mis- fire.

Indexing of names is particularly useful in the Japanese case as it can improve overall segmenta- tion and thus indexing accuracy. In English, since words are separated by spaces, there is no issue of in- dexing accuracy for individual words. On the other hand, in languages like Japanese, where word bound- aries are not explicitly marked by spaces, indexing accuracy of individual words depends on accuracy of word segmentation. However, most segmentation algorithms are more likely to make errors on names, as these are less likely to be in the lexicons. Name tagging can reduce such errors by identifying names as single units.

Both indexing servers are "intelligent" because

they

identify

and

disambiguate

names with high

speed and accuracy. T h e y identify names in texts dynamically rather t h a n relying on finite lists of names. Thus, they can identify names which they have never seen before. In addition, they can dis- a m b i g u a t e types of names so t h a t a person n a m e d "Washington" is distinguished from a place called

Washington, and a c o m p a n y "Apple" can be dis- tinguished from a c o m m o n noun "apple." In addi-

tion, they can

generate

aliases of names a u t o m a t -

ically (e.g., "ANA" for "All Nippon Airline") and

link

variants of names within a document.

As the indexing servers process texts, the in- dexed terms are stored in a relational d a t a b a s e with their semantic type information (person, entity, place, S&:T term) and alias information along with such m e t a d a t a as source, date, language, and fre- quency information. T h e system can use any O D B C (Open D a t a B a s e Connectivity)-compliant database, and form-based Boolean queries from the Client Module, similar to those seen in any Web search engine, are translated into s t a n d a r d SQL queries automatically. We have decided to use commercial databases for our applications as we are not only in- dexing strings of terms but also adding much richer information on indexed terms available through the use of IE technology. Furthermore, we plan to apply d a t a - m i n i n g algorithms to the resulting databases to conduct advanced d a t a analysis and knowledge discovery.

2.2 T h e C l i e n t M o d u l e

T h e Client Module lets the user both retrieve and browse information in the d a t a b a s e through the Web browser-based GUI. In the query m o d e (cf. Fig- ure 1), a form-based Boolean query issued by a user is a u t o m a t i c a l l y translated into an SQL query, and the English terms in the query are sent to the T e r m Translation Module. T h e Client Module then re- trieves documents which m a t c h either the original English query or the translated J a p a n e s e query. As the indices are names and terms which m a y con- sist of multiple words (e.g, "Bill Clinton," "personal computer"), the query terms are delimited in sep- arate boxes in the form, making sure no ambiguity occurs in b o t h translation and retrieval. T h e user has the choice of selecting the sources (e.g, Washing- ton Post, Nikkei Newspaper, Web pages), languages (e.g., English, Japanese, or both), and specific date ranges of documents to constrain queries.

(3)

iD~S.r.

..:'--:'."i" . : . . . : . . . . : ' " - = -

W

~ 4 : ~ ... ~;;~; ... ~ ~:

= : ~ . : . . . .

... x ~ ... re l :

. . . - . . . . , .. . . . .

to

c ] ~ Form °ti sub~t seart.~

[image:3.612.169.481.182.602.2]
(4)

( 7 j g 3 1 ~ ] 1 { i : 1 9 ) . ' . . ~ • . " / . - . . . " . "

: .fi : / i

.. .... . - " . 7 ' -

,y~:!> k : 7 7 . . .

. . . w . . . . . . . w . : . v . w . v . v - - - - - . w . . . v . w : . . : : . w : : . w : : . = = . . = w . = = = = w : : : . v : , : . . . . : v . - . v . v . v v . . . v . . . v . . : . v = v . v . v : v : : . = . v : . v . . = ,

Figure 2: Translated and Hyperlinked Terms

perlinking is based on the original or translated

En-

glish

terms, the user can follow the links to b o t h En-

glish and J a p a n e s e documents transparently. In ad- dition, the Client Module is integrated with a com- mercial M T s y s t e m for rough translation. A docu- ment which the user is browsing can be translated on the fly by clicking the T R A N S L A T E button.

2.3 T h e T e r m T r a n s l a t i o n M o d u l e

T h e T e r m Translation Module is used by the Client Module bi-directionally in two different modes. It translates English query terms into Japanese in the query m o d e and translates Japanese indexed terms into English for viewing of a retrieved Japanese text in the browsing mode.

This translation module is sensitive to the seman- tic types of terms it is translating to resolve trans- lation ambiguity. Thus, if a t e r m can be translated in one way for one type and in another way for an- other type, the T e r m Translation Module can o u t p u t a p p r o p r i a t e translations based on the type informa- tion. For example, in translating J a p a n e s e text into

English, a single

kanji

(Chinese) character standing

for England can be also a first name of a Japanese personal name, which should be translated to "Hide" and not "England." In translating an English query into Japanese, a c o m p a n y "Apple" should be trans-

lated into a transliteration in

katakana

and not into

a Japanese word meaning a fruit apple.

T h e T e r m Translation Module uses various re-

sources and m e t h o d s to translate English and J a p a n e s e names. We use a u t o m a t e d m e t h o d s as much as possible to reduce the cost of creating a large n a m e lexicon manually.

First, this module is unique in t h a t it creates on

the fly English translations of

hiragana

names and

personal names. H i r a g a n a names are transliterated into English using the h i r a g a n a - t o - r o m a j i m a p p i n g rules. Japanese personal names are translated by finding a combination of first and last names which spans the i n p u t ) Then, each of the n a m e parts is translated using the Japanese-English first and last name lexicons.

In addition, in order to develop a large lexicon of English names and their J a p a n e s e translations,

which are transliterated into

katakana,

we have au-

tomatically generated k a t a k a n a names from pho-

netic transcriptions of English names. We have

written rules which m a p s phonetic transcriptions to k a t a k a n a letters, and generated possible Japanese k a t a k a n a translations for given English names. As transliterations of the same English names m a y dif- fer, multiple k a t a k a n a translations m a y be generated for single English n a m e s 3

T h e remaining terms are currently translated us- ing the English-Japanese translation lexicons, and we are expanding the lexicons by utilizing on-line resources and corpora and a translation aiding tool.

3

Utilizing IE in Multilingual

I n f o r m a t i o n A c c e s s

T h e system applies

information extraction

technol-

ogy (Adv, 1995) to index names accurately and ro- bustly. In this section, we describe how we have in- corporated this technology to improve multilingual information access in several innovative ways.

3.1 Q u e r y D i s a m b i g u a t i o n

As described in Section 2.1, the Indexing Module not only identifies names of people, entities and locations but also disambiguates types a m o n g themselves and between names and non-names. Thus, if the user is searching for documents with the location "Wash- ington (not a person or a c o m p a n y n a m e d "Wash- ington"), a person "Clinton" (not a location), or an entity "Apple" (not fruit), the s y s t e m allows the user to specify, through the GUI, the type of each query t e r m (cf. Figure 1). This ability to disambiguate types of queries not only constrains the search and hence improves retrieval precision but also speeds

1The Japanese Indexing Module does not specify if an identified name is a first name, a last name, or a combination of first and last name. Since there is no space between first and last names in Japanese, this must be automatically determined.

[image:4.612.94.299.73.351.2]
(5)

up the search time considerably especially when the database is very large.

3.2 T r a n s l a t i o n D i s a m b i g u a t i o n

In developing the system, we have intentionally avoided an approach where we first translate foreign- language documents into English and index the translated English texts (Fluhr, 1995; Kay, 1995; Oard and Dorr, 1996). In (Aone et al., 1994), we have shown t h a t , in an application of extracting in- f o r m a t i o n f r o m foreign language texts and present- ing the results in English, the " M T first, IE second" approach was less accurate t h a n the approach in the reverse order, i.e., "IE first, M T second". In partic- ular, translation quality of names by even the best M T systems is poor.

There are two cases where an M T system fails to translate names. First, it fails to recognize where a n a m e starts and ends in a text string. This is a non-trivial p r o b l e m in languages such as Japanese where words are not segmented by spaces and there is no capitalization convention. Often, an M T sys- t e m "chops up" names into words and translates

each word individually. For example, a m o n g the

errors we have encountered, an M T system failed to recognize a person n a m e "Mori Hanae" in kanji characters, segmented it into three words "mori," "hana," and "e" and translated t h e m into "forest," "England" and "blessing," respectively.

Another c o m m o n M T system error is where the s y s t e m fails to m a k e a distinction between names and non-names. This distinction is very i m p o r t a n t

in

getting correct translations as names are usu-

ally translated very differently f r o m non-names. For example, a personal n a m e "Dole" in k a t a k a n a was translated into a c o m m o n noun "doll" as the two have the same k a t a k a n a string in Japanese. Abbre- viated country names for J a p a n and United States in single kanji characters, which often occurs in news- papers, were sometimes translated by an M T s y s t e m into their literal kanji meanings, "day" and "rice," respectively.

Our s y s t e m avoids these c o m m o n but serious translation errors by taking advantage of the Index- ing Module's ability to identify and disambiguate names. In translating terms f r o m J a p a n e s e to En- glish in the browsing mode, the Indexing Module identifies names correctly, avoiding the first type of translation errors. Then, the T e r m Translation Module utilizes type information obtained by the In- dexing Module to decide which translation strategies to use, thus overcoming the second type of error.

3.3

Intelligent Query Expansion and

Hyperlinking

As described in Section 2.1, the Indexing Module a u t o m a t i c a l l y identifies aliases of names and keeps track of such alias links in the database. For exam- ple, if "International Business Machine" and "IBM"

appears in the same document, the system records in the database t h a t they are aliases.

T h e system uses this information in a u t o m a t i c a l l y expanding terms for query expansion and hyper- linking. At the query time, when the user types

"IBM" and chooses the alias option in the search

screen (see Figure 1), the query is a u t o m a t i c a l l y ex- panded to include its variant names b o t h in English and Japanese, e.g., "International Business Ma- chine," "International Business Machine Corp." and J a p a n e s e translations for "IBM" and their aliases

in Japanese. This is especially useful in retriev-

ing J a p a n e s e documents because typically the user would not know various ways to say "IBM" in

Japanese. T h e a u t o m a t e d query expansion thus

improves retrieval recall without m a n u a l l y creating alias lexicons.

T h e same alias capability is also used in hyper- linking indexed t e r m s in browsing a document. For example, when a user follows a hyperlink "United States," it takes the user to a collection of documents which contains the English t e r m "United States" and its aliases (e.g., "US," "U.S.A." etc.), and the J a p a n e s e translations of "United States" and their aliases. T h e result is a truly t r a n s p a r e n t multilin- gual document browsing and access capability.

3.4 I n f o r m a t i o n D i s c o v e r y

One of the biggest advantages of introducing IE tech- nology into information access systems is the ability to create rich structured d a t a which can be analyzed for "buried" information. Our multilingual capabil- ity enables the merging of possibly c o m p l e m e n t a r y d a t a from b o t h English and J a p a n e s e sources and enriching the available information.

Currently t h e s y s t e m offers the user several ways to explore and discover hidden information. Our search capability allows interactive information dis- covery methods. For example, using the query inter- face, the user can in effect ask "Which c o m p a n y was mentioned along with Intel in regard to micropro- cessors?" and the s y s t e m will return all the articles which mentions "Intel," "microprocessors," and one or more c o m p a n y names. T h e user might see t h a t NexGen and Cyrix often occurs with Intel and find out t h a t they are c o m p e t i t o r s of Intel in this field. Or the user might ask " W h o is related to "Shinshin- tou Party," a J a p a n e s e political party, and the user can find out all the people associated with this party. This type of search capabilities cannot be offered by typical information retrieval systems as they treat words as just strings and do not distinguish their semantic attributes.

(6)

: u ~ * P r o ~.=.= ~ : ~ ..,a~:t~P~t~,

.... : : : " . ~ . . s . ~ t ~ . : ~ : ..: ..

: . + l ~ > ~ m ~ 6 ~ o • .:: ~ ; ~ , . : e : : :

{: :~: : . : ~ o : : ... :... : u ~ :::. s : : : . . . : . :

:..*e~l+To~: .::. :: : ;:: : U ~ o ~ : :' : ..:"::~ :.::.:. :: ::.i

• ~ " ; " ~ ; ~ ° m " : ~ ~ ':: :~0~: !:.~ : : :-.i+

:' ]

+

: . . . . .. , " . : : : ~ + ~ + m ::. ... : +

Figure 3: Person Names Co-occurring with Peru

:IEI:]NI : +

[To~z5] T=p25] : I T o p 2 : ] [To.~25J [ T o ~ ' ~ ]

. . . : ~ . t T m ~ ] : T o e f ~ } [~'~p ~ ] [To_~.~] . [ T m . 5 ~ ]

Figure 4: The 25 and 50 Most Frequent Names

"Toshiba" mentioned in this document, the user can establish an immediate connection and follow the

link from "Toshiba" to other English and Japanese

documents which contain that term.

In addition, for each indexed term, the user can explore co-occurring persons, entities, places and technology. For example, Figure 3 shows a list of people co-occurring with the place "Peru." It lists the Japanese prime minister and the Peruvian pres- ident at the top (as the Japanese embassy hostage incident occurred recently.)

4 T h e S y s t e m T o u r

In this section, we give a tour of the system. Figure 4 shows the main Browse screen where the user can browse the top 25 or 50 names of people, entities, locations, and S&:T terms. This can provide the user with a snapshot of what is in the database and what types of information are likely to be available. By following the top 50 entity name link, the user sees the list of entity names in order of frequency

(cf. Figure 5). T h e Subtype column in the screen

indicates more detailed types of the entity (e.g., or- ganization, company, facility, etc.) From this screen, the user can go to a list of all English and Japanese documents which mention, for example, "Bank of Japan" by clicking the link (cf. Figure 6). The list provides information on the title, length, source, lan- guage, and date of each article.

: : ; . : e ~ w o ~ : ~ , , . " ~ . ~ = , , ~ ::; i i ~ o ~ , ~ . ::....11+ : : / . : i ; : ; i i . : . ~ , ~ i m ~ : ~ m N , , ~ ^ ~ i ~ i ~ , i:. i

... i ~ * ~ m m m ~ p : A ~ : . ' i ..~=~,.~,, ~. 14. : : :.: • : i : . ? " ~ : : : : L T. • :: - :

:: : ' .e~A++r~:oeJ+~^~ :.:,: : a , ; ~ = . . : i{. ~. :

[image:6.612.95.301.72.351.2] [image:6.612.339.548.118.259.2] [image:6.612.338.544.376.656.2]
(7)

! ~ ' ] Browse Results

fu~ow~ IS dtocumm~ ~ ~'~MJqK O1~ JAPAN".

Lagth Som~ [~ -*"2"--2"- na~

t - 6 ! ~ ) . . .

• Bp.'C~',~COn~de~,~oe~ndcp~d, en~ 2001 .Till " ent~b~., :t996-~-32

-. . • .

" . . . . • . . Next5 .:. : ..

:.: ~ ] ' ~ : .::,- .. . .

. . . . " : . T ~ ::-- : - : " :::

Figure 6: Documents Containing "Bank of J a p a n "

• " " !

< Weshington 30 flay ~. Hyama degance hhtoly > ~ ~ e ~7 F4~pt p r ~ i d ~ t

who is in the mid~ of vifiting:the United States on the 30th. cony•wed with t

American I~aident Clinton at Whitelumse, Conferred c0ncermng Middle

Eutetn peace negotiation ~md the ~'Torist meamre which are aagnant with

strong sroup neva item -~ ~' ~ adminiRr atio n start of h reel.

I n j o i n t press ~ n f e n m c e after converting Is for s. rose ~ president," in order

f o r our ~dOrt to mcceed, A m Q i c a n rOleis indisl~ns~ble, " t h a t doing, in order

to Pull back hrael m ~ proton, it made that it um~ht the Ix~ifive

mediation o f t h e United States clear.

Vis-a-vis thh~ az forFre~ident o i n t o ~ ; ,~ at .for ut, at you ¢gpreu; that it

: agreed,b~_ thefact t h t t the prenmt Middle Easter n peace p r v c m which expands Pale~inian p r ~ I o n a l autonomy b firmlymaintained " i n the ::filmre,it'includes thettart of Syl.i.an. L ~ a n o n and Isr~e~ which ereleft

.negotiatlon; ~ d ~ ~ u r . ~ b l e t h h ~ entirely" that dete/'minatio n w u Shown,

• is, lint" it d ~ : n o t e~ape t s o m e di~conthm ance and ~tagnadon," that

attendant up0a hraeli ~ n i s t r a t i o n allefntlion it did.did not show the

~encre~ ~ t~.zituaUon brea~ . . .

. . . : . .~

Ononelhand bethlead¢~ did opinion ~¢changeconcerning te~oHs t " .

,precenU.'~n t ~ ; :u, f0r :~ ra~e-:~' P ~ d e ~ C * u : far Problem 0f tefrorit~n,. :~tarted from M i ( l f i l ¢ . E a s ~ region " t h a t d o i i ~ , if.indm~e peace of the • Middle EMt a~cm alizes~ to conclude it. probiblY:iS p0~ible.95 % ~ te~rorin

] a c t i v k y o f ~ e ~ d d ' ~ t h a t y c u ~ l ~ e i ~ d . . . . : : : . . . .

:::(~y~t~10i:~): : :: " -~ ' . ' . :

• . . . , . - . . . . : . . / . . . . .

Figure 7: Translation by a C o m m e r c i a l M T s y s t e m

In the m a i n Search screen (cf. Figure 1), the user types in each query term, including multi-words like "personal computer," in each numbered box. T h e user can formulate a Boolean query using the box numbers and boolean operators. If not specified, the query t e r m s are joined by " O R " . When the A l i a s b u t t o n is on, query terms are expanded to include their aliases. T h e T y p e menu allows the user to dis- a m b i g u a t e types of query terms. In the L a n g u a g e box, the user has the choice of selecting documents in English, Japanese, or both. In addition, the user can constrain sources and the date range of docu- ments, and also sort the results by date, title, and

s o u r c e s .

As discussed in Section 2.2, when the user selects a J a p a n e s e article, they can optionally send the article to a commercial M T s y s t e m for rough translation by pushing the TRANSLATE b u t t o n (cf. Figure2). Fig- ure 7 shows the translation result for the J a p a n e s e d o c u m e n t in Figure 2.

5 S u m m a r y

We have described an advanced multilingual cross- linguistic information browsing and retrieval sys- t e m which takes a d v a n t a g e of information extraction technology in unique ways. In addition to its basic capability of allowing a user to send Boolean queries in English against English and J a p a n e s e documents and to view the results in semi- and fully translated

forms, the s y s t e m has m a n y innovative capabilities. It can disambiguate query terms to increase preci- sion, expand query terms a u t o m a t i c a l l y using aliases to increase recall, and improve translation accuracy significantly by finding and d i s a m b i g u a t i n g names accurately. Moreover, the system allows interactive information discovery f r o m a multilingual d o c u m e n t collection by combining IE and M T technologies.

T h e Indexing Module is currently running on a Sun p l a t f o r m and is designed to scale for a multi-user operational environment. T h e Web browser-based user interface will work in any Web browser sup- porting H T M L 3.0 on any p l a t f o r m which the Web browser supports, and this ensures a large user base. T h e system is customizable in several ways. For our current application, the system indexes names and S & T terms, but for other applications we can cus- tomize the s y s t e m to index different types of names

and terms. For example, the s y s t e m can be cus-

[image:7.612.343.551.79.362.2] [image:7.612.100.308.80.361.2]
(8)

R e f e r e n c e s

Advanced Research Projects Agency. 1995. Proceed-

ings of Sixth Message Understanding Conference (MUC-6). Morgan Kaufmann Publishers.

Aone, Chinatsu. 1996. NameTag Japanese and

Spanish Systems as Used for MET. In Proceedings

of Tipster Phase II. Morgan Kaufmann Publish- ers.

Aone, Chinatsu, Hatte Blejer, Mary Ellen

Okurowski, and Carol Van Ess-Dykema. 1994. A Hybrid Approach to Multilingual Text Processing: Information Extraction and Machine Translation. In Proceedings of the First Conference of the As- sociation for Machine Translation in the Americas (AMTA).

Fluhr, Christian. 1995. Multilingual information re- trieval. In Ronald A. Cole, Joseph Mariani, Hans Uszkoreit, Annie Zaenen, and Victor Zue, editors,

Survey of the State of the Art in Human Language Technology. Oregon Graduate Institute.

Kay, Martin. 1995. Machine translation: The dis- appointing past and present. In Ronald A. Cole, Joseph Mariani, Hans Uszkoreit, Annie Zaenen,

and Victor Zue, editors, Survey of the State of

the Art in Human Language Technology. Oregon Graduate Institute.

Krupka, George. 1995. SRA: Description of the

SRA System as Used for MUC-6. In Proceed-

ings of Sixth Message Understanding Conference

(MUG-6).

Oard, Douglas W. and Bonnie J. Dorr, editors. 1996.

Figure

Figure 1: The Search Screen
Figure 2: Translated and Hyperlinked Terms
Figure 4: The 25 and 50 Most Frequent Names
Figure 6: Documents Containing "Bank of Japan"

References

Related documents

Resiliency Fund Priorities  Health Equity  

According to Fonseca (1998), supported by the propositions of the Public Relations Society of America, the functions that delimit the work of a Public Relations professional are

To obtain adequate osmolarity and pH values for the intravenous administration, a suitable amount of glycerol (2.28% w / v for osmolarity adjustment to 290 ±

In an attempt to quantify plant stress, the activity of cotton foliar antioxidant enzymes after aphid feeding was examined.. Cotton aphids were col- lected from cotton fields

First, traditional approaches used by practition- ers, like complete case analysis (CC), linear interpola- tion (LI), nearest neighbor algorithm (NN) and uncon- ditional mean

According to the Bank’s Articles of Association the Supervisory Board appoints and dismisses the Chairman, Deputy Chairmen and the Members of Management Board.

create reusable knowledge chunks Organizational Knowledge Framework AFTER Project Needs viewed through knowledge COLLABORATIVE PROCESS.. Update

We analyze equilibrium problems arising from interacting markets and market partic- ipants, first competing markets with feedback and asymmetric information and then