• No results found

How To Access Multilingual Information On The Web With Google And Clir

N/A
N/A
Protected

Academic year: 2021

Share "How To Access Multilingual Information On The Web With Google And Clir"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Information Access across Languages on the Web: From Search Engines to Digital Libraries

Jiangping Chen, Yu Bao

Department of Library and Information Sciences, University of North Texas 1155 Union Circle #311068, Denton, Texas 76203-5017

{jpchen, yb0033}@unt.edu Abstract:

Information access across languages challenges researchers and practitioners in many

disciplines, especially machine translation (MT) and Cross-language Information Retrieval (CLIR). Google’s cross-language search is a model of integrating MT and CLIR technologies to help users find web pages that are not written in their familiar languages. This paper examines the functions of Google’s cross-language search. It reports a small-scale evaluation to investigate Google’s performance on translating keywords and sentences. Then the authors propose several strategies that digital libraries can apply for implementing multilingual information access through the discussion of the cases of five bilingual or multilingual digital libraries.

Keywords:

Digital Libraries, multilingual information access, Google, cross-language search 1. Introduction

Recent and continuing advances in online information systems are creating many opportunities and also new problems in information access and retrieval. Although English is the language of the Internet (Flammia & Saunders, 2007), online documents are available internationally in many different languages. This provides opportunity for users to directly access previously unavailable sources of information. Information Access across Languages is the dream of many of the Web users: information can be understood and used no matter in what language.

Researchers in various disciplines have been diligently working on exploring computing algorithms and systems. In the past ten years, research in these areas has been vigorously pursued through TREC (http://trec.nist.gov/), CLEF (http://www.clef-campaign.org/), Asian Language Retrieval and Question-answering Workshop called NTCIR

(http://research.nii.ac.jp/ntcir/) and other forums. Significant experimental results have been obtained in cross-language summarization workshops and cross-language named entity

extraction challenges by the Association for Computational Linguistics (ACL) and the Geographic Information Retrieval track (GeoCLEF) of CLEF (Gey et al, 2006). Many related research projects have been funded by U.S. government agencies and the governments of other countries.

Exploration and application of related technologies in areas such as digital libraries are ongoing and becoming more and more popular (Larson, Gey, & Chen, 2002; Wang, Lu, & Chien, 2004; Monroy, Furuta, & Castro, 2007). However, there is little application of these technologies in existing information access systems such as commercial online services and digital libraries. In 2004, search engines began to provide various language supports. Zhang ad Lin

(2007) investigated the multiple language support features in 21 search engines. The selected search engines were categorized into regular search engines (such as Google, Yahoo, and MSN), meta-search engines (such as Exite, HotBot, and WebCrawler), and visualization search engines (Kartoo, Onlinelink, and Ujiko). Zhang and Lin summarized the characteristics and functions of these search engines in the following five aspects: the number of supported languages, visibility of language support, translation ability, result

(2)

presentation, and interface design. Google was identified as the regular search engine with the best multiple language support (Zhang and Lin, 2007, p530).

On May 23, 2007, Google launched its “Translated Search” in its Google Language Tools

(http://www.google.com/language_tools) in addition to other language support services and tools. Here we use the term Cross-Language Search instead of “Translated Search” in order to reflect its relationship with the field Cross-Language Information Retrieval (CLIR). Greg Notess (2008) found Google was the only search engine providing cross-language search. He briefly described the procedures of this new service and considered it useful for monolingual searchers to explore information content in other languages.

The launch of the cross-language search by Google was a breaking-through event because it signified the transition from CLIR research to its real application. It was the first time that CLIR and machine translation (MT) were integrated to provide a real application on the Internet. In this paper, we would like to examine the cross-language search function provided by Google

Language Tools (we use the acronym GLT to represent Google Language Tools in the remaining paper), and then to discuss possible strategies that can put existing technologies to practical use. The remaining paper is organized as follows. The next section overviews the current research and progress in MT and CLIR, which constitute the major challenges for cross-language search, and then reports GLT’s cross-language search service and its performance through a small-scale evaluation. After that, we describe three digital libraries that provide multilingual information access to understand their general characteristics. The next section proposes strategies that digital libraries can apply in order to serve global users based on previous analysis of Google cross-language search and status of multilingual information access in digital libraries. The paper concludes with suggestions for future study on research and development in multilingual

information access on the Web.

Google’s Cross-Language Search

The General Process of Cross-Language Search

Cross-language search aims at facilitating information access across languages. It is built upon more than fifty years of research and development in Machine Translation (MT) and more than ten years of research in Cross-Language Information Retrieval (CLIR),

MT has been a field in Artificial Intelligence. MT aims to automate the process of translation, which normally includes analyzing and understanding information in one language and expressing it in another language. Translation is difficult because the process involves interpretation of the meaning in the original language and its expression in the target language using correct terminology and syntax. Automatic translation by a computer system, i.e. machine translation (MT), is even more difficult since the computer has not achieved human-like

understanding of languages. Machine translation (MT) systems apply various translation strategies to automatically convert text or speech from one language to one or more other languages. Manning and Schutze (1999, p.464) summarized four different levels of translation strategy for machine translation. The simplest approach is word-level translation, or word-for-word substitution in which the system attempts to find a word-for-word in the target language for each word-for-word in the original language. Other methods include syntax based, semantics-based, and knowledge-based translation, which also consider the structure and semantics of the translated text. The desired translation is the one that expresses the exact meaning in the source text with correct syntax.

Current MT research explores various statistical modeling based approaches for translation. MT systems build statistical models automatically “learned” from parallel corpora (texts with the same meaning but written in the two languages of interest) and use the models to translate one

language to the other. Web users can find several online machine translation services such as

2

(3)

SYSTRAN (http://www.systransoft.com/), Live Translation by Microsoft

(http://www.windowslivetranslator.com/), and Google Language Tools. Google Language Tools will be discussed in this paper.

Cross-Language Information Retrieval (CLIR) is a subfield of the traditional Information Retrieval (IR). It provides users with access to information that is in a different language from their queries (Chen, 2006). The basic strategy for information retrieval is to match documents to queries. A transformation on either side or both is necessary if the queries and documents are not written in the same language, as in the case of CLIR, since the match cannot be directly conducted. Oard and Diekema (1999) identified three basic transformation approaches to CLIR: query translation, document translation, and interlingual techniques. Query translation based CLIR systems translate user queries to the language that the documents are written. Document translation is the reverse of query translation where documents are translated into the query language. The interlingual approach translates both documents and queries to a third representation. Among the three approaches, the query translation approach is the generally accepted approach and has been applied by most CLIR experimental systems because of its simplicity and effectiveness. Query translation based CLIR systems use various knowledge resources, such as bilingual dictionaries, MT systems, parallel texts, or a combination of these to translate queries from one language into another language; and then conducts monolingual search to retrieve relevant documents. Most CLIR experimental systems emphasize finding the relevant documents from the collections, but do little with the translating of returned documents.

Google’s cross-language search integrates CLIR and MT to provide the full function of finding information in languages different from users’ queries. Figure 1 shows the screen shot of GLT homepage.

Figure 1. GLT’s Cross-Language Search Page

On GLT homepage as illustrated in Figure 1, a user can type in a search query in its search textbox, specify the language of the query, specify the language of the result pages, and then click the button “Translate and Search”. Then GLT will conduct cross-language search and present the search results in both query language and the intended language in two separate columns. For each result, GLT provides the title, a short summary, and the URL of the page, just like the results presented by Google Web search. For example, to find information resources on autism treatment in Chinese, we type in “autism treatments” in the search box of GLT. We specify the query language as English, and search pages written as Simplified Chinese. GLT returns the

(4)

translated pages (in English) and the original pages (in Simplified Chinese). Figure 2 shows the screen shot of the search result.

Figure 2: Screen Shot of A GLT Search Result Page

ow does GLT do cross-language search? Google integrates Web search and machine

A. Search Interface: It allows a user to type in the search terms and to specify the language

B. languages of the

C.

D. of Results: the retrieved Web pages are translated into the

E. es are presented to the users. The system may also he Search Interface and Result Interface are for interaction with users, while other components

s of Oct. 28 2008, GLT supports cross-language search for 35 languages: Arabic, Bulgarian, , H

translation to provide the cross-language search service. The system consists of following components:

of these search terms and the language of the retrieved Web pages; Query Translation: the system will translate the users’ queries into the

Web pages so that matching between the queries and the pages can be conducted; Web Search or Information Retrieval: the actual search for relevant pages based on a retrieval algorithm;

Machine Translation languages of the queries;

Result Interface: the translated pag

present the results in their original languages simultaneously. T

that are transparent to users handle the most difficult issues of cross-language search: query translation, search, and machine translation of result pages.

A

Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Filipino, Finnish, French, German, Greek, Hebrew, Hindi, Indonesian, Italian, Japanese, Korean Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Ukrainian, and Vietnamese. This means one can use GLT to

(5)

submit search terms in one of the above languages, to search pages written in the remaining languages, and to read search results in the language he/she understands, i.e. the language of his/her search terms.

34

oogle’s Translation Mechanism

very language related service provided by GLT, including the cross-language search, involves G

E

machine translation. How does Google carry out the translation then? The Google Translate FAQ (http://www.google.com/intl/en/help/faq_translation.html) explains Google’s strategy for translation. To build its machine translation system, Google feeds “…the computer billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages.” It then applies “…statistical lear techniques to build a translation model.” (Google Translate FAQ, 2008). Google continues to work on translation quality and attempts to improve the performance of MT by understanding th context of words.

ning e he Performance of Translation

hat is the performance of GLT’s cross-language search? Can Web users rely on GLT to find to LT’s cross-language search consists of three major processes: query translation, search, and

ine T

W

information they previously could not access? These questions are important to whoever wants use GLT.

G

machine translation of result pages. Both query translation and machine translation of result pages are conducted by Google’s machine translation service. Literature indicates that mach translation is the bottleneck for CLIR, we therefore conducted a small-scale evaluation of GLT’s machine translation using 50 topics that have been evaluated at NTCIR-5 Cross-Lingual Information Retrieval Task (http://research.nii.ac.jp/ntcir-ws5/cfp-en.html). These topics w originally presented in four languages: Traditional Chinese, Korean, Japanese, and English (

ere

l http://research.nii.ac.jp/ntcir/workshop/OnlineProceedings5/cdrom/CLIR/ntc5-CLIR-eval.htm).

s

our evaluation experiment, we constructed two queries from each topic. One query consisted

In total, CLIR NTCIR-5 topics. We

e

e then sent each query to GLT manually. We instructed GLT to search pages in English. The e Each topic has several attributes that describe the content, such as TITLE, DESC (descriptors), NARRATIVE, and CONC (concepts). Table 1 shows one of the topics in both Traditional Chinese and English in XML format. The TITLE of a topic consists of short phrases or word separated by the punctuation “,”, and the DESC of a topic is a short sentence describing the information that needs to be found for the topic.

In

of the texts in the TITLE attribute and one query consisted of texts in the DESC attribute. For example, we extracted two queries from the sample topic in Table 1:

Query 001-1: 時代華納,美國線上,合併案,後續影響 Query 001-2: 查詢時代華納與美國線上合併案的後續影響 we generated 100 queries in Traditional Chinese from the 50

divided the 100 queries into two groups: TITLE group and DESC group. The Title group included the 50 queries from the TITLE of each topic, and the DESC group included the 50 queries from the DESC of each topic. We considered that queries in TITLE group were easier than those in DESC group. Queries in TITLE group are composed of Chinese words or short phrases that do not need word segmentation. However, queries in the DESC group are short sentences that need word segmentation in most cases. Word segmentation is an important and complicated task for MT systems and segmentation errors may greatly affect the accuracy of machine translation ((Ni & Ren, 1999). In our study, we wanted to see whether there was any significant difference between the translation results of the two groups.

W

result pages returned by GLT, as shown in Figure 2, included the translation of the query, and th summaries of pages that satisfied the query in both query language (Traditional Chinese in this

(6)

evaluation) and the intended languages (English). We then compared the translation results of the queries with their English version in order to assess whether the translation was correct. The two authors served as evaluators in this experiment. The translation of a query was judged “Correct” if it was exactly the same as the English version of the topic, or was judged correc semantically and grammatically by the two authors. Otherwise the translation was judged “Oth which included multiple situations such as false translation, incomprehensible translation, or partially correct translation.

t er”,

able 1. A Sample Topic Extracted from NTCIR-5 CLIR Task T

Chinese Version English Version

<TOPIC> </NUM <NUM>001 > G> 併案,後續影響 詢時代華納與美國線上合併案的後續影響。 代華納與美國線上於2000年1月10日宣佈 與美國線上的合併對於網路與娛樂 代華納,美國線上,李文,Gerald Levin, > G>

erican Online (AOL), Merger,

an Online (AOL) arket ormation Gerald <SLANG>CH</SLAN <TLANG>CH</TLANG> <TITLE>時代華納,美國線上,合 </TITLE> <DESC> 查 </DESC> <NARR> <BACK>時 合併,總市值估計為3500億美元,為當時美國最大宗 合併案。</BACK> <REL>評論時代華納 媒體事業產生的影響為相關。敘述時代華納與美國線上 合併案的發展過程為部分相關。內容僅提及合併的金額 與股權結構轉換則為不相關。</REL> </NARR> <CONC>時 合併案,合併及採購,媒體業,娛樂事業</CONC> </TOPIC> <TOPIC> </NUM <NUM>001 <SLANG>CH</SLAN <TLANG>EN</TLANG> <TITLE>Time Warner, Am Impact</TITLE>

<DESC>Find reports about the impact of AOL/Time Warner merger.</DESC>

<NARR>

me Warner and Americ <BACK>Ti

announced a merger on January 10th, 2000. The m value was estimated at $US350 billion making it the biggest merger in the US.</BACK>

<REL>Comments on AOL/Time Warner merger's effects on Internet and entertainment media businesses are relevant. Descriptions of the development of the AOL/Time Warner merger are partially relevant. Information about the total amount and the transf of ownership structure are irrelevant.</REL> </NARR>

me Warner, American Online, AOL, <CONC>Ti

Levin, merger, M&A, Merger and Acquisition, media, entertainment business</CONC>

</TOPIC>

e also submitted the 100 Chinese queries in the exact format to SYSTRAN W

(http://www.systransoft.com/), and conducted the same evaluation. Table 2 is

evaluation of the 100 queries. the result of our

able 2. Results of GLT’s Translation Evaluation

Queries in DESC group T

Queries in TITLE group

Google SYSTRAN Google SYSTRAN

Correct 38 (76%) 26 (52%) 9 (18%) 13 (26%)

Other 12 (24%) 24 (48%) 41 (82%) 37 (74%)

Total queries 50 50 50 50

able 2 demonstrates that both Google and SYSTRAN did much better on TITLE group than

d

his small-scale evaluation on query translation has actually assessed two types of machine hey T

DESC group. And the difference on performance for TITLE and DESC groups were significant for both systems. Also, Google could correctly translate more TITLE queries but fewer DESC queries than SYSTRAN. Assuming the queries were random samples of all queries submitted to these two systems, we did a Chi-square test on Google’s query translation performance and compare it with SYSTRAN. For TITLE queries, Google did significantly better (χ = 11.54, p value < 0.001) than SYSTRAN. But it didn’t significantly differ from SYSTRAN on translating DESC queries (χ = 1.66, p value >0.1).

T

translation: machine translation of words or phrases, and machine translation of sentences. Queries in TITLE group are more similar to queries a Web user submits to a search engine. T are composed of words and short phrases. Google’s translation service can perform quite well on these queries. Queries in DESC group are sentences that are similar to those constituting Web

(7)

pages. Google, like SYSTRAN, failed to translate many of those correctly. Due to the fact that most Web pages include texts composed of sentences in natural languages, we consider GLT’ query translation is less of a concern than its machine translation of result pages. s

igital Library with Bilingual or Multilingual Information Access

ue to the fact that machine translation usually produce hard-to understood translations, Many r

e analyzed about 150 digital libraries that we could find out through DL literature and search

able 3. Digital Libraries with Multilingual Information Access s

D

D

organizations and information systems still rely on human translators for translating documents o files from one language to other languages. As for digital libraries, very few digital libraries have gone realized multilingual information access (Chen, 2007).

W

engines, only five of them can be accessed by using more than one language. Table 3 lists the five digital libraries.

T

Library Name URL Language

Meeting of

Frontiers http://frontiers.loc.gov/intldl/mtfhtml/mfsplash.html English/Russian France in

America http://international.loc.gov/intldl/fiahtml/fiahome.html English/French Parallel Histories http://international.loc.gov/i English/Spanish

ntldl/eshtml/

International ks.org/ Digital Objects in 11 languages. Children's Digital

Library

http://www.icdlboo

Users can do the keyword search in 51 languages.

The Perseus http://www.perseus.tufts.ed tin

Digital Library u Greek, English, La

eeting of Frontiers is a “bilingual, multimedia English-Russian digital library that tells the story for raphs

rance in America is “a bilingual, multi-format English-French digital library that tells the story of

inal ented

arallel Histories: Spain, the United States, and the American Frontier is a bilingual, multi-y

l M

of the American exploration and settlement of the West, the parallel exploration and settlement of Siberia and the Russian Far East, and the meeting of the Russian-American frontier in Alaska and the Pacific Northwest” (The Library of Congress, 2002, About the Project, 1). It is intended use in U.S. and Russian schools and libraries and by the general public in both countries. Bilingual collection includes Books and Other Printed Materials, Manuscripts, Maps, Photog and Prints, Mixed Format Collections, Sheet Music, Motion Pictures and Recorded Sound, and Exhibitions. Users can search and browse either in English or Russian.

F

the French presence in America and the interactions between the French and American peoples from the early 16th to the late 19th centuries” (The Library of Congress, n.d., About the Site, ¶ 1). It is “part of the Library of Congress’s Global Gateway project to establish cooperative digital libraries with national libraries from around the world. The collection is “intended for use by students, scholars, and researchers worldwide”. Users can browse the collections in the orig languages of the collections. In terms of searching functions, the “text of the descriptive information records, the full text transcriptions, and the themes” in the Web site “are pres

using Latin 1 character encoding - this encoding includes diacritic marks commonly found in west European languages (that is, accent marks).

P

format English-Spanish digital library site that explores the interactions between Spain and the United States in America from the fifteenth to the early nineteenth centuries” (The Library of Congress, n.d., About the Project, ¶ 1). It is “is part of the Library of Congress' Global Gatewa project to establish cooperative digital libraries with national libraries around the world. The digita

(8)

library aims at “making available to students, researchers, and lifelong learners unique documents from the cultural heritage of Spain and the United States” Users can browse collections in the original languages of the collections and applied the same method as for th France in America.

the e he above three digital libraries are all developed by the Library of Congress in partnership with

ternational Children’s Digital Library “was created by an interdisciplinary research team at

hildren’s he Perseus Digital Library is in the Department of the Classics, Tufts University. Its collection

l he above digital libraries share the following characteristics:

ally from the federal government;

• where users speak different languages;

ow, we are asking the questions: Shall digital libraries employ cross-language search? How can

ultilingual Information Access in Digital Libraries: Possible Strategies

LIR community has been focusing on improving retrieval performance for more than 10 years.

earch has become an essential component of all information systems including digital libraries. es funding: (1) Digital library (DL) developers should collaborate with researchers in CLIR and MT to

ing T

libraries in the respective countries, In

the University of Maryland in cooperation with the Internet Archive.” The ICDL collection has two primary audiences. The first audience is children ages 3-13, as well as librarians, teachers, parents, and caregivers who work with children of these ages. The second audience is international scholars and researchers in the area of children's literature” (International C

Digital Library, n.d., Frequently Asked Questions). The library can be accessed in 11 languages. T

includes Classics, Papyri, Renaissance, London, California, Upper Midwest, and Tufts History. It is for both general readers and specialists. The library provides many tools for access the collection, such as English to Greek and Latin Word Search, Greek and Latin Morphologica Analysis, and Greek and Latin Vocabulary Tool.

T

• They have been funded by various funding agencies, especi

• They are the products of collaboration. People from different countries work together to produce the bilingual or multilingual collections;

They serve a broader or global user communities

• They do not employ cross-language search or any cross-language information retrieval techniques or machine translation.

N

digital libraries achieve bilingual or multi-lingual information access with minimum cost and effort?

M

C

Numerous matching strategies have been explored to realize CLIR between various language pairs. However, CLIR service has been offered by very fewer information systems. Translation performance has been considered the major obstacle of applying the technologies to practical systems (Gey, Kando, and Peters, 2005). Also, the lack of knowledge of the users is among the major reasons for this situation. As pointed out by Petrelli, Beaulier and Sanderson (2002), little effort has been made to identify the users of CLIA systems and to fully understand how these users can make use of such systems.

S

A large amount of information stored in digital libraries is not accessible to search engines. It would benefit global users if digital libraries were to offer multilingual information access servic so that more people in the world could use precious information in those digital libraries. Following strategies can be considered to develop multilingual digital libraries with limited

explore solutions that are appropriate for the specific digital objects while seeking fund to support cross-language search as a value-added service. As digital objects are more organized than Web pages crawled by search engines, it is possible that better

performance of machine translation could be achieved through the construction of a customized knowledge base for machine translation software.

(9)

(2) Collaborate with DL developers in other countries to increase the languages that the users can access. Many digital libraries manage precious digital assets that can be attractive to people at the other side of the earth. Collaboration with colleagues in other countries would make information resource more useful.

(3) Collaborate with the users. In current digital age, even monolingual digital libraries are also accessed by people who don’t know the language (Sorid, 2009). Social computing has been widely used on the Internet, and it can play big role for involving users to the multilingual information access services: users may volunteer to translate digital objects into another language. They may help to correct errors produced by machine translation systems. They can donate money to help the DL to offer the new service if they know the significance of the service, or the information needs from the other side of the earth. (4) Take a step-by-step approach. DL developers can first implement a multilingual interface,

then the metadata, and then the whole collection. Conclusions

Google Language Tools provide multiple language support tools for Web users. These tools include a cross-language search service for 36 languages, monolingual search in user preferred language or country, machine translation of texts or Web pages, and online dictionary lookup. GLT’s cross-language search integrates MT and CLIR that have been investigated many years by the separate research communities. Although systematic user evaluation of GLT’s cross-language search waits to be conducted, our evaluation shows that GLT can do a reasonably good job translating short queries from Chinese into English. GLT’s cross-language search service enables Web users to access information that could not be accessible before.

Information systems such as digital libraries would better serve their users if language support services were integrated as part of the systems. Very few digital libraries in the United States have implemented multilingual information access. However, digital libraries with less money should also consider adding bilingual or multilingual information access through collaboration with CLIR and MT researchers, colleagues in other countries, and the users. As for the specific implementation approach, the model of Google’s cross-language search should not be

considered the only approach. A digital library may choose to conduct machine translation for all the documents before indexing instead of doing query translation at the time of searching; Digital libraries with stable collections can apply computer-assisted mechanism to build their translation knowledge base for query translation. The study of user needs under specific conditions will help developers to build efficient and effective systems for information access across languages. As for future research, we will like to understand more about the needs and information behavior of bilingual users because bilingual users have been identified as the most possible users for CLIR systems. Also, we will collaborate with small digital libraries to investigate effective and efficient solutions on providing multilingual information access for the digital library users. References

Chen, J. (2006). A lexical knowledge base approach for English-Chinese cross-language information retrieval. Journal of the American Society for Information Science and Technology, 57(2), 233-243.

Flammia, M. & Saunders, C. (2007). Language as power on the Internet. Journal of the American Society for Information Science and Technology, 58(12): 1899-1903.

Gey, F. C., Kando, N., and Peters, C. (2005). Cross-language information retrieval: the way ahead. Information Processing and Management, 41, 415-431.

Gey, F., Larson, R., Sanderson, M., Bischoff, K., Mandl, T., Womser-Hacker, C., et al (2006). GeoCLEF 2006: the CLEF 2006 Cross-Language Geographic Information Retrieval track

(10)

overview. Retrieved Nov. 16, 2007 from

http://www.clef-campaign.org/2006/working_notes/workingnotes2006/geyOCLEF2006.pdf.

Manning, C. D., & Schutze, H. (1999). Foundations of Statistical natural Language processing. Cambridge, MA: The MIT Press.

Monroy, C., Furuta, R., & Castro, F. (2007). A multilingual approach to technical

manuscripts: 16th and 17th-century Portuguese shipbuilding treatises. Proceedings of the 2007 International Conference on Digital Libraries. 413-414.

Nie, J.Y. & Ren, F.J. (1999). Chinese information retrieval: using characters or words? Information Processing &Management, 35(4), 443-463.

Notess, G. R. (2008). Multilingual Searching: Search Engine Language Tools. Online, May/June 2008, 32(3), p40-42. retrieved September 1, 2008 at:

http://www.infotoday.com/ONLINE/may08/Notess.shtml

Oard, D. W., & Diekema, A. R. (1999). Cross-language information retrieval. In M. Williams (Ed.), Annual Review of Information Science and Technology, 33 (pp. 223-256).

Petrelli, D., Beaulieu, M. & Sanderson, M. (2002). User participation in CLIR research.

Proceedings of SIGIR 2002 Workshop: Cross-Language Information Retrieval: A Research Roadmap. Retrieved online Nov. 12, 2007 at:

http://ucdata.berkeley.edu:7101/sigir-2002/sigir2002CLIR-12-petrelli.pdf.

Sorid, D. (2008). Writing the Web’s Future in Numerous Languages. The New York Times, December 31, 2008.

Wang, J. Lu, W. & Chien, L. (2004). Toward Web mining of cross-language query translation in digital libraries. International Journal of Digital Libraries, 4(4), 247-257.

Zhang, J. & Lin, S. (2007). Multiple language supports in search engines. Online Information Review, 31(4), 516-532. Retrieved August 18, 2008, from

http://www.emeraldinsight.com/Insight/ViewContentServlet;jsessionid=26559230F1B5E51B9 C81A07DB8D54796?Filename=Published/EmeraldFullTextArticle/Articles/2640310408.html.

References

Related documents

To provide a unified approach to the study of various properties of these classes, we introduce the following most general- ized subclass of H by using both the Hadamard product

: Natural course of scoliosis in proximal spinal muscular atrophy type II and IIIa: descriptive clinical study with retrospective data collection of 126 patients. BMC

Satit Saejung: Department of Mathematics, Khon Kaen University, Khon Kaen 40002, Thailand E-mail address

Therefore, the objectives of this study are: 1) to describe the travel motivations of actual tourists by adopting the types of fi lm tourists as proposed by Macionis (2004)

esis is rejected we precede further to find which pair of treatment means differ significantly. The performance of the students is enhanced in the third method.It is predict-

The incorporation of social memory studies into biblical study has been driven in part by the initiative and work of Alan Kirk and Tom Thatcher who edited and contributed to

thus, the change in the processes of deformation and fracture during the cutting of metal in lctF occurs because of the contact interactions of the active forms

This chapter provides context to the present research by summarizing recent findings on awareness, knowledge and perceptions of CCS (3.1); summarizing what is known