Related Terms - Ontology Localization

3.5 Restrictions

4.1.2 Related Terms

As we explained before ontology localization cannot be fully understood without being contextualized in reference to two interdependent processes:

internationalization and translation.

Internationalization.

When an ontology is developed, its design is inevitably inﬂuenced by the culture and native language of their developers. To adapt an ontology suc- cessfully to international regions or markets, the culturally and linguistically- depend parts of the ontology must be carefully designed, a process referred to as ontology internationalization. This process includes, for example nam- ing conventions of ontology terms and/or hyphenation and morphological rules of the ontology elements.

Thus, ontology internationalization can be deﬁned as the process of gen- eralizing an ontology so that it can handle multiple languages and cultural conventions without the need of re-designing it. Internationalization takes place at the design level. There are two key reasons for ontology internationalization:

1. To ensure that an ontology is properly designed and therefore can be accepted in international markets, and

2. To ensure that an ontology is localizable.

In the first case, the labels and descriptions used in the ontology are concise, clear, and they do not contain any jargon or slang. The second reason above mentioned will help to reduce the localization costs by devel- oping the ontology in a way that ensures a smooth localization process. One way to do this is by following a standard for the naming of labels. Some works [Flied et al., 2007, Schober et al., 2007] have proposed naming conventions for ontology terms. These guides are used in specific applications such as ontology verbalization1. We claim that the definition and use of style guidelines should also be extended to ontology engineering.

Translation.

Translation can be generally defined as the process of “transferring a text from a source language and culture into a target language and culture with a certain purpose” (adapted from [Nord, 1997]). Considering this definition, we agree with [Montiel-Ponsoda, 2011a] in that the translation process may be considered the mother activity that encompasses Ontology Localization. Depending on localization purpose, the process of translation can be categorized in: instrumental and documental [Montiel-Ponsoda, 2011a]. In the first case, the goal of the target ontology can be to have the same function in the target community as the original ontology in the source ontology. The purpose of the translation can also be to “document” the ontology in another language to make it accessible to a community which speaks another language. In both cases and just as it occurs in software localization, in ontology localization the emphasis should be placed on automatic translation tools that allow users to avoid the manual effort of building a multilingual ontology.

Some authors believe that this solution is not viable, since machine translation (MT) today suﬀers from several critical limitations to support the translation of ontology labels [Segev and Gal, 2008]. The general awareness is that automatic translation tools have yet to achieve a level of proﬁciency comparable to human translation. However, since ontologies consist of con- cepts, attributes and relations that are stated clearly and succinctly, we hypothesize that ontology components are more readily translatable than full-length text [Espinoza et al., 2008a].

When translating ordinary text, one has to deal with textual phenomena such as anaphora or metaphors, and much care must go into assuring that

1_{The verbalization makes ontologies accessible to people with no training in formal}

methods. The goal of the ontology verbalization is to produce natural language from the deﬁnition of class or properties.

one obtains clear and natural-sounding sentences. This is not such a big issue in ontology labels, which tend to have text with single words, compound words, named entities, short phrases, or short sentence fragments. Also, the ontology labels have characteristics, which make these amenable to MT:

• Consistency. The lexical formats used for naming ontology terms are

very similar [Espinoza et al., 2008a]. Also, the labels used for describ- ing ontology elements commonly use a upper/lower case distinction. It poses some advantages to MT because it allows performing word segmentation. Some works have shown that having a basic word seg- menter helps MT performance [Koehn and Knight, 2003, Habash and Sadat, 2006, Chang et al., 2008]. Additionally, we can rely on the initial uppercase letter to identify a phrase initial word.

• Accuracy. The spelling accuracy of the labels of an ontology is reported

to be approximately 97.0%-99.5% [Espinoza et al., 2008a]. These val- ues are very important because the typographical errors can aﬀect the translation quality. Furthermore, sentence boundaries (used in ontology term comments), which are absolutely crucial for parsing in MT, are usually clear in the ontologies through the use of accurate methods of punctuation.

We define the ontology label translation task as finding, for an individual label l in the source language S, the correct translation, either a word or phrase, in the target language T . Clearly, there are cases where l is part of a multi-word term that needs to be translated as a unit. For this case, this approach can be extended by preprocessing the data in S to find short- phrases, and then executing the entire algorithm treating short-phrases as atomic units. In this thesis, we do not explore the extension of this approach to the translation of sentences (e.g., comments of ontology term). Never- theless, we focus on the translation of simple and compound labels. The technical details of this process will be explained in section 4.4.2.

In document Ontology Localization (Page 69-71)