• No results found

Term Context Interpretation

In document Ontology Localization (Page 96-100)

4.3 Scales of the Ontology Localization Activity

5.1.1 Term Context Interpretation

The Term Context Interpretation classification is concerned with the way of modeling the term context used for disambiguate the candidate translations:

• The first level is categorized depending on size or depth of the context: without context and with local context. These categories are assumed

to be independent of the modalities used for encoding the context of each ontology term to be translated.

a. The without context approach uses only the information related to the term itself as context. This option is sometimes disre- garded, but it contains important information about the inter- nal structure of the ontology term, e.g., term annotation (see rdsf:comment), or type of term (concept, relation, or instance). b. In the with local context approach the context involves a nar- row group of terms centered on the ontology term itself, which fairly well approximates contexts starting from the immediately surrounding of direct relationship terms to the whole ontology. We wish to point out that the division of context into different sizes allows for the showing of their relative influence on translation tech- niques. One can argue by example that there are no distinct bound- aries between local context that uses a small set of terms and a local context that uses many related terms. There are only more or less influential context features, whose general tendency is that their in- fluence diminishes with increasing distance from the ontology term itself.

• The second level of this classification decomposes these categories,

taking into consideration the way of encoding the context of each on- tology term to be translated. There are two different points of view for context pre-processing: linguistic and semantic.

a. The linguistic encoding processes the context of an ontology term as linguistic objects. Basically, the linguistic encoding approach uses the information obtained from the lexicon of the ontology in order to generate the term context.

b. The semantic encoding processes the context as the entities that appear hierarchically organized in an ontological structure. In this approach of encoding, the context is obtained from the en- tities that are part of both lexicon and core ontology. In other words the semantic encoding makes use of all information of the ontology.

• The third level of this classification particularize the categories above

mentioned in seven groups of syntactic and semantic context knowl- edge: term description, term POS tagging, term list association, term

description association, term verbalization, and structural context.

The first two groups have as context the information of term itself. The rest of the categories use a local context, but with the difference that both term verbalization and structural groups use a semantic en- coding; the other groups use a linguistic encoding approach.

To illustrate the different ways of modeling the context of an ontolog- ical term, this section contains an ontology example of the university domain (see Figure 5.2). Concepts are depicted as rectangular boxes, relations as ellipses, annotation values as hexagons, and instances as rounded boxes. Ontology relations are drawn as solid arrows, whilst the instantiations of concepts and relations are depicted as dotted, ar- rowed lines. The example contains five concepts person, professor, full

professor, associate professor, and faculty; one object relationship be- longsTo; two attribute relationships hasFullName and hasName; and

three instances Computer, Edu, and Asun. The example is fictitious and any concurrences with the real world are purely by chance.

a. Term description. This category is represented by the use of a short description in the natural language of the ontology term under consideration. Usually these descriptions help clarify the meaning of the ontology terms. The rdfs:comment property can be used to define an ontology term description in the natural language (see RDF(S)2 for more details). The term description context of the concept professor of our sample ontology can be:

ctxprof essor:= ( a professor is a member of the faculty ...)

b. Term POS tagging. In this case the context is represented by the use of the grammatical category of the term. In order to obtain the grammatical information of a term, the Part-of-Speech (POS) [Church, 1988, DeRose, 1988, Garside, 1987] tagging is a natural option. POS tagging is the process of assigning a part- of-speech like noun, verb, pronoun, preposition, adverb, adjective

Figure 5.2: Ontology Example.

or other lexical class marker a word of a text. Most POS taggers3 need at least one short phrase from which it is possible to derive the lexical categories, or parts of speech of each word.

We have identified that for the majority of ontology compound labels (e.g., AssociateProfessor ) it is not necessary to have an ad- ditional processing to determine the part of speech of each token. However, for obtaining the POS of a single term (e.g., Professor ) additional information is required, i.e., the relationship with ad- jacent and related words in a phrase, sentence, or paragraph. One way to solve this problem is to use empirical rules to annotate a simple term. Based on our experience, we propose the following rules:

- The concepts, instances and attribute relations are consid- ered nouns.

- All the rest of the terms (e.g., object relations) are considered verbs.

Another option is to try to generate a natural language sentence from the ontology term. In literature this process is known as

3A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text

in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc. Available POS tagger tools can be consulted in the web page of the Stanford Natural Language Processing Group (http://www- nlp.stanford.edu/links/statnlp.html#Taggers).

ontology verbalization. Some authors have studied this problem extensively (see [Hewlett et al., 2005, Flied et al., 2007, Schober et al., 2007] for example). This approach can be used for verbal- izing the ontology term and then use a part-of-speech tagger to discover the POS of the term.

The context of the concept term professor will be:

ctxprof essor:= ( NN )

Where, NN represents a singular noun.

c. Term lists. This category is represented by the use of a bag-of- words consisting of n words adjacent to the target ontology term. The list of terms obtained is independent of the semantic rela- tionship between adjacent terms. Thus, for instance the context to depth two of the ontology term professor can be:

ctxprof essor:= ( person, fullProfessor, associateProfessor, ...)

d. Term list descriptions. In this category, the descriptions (in the natural language) of surrounding terms in the context are ex- panded to include descriptions of the terms related to subsump- tion relations in ontology. The natural language descriptions of each term can be extracted from the rdfs:comment property. For the ontological term professor the context can be:

ctxprof essor:= ( a professor is a member of the faculty .... ; a

person is a human, that has capacities or attributes ....) In the example, the second description belongs to the broader term Person.

e. Term verbalization.4 To model the context using this approach, it

is necessary to transform an ontology term into a natural language sentence. As we commented previously recent works already have studied the way of generating natural language sentences from ontology elements.

Intuitively, we can see that this option is an alternative to the approach previously described. Also, the ontology term verbal- ization has some advantages. In contrast to the term list de-

scription approach, where the descriptions not always define the

exact meaning of the term, term verbalization reflects exactly the meaning of the ontological term. An example of the term

verbalization context for the sample term professor is shown in

the following:

4According to the Merriam Webster dictionary, one of the definitions of verbalization

is to use words to express or communicate meaning. In this thesis we use this term in the same sense.

ctxprof essor:= ( a Professor is a Person; a Professor belongs To

Faculty; ....)

g. Structural term context. The structural context is encoded ex- actly as the entities appear together in a ontological structure. Also, the structural context uses all logical relations represented in an ontology, such as equivalence, subsumption, disjoint, etc.

In document Ontology Localization (Page 96-100)