• No results found

Chapter 3   Problem Analysis 55

3.1   A Thorough Discussion on Fundamental Terms 55

3.1.2   Information Semantics 67

3.1.2.1   Semantics Fundamentals 67

Information semantics and semantic integration have become active topics in several disciplines, such as databases, information integration, and ontologies. Researchers

and practitioners have conducted great number of works on semantic integration to facilitate interoperability between different information systems [Noy, 2004].

According to [Meersman, 1995], semantics refers to a user’s interpretation of the computer representation of the world – i.e., the way users relate computer representation to the real world. The ability to incorporate detailed semantics of data in computers will provide greater consistency in its use, understanding, and application [Magnini, et al., 2003]. One of the principal benefits of introducing the semantics is the reduction of human involvement in the process of information understanding and information integration.

Vetere [Vetere and Lenzerini, 2005] thinks that semantics is a mapping (also known as “interpretation function”) which involves:

z Expressions: a system of manifested symbols (e.g. a formal language).

z Contents: a system of something else which is not necessarily apparent (e.g. sets of objects or events in (some abstraction of) the “real world”).

Roughly speaking, semantics refers to “the intended meaning of something”. This simple definition involves two aspects: what “something” is and what “meaning” is. “Something” is the abstraction of the external world in human minds, and is expressed in specific forms such as symbols, formulas, texts, voices, or graphs. Put simply, it may be concepts abstracted from some concrete objects like trees, animals, cars, rocks, and persons, or from some logical ideas like time, space, weights, and volumes, or from some actions like eating, looking, walking, etc. In more complex cases, “something” can refer to a comprehensive fact composed by concepts and relationships, such as a statement “Dr. Jackson introduces us to many interesting topics in ES 250”, as depicted in Figure 3-3.

Dr. Jackson introduces us to many interesting topics in ES 250.

Concept Relationship

It is difficult to define “meaning”. As an alternative, it can be interpreted as the intension of specific concepts, relationships, or comprehensive facts. Their intension is expressed by some kinds of formalisms that are used to represent the meaning visually, and is meaningful only after the expressions are understood correctly by those who read them. In some cases, the reader will be a non-human object like a computer or a software agent, which is an important research issue in semantic integration. For example, given an expression (in a specific formalism) that represents a fact:

I DB 100

It is certain that very few people are familiar with this expression. Therefore, most people cannot understand it without any explanation of its semantics. In the computer programming domain, we can illustrate it with the following expression in another form, or, we can say that its meaning is:

int I = 100;

It is reasonable to claim that more people will be able to understand this form. Its meaning or semantics is trivial for people who are familiar with C, C++, or Java programming.

We can extend the “path” to interpret the meaning of “int I = 100”. That is to say, we can explain that its meaning is identical to:

Dim I as Integer = 100

It is a variable declaration statement in Visual Basic (VB) language. People who are familiar with VB other than C or C++ can now understand it. For people who are only familiar with Perl language, another interpretation can be provided further:

The semantics implied by these expressions can be further interpreted in a natural language sentence: define a variable which name is I, type is integer, and initial value is 100. Note that here we use natural language (which is also a formalism to express the semantics of something) to explain the meaning of the previous formalisms. For people who are not familiar with programming but have fundamental knowledge in computer science, a variable is a storage unit in memory space which is referenced by its name. For someone unfamiliar with computer science, more details may be required to explain the semantics of the expressions.

Note that, from the beginning, we are limiting the domain of discourse to computer programming. In other domains, “I DB 100” may have completely different meanings. Another key issue to mention is that we suppose that people who have a similar background and normal intelligence will achieve a common understanding of the same expression (at least in one specific domain). However, we must be aware that exceptions exist. For instance, a programmer may consider the expression “int I = 100” in another way, unconsciously or purposely. That becomes more complicated. We will not consider this exceptional case because it is really not a problem we can solve and it is very rare. In fact, some research did touch on the topic of discovering malicious semantics interpretation [Doan and McCann, 2003 and McCann, et al., 2003], but more work remains to be done.

From the above interpreting process, we can see that semantics in a specific domain can be represented in some kind of language (or formalisms), and interpreted by other kinds of languages (or formalisms). Natural language such as English is the ultimate formalism we use to interpret the intension of something. The continuous interpretations at different levels form an interpretation chain, as shown in Figure 3-4, where the same semantics can be interpreted by multiple formalisms, and at each level some specific formalism is employed to interpret its upper levels and can be most readable for a specific group of readers.

Since, ultimately, all semantics must be interpreted by a specific natural language and interpretation expressed by natural language can be interpreted in more detailed ways with the same language, we can assume that there is a level number N in the above figure meaning that starting from the Nth level, all the lower levels of interpretation formalisms are natural languages. Note that the reader groups may overlap, i.e., there are readers (people or machines) who can read and understand multiple levels of formalisms.

According to the nature of human thinking, we have several conclusions about semantics:

Conclusion 1: Any level in the interpretation chain is readable for a human. Any

formalism is a kind of explicit representation of semantics in human thoughts. People create various forms to express the semantics for different goals; therefore people can understand any of them, although some of them are understandable only by very few people. Level 1 Form 1 Level 2 Form 2 Level 3 Form 3 Level N Form N …… Level N+1 Form N+1 Level N+2 Form N+2 Reader Group 1 Reader Group 2 Reader Group 3 Reader Group N Reader Group N+1 Reader Group N+2 Semantics …… Readable for Represent Interpret Domain

Conclusion 2: The interpretation chain is infinite. It can grow in both upper direction

and lower direction along with the creation of new representation forms.

Conclusion 3: Reader groups are not totally disjoint. Some individuals may be

familiar with different formalisms.

Conclusion 4: Machines (computers) can be members of some high level groups, i.e.,

the corresponding forms are readable for machines. The levels readable for machines are limited, although they may extend to lower levels along with the advancing of machine design.

Our work will focus on machine readable formalisms and semantics.