• No results found

A STUDY IN USER-CENTRIC DATA INTEGRATION

N/A
N/A
Protected

Academic year: 2021

Share "A STUDY IN USER-CENTRIC DATA INTEGRATION"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

1 School of Business Informatics and Mathematics,

University of Mannheim. 68159 Mannheim. Germany

2 Institute for Enterprise Systems (InES),

L 15, 1-6, 68131 Mannheim. Germany

3 ontoprise GmbH, An der RaumFabrik 33a,

76227 Karlsruhe. Germany

A STUDY IN USER-CENTRIC DATA INTEGRATION

(2)

Data Integration

maps different data sources to a consistent

target structure.

Motivation 1

Target Structure (Ontology)

(Encompassing consistent view to the data)

Data Integration Rules Data Sources

(Direct extraction out of different data sources)

(3)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 3

Motivation Related Work

User-Centric Mapping Assistant Approach

Study Design and Datasets

Experimental Results

Conclusion and Future Work

Outline

Outline 1 2 3 4 5 6

(4)

Automatic data integration approaches

are still error prone and

need to be supervised by human domain experts.

The problem of data integration has been studied intensively on a technical level in different areas of computer science.

Researchers have investigated the automatic identification of semantic relations between different datasets (Euzenat and Shvaiko, 2007).

A prominent line of research investigates the use of ontologies - formal representations of the conceptual structure of an application domain - as a basis for both, identifying and using semantic relations.

(5)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 5

Existing work in

user-centric data integration

investigated rather

simple scenarios.

In a recent study, Gass and Maedche have investigated the problem of data integration in the context of personal information management from a user-centric point of view (Gass and Maedche, 2011).

The scenario addressed in their work, however, focuses on the integration of rather simple data schemas, in that case personal data where the task is mainly to map properties describing a person (e.g. name or bank account number).

(6)

Traditional User Interfaces try to visualize integration rules

Related Work 2

Most approaches are based on advanced visualization of the models to be integrated and the mappings created by the user (Granitzer et al., 2010).

(7)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 7

Drawbacks

are visualization limits in number and complexity of

integration rules.

Related Work 2

Visualizations quickly reach their limits if Many integration rules exist

Very complex mapping rules exist, which are hard to visualize.

@{c#CCMappingRule1305046217116} ?_SIID:HighComfortAndLowSavetyCar :- ?_SIID:<http://www.owl-ontologies.com/autos.owl#Cars>@<http://www.owl-ontologies.com/autos.owl> AND

(?_SIID[<http://www.owl- ontologies.com/autos.owl#hasSafetyFeaturesRating>->?_VAR0]@<http://www.owl-ontologies.com/autos.owl> AND ?_VAR0 <= 2.0) AND (?_SIID[<http://www.owl-

ontologies.com/autos.owl#hasComfortAndConvenienceRating>-?_VAR1]@<http://www.owl-ontologies.com/autos.owl> AND ?_VAR1 >= 3.5).

High expert knowledge is needed to interprete the consequences of the Mapping Rules

(8)

The need of

User-Centric Data Integration

has been recognized.

Recently, researchers in ontology and schema matching have recognized the need for user support in aligning complex conceptual models (Falconer, 2009; Falconer and Storey, 2007).

(9)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 9

The

cognitive support model

for data integration

by Falconer and Noy (2011) underlines the user interaction.

(10)

Our

Modified Cognitive Support Model

is based on identifying

wrong instances and asking questions in natural language.

User-Centric Mapping Assistant Approach 3

User Inspection Decision which concept to examine

User identifies instances which have been classified incorrectly.

User answers questions.

Diagnostic algorithm generates the minimal amount of user questions

Questions are represented to the user in natural language sentences in a todo-list.

(11)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 11

Our

Interactive User Interface

enables users to investigate data

on the instance level.

(12)

In the

Analysis and Decision Making

phase the user decides

which concept he wants to examine.

User-Centric Mapping Assistant Approach 3

User decides which

examine

User decides which concept he wants to examine

(13)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 13

In the

Interaction

phase the user identifies wrong classified

instances.

User-Centric Mapping Assistant Approach 3

User decides which

examine

User decides which concept he wants to examine

1

User identifies instances

incorrectly.

User identifies instances which have been classified incorrectly.

(14)

In the

Analysis and Generation

phase the minimal amount of user

questions is generated by the system.

User-Centric Mapping Assistant Approach 3

User decides which

examine

User decides which concept he wants to examine

1

User identifies instances

incorrectly.

User identifies instances which have been classified incorrectly.

2

Diagnostic algorithm

amount of user questions Diagnostic algorithm

generates the minimal amount of user questions 3

(15)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 15

In the

Representation

phase the questions are represented to the

user in natural language.

User-Centric Mapping Assistant Approach 3

User decides which

examine

User decides which concept he wants to examine

1

User identifies instances

incorrectly.

User identifies instances which have been classified incorrectly.

2

Diagnostic algorithm

amount of user questions Diagnostic algorithm

generates the minimal amount of user questions 3

Questions are represented to the user in natural language sentences in a todo-list.

4

(16)

Motivation Related Work

User-Centric Mapping Assistant Approach Study Design and Datasets

Experimental Results

Conclusion and Future Work

Outline

Outline 1 2 3 4 5 6

(17)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 17

The

Source Dataset

is an instructional dataset from the web.

The

target Schema

is manually created.

Study Design and Datasets 4

Source Dataset

Instructional dataset from the car-selling domain

(http://gaia.isI.cnr.it/˜straccia/down load/teaching/SI/2006/Autos.owl) The dataset contains:

324 data records (cars, car parts, etc.)

100 attributes (like speed, fuel consumption, ...).

91 concepts organized in a concept hierarchy.

Complex enough, but small enough to be handled in a user-study.

(18)

Ten

Integration Rules

were wrong and had to be identified by the

subjects (

Dependent Variable

).

Study Design and Datasets 4

Two Datasets containing 10 wrong integration rules each. Type 1: Easy Mistakes

Type 2: Complex Mistakes

The subjects had to find as many wrong integration rules as possible.

The dependent variable is the number of errors the subjects found in the

Wheel Engine

AirCondition AutomaticOneZoneAirCondition Filter:

hasZoneNumber = 2 hasAutomatic = false

(19)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 19

We compared the conventional approach with the

MappingAssistant approach (

Independent Variable

).

(20)

We compared the conventional approach with the

MappingAssistant approach (

Independent Variable

).

(21)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 21

We compared the conventional approach with the

MappingAssistant approach (

Independent Variable

).

(22)

We compared the conventional approach with the

MappingAssistant approach (

Independent Variable

).

(23)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 23

For

Simulating Background Knowledge

the subjects had an

information sheet.

(24)

Both, the order of tasks and the order of datasets were switched.

(25)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 25

We performed the study with 22 subjects.

Study Design and Datasets 4

22 subjects participated in the user study, each performed both tasks on both datasets.

6 female, 16 male

average age: 27.8 years (min = 21, max > 50). 54% of the subjects were students.

(26)

Precision, Recall, and F-Measure

Experimental Results 5

number of errors that have correctly been identified by a subject number of errors been identified by a subject

number of errors that have correctly been identified by a subject number of all existing correct errors (10)

1 2

(27)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 27

In the

Average Performance of Subjects

the recall was one third

higher in the MappingAssistant approach.

(28)

Comparing the

Performance on the Subject Level

91% of the

subjects found more mistakes in the MappingAssistant approach.

(29)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 29

In the standard approach subjects with low

technical knowledge reached lower F-Scores.

(30)

In the MappingAssistant approach the reached F-Score is

independent from the level of knowledge.

(31)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 31

The

User Feedback

is better for the MappingAssistant approach

than for the standard approach.

Experimental Results 5

Task 1

(32)

Conclusion

Conclusion and Future Work 6

The goal of our research was to enable the people with less or no knowledge of technologies to integrate their data. We presented a user-centric approach to data integration that is based on a cognitive support model.

We presented the results of a user study demonstrating that our

MappingAssistant approach empowers users to solve data integration

problems more effectively and efficiently. In particular, we showed that users were able to find more errors in mapping rules in a given period of time.

Further, we were able to show that while with conventional mapping

technology a high level of expertise in mapping technology is required, while the MappingAssistant approach significantly reduces the performance

difference of experienced and inexperienced users.

(33)

Slide 32

s5 auch hier ist das while zuviel oder?! shaihulud; 21.06.2012

(34)

In

Future Work

we will focus on correcting the wrong integration

rules.

Select concept and mark wrong instance

Feedback questions from the sysstem to the user

Identified the wront integration rules

Selection of the integration rule and mark Calculation of correction suggestions of the Selection of a correction suggestion Actualizing the integration rule

(35)

Jan Noessner - Lehrstuhl für künstliche Intelligenz – University of Mannheim 34

End

… for your attention!

References

Related documents

They also identified five general reasons for enhancing learners’ expressive English language skills in this phase: English is a new LoLT for most South African learners in

The result overall support the null hypothesis that earnings announcement have no impact on stock prices but Beaver (1968) point-out that earnings announcement provide information

The intuitive graphical interface makes it easy to see every step of the data mining process as part of a “stream.” Text analytics is straightforward and efficient, with

Also, the tours use a quality head-mounted microphone into which the guide talks, as well as quality ear-buds and receiver systems for the visitors; so, hearing the discussion inside

More formally, we address the problem of cache placement in general multi-hop networks wherein the given data item may be read and written by multiple network nodes, and the

The study also recommends further studies to be carried out on the relationship between board diversity and dividend policy on privately owned, SME’s, both listed and

PART 5 – WEALTH MANAGEMENT: ALTERNATIVE INVESTMENT STRATEGIES 25 Islamic banks and sukuk: growing fast, but still fragmented. Anouar

glossy white lacquered 188 32574-50-BIK L.70 H.108 P.18,5 rovere moro rovere moro rovere moro rovere moro rovere moro rovere moro rovere moro rovere moro dark oak dark oak dark oak