• No results found

Introduction to Ontologies

N/A
N/A
Protected

Academic year: 2021

Share "Introduction to Ontologies"

Copied!
38
0
0

Loading.... (view fulltext now)

Full text

(1)

2013 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STU TTGART WIEN

Introduction to

Ontologies

Technological challenges

Combining relational

databases and ontologies

Author : Marc Lieber
(2)

2013 © Trivadis

AGENDA

1. Introduction to Semantic Web

2. Graph databases / Triple Stores

 overview

 Oracle Graph databases  Franz Allegrograph

3. Uses cases

 Novartis

 Fraud detetion

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

(3)

2013 © Trivadis

Semantic technologies

1.

Semantic technologies” generally refers to a broad spectrum

of techniques for finding signal in large or complex data

sources

Link Analysis

Distance

Pattern

Detect anomalies

Complex search

(4)

2013 © Trivadis

Ontology Editing and Engineering

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

4

(5)

2013 © Trivadis

Semantic Web in Use

1. Industries include:

 Life Science, Health care and Pharma  Energy sector , Oil & Gas

 Google, Facebook, Linkedin  Financial services

 Digital libraries

 Libraries & museums

 Defense & Intelligence Service  eGovernement

 Media, Sport (BBC, NFL)

 Networks & Communication  Department Stores (Wallmart)

(6)

2013 © Trivadis

W3C Semantic Web technologies

• Goes back to few years now…

• Large set of specifications for many

application domains

• RDF, RDFS, OWL, SKOS, SNOMED, etc …

• Google’s schema.org initiative to

federate the definition of ontologies

• Ontologies : FOAF (Friend of a Friend)

• Serialisation in n3 triple, RDF/XML,

Turtle or RDFa (XHTML)

(7)

2013 © Trivadis

Graph DBs

1. Graph databases can be split into

 W3c Semantic Web Databases also named as Triplestores or RDF graphDB  General Graph databases; Property Graph and Hypergraph are two main

types of General Graph databases (Property Graph Vs. Hypergraph).

2. Triple stores store the relationships between nodes and their properties as triples or quads

3. Property Graphs store the relationships between nodes and the properties of each node separately

4. Some database such as Allegrograph can be considered as a W3c Semantic Web Database and a Property Graph DB since it supports Graph traversals and the W3C SPARQL querying language

(8)

2013 © Trivadis

Property graphs and hypergraphs

1. In a property graph both nodes and links can have properties

pays amount 2000 pays pays pays email [email protected] account# 96777543 Lat long 37.30|121.90 time 2013-01-01T12:12:12

(9)

2013 © Trivadis

Resource Description Framework Graphs

• URIs are used to identify Resources, entities, relationships, concepts

• Creates Subject-Property-Object “triples”

• Properties of subjects are triples

(10)

2013 © Trivadis

10 | Presentation Title | Presenter Name | Date | Subject | Business Use Only

RDF Triples

 RDF as core data format

Uniform structure to represent data (triples)

[subject] [predicate] [object]

JFK president of the United States [resource] [property] [value]

quad = triple + named graph, quint = quad + technical ID (rowid)

 use of namespaces to differentiate terms

 Some are predefined, but you can create your own namespaces

JFK The United States

PresidentOf

<http://www.world.org/celibrity#JFK> <http://www.w3.org/2000/01/rdf-schema#label>

"John Fitzgerald Kennedy"^^<http://www.w3.org/2001/XMLSchema#string>. <http://www.world.org/airport#JFK> <http://www.world.org/airport#isLocatedIn> “New York City” .

(11)

2013 © Trivadis

Data migration : Where do triples come from?

1. Relational storage

2. Equivalent in triples

ID Name Hiredate Job Salary Deptno

7982 Scott 12-02-1998 Clerk 4800 30 7855 Adams 27-09-2001 Manager 7500 30

Subject Predicate Object

<...emp:7982> rdfs:label Scott xsd:string

<...emp:7982> <..HR#Hiredate> 12-02-1998 xsd:date <...emp:7982> <..HR#hasJob> Clerk xsd:string

<...emp:7982> <..HR#HasSalary> 4800 xsd:int <...emp:7982> <..HR#worksIn> <...dept:30> <...dept:30> rdfs:label Sales xsd:string

(12)

2013 © Trivadis

Databases – Market Overview

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

12

The database world is changing rapidly

NoSQL databases are often used in conjunction with Big Data

Graph databases can be split into W3c Semantic Web Databases and others

(13)

2013 © Trivadis

Triple stores comparaison

Tripe Stores Scalability (Billion Triples)

Query Reasoning support Full text Search support

Programming Jena (TDB) up to 1.7 BT SPARQL 1.1 OWL, RDFS Yes (lucene

integration)

Java

Sesame Millions Triples

SPARQL 1.1 RDFS Yes (through Lucene SAIL)

Java

OpenLink Viruoso

15.4 BT SPARQL 1.1 RDFS, subsets of OWL yes Java

Oracle >500 Billons Triples SPARQL 1.0 (11g) Sparql 1.1 (12c), SEM_MATCH, SEM_RELATED RDFS, OWL, OWLIM, SKOS, SNOMED

Yes (Oracle Text) Java, SQL, PL/SQL

OWLIM 20 BT SPARQL 1.1 RDFS, OWL, OWLIM yes Java

Allegrograph >500 Billons Triples

SPARQL 1.1, Prolog RDFS, Prolog rules yes Java, LISP, Python, Ruby, C#

4 Store 15 BT SPARQL 1.1 RDFS yes Java

BigData over 10 BT SPARQL 1.1 RDFS, OWL Lite Internal, external through Lucene

Java

Urika ( YarcData)

Trillions SPARQL 1.1 RDFS Yes Java, Python

Anzo Cambridge

unknown SPARQL 1.1 RDFS, OWL Yes (Information Mining)

(14)

2013 © Trivadis

SPARQL Protocol and RDF Query Language

• Latest Version 1.1

• SELECT returns all, or a

subset of, the variables bound in a query

pattern match

• CONSTRUCT returns

triples

• ASK returns a boolean

• DESCRIBE asks for

triples that describe a particular resource

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

(15)

2013 © Trivadis

SPARQL compared to SQL

A SPARQL query of this type would be quite difficult to translate into SQL queries :

(16)

2013 © Trivadis

Inferencing / Reasoning

Inferencing is the ability to make logical deductions based

on Ontology rules.

• The reasoning tools use the rules defined in the RDF Model (RDFS,

OWL, SKOS,…) to detect new properties and new relationships.

The ability to draw inferences from existing data using the

precision and rigor of mathematical logic is probably the

most important property that distinguishes semantic data

from others.

• Example of use: Linkedin or Facebooks discovering new links between

persons

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

(17)

2013 © Trivadis

Reasoning example

Graph representation

and data modelisation

Reasoning builts the

missing relation

Can take time …..

Some DBs do it on the

fly or materialize the

generated triples

(18)

2013 © Trivadis

LOD : Linked Open Data Initative

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

(19)

2013 © Trivadis

Semantic Web query federation

• Searching multiple

(20)

2013 © Trivadis

Semantic Web in relation to Big Data

or how to transform Big Data into Smart Data

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

20 . Sample vs. All • Clean vs Dirty • Many Undiscovered causation (Why) vs Correlation • Table vs Graph • Planned Path vs Discovery

(21)

2013 © Trivadis

Data Science example using R and SPARQL

1. Extracts data from htp://spatial.linkedscience.org and represents the

(22)

2013 © Trivadis

Linked Data in Enterprise

Index

Content Mgmt BI Server Data Warehouse

Machine Generated Data

Semantic Graph model (W3C RDF Metadata Model) Transaction Systems Hadoop Appliance Subscription Services Human Sourced Information Social Media Event Server Data Servers

Data Sources / Types

(23)

2013 © Trivadis

Franz Corp. Allegrograph

1. Allegrograph is licensed under proprietary commercial license

2. Focuses on high scalability

3. Development language : Java, Python or LISP

4. Alternative to SPARQL queries : PROLOG

5. RESTful HTTP protocol to maintain triples in the DB

(24)

2013 © Trivadis 16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

24

Oracle Spatial & Graphs

1. The Oracle RDF Triple Store embedded in the relational databases

 Schema MDSYS contains RDF_LINK$ and RDF_VALUE$ tables

• SPARQL 1.1 supported in 12c

• Native support of most of the

W3C rules

• Use of named graphs (quad)

since 11.2.0.3

• Scales up to 100’s billions of

triples

• Oracle specific adapters

available for JENA, SESAME, TopBraid, Protégé and

(25)

2013 © Trivadis

Oracle Spatial & Graphs other features

1.

Support of Temporal reasoning, Spatial reasoning

2.

Fine grained security on triple level and for inferenced

graphs

3.

The oracle reasoner persists the infered triples in the DB. As

an alternative, integration with Pellet or TrOWL, as an

external OWL 2 reasoner

4.

Jena and Sesame Adapters

1.

To build SPARQL end points

2.

Bulk load triples from Java

3.

Develop applications in Java

5.

Integration with OBIEE, RDF browser

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

(26)

2013 © Trivadis

SPARQL and “

SPARQL in SQL

” Architecture

Jena API

Jena Adapter Sesame API Sesame Adapter

Standard SPARQL Endpoint

Enhanced with query management control

SQL

Java

HTTP

SPARQL-to-SQL Translation Logic SEM_MATCH
(27)

2013 © Trivadis

ORACLE Database RDF Query engine

Can be joined with any other relational table or view

(28)

2013 © Trivadis

W3C R2RML

Oracle Spatial and Graph 12c can represent relational schema as graph view

Integrate content from distributed sources

Federate distributed databases Apply SPARQL queries on tables,

views, SQL query results

No duplication of data and storage

Relational to RDF Modeling

(29)

2013 © Trivadis

RDF Graph support in Oracle NoSQL Database Enterprise Edition

High performance Key Value store

Standard access to graph data: SPARQL 1.1 Jena & Joseki SPARQL endpoint Web Services Massive horizontal scalability – petabytes

of triples

Support for World Wide Web Consortium (W3C) Semantic Web standards

Graph Feature for NoSQL

Graph Support on Oracle

NoSQL

(30)

2013 © Trivadis

Novartis Institutes for BioMedical Research (NIBR)

Usecase : project Metastore

NIBR is the global pharmaceutical organization for Novartis

committed to discovering innovative medicines to treat

diseases with high unmet medical need

6000+ scientists, physicians, business professionals worldwide

METASTORE is a Scientific knowledge portal used by

many application to Search

over Ontology oriented data

• Organized around scientific concept types : Genes, Proteins, Indications,

Anatomy, diseases, taxonomy etc…;

 Can be hierarchically organized and classified  Builds a semantic network of scientific concepts

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

(31)

2013 © Trivadis

Solution implemented : Oracle Spatial & Graph

1. Accessible through dedicated service layer and reusable widgets

(32)

2013 © Trivadis

Use case Fraud detection

16-Oct-2013

Semantic Web or how to turn Big Data into actionable «Smart Data»

(33)

2013 © Trivadis

A real world fraud detection example

• Find any circle of payments between accounts that all happened within

10 miles of San Jose within the last day and where the payments > $1000

• Requires

• Graph Analytics • Temporal reasoning • Geospatial reasoning • Social Network Analysis

(34)

2013 © Trivadis

Social Network Analysis answers 4 questions

Social Network Analysis answers 4 questions

How far is P1 from P2 and how

strong is the relation

To what groups does this person

belong (ego groups, cliques?)

How important is this person in

the group?

Does this group have a leader,

(35)

2013 © Trivadis

Activity recognition

Find all meetings that happened in

November within 5 miles of Berkeley that

was attended by the most important person

in Jans’ friends and friends of friends.

(select (?x)

(ego-group person:jans knows ?group 2) SNA

(actor-centrality-members ?group knows ?x ?num) SNA

(q ?event fr:actor ?x) DB Lookup

(qs ?event rdf:type fr:Meeting) RDFS

(interval-during ?event “2008-11-01” “2008-11-06”) Temporal (geo-box-around geoname:Berkeley ?event 5 miles) Spatial !)

(36)

2013 © Trivadis

Fraud detection example using SPARQL

Find the circle

Inspect the property graph

Geo Temporal

Find any circle of payments between accounts that all happened

within 10 miles of San Jose within the last day and where the payments > $1000

(37)

2013 © Trivadis

Conclusion : Why should you choose Semantic Web?

1. You want a flexible, adaptable, transparant information architecture

2. Project requires complex structures and large amount of relations

beetween classes as well as properties

3. project requires integration of data from different sources

4. heterogeneous sets of metadata and vocabulary concepts,

originating from multiple sources

5. Need for semantic annotations using controlled vocabularies and

thesauri such as FOAF, OWL, SKOS, etc …

6. There is a need for making logical deductions based on rules

(38)

2013 © Trivadis

BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STU TTGART WIEN

THANK YOU.

Marc Lieber

[email protected] www.trivadis.com

38

Semantic Web or how to turn Big Data into actionable «Smart Data» 16-Oct-2013

schema.org (Property Graph Vs. Hypergraph

References

Related documents

M 3 DB uses Apache Hive (Hive Query Language HiveQL) that supports querying data stored in a Hadoop’s HDFS, and the Python library psycogpg2 that communicates with the

Jena also supports incremental maintenance (when the forward-chaining RETE-based engine is used). This is used uniformly for graph implementations, including in-memory,

As semantic graph database (SGDB) technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it

It is a World Wide Web Consortium (W3C) standard and it is considered a part of the semantic web vision. It is designed to provide a common way to process the content of web

Abstract Research State of Practice State of Research RuleML Semantic Web Rule Language (SWRL) RDQL RQL DAML Query Language (DQL) RDFQL SeRQL SPARQL Triple Here is what we

We introduced EPPs, a significant extension of property paths, the current navigational core of SPARQL, the standard query language for querying KGs based on RDF. We underlined

The methodology does not apply to the general ontology language OWL, but rather to a specific ontology called OWL-S (World Wide Web Consortium (W3C), 2004a) which can be used as

A SPARQL to SQL conversion tool should support all three current RDF to DB mappings: single triples table, property tables, and vertical partitioning (one predicate per table) [3]..