2013 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STU TTGART WIEN
Introduction to
Ontologies
Technological challenges
Combining relational
databases and ontologies
Author : Marc Lieber2013 © Trivadis
AGENDA
1. Introduction to Semantic Web
2. Graph databases / Triple Stores
overview
Oracle Graph databases Franz Allegrograph
3. Uses cases
Novartis
Fraud detetion
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
2013 © Trivadis
Semantic technologies
1. “
Semantic technologies” generally refers to a broad spectrum
of techniques for finding signal in large or complex data
sources
Link Analysis
Distance
Pattern
Detect anomalies
Complex search
2013 © Trivadis
Ontology Editing and Engineering
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
4
2013 © Trivadis
Semantic Web in Use
1. Industries include:
Life Science, Health care and Pharma Energy sector , Oil & Gas
Google, Facebook, Linkedin Financial services
Digital libraries
Libraries & museums
Defense & Intelligence Service eGovernement
Media, Sport (BBC, NFL)
Networks & Communication Department Stores (Wallmart)
2013 © Trivadis
W3C Semantic Web technologies
• Goes back to few years now…
• Large set of specifications for many
application domains
• RDF, RDFS, OWL, SKOS, SNOMED, etc …
• Google’s schema.org initiative to
federate the definition of ontologies
• Ontologies : FOAF (Friend of a Friend)
• Serialisation in n3 triple, RDF/XML,
Turtle or RDFa (XHTML)
2013 © Trivadis
Graph DBs
1. Graph databases can be split into
W3c Semantic Web Databases also named as Triplestores or RDF graphDB General Graph databases; Property Graph and Hypergraph are two main
types of General Graph databases (Property Graph Vs. Hypergraph).
2. Triple stores store the relationships between nodes and their properties as triples or quads
3. Property Graphs store the relationships between nodes and the properties of each node separately
4. Some database such as Allegrograph can be considered as a W3c Semantic Web Database and a Property Graph DB since it supports Graph traversals and the W3C SPARQL querying language
2013 © Trivadis
Property graphs and hypergraphs
1. In a property graph both nodes and links can have properties
pays amount 2000 pays pays pays email [email protected] account# 96777543 Lat long 37.30|121.90 time 2013-01-01T12:12:12
2013 © Trivadis
Resource Description Framework Graphs
• URIs are used to identify Resources, entities, relationships, concepts
• Creates Subject-Property-Object “triples”
• Properties of subjects are triples
2013 © Trivadis
10 | Presentation Title | Presenter Name | Date | Subject | Business Use Only
RDF Triples
RDF as core data format
Uniform structure to represent data (triples)
[subject] [predicate] [object]
JFK president of the United States [resource] [property] [value]
quad = triple + named graph, quint = quad + technical ID (rowid)
use of namespaces to differentiate terms
Some are predefined, but you can create your own namespaces
JFK The United States
PresidentOf
<http://www.world.org/celibrity#JFK> <http://www.w3.org/2000/01/rdf-schema#label>
"John Fitzgerald Kennedy"^^<http://www.w3.org/2001/XMLSchema#string>. <http://www.world.org/airport#JFK> <http://www.world.org/airport#isLocatedIn> “New York City” .
2013 © Trivadis
Data migration : Where do triples come from?
1. Relational storage
2. Equivalent in triples
ID Name Hiredate Job Salary Deptno
7982 Scott 12-02-1998 Clerk 4800 30 7855 Adams 27-09-2001 Manager 7500 30
Subject Predicate Object
<...emp:7982> rdfs:label Scott xsd:string
<...emp:7982> <..HR#Hiredate> 12-02-1998 xsd:date <...emp:7982> <..HR#hasJob> Clerk xsd:string
<...emp:7982> <..HR#HasSalary> 4800 xsd:int <...emp:7982> <..HR#worksIn> <...dept:30> <...dept:30> rdfs:label Sales xsd:string
2013 © Trivadis
Databases – Market Overview
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
12
The database world is changing rapidly
NoSQL databases are often used in conjunction with Big Data
Graph databases can be split into W3c Semantic Web Databases and others
2013 © Trivadis
Triple stores comparaison
Tripe Stores Scalability (Billion Triples)
Query Reasoning support Full text Search support
Programming Jena (TDB) up to 1.7 BT SPARQL 1.1 OWL, RDFS Yes (lucene
integration)
Java
Sesame Millions Triples
SPARQL 1.1 RDFS Yes (through Lucene SAIL)
Java
OpenLink Viruoso
15.4 BT SPARQL 1.1 RDFS, subsets of OWL yes Java
Oracle >500 Billons Triples SPARQL 1.0 (11g) Sparql 1.1 (12c), SEM_MATCH, SEM_RELATED RDFS, OWL, OWLIM, SKOS, SNOMED
Yes (Oracle Text) Java, SQL, PL/SQL
OWLIM 20 BT SPARQL 1.1 RDFS, OWL, OWLIM yes Java
Allegrograph >500 Billons Triples
SPARQL 1.1, Prolog RDFS, Prolog rules yes Java, LISP, Python, Ruby, C#
4 Store 15 BT SPARQL 1.1 RDFS yes Java
BigData over 10 BT SPARQL 1.1 RDFS, OWL Lite Internal, external through Lucene
Java
Urika ( YarcData)
Trillions SPARQL 1.1 RDFS Yes Java, Python
Anzo Cambridge
unknown SPARQL 1.1 RDFS, OWL Yes (Information Mining)
2013 © Trivadis
SPARQL Protocol and RDF Query Language
• Latest Version 1.1
• SELECT returns all, or a
subset of, the variables bound in a query
pattern match
• CONSTRUCT returns
triples
• ASK returns a boolean
• DESCRIBE asks for
triples that describe a particular resource
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
2013 © Trivadis
SPARQL compared to SQL
A SPARQL query of this type would be quite difficult to translate into SQL queries :
2013 © Trivadis
Inferencing / Reasoning
•
Inferencing is the ability to make logical deductions based
on Ontology rules.
• The reasoning tools use the rules defined in the RDF Model (RDFS,
OWL, SKOS,…) to detect new properties and new relationships.
•
The ability to draw inferences from existing data using the
precision and rigor of mathematical logic is probably the
most important property that distinguishes semantic data
from others.
• Example of use: Linkedin or Facebooks discovering new links between
persons
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
2013 © Trivadis
Reasoning example
•
Graph representation
and data modelisation
•
Reasoning builts the
missing relation
•
Can take time …..
•
Some DBs do it on the
fly or materialize the
generated triples
2013 © Trivadis
LOD : Linked Open Data Initative
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
2013 © Trivadis
Semantic Web query federation
• Searching multiple
2013 © Trivadis
Semantic Web in relation to Big Data
or how to transform Big Data into Smart Data
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
20 . Sample vs. All • Clean vs Dirty • Many Undiscovered causation (Why) vs Correlation • Table vs Graph • Planned Path vs Discovery
2013 © Trivadis
Data Science example using R and SPARQL
1. Extracts data from htp://spatial.linkedscience.org and represents the
2013 © Trivadis
Linked Data in Enterprise
Index
Content Mgmt BI Server Data Warehouse
Machine Generated Data
Semantic Graph model (W3C RDF Metadata Model) Transaction Systems Hadoop Appliance Subscription Services Human Sourced Information Social Media Event Server Data Servers
Data Sources / Types
2013 © Trivadis
Franz Corp. Allegrograph
1. Allegrograph is licensed under proprietary commercial license
2. Focuses on high scalability
3. Development language : Java, Python or LISP
4. Alternative to SPARQL queries : PROLOG
5. RESTful HTTP protocol to maintain triples in the DB
2013 © Trivadis 16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
24
Oracle Spatial & Graphs
1. The Oracle RDF Triple Store embedded in the relational databases
Schema MDSYS contains RDF_LINK$ and RDF_VALUE$ tables
• SPARQL 1.1 supported in 12c
• Native support of most of the
W3C rules
• Use of named graphs (quad)
since 11.2.0.3
• Scales up to 100’s billions of
triples
• Oracle specific adapters
available for JENA, SESAME, TopBraid, Protégé and
2013 © Trivadis
Oracle Spatial & Graphs other features
1.
Support of Temporal reasoning, Spatial reasoning
2.
Fine grained security on triple level and for inferenced
graphs
3.
The oracle reasoner persists the infered triples in the DB. As
an alternative, integration with Pellet or TrOWL, as an
external OWL 2 reasoner
4.
Jena and Sesame Adapters
1.
To build SPARQL end points
2.
Bulk load triples from Java
3.
Develop applications in Java
5.
Integration with OBIEE, RDF browser
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
2013 © Trivadis
SPARQL and “
SPARQL in SQL
” Architecture
Jena API
Jena Adapter Sesame API Sesame Adapter
Standard SPARQL Endpoint
Enhanced with query management control
SQL
Java
HTTP
SPARQL-to-SQL Translation Logic SEM_MATCH2013 © Trivadis
ORACLE Database RDF Query engine
Can be joined with any other relational table or view
2013 © Trivadis
W3C R2RML
Oracle Spatial and Graph 12c can represent relational schema as graph view
Integrate content from distributed sources
Federate distributed databases Apply SPARQL queries on tables,
views, SQL query results
No duplication of data and storage
Relational to RDF Modeling
2013 © Trivadis
RDF Graph support in Oracle NoSQL Database Enterprise Edition
High performance Key Value store
Standard access to graph data: SPARQL 1.1 Jena & Joseki SPARQL endpoint Web Services Massive horizontal scalability – petabytes
of triples
Support for World Wide Web Consortium (W3C) Semantic Web standards
Graph Feature for NoSQL
Graph Support on Oracle
NoSQL
2013 © Trivadis
Novartis Institutes for BioMedical Research (NIBR)
Usecase : project Metastore
•
NIBR is the global pharmaceutical organization for Novartis
committed to discovering innovative medicines to treat
diseases with high unmet medical need
• 6000+ scientists, physicians, business professionals worldwide
•
METASTORE is a Scientific knowledge portal used by
many application to Search
over Ontology oriented data
• Organized around scientific concept types : Genes, Proteins, Indications,
Anatomy, diseases, taxonomy etc…;
Can be hierarchically organized and classified Builds a semantic network of scientific concepts
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
2013 © Trivadis
Solution implemented : Oracle Spatial & Graph
1. Accessible through dedicated service layer and reusable widgets
2013 © Trivadis
Use case Fraud detection
16-Oct-2013
Semantic Web or how to turn Big Data into actionable «Smart Data»
2013 © Trivadis
A real world fraud detection example
• Find any circle of payments between accounts that all happened within
10 miles of San Jose within the last day and where the payments > $1000
• Requires
• Graph Analytics • Temporal reasoning • Geospatial reasoning • Social Network Analysis
2013 © Trivadis
Social Network Analysis answers 4 questions
Social Network Analysis answers 4 questions
•
How far is P1 from P2 and how
strong is the relation
•
To what groups does this person
belong (ego groups, cliques?)
•
How important is this person in
the group?
•
Does this group have a leader,
2013 © Trivadis
Activity recognition
Find all meetings that happened in
November within 5 miles of Berkeley that
was attended by the most important person
in Jans’ friends and friends of friends.
(select (?x)(ego-group person:jans knows ?group 2) SNA
(actor-centrality-members ?group knows ?x ?num) SNA
(q ?event fr:actor ?x) DB Lookup
(qs ?event rdf:type fr:Meeting) RDFS
(interval-during ?event “2008-11-01” “2008-11-06”) Temporal (geo-box-around geoname:Berkeley ?event 5 miles) Spatial !)
2013 © Trivadis
Fraud detection example using SPARQL
Find the circle
Inspect the property graph
Geo Temporal
Find any circle of payments between accounts that all happened
within 10 miles of San Jose within the last day and where the payments > $1000
2013 © Trivadis
Conclusion : Why should you choose Semantic Web?
1. You want a flexible, adaptable, transparant information architecture
2. Project requires complex structures and large amount of relations
beetween classes as well as properties
3. project requires integration of data from different sources
4. heterogeneous sets of metadata and vocabulary concepts,
originating from multiple sources
5. Need for semantic annotations using controlled vocabularies and
thesauri such as FOAF, OWL, SKOS, etc …
6. There is a need for making logical deductions based on rules
2013 © Trivadis
BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STU TTGART WIEN
THANK YOU.
Marc Lieber[email protected] www.trivadis.com
38
Semantic Web or how to turn Big Data into actionable «Smart Data» 16-Oct-2013