The Semantic Web:
Web of (integrated) Data
Frank van Harmelen Vrije Universiteit Amsterdam
Take home message
Semantic Web = Web of Data
(no longer only web of text, web of pictures)
Set of open, stable W3C standards
Rapidly emerging tools & vendors
Use cases:
z data integration z web services
z knowledge management z search (intranets)
Outline
The vision
What is required
Machine representation
z
XML, RDF, OWL
Where are we now?
Examples
Things we would
“Intelligent” things
we can’t do today
Search engines
• concepts, not keywords
• semantic narrowing/widening of queries
Shopbots
• semantic interchange, not screenscraping
E-commerce
z Negotiation, catalogue mapping, personalisation
Web Services
z Need semantic characterisations to find them, z to combine them
Navigation
• by semantic proximity, not hardwired links
...
harmelen harmelen
Other use-case are
z personalisation z semantic linking z data integration z web services z ...Sounds good, so..
how is this tackled?
Outline
The vision
What is required
Machine representation
z
XML, RDF, OWL
Where are we now?
Examples
machine accessible meaning
(What it’s like to be a machine)
disease name symptoms drug administration
Meta-data !
What is meta-data?
it's just data
it's data describing other data
its' meant for machine consumption
disease name symptoms drug administrationmeta-data +
ontologies
<name> <symptoms> <drug> <drug administration> <disease> <treatment> IS-A reducesWhat’s inside an ontology?
terms + specialisation hierarchy classes + class-hierarchy
instances slots/values
inheritance (multiple? defaults?) restrictions on slots (type, cardinality) properties of slots (symm., trans., …)
relations between classes (disjoint, covers) reasoning tasks: classification, subsumption
Increasing semantic “weight”
In short
(for the duration of this tutorial)
Ontologies are
notdefinitive descriptions of
what exists in the world (= philosphy)
Ontologies
areshared models of the world
constructed
to facilitate communication
Yes, ontologies exist
(because we build them)
Real life examples
handcrafted (often by communities)
z music: CDnow (2410/5), MusicMoz (1073/7) z biomedical: SNOMED (200k), GO (15k),
Emtree(45k+190k)
ranging from lightweight (
Yahoo, UNSPC)
to heavyweight (
Cyc)
ranging from small (
METAR)
to large (
UNSPC)
allright,
but how to represent all this
in a computer?
Outline
The vision
What is required
machine representation
z
XML, RDF, OWL
Where are we now?
Examples
What was XML again?
country
name capital
“Netherlands” name areacode “Amsterdam” “020”
<country name=”Netherlands”> <capital name=”Amsterdam”>
<areacode>020</areacode> </capital>
</country>
So why not just use XML?
No agreement on: z structure • is country a: –object? –class? –attribute? –relation? –something else? • what does nesting
mean? z vocabulary
• is country the same as nation?
<countryname=”Netherlands”> <capitalname=”Amsterdam”>
<areacode>020</areacode> </capital>
</country> <nation>
<name>Netherlands</name> <capital>Amsterdam</capital> <capital_areacode>
020
</capital_areacode> </nation>
●Are the above XML documents the same? ●Do they convey the same information? ●Is the answer machine-derivable?
So: XML
≠
machine accessible meaning
CV name education work private < > < > < > < > < > < Χς> <ναμε> <εδυχατιον> <ωορκ> <πριϖατε>
W3C Stack
XML
:
z Surface syntax, no semantics
XML Schema
:
z Describes structure of XML documents
RDF
:
z Datamodel for “relations” between “things”
RDF Schema
:
z RDF Vocabulary Definition Language
OWL
:
z A more expressive
Vocabulary Definition Language
RDF & RDF Schema
RDF =
z relations between things
z all objects are URL’s (both things and relations)
RDF Schema =
z hierarchical organisation of an RDF vocabulary z all things are URL’s
(classes of things, subclass relations)
The semantic pyramid again
OWL:
things RDF Schema can’t do
equality
enumeration
number restrictions
z Single-valued/multi-valued z Optional/required values
inverse, symmetric, transitive
boolean algebra
z Union, complement
Again:
Sounds good in theory.
How far are you with this
in practice?
Where are we now: tools
Languages are stable (W3C)
Tooling is rapidly emerging
z HP, IBM, Oracle, Adobe, … z Parsers,
z Editors, z visualisers,
z large scale storage and querying z Portal generation
Aduna
Three example use-cases
Closed-world data integration:
DOPE browser @ Elsevier
Open-world data integration:
streaming media @ Philips
Semantic Web services
Conclusions
This section joint with Aduna and
Anita de Waard@Elsevier
This section joint with Aduna and
Anita de Waard@Elsevier
Closed-world data integration:
Background
Vertical Information Provision
z Buy a topic instead of a Journal ! z Web provides new opportunities
Business driver: drug development
z Rich, information-hungry market z Good thesaurus (EMTREE)
The Data
Document repositories:
z ScienceDirect: approx. 500.000 fulltext articles z MEDLINE: approx. 10.000.000 abstracts
Extracted Metadata
z The Collexis Metadata Server: concept-extraction ("semantic fingerprinting")
Thesauri and Ontologies
z EMTREE:RDF Schema EMTREE Query interface RDF Datasource 1 RDF Datasource n
….
Architecture:
This section material from Zharko Aleksovski @ VU & Philips
This section material from Zharko Aleksovski @ VU & Philips
Web-based
data integration scenario:
•
heterogeneous
Motivating scenario
consumer.philips.com User devices Semantic Web iTunes Wal*Mart Buy.com Napster eMusic Musicmatch Rhapsody Providers MusicNet MusicNow LaunchCastExample
Evergreens and Golden hits are related: Golden hits is mostly subclass of Evergreens
Music Ontology
Mediator
Domain characteristics
Many music providers
Wide variety of music offered
Constantly increasing in size and evolving
Cumbersome to browse and retrieve music
There is no agreement
z Different terms are used
z The same terms contain different sets of artists
CDNow (Amazon.com)
All Music Guide
MusicMoz
ArtistGigs
Artist Direct Network
CD baby
Yahoo
Size: 96classes Depth: 2levels Size: 2410classes
Depth: 5levels Size: 382classes Depth: 4levels Size: 222classes Depth: 2levels Size: 1073classes Depth: 7levels Size: 465classes Depth: 2levels Size: 403classes Depth: 3levels
data-sources
Why
approximate matching
Genre is not precisely defined
Pop and Rock have no common definition
on the big portals AllMusic.com,
Amazon.com and MP3.com
Exact reasoning will not be useful
A X % 1 % 99
Results
A - AllMusicGuide B - ArtistDirectNetwork 0 100000 200000 300000 400000 500000 600000 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 B subClass of A A subClass of B equivalencesThis section material from Marta Sabou @ VU
This section material from Marta Sabou @ VU
Semantic Web Services
What are web-services
a software system designed to support
interoperable machine-to-machine
interaction over a network.
has an interface described in a machine
processable format (specifically WSDL).
Other systems interact with a web service
in a manner specified by its descriptions
using SOAP messages
Web Service Tasks
Web Service Discovery & Selection
z Find an airline that can fly me to Marina del Rey
Web Service Invocation
z Book flight tickets from NWAto arrive 12thOct.
Web Service Composition & Interoperation
z Arrange taxis, flights and hotel for travel from
Southampton to Portland, OR, via Marina del Rey, CA.
Web Service Execution Monitoring
z Has the taxi to Gatwick Airport been reserved yet?
Limitations of WS Technology
Manual Discovery
Manual Invocation
Manual (ad hoc) Mediation
Use of Semantics: Example
<do:HotelBooking rdf:ID=”WS1">
<owls:hasInput rdf:resource=”do:Hotel”/> </do:HotelBooking >
<do:HostelBooking rdf:ID=”WS2"> <owls:hasInput rdf:res=”do:Hostel”/> </do:HostelBooking >
R:(BookingService,Hotel)=> * exact match with WS1 * plug-in match for WS2
Degrees
of WS Matching
Match Advertisement with Request:
Exact: Adv equals Req
Plug-In: Adv is more general than Req
Subsume: Adv is less general than Req Intersection: Adv and Req overlap (a bit)
Disjoint: Adv and Req don’ t overlap
Matchmaking algorithms (primarily) employ subsumption reasoning over the knowledge provided by the domain ontologies.
Take home message again:
Take home message
Semantic Web = Web of Data
(no longer web of text, web of pictures)
Set of open, stable W3C standards
Rapidly emerging tools & vendors
Use cases:
z data integration z web services
z knowledge management z search (intranets)