Semantic Web
Tim Berner Lee’s Vision:
Web as a means of collaboration for people Web as a means of collaboration for machines
Difficulties for the SemWeb
How is information represented in the actual Web?
As documents written in natural language
As graphs, pictures, tables, videos, and other multimedia
Humans are good at:
deduce facts from some (incomplete) information create associations between facts
aggregate information from several sources
But, machines:
cannot use partial (or incomplete) information
Semantic Web (1998 – 2008)
Semantic Web Layers
URI/IRIUniversal Resource Identifier
Internationalized Resource Identifier
XML
eXtendted Markup Language
RDF
Resource Description Framework
RDFS
RDF Schema
RIF
Rule Interchange Format
SPARQL
Simple Protocol and RDF Query Language
OWL
What is needed for the SemWeb?
The technologies shown in the previous picture.
That the existing data (which are meaningful only to people) are represented in a form understandable for machines. This means, annotate data with metadata.
Ontologies: documents that define relations among terms.
Software agents that can process the data on behalf of humans, and automated web services that provide data.
XML (eXtended Markup Language)
XML is a flexible text format that is widely used to structure, store, and transport data.
XML is different from HTML because it is not about displaying data.
In XML (differently from HTML) you create your own tags to annotate data.
XML is used to create other languages such as: XHTML, RSS, RDF, OWL, etc.
An XML Example
<bookstore>
<book category="COOKING">
<title lang="en">Everyday
Italian</title> <author>Giada De Laurentiis</author> <year>2005</year>
<price>30.00</price> </book>
<book category="CHILDREN">
<title lang="en">Harry Potter</title> <author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price> </book>
RDF (Resource Description Framework)
RDF: a standard for describing resources on the Web The meaning of data is encoded in sets of triples.
Triples are “subject, predicate, object” statements.
Each element of a triple is identified by a URI. URIs represent both resources and relations.
RDF is written in XML
RDF is to Semantic Web what HTML was to the Web.
An RDF Example
http://en.wikipedia.org/wiki/J._K._Rowling
http://en.wikipedia.org/wiki/ Harry_Potter dc:creator
<rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#
xmlns:dc=http://purl.org/dc/elements/1.1/> <rdf:Description
rdf:about=“http://en.wikipedia.org/wiki/Harry_Potter”>
Other RDF related technologies
RDFS supports expression of structured vocabulary.
It can be used to represent minimal ontologies.
RDF triples are stored in special repositories. For an example, refer to openRDF.org
GRDDL - Gleaning Resource Descriptions from Dialects of Languages (a means to extract RDF from XML or XHTML documents)
Ontologies and OWL
An ontology is an explicit description of things and their relations.
OWL serves to write ontologies for the Web. OWL is written in XML and built on top of RDF.
You can think of OWL as an object-oriented language that defines classes, hierarchy of classes, attributes, relations, etc.
OWL is designed to support inference (subsumption and classification)
An Ontology Example
Visit http://protege.stanford.edu/ to learn about creating ontologies.
Friend of a Friend (FOAF)
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/">
<foaf:Person> <foaf:name>Peter Parker</foaf:name> <foaf:gender>Male</foaf:gender> <foaf:title>Mr</foaf:title> <foaf:givenname>Peter</foaf:givenname> <foaf:family_name>Parker</foaf:family_name> <foaf:mbox_sha1sum>cf2f4bd069302febd8d7c26d803f63fa7f20bd82 </foaf:mbox_sha1sum> <foaf:homepage rdf:resource="http://www.peterparker.com"/> </foaf:Person> </rdf:RDF>
Repositories of SW data
a community effort to extract structured information from Wikipedia
Semantic web atlas of postgenomic knowledge the universal protein resource, a central repository of protein data
Semantic Web Search
A search engine for semantic web
documents
Semantic Applications
Summary
Semantic Web is an ambitious vision with uncertain future. Not all technologies needed are yet in place, but progress is steady.
The biggest challenge is to convince people to make their data available in an annotated form (e.g., RDF).
There are big research opportunities in the SemWeb:
automatically annotating data creating, aligning ontologies
Where to learn more
W3C Semantic Web Activity: http://www.w3.org/2001/sw/
Prof. James Hendler: http://www.cs.rpi.edu/~hendler/
Prof. Steffen Staab: http://www.uni-koblenz.de/~staab/
Resources:
Jena – A semantic web framework for Java. It provides a
programmatic environment for RDF, RDFS and OWL, SPARQL and includes a rule-based inference engine. http://jena.sourceforge.net/
Yahoo! SearchMonkey: http://developer.yahoo.com/searchmonkey/ LinkingOpenData: