Best practices for
Linked Data
Asunción Gómez-Pérez
Facultad de Informática, Universidad Politécnica de Madrid
Avda. Montepríncipe s/n, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net
Phone: 34.91.3367417, Fax: 34.91.3524819
Acknowledgements
:
M. Poveda, V. Rodríguez-Doncel , D. Vila
Linked Data: why it is important?
•
Facilitate data integration
§
From heterogeous sources
§
In different formats
§
Different granularity
§
In different languages
§
From different countries
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
3
M. Cervantes Don Quixote Hebrew creator Translated into 1960 Year of publication VIAF locatedData Integration
3
M. Cervantes Alcalá de Henares Alcalá de Henares birthPlace Same as Alcalá de Henares 20º Temperatura M. Cervantes El Quijote Autor 1605 Año de Publicación BNE Ubicado en BD BNE BD VIAF BD AEMET BD IGN Alcalá de Henares Tapas Siglo de Oro guía BD Prisa BD DBpedia
4
Foundations
Unique identifiers: URI
identify or name a resource
RDF(S) models
Cer El Quijote Cervantes Is creator of Cer Work Person Is creator of Is a Is a http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX3383563 http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://iflastandards.info/ns/fr/frbr/frbrer/C1001
Equivalence links to other datasets
Same As
http://viaf.org/viaf/17220427 CervantesSame As
Same As
http://dbpedia.org/resource/Miguel_de_Cervantes CervantesData navigation
http://www.w3.org/DesignIssues/LinkedData.html
5
The model (Ontology) and the data for humans
5
Work Idiom translation Year Publication date Library Located at Person Is creator of Has subject El Quijote Cervantes Is creator of Catalán translation 1960 Publication date BNE Located in Has subject Vida de CervantesOntology
Data
birthPlace Place birthPlace Alcalá de Henares6
6
http://iflastandards.info/ns/fr/frbr/frbrer/C1001 http://iflastandards.info/ns/fr/frbr/frbrer/C1002 translation Año Publication date http://xmlns.com/foaf/0.1/Organization Located in http://iflastandards.info/ns/fr/frbr/frbrer/C1005 Is creator of Has subjecthttp://datos.bne.es/resource/XX3383563 Es autor http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX1924295 translation 1960 Publication date BNE Located in Has subject http://datos.bne.es/resource/bimo0002045496
Vida de Miguel de Cervantes Saavedra Don Quijote de la Mancha Cervantes Saavedra, Miguel de Catalán
Ontology
Data
http://datos.bne.es/# Language work Biblioteca Person http://geo.linkeddata.es/ontology/Municipio birthPlace http://geo.linkeddata.es/resource/Alcalá de Henares birthPlaceThe model and the data for Machines
Linked Data is to be processed by machines
The generation process
Domains
Sources
Providers
Languages
The Linked Data Generation Process
Specification
Modelling
Generation
Publication
Exploitation
Linking
9
Data
Curation
Lot of data in many domains
…
Music
Geographic
Life Sciences
Publications
E-Gov
On-line activities
I want to use Linked Open Data
§
Who generated the LD dataset?
§
When the LD dataset was created?
§
How the LD dataset was created?
§
Is the latest version of the LD dataset?
§
Is the license information clearly stated in the LD dataset?
§
How is LD licenses offered?
LOD observations
•
How the LD
generation process
influence the use of
the data by third
parties?
•
Vocabularies
•
Licenses
•
Language
How to prevent GIGO
Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18
thDecember
GARBAGE
PROCESS
Vocabularies
14
Cervantes at the data level
http://www.server1.org/resource/Cervantes http://www.server2.es/resource/Cervantes http://datos.bne.es/resource/XX1718747 http://d-nb.info/gnd/11851993X http://geo.linkeddata.es/page/resource/Municipio/Cervantes Same as Same as Same as Same as URI URI URI URI URI 914 296 093 276,4 km² Phone Size 1547 #People 1547 Date of Birth Author D. Quijote CervantesCervantes and a bit of semantics
http://www.server1.org/resource/Cervantes http://www.server2.es/resource/Cervantes http://datos.bne.es/resource/XX1718747 http://d-nb.info/gnd/11851993X http://geo.linkeddata.es/page/resource/Municipio/Cervantes Same as Person rdf:type rdf:type Retaurant rdf:type Street rdf:type Municipality rdf:type URI URI URI URI URI 1547 Date of Birth Author D. Quijote Cervantes (Person)Cervantes foaf
17
bibliothek:Cervantes foaf:Person foaf:Document foaf:knows foaf:Image foaf:publications - foaf:firstName - foaf:surname owl:Thing foaf:mbox foaf:Agent foaf:Group foaf:Organization - foaf:birthday foaf:homepage foaf:img foaf:depiction http://.../authors/cervantes.png foaf:img http://www.BibliothekBerlin.com/.../3-538-06892-5 foaf:publications foaf:firstName foaf:surname “Miguel”“de Cervantes Saavedra”
http://www.BibliothekBerlin/…/images/Quixote.tif foaf:depiction instanceOf instanceOf instanceOf “29-09” foaf:birthday instanceOf
License
Information
LOD observations: Licenses
How Open
http://oeg-dev.dia.fi.upm.es/licensius/static/ldr
/
Lenguage
26
Rationale: LOD is dominated by the English Language
Questions:
1.
Searching resources in a particular language
2.
Distribution of natural languages across RDF
datasets?
3.
Usage of language tags to indicate the natural
language of RDF tags?
1.
Distribution of usage of language tags
2.
Distribution of literals tagged as English vs other languages
3.
Distribution of literals tagged in languages other than
English
27
Example of multilingual library resource
“Ernest Hemingway” and “El viejo y el mar” MARC 21 records
The dataset publisher does not tag the language of the content of different fields
Multilingualism and the Linked Data Process
How to represent language information for datasets?
•
How to represent language information in Linked Data?
§
Traditional
annotation
properties for most cases
§
Richer models
for more demanding applications
dbpedia:Miguel_de_Cervantes
rdfs:label "Miguel de Cervantes"@es .
"ミゲル・デ・セルバンテス"@ja . "미겔데세르반테스"@ko .
Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18
thDecember
# LEMON
isbd:T1001 lemon:isReferenceOf [lemon:isSenseOf :cartographic]. :cartographic a lemon:LexicalEntry;
lemon:form [lemon:writtenRep “cartográfico”@es; isocat:grammaticalGender isocat:masculine]; lemon:form [lemon:writtenRep “cartográfica”@es; isocat:grammaticalGender isocat:feminine].
isocat:grammaticalGender rdfs:subPropertyOf lemon:property. # VoiD description :bne a void:Dataset; dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es> . # DCAT description :bne a dcat:Dataset; dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es>
Implementation of the recording of data
and metadata provenance
29
RDF
Store
PROVENANCE
Model (RDF(S))
1Generation process
•
PROV-O @W3C
Filev1.
txt
Revision
Process
generatedBy
File.txt
used
Resource provenance
•
DC
creator
rights
creaDonDate
John
12-‐2-‐1900
GPL
Conclusions
The use of
§
Data curated
§
Use vocabularies widely known
§
License metadata in RDF
§
Language metadata in RDF
§
Provenance metadata in RDF
§
Will influence the use of the linked data by third parties
Thanks
for your attention !
31
Asuncion Gomez-Perez Guidelines for Multilingual Linked Data. WIMS – 2013 Madrid, 12-14 June
There is no One-Size-Fits-All Formula
Phase
BNE
IGN
AEMET
PRISA
INE
Modeling
RDF generation
Links generation
Publication
Exploitation
http://oa.upm.es/14465/1/2.formulaLD.pdf
Scovo
Data cube
SSN ontology
SIOC
DC
map4rdf
SPARQL
geometry
2
rdf
NOR
2
O
sitemap4rdf
Pubby
MARiMbA
Silk
Silk
Silk
NOR2O
DNB
VIAF
LIBRIS
DBPEDIA
DBPEDIA
Geonames
Geolinkeddata.es
DBPEDIA
Geolinkeddata.es
Geolinkeddata.es
hydr
ontology
Wgs84
time
33
2,567,324
10,250,936
3,154,779
10,594,338
12,272,806
3,365,930
RDF literals
without
language tag
RDF literals
with
language tag
January
2012
June
2012
December
2012
2. Current usage of language tagging capabilities in RDF
349
1,906
635
2,201
1,984
676
Monolingual
datasets
Multilingual
datasets
January
2012
June
2012
December
2012
1. Number of Monolingual and multilingual datasets
4. Evolution of top-10 languages
The multilingual Web of Data: Current state
431,660
2,135,664
2,751,065
403,714
2,808,145
557,785
RDF literals
with
English
tag
RDF literals
with
other
language tag
January
2012
June
2012
December
2012