• No results found

Best practices for Linked Data

N/A
N/A
Protected

Academic year: 2021

Share "Best practices for Linked Data"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Best practices for

Linked Data

Asunción Gómez-Pérez

Facultad de Informática, Universidad Politécnica de Madrid

Avda. Montepríncipe s/n, 28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net

[email protected]

Phone: 34.91.3367417, Fax: 34.91.3524819

Acknowledgements

:

M. Poveda, V. Rodríguez-Doncel , D. Vila

(2)

Linked Data: why it is important?

• 

Facilitate data integration

§

From heterogeous sources

§

In different formats

§

Different granularity

§

In different languages

§

From different countries

© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig

(3)

3

M. Cervantes Don Quixote Hebrew creator Translated into 1960 Year of publication VIAF located

Data Integration

3

M. Cervantes Alcalá de Henares Alcalá de Henares birthPlace Same as Alcalá de Henares 20º Temperatura M. Cervantes El Quijote Autor 1605 Año de Publicación BNE Ubicado en BD BNE BD VIAF BD AEMET BD IGN Alcalá de Henares Tapas Siglo de Oro guía BD Prisa BD DBpedia

(4)

4

Foundations

Unique identifiers: URI

identify or name a resource

RDF(S) models

Cer El Quijote Cervantes Is creator of Cer Work Person Is creator of Is a Is a http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX3383563 http://iflastandards.info/ns/fr/frbr/frbrer/C1005 http://iflastandards.info/ns/fr/frbr/frbrer/C1001

Equivalence links to other datasets

Same As

http://viaf.org/viaf/17220427 Cervantes

Same As

Same As

http://dbpedia.org/resource/Miguel_de_Cervantes Cervantes

Data navigation

http://www.w3.org/DesignIssues/LinkedData.html

(5)

5

The model (Ontology) and the data for humans

5

Work Idiom translation Year Publication date Library Located at Person Is creator of Has subject El Quijote Cervantes Is creator of Catalán translation 1960 Publication date BNE Located in Has subject Vida de Cervantes

Ontology

Data

birthPlace Place birthPlace Alcalá de Henares
(6)

6

6

http://iflastandards.info/ns/fr/frbr/frbrer/C1001 http://iflastandards.info/ns/fr/frbr/frbrer/C1002 translation Año Publication date http://xmlns.com/foaf/0.1/Organization Located in http://iflastandards.info/ns/fr/frbr/frbrer/C1005 Is creator of Has subject

http://datos.bne.es/resource/XX3383563 Es autor http://datos.bne.es/resource/XX1718747 http://datos.bne.es/resource/XX1924295 translation 1960 Publication date BNE Located in Has subject http://datos.bne.es/resource/bimo0002045496

Vida de Miguel de Cervantes Saavedra Don Quijote de la Mancha Cervantes Saavedra, Miguel de Catalán

Ontology

Data

http://datos.bne.es/# Language work Biblioteca Person http://geo.linkeddata.es/ontology/Municipio birthPlace http://geo.linkeddata.es/resource/Alcalá de Henares birthPlace

The model and the data for Machines

(7)

Linked Data is to be processed by machines

(8)

The generation process

Domains

Sources

Providers

Languages

(9)

The Linked Data Generation Process

Specification

Modelling

Generation

Publication

Exploitation

Linking

9

Data

Curation

(10)

Lot of data in many domains

Music

Geographic

Life Sciences

Publications

E-Gov

On-line activities

(11)

I want to use Linked Open Data

§

Who generated the LD dataset?

§

When the LD dataset was created?

§

How the LD dataset was created?

§

Is the latest version of the LD dataset?

§

Is the license information clearly stated in the LD dataset?

§

How is LD licenses offered?

(12)

LOD observations

How the LD

generation process

influence the use of

the data by third

parties?

• 

Vocabularies

• 

Licenses

• 

Language

(13)

How to prevent GIGO

Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18

th

December

GARBAGE

PROCESS

(14)

Vocabularies

14

(15)

Cervantes at the data level

http://www.server1.org/resource/Cervantes http://www.server2.es/resource/Cervantes http://datos.bne.es/resource/XX1718747 http://d-nb.info/gnd/11851993X http://geo.linkeddata.es/page/resource/Municipio/Cervantes Same as Same as Same as Same as URI URI URI URI URI 914 296 093 276,4 km² Phone Size 1547 #People 1547 Date of Birth Author D. Quijote Cervantes
(16)

Cervantes and a bit of semantics

http://www.server1.org/resource/Cervantes http://www.server2.es/resource/Cervantes http://datos.bne.es/resource/XX1718747 http://d-nb.info/gnd/11851993X http://geo.linkeddata.es/page/resource/Municipio/Cervantes Same as Person rdf:type rdf:type Retaurant rdf:type Street rdf:type Municipality rdf:type URI URI URI URI URI 1547 Date of Birth Author D. Quijote Cervantes (Person)
(17)

Cervantes foaf

17

bibliothek:Cervantes foaf:Person foaf:Document foaf:knows foaf:Image foaf:publications - foaf:firstName - foaf:surname owl:Thing foaf:mbox foaf:Agent foaf:Group foaf:Organization - foaf:birthday foaf:homepage foaf:img foaf:depiction http://.../authors/cervantes.png foaf:img http://www.BibliothekBerlin.com/.../3-538-06892-5 foaf:publications foaf:firstName foaf:surname “Miguel”

“de Cervantes Saavedra”

http://www.BibliothekBerlin/…/images/Quixote.tif foaf:depiction instanceOf instanceOf instanceOf “29-09” foaf:birthday instanceOf

(18)

License

Information

(19)

LOD observations: Licenses

How Open

(20)
(21)
(22)
(23)
(24)

http://oeg-dev.dia.fi.upm.es/licensius/static/ldr

/

(25)

Lenguage

(26)

26

Rationale: LOD is dominated by the English Language

Questions:

1. 

Searching resources in a particular language

2. 

Distribution of natural languages across RDF

datasets?

3. 

Usage of language tags to indicate the natural

language of RDF tags?

1.

Distribution of usage of language tags

2.

Distribution of literals tagged as English vs other languages

3.

Distribution of literals tagged in languages other than

English

(27)

27

Example of multilingual library resource

“Ernest Hemingway” and “El viejo y el mar” MARC 21 records

The dataset publisher does not tag the language of the content of different fields

(28)

Multilingualism and the Linked Data Process

How to represent language information for datasets?

How to represent language information in Linked Data?

§ 

Traditional

annotation

properties for most cases

§

Richer models

for more demanding applications

dbpedia:Miguel_de_Cervantes

rdfs:label "Miguel de Cervantes"@es .

"ミゲル・デ・セルバンテス"@ja . "미겔데세르반테스"@ko .

Asunción Gómez-Pérez W3C @ Spain – 2013 Madrid, 18

th

December

# LEMON

isbd:T1001 lemon:isReferenceOf [lemon:isSenseOf :cartographic]. :cartographic a lemon:LexicalEntry;

lemon:form [lemon:writtenRep “cartográfico”@es; isocat:grammaticalGender isocat:masculine]; lemon:form [lemon:writtenRep “cartográfica”@es; isocat:grammaticalGender isocat:feminine].

isocat:grammaticalGender rdfs:subPropertyOf lemon:property. # VoiD description :bne a void:Dataset; dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es> . # DCAT description :bne a dcat:Dataset; dcterms:language <http://id.loc.gov/vocabulary/iso639-1/es>

(29)

Implementation of the recording of data

and metadata provenance

29

RDF    

Store  

PROVENANCE  

Model  (RDF(S))  

1

Generation process

PROV-O @W3C

Filev1.  

txt  

Revision  

Process  

generatedBy  

File.txt  

used  

Resource provenance

• 

DC

creator  

rights  

creaDonDate  

John  

12-­‐2-­‐1900  

GPL  

(30)

Conclusions

The use of

§

Data curated

§

Use vocabularies widely known

§

License metadata in RDF

§

Language metadata in RDF

§

Provenance metadata in RDF

§

Will influence the use of the linked data by third parties

(31)

Thanks

for your attention !

31

Asuncion Gomez-Perez Guidelines for Multilingual Linked Data. WIMS – 2013 Madrid, 12-14 June

(32)

There is no One-Size-Fits-All Formula

Phase

BNE

IGN

AEMET

PRISA

INE

Modeling

RDF generation

Links generation

Publication

Exploitation

http://oa.upm.es/14465/1/2.formulaLD.pdf

Scovo

Data cube

SSN ontology

SIOC

DC

map4rdf

SPARQL

geometry

2

rdf

NOR

2

O

sitemap4rdf

Pubby

MARiMbA

Silk

Silk

Silk

NOR2O

DNB

VIAF

LIBRIS

DBPEDIA

DBPEDIA

Geonames

Geolinkeddata.es

DBPEDIA

Geolinkeddata.es

Geolinkeddata.es

hydr

ontology

Wgs84

time

(33)

33

2,567,324

10,250,936

3,154,779

10,594,338

12,272,806

3,365,930

RDF literals

without

language tag

RDF literals

with

language tag

January

2012

June

2012

December

2012

2. Current usage of language tagging capabilities in RDF

349

1,906

635

2,201

1,984

676

Monolingual

datasets

Multilingual

datasets

January

2012

June

2012

December

2012

1. Number of Monolingual and multilingual datasets

4. Evolution of top-10 languages

The multilingual Web of Data: Current state

431,660

2,135,664

2,751,065

403,714

2,808,145

557,785

RDF literals

with

English

tag

RDF literals

with

other

language tag

January

2012

June

2012

December

2012

References

Related documents

Step 1: member logs into VPS start page and requests a VPS Instance Step 2: VPS start page spins up a VPS instance for member VPS instance Instance Control T7 Architecture

The objectives of this study were to determine the efficacy of metabolites of a Streptomyces strain AS1 on (a) spore germination, (b) mycelial growth, (c) control of mycotoxins

- Requests the TrGW to bind corresponding IPv4 address(es) and port number(s) from its pool with the received IPv6 address(es) and port number(s) to enable the routing of user

Although both qualitative and quantitative methods can and have been applied to evaluate social marketing campaigns there are several limitations that must be

- establishing of basic principles and requirements for formation and maintenance of the system of risk management and internal controls in the organization,

In deciding whether to implement a hosted web portal to centralize document management and exchange among study participants, sponsors and CROs face numerous issues including: costs

RTOG research has set and/or validated many of the national and international standards for combined modality therapy of localized to intermediate-stage cancer in adult brain