• No results found

Entity Resolution on Web Knowledge Graphs. Mayank Kejriwal

N/A
N/A
Protected

Academic year: 2021

Share "Entity Resolution on Web Knowledge Graphs. Mayank Kejriwal"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

Entity Resolution on Web

Knowledge Graphs

(2)
(3)
(4)

...to a Web of Linked ‘Data’

Cyganiak and Jentzsch (2014)

Linkeddata.org

▪ ‘Linked Open Data’ started in 2007 with just 12 RDF datasets ▪ By mid-2010s, contained: ▪ Millions of resources ▪ 1000 datasets ▪ 900,000 documents ▪ 500 million inter-dataset links ▪ Many domains! ▪ Applications include

schema.org, Google Knowledge Graph, the Constitute Project...

Media

Social Networking

Cross-domain

(5)

A set of four best practices for publishing and connecting structured data on the Web

Linked Data

(6)

Resource Description Framework (RDF)

An RDF dataset is a set of triples, visualized as a directed labeled graph

A triple is a 3-element tuple (subject, property, object) and represents an edge in the graph

▪ Subjects and properties are necessarily URIs

▪ Objects may be URIs or literals

http://www.w3.org/RDF

(7)

Entity Resolution (ER)

Jaffri et al. (2008)

Papadakis et al. (2010) Nikolov et al. (2011) ▪ Connecting pairs of entities that refer to the same underlying entity

(8)

What’s the vision? A thesaurus for entities called an

Entity Name

System (ENS)

Populating an ENS requires solutions to ER

Many applications

Paul Gardner Allen

Microsoft

...

freebase:Paul_G._Allen dbpedia:Allen_,Paul

...

dbpedia:Microsoft Corp.

...

freebase:Microsoft

(9)

Research question

What

requirements

need to be fulfilled in order to populate a

(10)
(11)

Linked Open Data

Cyganiak and Jentzsch (2014)

Linkeddata.org

▪ ‘Linked Open Data’ started in 2007 with just a handful of datasets

▪ At last survey (2014), contains: ▪ Millions of resources ▪ 1000 datasets ▪ 900,000 documents ▪ 500 million inter-dataset links ▪ Many domains! Media Social Networking Cross-domain Publications

(12)

Hypothesis

Populating a Linked Data Entity Name System requires

simultaneously

fulfilling the four

DASH

requirements of

domain-independence, automation, scalability and

heterogeneity

(13)

Step 1: Type alignment

Kejriwal and Miranker (2014) Euzenat and Shvaiko (2007)

(14)

Step 2: Property alignment

(15)
(16)

Step 3: blocking and similarity

Blocks

1

2

3

4

5

Apply

blocking key

e.g. Tokens(LastName)

Generate

candidate set (7

pairs), apply

similarity function

on each pair

?

?

?

?

?

?

?

Dataset 1

Dataset 2

‘Exhaustive’ set: 4 X 6=24 pairs

(17)
(18)

Supervised schematic (post type-alignment)

Elmagarmid et al. (2007) Learn Property Alignment Learn blocking key Learn Similarity function Training set of duplicates/ non-duplicates

Aligned training set

T rained Classifier Execute blocking Execute similarity Blocking key Candidate set :sameAs links RDF dataset 1 RDF dataset 2

(19)

Semi-supervised schematic (post type-alignment)

Kejriwal and Miranker (2015)

Learn Property Alignment Learn blocking key Learn Similarity function

Seed training set of duplicates/ non-duplicates

Aligned training set

T

rained

Classifier

Execute

blocking similarityExecute

Blocking key Candidate set :sameAs links RDF dataset 1 RDF dataset 2 Most con fid ent sa mples

(20)

Unsupervised schematic?

Learn Property Alignment Learn blocking key Learn Similarity function

Seed training set of duplicates/ non-duplicates

Aligned training set

T

rained

Classifier

Execute

blocking similarityExecute

Blocking key Candidate set :sameAs links RDF dataset 1 RDF dataset 2 Most con fid ent sa mples

(21)

Unsupervised schematic?

Kejriwal and Miranker (2013-2015)

Learn Property Alignment Learn blocking key Learn Similarity function Noisy seed training set of duplicates/ non-duplicates

Aligned training set

T rained Classifier Execute blocking Execute similarity Blocking key Candidate set :sameAs links RDF dataset 1 RDF dataset 2 Most con fid ent sa mples Training set generator?

(22)

A complete, unsupervised schematic

▪ Implemented both serially and in MapReduce (using standard cloud services) ▪ Feasible for linking large, cross-domain graphs like Dbpedia and Freebase

(23)
(24)

Let’s think about a complete KG ecosystem

Information extraction Co-reference resolution … Entity Resolution Knowledge Graph Embeddings … KG Construction KG Completion KG Applications Corpus (usually documents, but also webpages, tables, reviews, social media…) Domain Model (e.g., ontology) Corpus/Raw Data (reviews, social media etc.) Entity Linking Knowledge Graph representati on learning KG Identification End-user tasks and applications

Semantic Web Natural Language

Processing KDD, Core Machine Learning/AI, Semantic Web, Databases Semantic Web, Databases, Information Retrieval, Applied AI

(25)

Avenues for future research

Fully end-to-end, scalable deployment

of a complete KG pipeline (which

includes Entity Resolution)

Visualizing and interactively

manipulating KGs

Long-tail vs. short-tail ER

Systems-level vs. reductionist

understanding of the problem

For example, how does noise in

information extraction affect

performance of ER?

Easy-to-use open-source software

development

References

Related documents

With a full understanding of the government-wide initiatives, of DoD’s strategic plan, of DoN’s strategic plan, of the Navy’s vision, strategic context, and combined metrics effort,

• Source material for developing the grid layer (Reference: RFP Section II. Potential Address Ranges, number 1. Development of County wide Address Grid Layer, page 5).. •

[r]

The HR Manager will be the only HR presence in our Birmingham office but will be supported by and work closely with the wider HR team based in London. Key

In effect, the financial information from the accounting systems of hotels requires to be supplemented by qualitative operational measures, such as on the spot (simultaneous)

Cruise ships that deliver source separated waste, as confirmed and documented by the waste disposal company, are entitled to a 40% fee discount. To obtain this discount the

But overall, only little is known about the effect in- prison punishments, and studies have not identified a misconduct-reducing effect (Labrecque, 2015; Lucas & Jones,

ethnically diverse sexual offenders may present to public safety. In addition, the current researcher investigated the risk posed to public safety by Latino sexual offenders