• No results found

Inducing Implicit Relations from text using distantly supervised Deep Nets

N/A
N/A
Protected

Academic year: 2022

Share "Inducing Implicit Relations from text using distantly supervised Deep Nets"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

© 2018 IBM Corporation

Inducing Implicit Relations from text using distantly supervised Deep Nets

Michael Glass

1

Alfio Gliozzo

1

Oktie Hassanzadeh

1

Nandana Mihindukulasooriya

12

Gaetano Rossiello

13

1

IBM Research AI

2

Universidad Politécnica de Madrid

3

University of Bari

(2)

Outline

 Distant Supervision for Relation Extraction

 Detecting Implicit Knowledge with Unary Relations

 Detecting Implicit Knowledge with Composite contexts

 Evaluation

 Application to ISWC 2017 Challenge

(3)

© 2018 IBM Corporation

Extending Knowledge Graphs using Distant Supervision

 Given a knowledge graph, an Entity Detection and Linking (EDL) system connects it to text

 Training Data is collected by identifying contexts in the corpus containing pairs of related entities

 A Relation Extraction system is trained to recognize new relations between entities

3

Domain Corpus

EDL RE

Open PermID

(4)

Binary Contexts

 The company competes with Holcim Philippines, the local unit of Swiss company LafargeHolcim, and Eagle Cement, a company backed by

diversified local conglomerate San Miguel which is aggressively expanding into infrastructure.

 The company competes with Holcim Philippines, the local unit of Swiss company LafargeHolcim, and Eagle Cement, a company backed by

diversified local conglomerate San Miguel which is aggressively expanding into infrastructure.

Company Country

Company Country

Positive

Negative

(5)

© 2018 IBM Corporation

Deep Learning Model

Inspired by Lin et al, 2016

5

(6)

Context Sets

Socrates Architecture for Knowledge Graph Population

KB

Corpus

EDL Binary

Unary Composite

DL Unary Extended

KB

DL Binary

DL Composite M e r g i n g

TODs

IBM Apple

Tesla

EDL

(7)

© 2018 IBM Corporation

Unary Relations

7

Arg. 1 Relation Arg. 2

Bionik Laboratories Corp location USA Tanaka Properties LLC location USA Aardvark IT Solutions Inc location USA Sammy Usa Corp location USA

Qualcorp Inc location USA

Multinet Communications location Canada Sundial Growers Inc location Canada Catamaran Corp Ltd location India Jeet Properties P Ltd location India Stafford Funding Ltd location UK Barking Cash And Carry location UK

Easedandy location UK

 X:location:USA

– Bionik Laboratories Corp, Tanaka Properties LLC, Aardvark IT Solutions Inc, Sammy Usa Corp, Qualcorp Inc, …

 X:location:Canada

– Multinet Communications Services Inc, Sundial Growers Inc, …

 X:location:India

– Jeet Properties P Ltd, Catamaran Corp Ltd, …

 X:location:UK

– Stafford Funding Ltd, Barking Cash And Carry, Easedandy, …

group by frequent argument

Create Unary Relations:

One fixed argument, one filler argument

Fixed Argument Filler

Argument

(8)

Unary Contexts

 Woolworths, Coles owner Wesfarmers, JB Hi-Fi and Harvey Norman were also trading higher.

 JB Hi-Fi in talks to buy The Good Guys

 In equities news, protective glove and condom maker Ansell and JB Hi-Fi are slated to post half year results, while Bitcoin group is expected to list on ASX.

Australian Companies

Australian Stock Exchange

X#[dbo:locationCountry#Australia]

(9)

© 2018 IBM Corporation

Locate single occurrences of entities

9

Domain Corpus

Entity Mention Set

Unary Deep Network Domain

Corpus Entity

Detection and Linking

 The fixed argument is considered part of the relation

 To predict the relation we only find occurrences of the filler argument

 We gather all occurrences of the relevant entities, group by entity and predict the unary relations

 We collect thousands of different unary relations from a large KB

(10)

Context Sets

Socrates Architecture for Knowledge Graph Population

KB

Corpus

EDL Binary

Unary Composite

DL Unary Extended

KB

DL Binary

DL Composite M e r g i n g

TODs

IBM Apple

Tesla

EDL

(11)

© 2018 IBM Corporation

Gathering Composite Contexts from Title Oriented Documents

11

Focus Entity

contact form below. Our representatives would be glad to

help you! Phone: 514-822-4431 Toll-free: 1-800-387-9666 Fax:

514-812-8641 E-mail:

ABC Electronique Canada

Phone Number

X#HeadquartersPhoneNumber#Phone`

ABC Electronique Canada … Our representatives would be glad to help you! Phone: 514-822-4431 Toll-free: 1- 800-387-9666 Fax: 514-812-8641 E-mail:

ABC Electronique Canada … Our representatives would be glad to help you! Phone: 514-822-4431 Toll-free: 1- 800-387-9666 Fax: 514-812-8641 E-mail:

ABC Electronique Canada … Our representatives would be glad to help you! Phone: 514-822-4431 Toll-free: 1- 800-387-9666 Fax: 514-812-8641 E-mail:

Positive

Negative

Negative

Open PermID

(12)

Outline

 Distantly Supervised approaches for Relation Extraction: State of the art

 Detecting Implicit Knowledge with Unary Relations

 Detecting Implicit Knowledge with Composite contexts

 Evaluation

 Application to ISWC 2017 Challenge

(13)

© 2018 IBM Corporation

Binary Context: evaluation in New York Times - Freebase

13

Riedel, Sebastian and Yao, Limin and McCallum, Andrew.

Modeling relations and their mentions without labeled text, Machine learning and knowledge discovery in databases, 2010

Dataset Corpus size

# sent.

Relation Types

# contexts

NYT-FB 1.8M 56 157K

CC-DBP 173M 298 3M

Lin et al, Neural Relation Extraction with Selective attention over instances, ACL

(14)

Unary Contexts: Evaluation in Common Crawl – DBPedia

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 10000 20000 30000 40000 50000

Precision

Recall Count

Unary Binary Binary & Unary

Michael Glass and Alfio Gliozzo. A Dataset for Web-scale Knowledge Base Population, ESWC 2018

Dataset Corpus size

# sent.

Relation Types

# contexts

NYT-FB 1.8M 56 157K

CC-DBP 173M 298 3M

Aggregating binary and unary relations by max score

We pick the 1000 most frequent unary relations

(15)

© 2018 IBM Corporation

Outline

 Distantly Supervised approaches for Relation Extraction: State of the art

 Detecting Implicit Knowledge with Unary Relations

 Detecting Implicit Knowledge with Composite contexts

 Evaluation

 Application to ISWC 2017 Challenge

15

(16)

ISWC 2017 Challenge

Headquarters Country2 Headquarters Phone Number Year

Founded

Website URL1

Using the public portion of the PermID knowledge base, predict (Task 1) or validate (Task 2)

attributes of companies in the

private portion

(17)

© 2018 IBM Corporation

ISWC 2017 Challenge: Data Collection and Methodology

17

Entity Detection and Linking

Company Names Phone Numbers

Years

Website Text

&

Search Result Snippets

Extract Context and Group by

Entity Pair

• Collected TOD for 80k training companies

• Collected TOD for 90% of the companies in test

• Used Socrates (Composite) for YearFounded and

HeadquartersPhoneNu mber

• Used Text Categorization approach for

HeadquartersCountry

(18)

Attribute Prediction

Query Company

Allen & Shariff Engineering LLC http://www.allenshariff.com

Output

country United States phone 14103817100 year 1993

Example Input/Output

(19)

© 2018 IBM Corporation

Attribute Validation

19

Example Input/Output

Company Name: A-Cute Derm Inc Country: United States

Website: http://www.a-cutederm.com?

Year Founded: 1977?

Phone Number: 17137715018?

(20)

Conclusion and Future work

 Distant supervision is a scalable and effective solution to Extend Knowledge Bases using information derived from text …

 … however, binary relations occurring in the same sentence provide low recall

 Socrates is a framework to overcome this limitation in two ways –Identifying Unary Relations

–Exploiting Composite Contexts from Title Oriented Documents

 Experiments demonstrate large recall improvements …

 … and we won the ISWC 2017 Challenge!

 Working on

–Using Distant Supervision for Entity Detection

–Using Probabilistic Reasoning to leverage constraints from the ontology

–Using Knowledge Base Completion and Validation on top of extracted

(21)

© 2018 IBM Corporation

Inducing Implicit Relations from text using distantly supervised Deep Nets

Michael Glass

1

Alfio Gliozzo

1

Oktie Hassanzadeh

1

Nandana Mihindukulasooriya

12

Gaetano Rossiello

13

1

IBM Research AI

2

Universidad Politécnica de Madrid

3

University of Bari

References

Related documents

diazotized and coupled with naphthalene based acid derivatives and their application on

Decision Feedback Equaliation (DFE) aided wideband Burst- by-Burst (BbB) Adaptive Trellis Coded Modulation (TCM), Turbo Trellis Coded Modulation (TTCM) and Bit-Interleaved

In contrast, the armoured, motile dinoflagellate, Gonyaulax sp was only observed within the near surface layers of the water column at the time associated with the greatest

As panoramic radiographs are based on a para-frontal plane, different inclinations of the incisors between the pre and the post- treatment may result in length changes.

To evaluate the ion-transporting property of the yolk balls, we examined Cl – content and turnover in yolk balls incubated in freshwater and seawater for 48 · h, and

Outlines of the dorsal view of the fish with the position and values of α for finlets 3–5 and the tail are shown for four time points (indicated by the red symbols in Fig. The

In this study, the possibility that the dGIs mediate evasive flying maneuvers was investigated by stimulating individual identified dGIs during flight and measuring the

The role of central chemosensitivity in the control of ventilation in fishes was investigated directly by perfusing a mock extradural fluid (EDF) through the cranial space in