• No results found

UZH BIO390 Semantic web, RDF, Ontologies and Knowledge Graphs in biomedical sciences

N/A
N/A
Protected

Academic year: 2022

Share "UZH BIO390 Semantic web, RDF, Ontologies and Knowledge Graphs in biomedical sciences"

Copied!
56
0
0

Loading.... (view fulltext now)

Full text

(1)

UZH BIO390

Semantic web, RDF, Ontologies and Knowledge Graphs in biomedical sciences

Ahmad Aghaebrahimian

Zurich University of Applied Sciences

[email protected]

(2)

- Ahmad Aghaebrahimian

- Research Associate at ZHAW

- Ph.D. Computer Sciences focusing on Computational Linguistics

- Area of interests: Machine Learning Deep Neural Networks Biomedical text analytics Natural Language Processing Semantic Web

Email: [email protected]

Introduction

(3)

- Introduction

- Stack of standards (URI, XML, RDF, SPARQL, OWL, …) - RDF: Entities and Relationships

- Ontology

- Knowledge graphs

Session Content

(4)

LOF

ART1

Melanoma

(5)

LOF

ART1

Melanoma

Melanoma

Melanoma Tumors

Same_as

(6)

LOF

ART1

Melanoma

Melanoma

Melanoma Tumors

Same_as

Caused_by

Type

UV

Melanoma Tumors

Cancer

(7)

LOF

ART1

Melanoma

Melanoma

Melanoma Tumors

Same_as

Caused_by

Type

UV

Melanoma Tumors

Cancer Disable

ADP-ribosyltransferase 1

UV

(8)

The linked open data

• Linked open data example

LOF

ART1

Melanoma

Melanoma

Melanoma Tumors

Same_as

Caused_by

Type

UV

Melanoma Tumors

Cancer Disable

ADP-ribosyltransferase 1

UV Same_as

ART1

ADP-ribosyltransferase 1

(9)

The linked open data cloud

(10)

The life sciences data cloud

(11)

Basics of the web

- Web structure:

Server vs. Client

(12)

Basics of the web

- Web structure:

Server vs. Client

- Web Components:

Uniform Resource Locator (URL): identify document

Hypertext Markup Language (HTML): access document

Hypertext Transfer Protocol (HTTP): transfer document

(13)

Basics of the web

- Web structure:

Server vs. Client

- Web Components:

Uniform Resource Locator (URL): identify document Hypertext Markup Language (HTML): access document Hypertext Transfer Protocol (HTTP): transfer document

- Moving from pages to resources

Interactive web, Web 2.0 or semantic web

(14)

Semantic Web

(15)

Semantic Web

What?

Semantic Web (SW) is an extension of the World Wide Web that uses the Resource

Description Framework (RDF) and Web Ontology Language (OWL), among other

standards, to make the Internet machine-readable.

(16)

Semantic Web

What?

Semantic Web (SW) is an extension of the World Wide Web that uses the Resource

Description Framework (RDF) and Web Ontology Language (OWL), among other

standards, to make the Internet machine-readable.

(17)

Semantic Web

What?

Semantic Web (SW) is an extension of the World Wide Web that uses the Resource Description Framework (RDF) and Web Ontology Language (OWL), among other standards, to make the Internet machine-readable.

Why?

(18)

Semantic Web

What?

Semantic Web (SW) is an extension of the World Wide Web that uses the Resource Description Framework (RDF) and Web Ontology Language (OWL), among other standards, to make the Internet machine-readable.

Why?

- Presenting knowledge about data

(19)

Semantic Web

What?

Semantic Web (SW) is an extension of the World Wide Web that uses the Resource Description Framework (RDF) and Web Ontology Language (OWL), among other standards, to make the Internet machine-readable.

Why?

- Presenting knowledge about data

(20)

Semantic Web

What?

Semantic Web (SW) is an extension of the World Wide Web that uses the Resource Description Framework (RDF) and Web Ontology Language (OWL), among other standards, to make the Internet machine-readable.

Why?

- Presenting knowledge about data

- Allowing data integration from data silos

(21)

Semantic Web

What?

Semantic Web (SW) is an extension of the World Wide Web that uses the Resource Description Framework (RDF) and Web Ontology Language (OWL), among other standards, to make the Internet machine-readable.

Why?

- Presenting knowledge about data

- Allowing data integration from data silos

- Introduce intelligence to systems

(22)

Semantic Web Standards

URI:

What is a Resource?

URL → URI → IRI

Physically located → conceptually identified → conceptually identified in all languages

(23)

Semantic Web Standards

URI:

What is a Resource?

URL → URI → IRI

Physically located → conceptually identified → conceptually identified in all languages

XML:

Open family of languages represent structured data using tags and in textual format Rules:

- Only one root <root> </root>

- Opening with closing <Gene></Gene>

- no tag begin with number or xml - Case sensitive <Gene> != <gene>

- Order matters <Gene> <nucl> </nucl></Gene>

- Tags may have attributes <Gene inherited=’true’ />

(24)

Semantic Web Standards

OWL:

OWL provides a rich vocabulary to add semantics and context and allow reasoning and inference

(25)

Ontology

• Ontology is

• A model of a domain

• A vocabulary consisting of classes and properties

• Machine-readable knowledge representation

(26)

Ontology

• Ontology is

• A model of a domain

• A vocabulary consisting of classes and properties

• Machine-readable knowledge representation

• How to build an ontology?

• Define a domain

• Define the classes and properties

• Extend existing ontology (RDF schema, dbpedia,...)

(27)

Ontology

• Ontology is

• A model of a domain

• A vocabulary consisting of classes and properties

• Machine-readable knowledge representation

• How to build an ontology?

• Define a domain

• Define the classes and properties

• Extend existing ontology (RDF schema, dbpedia,...)

• Benefits of an ontology in Biomedical research? (And why they are important)

• Data integration

• Language processing via domain vocabulary

• Defining the precise meaning of classes

• Automated processing

(28)

Ontology Continued

• Ontology as a set of:

• Definitions

• Terms and their synonyms

• Relationships

(29)

Ontology Continued

• Ontology as a set of:

• Definitions

• Terms and their synonyms

• Relationships

• OBO : ChEBI Access via : ‘https://github.zhaw.ch/agha/D-Heath’

[Term]

id: CHEBI:60871 name: selenium(2+)

def: "The selenium ion with two positive charges." []

synonym: "Se(2+)" RELATED [UniProt:]

synonym: "selenium dication" RELATED [ChEBI:]

synonym: "Se2+" RELATED [SUBMITTER:]

synonym: "Se" RELATED FORMULA [ChEBI:]

synonym: "[Se++]" RELATED SMILES [ChEBI:]

synonym: "InChI=1S/Se/q+2" RELATED InChI [ChEBI:]

synonym: "InChIKey=MFSBVGSNNPNWMD-UHFFFAOYSA-N" RELATED InChIKey [ChEBI:]

is_a: CHEBI:60250 is_a: CHEBI:30412 [Term]

id: CHEBI:60250 name: selenium ion

def: "A selenium atom having a net electric charge." []

is_a: CHEBI:36904 is_a: CHEBI:36914

(30)

Ontology Continued

• Ontology as a set of:

• Definitions

• Terms and their synonyms

• Relationships

• OBO : ChEBI Access via : ‘https://github.zhaw.ch/agha/D-Heath’

• UMLS:

• Metathesaurus

• Semantic network

• Specialized Lexicon

[Term]

id: CHEBI:60871 name: selenium(2+)

def: "The selenium ion with two positive charges." []

synonym: "Se(2+)" RELATED [UniProt:]

synonym: "selenium dication" RELATED [ChEBI:]

synonym: "Se2+" RELATED [SUBMITTER:]

synonym: "Se" RELATED FORMULA [ChEBI:]

synonym: "[Se++]" RELATED SMILES [ChEBI:]

synonym: "InChI=1S/Se/q+2" RELATED InChI [ChEBI:]

synonym: "InChIKey=MFSBVGSNNPNWMD-UHFFFAOYSA-N" RELATED InChIKey [ChEBI:]

is_a: CHEBI:60250 is_a: CHEBI:30412 [Term]

id: CHEBI:60250 name: selenium ion

def: "A selenium atom having a net electric charge." []

is_a: CHEBI:36904 is_a: CHEBI:36914

(31)

Semantic Web Standards

RDF:

RDF is a graph-based data model and the set of syntax that allows us to write description about the resources on

the web and to exchange them. It presents data in the triple format and gives it structures and unique identifiers so

that data can be easily linked

(32)

Semantic Web Standards

RDF:

RDF is a graph-based data model and the set of syntax that allows us to write description about the resources on the web and to exchange them. It presents data in the triple format and gives it structures and unique identifiers so that data can be easily linked.

Principles:

Triple structure: (subject, predicate, object) - subject → a URI resource

- predicate → binary type URI - object → a URI resource or literal Predicates are labeled

Predicates are directed

RDF is a graph model

(33)

Semantic Web Standards

RDF:

RDF is a graph-based data model and the set of syntax that allows us to write description about the resources on the web and to exchange them. It presents data in the triple format and gives it structures and unique identifiers so that data can be easily linked.

Principles:

Triple structure: (subject, predicate, object) - subject → a URI resource

- predicate → binary type URI - object → a URI resource or literal Predicates are labeled

Predicates are directed RDF is a graph model RDF serialization:

XML, N-triple, Turtle, TriG, JSON-LD

(34)

The Graph data model

• Storing data in form of triplets: (Subject, Predicate, Object) e.g. (ART, LOF, Melanoma_Tumors)

Subject and Predicate must be in URI form

(35)

The Graph data model

• Storing data in form of triplets: (Subject, Predicate, Object) e.g. (ART, LOF, Melanoma_Tumors)

Subject and Predicate must be in URI form

• Triplets follow the RDF standard.

• Triplets are easily Expanded and Interlinked.

• Triplets can be queried via SPARQL:

(36)

The Graph data model

• Storing data in form of triplets: (Subject, Predicate, Object) e.g. (ART, LOF, Melanoma_Tumors)

Subject and Predicate must be in URI form

• Triplets follow the RDF standard.

• Triplets are easily Expanded and Interlinked.

• Triplets can be queried via SPARQL:

SELECT ?gene ?relation WHERE {

?gene ?relation Melanoma_Tumors .

}

(37)

Subjects, Predicates, Objects

Named Entity Relationship

NLP components:

• Named Entity Recognition (NER) • Named Entity Disambiguation (NED) • Relation Extraction (RE)

(38)

Named Entity Recognition (NER)

B O B O B O O B O O O

Conditional Random Fields Long Short-Term Memory Convolutional Neural Network

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood.

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood.

B O B O B O O B O O O

(39)

NER Evaluation:

Accuracy:

(40)

NER Evaluation:

Accuracy:

F1 score:

Example: Cancer Diagnostics

Cancer No-cancer

Cancer TP FP

No-cancer FN TN

True labels

P re d ic te d la be ls

(41)

NER Evaluation:

Accuracy:

F1 score:

Example: Cancer Diagnostics

Cancer No-cancer

Cancer TP FP

No-cancer FN TN

True labels

P re d ic te d la be ls

(42)

NER Evaluation:

Accuracy:

F1 score:

Example: Cancer Diagnostics

Cancer No-cancer

Cancer TP FP

No-cancer FN TN

True labels

P re d ic te d la be ls

(43)

CHEBI:39548

lowers

CHEBI:47774

and

IUPAC:46823

and raises

CHEBI:47775

in the blood.

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood.

Named Entity Disambiguation (NED)

(44)

CHEBI:39548

lowers

CHEBI:47774

and

IUPAC:46823

and raises

CHEBI:47775

in the blood.

Problem:

- Different order

- Morphological forms - Synonymous names - Abbreviation

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood.

Named Entity Disambiguation (NED)

(45)

CHEBI:39548

lowers

CHEBI:47774

and

IUPAC:46823

and raises

CHEBI:47775

in the blood.

Problem:

- Different order

- Morphological forms - Synonymous names - Abbreviation

True match Entity False match

LSTM LSTM LSTM

Attention Attention Attention

Hinge Loss

Aghaebrahimian, A., Cieliebak, M.(2020), Named Entity Disambiguation at Scale, ANNPR, Winterthur, Switzerland

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood.

Named Entity Disambiguation (NED)

(46)

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood.

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood.

Relation Extraction (RE)

(47)

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood.

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood.

Single hop RE:

- Atorvastatin, LDL => lowers - Atorvastatin, triglycerides => lowers - Atorvastatin, HDL => raises - LDL, triglycerides => None - LDL, HDL => None

Atorvastatin lowers LDL and triglycerides and raises HDL in the blood

Embedding CNN Classifier

Embedding

}

Encode(.)

Aghaebrahimian, A. and Jurcicek, F., (2016), Open-domain Factoid Question Answering via Knowledge Graph Search, NAACL, San Diego, USA

Relation Extraction (RE)

(48)

Knowledge Graph

Collection of billions of triplet graph structures known as Assertion modeled in the RDF model

(49)

Knowledge Graph

Collection of billions of triplet graph structures known as Assertion modeled in the RDF model

(ART , LOF, Melanoma Tumors) (ego-3 , REG , paralysis) (ego-3 , REG , sterility)

(STAT1 , GOF , immunodeficiency and autoimmunity)

(50)

Knowledge Graph

Collection of billions of triplet graph structures known as Assertion modeled in the RDF model

(ART , LOF, Melanoma Tumors) (ego-3 , REG , paralysis) (ego-3 , REG , sterility)

(STAT1 , GOF , immunodeficiency and autoimmunity)

Which genes are related to paralysis?

How STAT1 impacts the immune system?

(51)

Knowledge Graph

Collection of billions of triplet graph structures known as Assertion modeled in the RDF model

(ART , LOF, Melanoma Tumors) (ego-3 , REG , paralysis) (ego-3 , REG , sterility)

(STAT1 , GOF , immunodeficiency and autoimmunity)

Which genes are related to paralysis?

How STAT1 impacts the immune system?

What proteins are associated with adverse events caused by Fulvestrant?

Fulvestrant

causes

events

associated

Protein A

Protein B

(52)

The linked open data

• Linked open data example

LOF

ART1

Melanoma

Melanoma

Melanoma Tumors

Same_as

Caused_by

Type

UV

Melanoma Tumors

Cancer Disable

ADP-ribosyltransferase 1

UV Same_as

ART1

ADP-ribosyltransferase 1

(53)

The linked open data

• Linked open data example

LOF

ART1

Melanoma

Melanoma

Melanoma Tumors

Same_as

Caused_by

Type

UV

Melanoma Tumors

Cancer Disable

ADP-ribosyltransferase 1

UV Same_as

ART1

ADP-ribosyltransferase 1

Question: How do we know that the dotted entities are the same entities.

(54)

Semantic Web tools

- RDFa:

Extracting triples from HTML pages via markups https://rdfa.info/play/

- Gleaning Resource Descriptions from Dialects of Languages (GRDDL):

Algorithms instead of markups

<link rel="transformation" href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />

- JSON for Linked Data: JSON-LD

Attaching context to JSON files

- R2RML: Transforming tables to RDF

(55)

SPARQL

W3C standard

SPARQL Protocol And RDF Query Language

Lab work: https://bit.ly/3wjyHpf

(56)

Life Sciences RDF data and SPARQL Endpoints

A SPARQL endpoint gets queries and returns their results using HTTP protocol

• Generic

- http://sparql.org/sparql.html - http://demo.openlinksw.com/sparql

• Specific

• Dbpedia

- https://dbpedia.org/sparql

• SIB Swiss Institute of Bioinformatics - UniProt: http://sparql.uniprot.org - neXtProt: http://snorql.nextprot.org

• EBI European Bioinformatics Institute:

- BioSamples, BioModels, ChEMBL, Expression Atlas, Reactome, Ensembl - https://www.ebi.ac.uk/rdf/services/sparql

• NCBI National Center for Biotechnology Information:

- PubChemRDF (rdf only, no SPARQL endpoint) - https://pubchem.ncbi.nlm.nih.gov/rdf/

• http://sparql-playground.sib.swiss/

References

Related documents

Their comments were analysed by means of a qualitative content analysis and revealed that the experienced teachers independently used the wide range of principles of

The aim of our study was to examine how the prediction model of prolonged postoperative ventilation can change within the same postoperative ICU in two adjacent and relatively

Finally, it was shown that the combination in the same molecule of peptides derived from filaggrin (cyclic peptide cfc1cyc) and from the alpha chain of fibrin

The fine grain configuration of the High Level Trigger farm with the control system developed for the usage of the farm for offline tasks, and the usage of CVMFS for the

Hence, the purpose of this study was to identify factors that are consistent and valid in driving the job satisfaction of private hospital nurses in the Klang Valley which lead

The block diagram of the linear automatic voltage regulator (AVR) with the PID controller and using Particle swarm Optimization (PSO) and Moth Flame Optimization

Die Tatsache, dass aus den beiden Behandlungsgruppen Propofol und Sevofluran im Gegensatz zur Kontroll-Gruppe lediglich ein einziges Tier eosinophil markierte

Notre étude a montré une baisse significative du calcium séminal chez les témoins par rapport aux patients, alors que la concentration en magnésium est significativement plus