• No results found

Open PHACTS Data integration for all. Andrew Leach

N/A
N/A
Protected

Academic year: 2021

Share "Open PHACTS Data integration for all. Andrew Leach"

Copied!
33
0
0

Loading.... (view fulltext now)

Full text

(1)

Open PHACTS

“Data integration for all”

(2)

Task, workflow and results

AUREUS search

targets: voltage-gated potassium channels

Apply filters

(MW, cLogP, Lipinski + remove undesirable target)

⇒ ~1000 molecules

Similarity searches

(RG, TP, Daylight) Cluster analysis

⇒ ~10000 molecules selected

IonWorks© single shot screening

240 single shot hits progressed into full curve assay

5 full curve actives

(in at least one test occasion)

Series for lead optimisation

Stefan Senger, ca. 2004

Task: create a focussed set to

identify leads against

voltage-gated potassium channels

(3)

We (may) know where the data is, but integrating

is a pain, bespoke, and often only for experts

Q: Identify all oxidoreductase inhibitors with an activity <100nM in both mouse and human

Q: The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X.

Q: For a given interaction profile, give me compounds similar to it.

ChEMBL DrugBank Gene

Ontology Wikipathways Uniprot ChemSpider UMLS ConceptWiki ChEBI etc. Internal

(4)

The Innovative Medicines Initiative

Biggest public-private partnership in area of medicine

Collaboration between European Commission and European

Federation of Pharmaceutical

Industries and Associations (EFPIA) Promotion of medical innovation in Europe

Tackle key bottlenecks

Recognises “in kind” contributions Focus on key problems

– Efficacy, Safety, Education & Training, Knowledge

(5)

Public Domain Drug Discovery Data

Pharma are accessing, processing, storing & re-processing

Literature

PubChemGenbank

Patents Databases

Downloads

Data Integration Data Analysis Firewalled Databases

Why repeat at each company?

Literature

PubChemGenbank

Patents Databases

Downloads

Data Integration Data Analysis Firewalled Databases Literature

PubChemGenbank

Patents Databases

Downloads

Data Integration Data Analysis Firewalled Databases Literature

PubChemGenbank

Patents Databases

Downloads

Data Integration Data Analysis Firewalled Databases

GSK AZ

Pfizer Merck

(6)

Information Tombs

– Built for primary use-case

– Tailored indexes

– Tailored GUIs

– Unique language & metadata

– Poor interoperability/integration

Literature HR Synthesis

Portfolio SAR Docs Safety

(7)

Pfizer Limited – Coordinator

Universität Wien – Managing entity

Technical University of Denmark

University of Hamburg, Center for Bioinformatics BioSolveIT GmBH

Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre

Royal Society of Chemistry Vrije Universiteit Amsterdam

Spanish National Cancer Research Centre University of Manchester

Maastricht University Aqnowledge

University of Santiago de Compostela

Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca GlaxoSmithKline Esteve Novartis Merck Serono H. Lundbeck A/S Eli Lilly

Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery

EMBL-European Bioinformatics Institute Janssen

OpenLink

(8)

A use-case driven approach, focussed on delivery

for the real world

Main architecture, technical implementation and primary

capabilities driven by a set of prioritised research questions

Based on the main research questions define prioritised data

sources

Develop three Exemplars to demonstrate the capabilites of

the Open PHACTS System and to define interfaces and

(9)

Work Streams

Build: Service layer and resource integration

Drive: Development of exemplar work packages & Applications Sustain: Community engagement and long-term sustainability

Assertion & Meta Data Mgmt Transform / Translate Integrator

OPS Service Layer

Corpus 1 ‘Consumer’ Firewall Supplier Firewall Db 2 Db 3 Db 4 Corpus 5 Std Public Vocabularies Target Dossier Compound Dossier Pharmacological Networks Business Rules

Work Stream 1: Open Pharmacological Space (OPS) Service Layer

Standardised software layer to allow public DD resource integration

− Define standards and construct OPS service layer − Develop interface (API) for data access, integration

and analysis

− Develop secure access models

Existing Drug Discovery (DD) Resource Integration

Work Stream 2: Exemplar Drug Discovery Informatics tools

Develop exemplar services to test OPS Service Layer

Target Dossier (Data Integration)

Pharmacological Network Navigator (Data Visualisation) Compound Dossier (Data Analysis)

(10)

Platform Explorer

Standards Apps

(11)

Number sum Nr of 1 Question

15 12 9 All oxido,reductase inhibitors active <100nM in both human and mouse

18 14 8

Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?

24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine

ADMET profile of actives.

32 13 8 For a given interaction profile, give me compounds similar to it.

37 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine

protease assays for molecules that contain substructure X.

38 13 8 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical

structure (with options to match stereochemistry or not).

41 13 8

A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature.

44 13 8 Give me all active compounds on a given target with the relevant assay data

46 13 8 Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)

59 14 8 Identify all known protein-protein interaction inhibitors

Prioritised research questions

(12)

` Pathways Pharmacological Activities Biological Processes Transcripts Pathological Processes Diseases Genes Proteins Interactions Clinical Drug Applications Indications Drugs Compounds Chemicals

(13)

Open PHACTS will be built upon semantic technologies and standards, providing an opportunity to:

• Demonstrate that semantic technologies can perform to the same degree as existing systems

• Provide an open platform to address common drug discovery questions; expose pharma’s use-cases and knowledge

• Create a pre-competitive infrastructure that can be sustained and

expanded into new areas; providing the platform for future collaboration

Why Semantic Technologies?

• Rapidly developing technology, powerful algorithms for integration and querying of data

• “schema free”

• Open standards – facilitating sharing public, private, commercial • A community of developers, leverage work going on elsewhere

(14)

User Interfaces & Applications

Linked Data API

Linked Data Cache

Identity

Mapping

Service

Identity

Resolution

Service

Domain

Specific

Services

Data

(15)

Nanopub

Db

VoID

Data Cache

(Triple Store)

Semantic Workflow Engine (LARKC)

Linked Data API

(RDF/XML, TTL, JSON)

Domain Specific Services

Open PHACTS

Explorer 1st Gen Apps

App Framework Identity Resolution Service (ConceptWiki) Chemistry Normalisation & Q/C ChemSpider Identifier Management Service (BridgeDb+) Partner Apps Data Import C ore Pl at fo rm P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” Oct. 2012 VoID Db Nanopub Db VoID Db VoID Nanopub VoID

Public Content Commercial

Public Ontologies

User Annotations

(16)

Building Quality

High quality chemical names and synonyms. Leverage ChemSpider and Concept wiki curation, Q/C and mapping

ChemSpider Validation and Standardization Platform (CVSP) for flagging chemical representation issues

Basic curation interface for editing concept terms available through Concept Wiki

Data quality issues detected in data sources reported back to depositors for their evaluation

(17)

STANDARD_TYPE UNIT_COUNT --- --- AC50 7 Activity 421 EC50 39 IC50 46 ID50 42 Ki 23 Log IC50 4 Log Ki 7 Potency 11 log IC50 0

STANDARD_TYPE STANDARD_UNITS COUNT(*) --- --- --- IC50 nM 829448 IC50 ug.mL-1 41000 IC50 38521 IC50 ug/ml 2038 IC50 ug ml-1 509 IC50 mg kg-1 295

IC50 molar ratio 178

IC50 ug 117 IC50 % 113 IC50 uM well-1 52 IC50 p.p.m. 51 IC50 ppm 36 IC50 uM-1 25 IC50 nM kg-1 25 IC50 milliequivalent 22 IC50 kJ m-2 20

~ 100 units

>5000 types

Implemented using the Quantities, Dimension, Units, Types Ontology (http://www.qudt.org/)

(18)

Chemistry within Open PHACTS

The challenges associated with handling chemistry data require the

support of a publicly accessible platform to integrate, standardise and host the data.

ChemSpider, an online database from the Royal Society of Chemistry hosts the chemical compound collection underpinning Open PHACTS and is responsible for standardising the chemical compounds and

providing both regular updates and ongoing data curation.

To serve the Open PHACTS platform, a structure validation and

standardisation platform (CVSP) has been developed to ensure chemical structures are normalised to rules derived from the FDA

structure standardisation guidelines and modified based on input from the EFPIA members.

(19)
(20)

Identities within Open PHACTS

Open PHACTS integrates information from multiple different databases, many of which use unique identifiers. The

Identity Mapping Service (IMS)

ensures these identifiers are linked and available for use interchangeably throughout the Open PHACTS platform.

To maintain vocabulary heterogeneity and provide interoperability, the

ConceptWiki is used. The ConceptWiki is an open access system that accepts essentially unlimited numbers of synonyms, in multiple languages, and then maps all the terms correctly back to one unique concept identifier, alleviating vocabulary problems and identifier differences.

Synonyms:

Aspirin Dispril

2-Acetoxybenzoic acid Acetyl salicylic acid Salicylic acid, acetyl-

ChemSpider ID: 2157

Explorer

FDA: 16030

ChEBI ID: CHEBI:15365 DrugBank ID: APRD00264

(21)

Why Provenance Matters

Using a community specification known as “VoID” (Vocabulary of Interlinked Datasets) Record version, author, derivations

Builds trust with users – know what you are querying (and why it might have changed) Provides mechanism to provide usage statistics back to providers, help them understand the value

Easier to track errors and ensure quality Actively participating in community

(22)

What does Open PHACTS do? Currently integrated databases Database Number of triples (million) ACD Labs / ChemSpider 161.34 ChEBI 0.91 ChEMBL_v13 146.08 ConceptWiki 3.74 DrugBank 0.52 Enzyme 0.07 Gene Ontology 0.85 SwissProt 156.57 WikiPathways 0.14 TOTAL 470.21

Open PHACTS draws together multiple sources of publicly-available pharmacological and chemical data, allowing public access to the information via the Open PHACTS Explorer, an intuitive interface.

(23)

Licensing: 3 “public” databases

Comparative Toxicogenomics Database OMIM

Drugbank

(24)

“CUTTING THE GORDIAN KNOT”

What are the problems with licensing we had to address?

– To make the data and software generated by the project usable and reusable – Multiplicity of unclear or non-standard licenses on original data sources

• ‘Public’ can mean use but not redistribute, use in commercial environment, • Legal position on use and reuse extremely unclear

• Different issues than just linking to data

– What is the legal status of integrated collections of the above, and of derived knowledge from

such a collection?

– Appropriate software license selection – Legal clarity for EFPIA and end users

– Approaches for commercial data integration, EFPIA in-house data

AIM: to enable maximum possible dissemination and usability of the integrated data and architecture generated by the project - with approaches that will be applicable in other data integration projects

(25)

Chose John Wilbanks as consultant

A framework built around STANDARD well-understood Creative Commons licences – and how they interoperate Deal with the problems by:

Interoperable licences Appropriate terms

Declare expectations to users and data publishers

One size won‘t fit all requirements

(26)

Development partnerships

Influence on API developments

Opportunities to demo ideas & use cases to core team Need MoU and annexe

Associated partners

Support, information

Exchange of ideas, data, technology

Opportunities to demo at ctions, mostommunity webinars Need MoU Associated partners Development partnerships Consortium MoU +Annexe Consortium 28 current members

Open PHACTS and the

scientific community

(27)

Example applications

Advanced analytics

ChemBioNavigator Navigating at the interface of chemical and biological data with sorting and plotting options

TargetDossier Interconnecting Open PHACTS with multiple target centric services. Exploring target

similarity using diverse criteria

PharmaTrek Interactive Polypharmacology space of experimental annotations

UTOPIA Semantic enrichment of scientific PDFs

Predictions

GARFIELD Prediction of target pharmacology based on the Similar Ensemble Approach

eTOX collector Automatic extraction of data for building

(28)

ChemBioNavigator Matthias Rarey et al

PharmaTrek

(29)

Call for expressions of interest

Open PHACTS ENSO proposal

Open PHACTS intends to submit a proposal for IMI ENSO funding.

We are currently drafting our ENSO proposal and invite all EFPIA companies with an interest in Open PHACTS to contact us to discuss

opportunities for involvement.

The Open PHACTS Foundation

Open PHACTS has a successor

organisation, the Open PHACTS Foundation.

Please register your interest with us for further information on membership and other opportunities to get involved within Open PHACTS.

For more information and/or to register interest email us at [email protected]

(30)

Acknowledgements

Stefan Senger Gerhard Ecker

(31)
(32)

Data

Targets; Chemistry; Pharmacology; Literature; Patents

Standards Ontology/taxonomy; Minimum information guide; Dictionaries; Interchange mapping

Assertions e.g. Gene-to-Disease; Compound-to-Target; Compound-to-ADR Application (Knowledge) Fact Visualisation e.g. Target Dossiers;

SAR Visualisation

SERVICES

(33)

Nanopublications – Capturing scientific information in

the Triple Store

References

Related documents

Among the most promising techniques for treating aqueous mixed wastes are ultraviolet light (UV) oxidation and the molten salt process, as opposed to methods like

Phillips (1958) described the objective of his study as follows: “to see whether statistical evidence supports the hypothesis that the rate of change of money wage rates in the

Tujuan yang akan dicapai dalam penelitian ini antara lain : Menganalisis data yang besar, membantu memberikan informasi dari data ekspor yang diolah untuk

Social media has provided people a platform to express their thoughts and emotions that the mainstream media fails to do so as found by Laato, Islam, Islam, and Whelan ( 2020 ).

[r]

It is in these portions of northern Chukhung crater that the landform associations providing strongest evidence for inverted paleochannel origins of the N1 sinuous ridges

Based on the data submitted by Teva UK Limited, Omega 3-acid-ethyl esters 1000mg Soft Capsules were considered to be a generic version of the UK reference product, Omacor

To conclude, The Simpsons, like the song lyrics of the Rolling Stones, depict a clash of the ideals of traditional gender roles and the new liberated woman.... Unlike the