Open PHACTS
“Data integration for all”
Task, workflow and results
AUREUS search
targets: voltage-gated potassium channels
Apply filters
(MW, cLogP, Lipinski + remove undesirable target)
⇒ ~1000 molecules
Similarity searches
(RG, TP, Daylight) Cluster analysis
⇒ ~10000 molecules selected
IonWorks© single shot screening
240 single shot hits progressed into full curve assay
5 full curve actives
(in at least one test occasion)
Series for lead optimisation
Stefan Senger, ca. 2004
Task: create a focussed set to
identify leads against
voltage-gated potassium channels
We (may) know where the data is, but integrating
is a pain, bespoke, and often only for experts
Q: Identify all oxidoreductase inhibitors with an activity <100nM in both mouse and human
Q: The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X.
Q: For a given interaction profile, give me compounds similar to it.
ChEMBL DrugBank Gene
Ontology Wikipathways Uniprot ChemSpider UMLS ConceptWiki ChEBI etc. Internal
The Innovative Medicines Initiative
Biggest public-private partnership in area of medicine
Collaboration between European Commission and European
Federation of Pharmaceutical
Industries and Associations (EFPIA) Promotion of medical innovation in Europe
Tackle key bottlenecks
Recognises “in kind” contributions Focus on key problems
– Efficacy, Safety, Education & Training, Knowledge
Public Domain Drug Discovery Data
Pharma are accessing, processing, storing & re-processing
Literature
PubChemGenbank
Patents Databases
Downloads
Data Integration Data Analysis Firewalled Databases
Why repeat at each company?
Literature
PubChemGenbank
Patents Databases
Downloads
Data Integration Data Analysis Firewalled Databases Literature
PubChemGenbank
Patents Databases
Downloads
Data Integration Data Analysis Firewalled Databases Literature
PubChemGenbank
Patents Databases
Downloads
Data Integration Data Analysis Firewalled Databases
GSK AZ
Pfizer Merck
Information Tombs
– Built for primary use-case
– Tailored indexes
– Tailored GUIs
– Unique language & metadata
– Poor interoperability/integration
Literature HR Synthesis
Portfolio SAR Docs Safety
Pfizer Limited – Coordinator
Universität Wien – Managing entity
Technical University of Denmark
University of Hamburg, Center for Bioinformatics BioSolveIT GmBH
Consorci Mar Parc de Salut de Barcelona Leiden University Medical Centre
Royal Society of Chemistry Vrije Universiteit Amsterdam
Spanish National Cancer Research Centre University of Manchester
Maastricht University Aqnowledge
University of Santiago de Compostela
Rheinische Friedrich-Wilhelms-Universität Bonn AstraZeneca GlaxoSmithKline Esteve Novartis Merck Serono H. Lundbeck A/S Eli Lilly
Netherlands Bioinformatics Centre Swiss Institute of Bioinformatics ConnectedDiscovery
EMBL-European Bioinformatics Institute Janssen
OpenLink
A use-case driven approach, focussed on delivery
for the real world
Main architecture, technical implementation and primary
capabilities driven by a set of prioritised research questions
Based on the main research questions define prioritised data
sources
Develop three Exemplars to demonstrate the capabilites of
the Open PHACTS System and to define interfaces and
Work Streams
Build: Service layer and resource integration
Drive: Development of exemplar work packages & Applications Sustain: Community engagement and long-term sustainability
Assertion & Meta Data Mgmt Transform / Translate Integrator
OPS Service Layer
Corpus 1 ‘Consumer’ Firewall Supplier Firewall Db 2 Db 3 Db 4 Corpus 5 Std Public Vocabularies Target Dossier Compound Dossier Pharmacological Networks Business Rules
Work Stream 1: Open Pharmacological Space (OPS) Service Layer
Standardised software layer to allow public DD resource integration
− Define standards and construct OPS service layer − Develop interface (API) for data access, integration
and analysis
− Develop secure access models
Existing Drug Discovery (DD) Resource Integration
Work Stream 2: Exemplar Drug Discovery Informatics tools
Develop exemplar services to test OPS Service Layer
Target Dossier (Data Integration)
Pharmacological Network Navigator (Data Visualisation) Compound Dossier (Data Analysis)
Platform Explorer
Standards Apps
Number sum Nr of 1 Question
15 12 9 All oxido,reductase inhibitors active <100nM in both human and mouse
18 14 8
Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?
24 13 8 Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine
ADMET profile of actives.
32 13 8 For a given interaction profile, give me compounds similar to it.
37 13 8 The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine
protease assays for molecules that contain substructure X.
38 13 8 Retrieve all experimental and clinical data for a given list of compounds defined by their chemical
structure (with options to match stereochemistry or not).
41 13 8
A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature.
44 13 8 Give me all active compounds on a given target with the relevant assay data
46 13 8 Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)
59 14 8 Identify all known protein-protein interaction inhibitors
Prioritised research questions
` Pathways Pharmacological Activities Biological Processes Transcripts Pathological Processes Diseases Genes Proteins Interactions Clinical Drug Applications Indications Drugs Compounds Chemicals
Open PHACTS will be built upon semantic technologies and standards, providing an opportunity to:
• Demonstrate that semantic technologies can perform to the same degree as existing systems
• Provide an open platform to address common drug discovery questions; expose pharma’s use-cases and knowledge
• Create a pre-competitive infrastructure that can be sustained and
expanded into new areas; providing the platform for future collaboration
Why Semantic Technologies?
• Rapidly developing technology, powerful algorithms for integration and querying of data
• “schema free”
• Open standards – facilitating sharing public, private, commercial • A community of developers, leverage work going on elsewhere
User Interfaces & Applications
Linked Data API
Linked Data Cache
Identity
Mapping
Service
Identity
Resolution
Service
Domain
Specific
Services
Data
Nanopub
Db
VoID
Data Cache
(Triple Store)
Semantic Workflow Engine (LARKC)
Linked Data API
(RDF/XML, TTL, JSON)Domain Specific Services
Open PHACTS
Explorer 1st Gen Apps
App Framework Identity Resolution Service (ConceptWiki) Chemistry Normalisation & Q/C ChemSpider Identifier Management Service (BridgeDb+) Partner Apps Data Import C ore Pl at fo rm P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” Oct. 2012 VoID Db Nanopub Db VoID Db VoID Nanopub VoID
Public Content Commercial
Public Ontologies
User Annotations
Building Quality
High quality chemical names and synonyms. Leverage ChemSpider and Concept wiki curation, Q/C and mapping
ChemSpider Validation and Standardization Platform (CVSP) for flagging chemical representation issues
Basic curation interface for editing concept terms available through Concept Wiki
Data quality issues detected in data sources reported back to depositors for their evaluation
STANDARD_TYPE UNIT_COUNT --- --- AC50 7 Activity 421 EC50 39 IC50 46 ID50 42 Ki 23 Log IC50 4 Log Ki 7 Potency 11 log IC50 0
STANDARD_TYPE STANDARD_UNITS COUNT(*) --- --- --- IC50 nM 829448 IC50 ug.mL-1 41000 IC50 38521 IC50 ug/ml 2038 IC50 ug ml-1 509 IC50 mg kg-1 295
IC50 molar ratio 178
IC50 ug 117 IC50 % 113 IC50 uM well-1 52 IC50 p.p.m. 51 IC50 ppm 36 IC50 uM-1 25 IC50 nM kg-1 25 IC50 milliequivalent 22 IC50 kJ m-2 20
~ 100 units
>5000 types
Implemented using the Quantities, Dimension, Units, Types Ontology (http://www.qudt.org/)
Chemistry within Open PHACTS
The challenges associated with handling chemistry data require the
support of a publicly accessible platform to integrate, standardise and host the data.
ChemSpider, an online database from the Royal Society of Chemistry hosts the chemical compound collection underpinning Open PHACTS and is responsible for standardising the chemical compounds and
providing both regular updates and ongoing data curation.
To serve the Open PHACTS platform, a structure validation and
standardisation platform (CVSP) has been developed to ensure chemical structures are normalised to rules derived from the FDA
structure standardisation guidelines and modified based on input from the EFPIA members.
Identities within Open PHACTS
Open PHACTS integrates information from multiple different databases, many of which use unique identifiers. The
Identity Mapping Service (IMS)
ensures these identifiers are linked and available for use interchangeably throughout the Open PHACTS platform.
To maintain vocabulary heterogeneity and provide interoperability, the
ConceptWiki is used. The ConceptWiki is an open access system that accepts essentially unlimited numbers of synonyms, in multiple languages, and then maps all the terms correctly back to one unique concept identifier, alleviating vocabulary problems and identifier differences.
Synonyms:
Aspirin Dispril
2-Acetoxybenzoic acid Acetyl salicylic acid Salicylic acid, acetyl-
ChemSpider ID: 2157
Explorer
FDA: 16030
ChEBI ID: CHEBI:15365 DrugBank ID: APRD00264
Why Provenance Matters
Using a community specification known as “VoID” (Vocabulary of Interlinked Datasets) Record version, author, derivations
Builds trust with users – know what you are querying (and why it might have changed) Provides mechanism to provide usage statistics back to providers, help them understand the value
Easier to track errors and ensure quality Actively participating in community
What does Open PHACTS do? Currently integrated databases Database Number of triples (million) ACD Labs / ChemSpider 161.34 ChEBI 0.91 ChEMBL_v13 146.08 ConceptWiki 3.74 DrugBank 0.52 Enzyme 0.07 Gene Ontology 0.85 SwissProt 156.57 WikiPathways 0.14 TOTAL 470.21
Open PHACTS draws together multiple sources of publicly-available pharmacological and chemical data, allowing public access to the information via the Open PHACTS Explorer, an intuitive interface.
Licensing: 3 “public” databases
Comparative Toxicogenomics Database OMIM
Drugbank
“CUTTING THE GORDIAN KNOT”
What are the problems with licensing we had to address?
– To make the data and software generated by the project usable and reusable – Multiplicity of unclear or non-standard licenses on original data sources
• ‘Public’ can mean use but not redistribute, use in commercial environment, • Legal position on use and reuse extremely unclear
• Different issues than just linking to data
– What is the legal status of integrated collections of the above, and of derived knowledge from
such a collection?
– Appropriate software license selection – Legal clarity for EFPIA and end users
– Approaches for commercial data integration, EFPIA in-house data
AIM: to enable maximum possible dissemination and usability of the integrated data and architecture generated by the project - with approaches that will be applicable in other data integration projects
Chose John Wilbanks as consultant
A framework built around STANDARD well-understood Creative Commons licences – and how they interoperate Deal with the problems by:
Interoperable licences Appropriate terms
Declare expectations to users and data publishers
One size won‘t fit all requirements
Development partnerships
Influence on API developments
Opportunities to demo ideas & use cases to core team Need MoU and annexe
Associated partners
Support, information
Exchange of ideas, data, technology
Opportunities to demo at ctions, mostommunity webinars Need MoU Associated partners Development partnerships Consortium MoU +Annexe Consortium 28 current members
Open PHACTS and the
scientific community
Example applications
Advanced analytics
ChemBioNavigator Navigating at the interface of chemical and biological data with sorting and plotting options
TargetDossier Interconnecting Open PHACTS with multiple target centric services. Exploring target
similarity using diverse criteria
PharmaTrek Interactive Polypharmacology space of experimental annotations
UTOPIA Semantic enrichment of scientific PDFs
Predictions
GARFIELD Prediction of target pharmacology based on the Similar Ensemble Approach
eTOX collector Automatic extraction of data for building
ChemBioNavigator Matthias Rarey et al
PharmaTrek
Call for expressions of interest
Open PHACTS ENSO proposal
Open PHACTS intends to submit a proposal for IMI ENSO funding.
We are currently drafting our ENSO proposal and invite all EFPIA companies with an interest in Open PHACTS to contact us to discuss
opportunities for involvement.
The Open PHACTS Foundation
Open PHACTS has a successor
organisation, the Open PHACTS Foundation.
Please register your interest with us for further information on membership and other opportunities to get involved within Open PHACTS.
For more information and/or to register interest email us at [email protected]
Acknowledgements
Stefan Senger Gerhard Ecker
Data
Targets; Chemistry; Pharmacology; Literature; Patents
Standards Ontology/taxonomy; Minimum information guide; Dictionaries; Interchange mapping
Assertions e.g. Gene-to-Disease; Compound-to-Target; Compound-to-ADR Application (Knowledge) Fact Visualisation e.g. Target Dossiers;
SAR Visualisation
SERVICES