• No results found

Logical Semantic Warehouse - Developing Your Own Semantic Ecosystem Peter Lawrence, TopQuadrant

N/A
N/A
Protected

Academic year: 2021

Share "Logical Semantic Warehouse - Developing Your Own Semantic Ecosystem Peter Lawrence, TopQuadrant"

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Logical Semantic Warehouse

- Developing Your Own Semantic Ecosystem

Peter Lawrence, TopQuadrant

(2)

Slide 2

Semantic Ecosystem Solution Value Chain

TopQuadrant

Semantic

Ecosystem

... turn data into semantic assets

Enrich

... searching and locating information using EVN to manage vocabularies and

meta-data

Inform

... decision makers with dynamic applications

that are aligned with their business needs

using TBS

Discover

... information throughout distributed

sources using TBI to provide logical data

warehouse/data virtualization

(3)

Slide 4

Metcalfe’s Law

1

of Connectivity

1Metcalfe's law states that the value of a telecommunications network is proportional to the

square of the number of connected users of the system (n2).

• The value of data as knowledge is proportional to the square of the number of other

databases with which it is connected

– Connect Patent database with Research database helps you identify which compounds have no patents

– Connect your Instrument database with your CAD system reveals inconsistent specifications, then connect with the Material Requisition database to reveal that the wrong instrument might be ordered.

To extract full value from

the enterprise data, any

database must be able to

be connected to any other

database.

(4)

Slide 5

(5)

Slide 6

TopBraid Insight Information Bazaar

Empowering the

end-user to answer their

own questions

(6)

Slide 8

Syngenta TBI deployment

Synthesize

Test

Analyse

Design Efficacy Safety

$$$$/cycle $/cycle

Left Hand Shift Valuable data Improved DSTA

(7)

Slide 9

Scoped to 8 primary CQs - examples

• Critical Data Domains Include

• Bio activity of small molecules for a known protein target

• Mammalian & environmental safety data (future need beyond pilot) • IP space for a known chemistry space

• Availability of test material for experimentation • Phys/Chem properties for a given substance

• 8 competency questions some examples below:

ID Competency Question Concepts Properties

C008 All the ChemicalSubstances (and their structure properties) that have activity (of type inhibition) against target 't' with an assay result where assayOrganism is a plant. ChemicalSubstance, Target, Assay, Bioactivity, AssayOrganism hasSubstanceStructure, activityType, assayResult

C011 All protein binding sites of the ChemicalSubstances that are similar to ChemicalSubstance 's' that have activity (of type inhibition) against target 't'

ChemicalSubstance, Bioactivity, Target

activityType, inhibitionType

C013 References to all published documents where ChemicalSubstances that are similar to

Chemicalsubstance 's' with structure property 'sp' are SubjectOfPaper. PatentDocuments, ChemicalSubstance subjectOfPaper, chemicalSimilarity, patentReference

(8)

Slide 10

Success Criteria – Evaluation of value

• The evaluation of the actual value for the scientists

and business in terms was assessed according to:

– Increased efficiency in lead finding, reducing the time

and effort for the scientists

– Improving the quality of generated leads and lead

evaluation by

helping

the scientists

develop insights

into databases and research

data

both internal &

external

– Improved access to the internal data sources by

linking them and providing a uniform way of

accessing them

• Competency questions provide the benchmark to determine the success of data access.

(9)

Now what is the question…

What Chemical-Substances (and their

structure properties) are we working on that

have activity (of type inhibition) against

target 't' with an assay result where

assayOrganism is a plant and this substance

is similar to any patented substance?

(10)

Now the technical problem …

SIDER Chembl Chebi Derwent Patent Index Sample Test Results Bio activity of small molecules Phys/Chem properties Documents /IP ChemSpider

Internal Data Sources

External/Public

Data Sources

What Chemical-Substances (and their structure properties) are we working on that have activity (of type inhibition) against target 't' with an assay result where assayOrganism is a plant and this substance is similar to any patented

(11)

The ‘lift and shift’ solution

SIDER Chembl Chebi Derwent Patent Index Sample Test Results Bio activity of small molecules Phys/Chem properties Documents /IP ChemSpider
(12)

The physical data warehouse solution

SIDER Chembl Chebi Derwent Patent Index Sample Test Results Bio activity of small molecules Phys/Chem properties Documents /IP ChemSpider ETL ETL
(13)

Slide 16

Solution: The Semantic Logical Data

Warehouse Solution*

*According to Gartner Hype Cycle for Information

Infrastructure, 2012, “the Logical Data Warehouse

(LDW) is a new data management architecture for

analytics which combines the strengths of

traditional repository warehouses with alternative

data management and access strategy. The LDW

will form a new best practices by the end of 2015.”

(14)

The implemented Logical Data Warehouse solution:

Step 1 SPARQL/RDF adapters to data-sources

SIDER Chembl Chebi Derwent Patent Index Sample Test Results Bio activity of small molecules Phys/Chem properties Documents /IP ChemSpider

(15)

Slide 18

What Chemical-Substances and their structure properties? What activity (of type

inhibition) against target 't' with an assay result where assayOrganism is a plant? What Chemical-Substances are we working on? What Chemical-Substances is similar to any patented substance? What Chemical-Substances have activity?

The implemented Logical Data Warehouse solution:

Step 2 Query Map-Reduce engine

Map

Reduce

Concept Map

Substances Activities Patents Organisms Targets

What Chemical-Substances (and their structure properties) are we working on that have activity (of type inhibition) against target 't' with an assay result where assayOrganism is a plant and this substance is similar to any patented substance?

(16)

The implemented Logical Data Warehouse solution”

Step 3 Browsing the virtualized cube

SIDER Chembl Chebi Derwent Patent Index Sample Test Results Bio activity of small molecules Phys/Chem properties Documents /IP ChemSpider Map -R ed u ce Map -R ed u ce

(17)

What a LDW looks like to a user:

Formulating the question

A user can use a form that is dynamically created to align with the concept that is being searched. Changes to the underlying model are

automatically reflected in this form.

For example a Substance with MW <130 and Activity associated with %skin% less than 0.5

(18)

What a LDW looks like to a user:

Searching all of the datasources

The query is then executed and results are returned in a matter of a few seconds.

(19)

What a LDW looks like to a user:

Discovering details

The user can then explore these results which have been

drawn from multiple data-sources;

anything to do with this concept can be retrieved into the logical data

warehouse. Note that can mean a lot of

information. Typically a concept might be pulling together 100+ facts.

(20)

What a LDW looks like to a user:

Exploring related concepts

The data might include cross references to other concepts, such as activities, targets, patents, documents etc which appear as links. So a user can explore this cross-referenced

information, expanding the contents of the logical data warehouse much like a snowflake navigation through a physical data warehouse. This can continue indefinitely.

(21)

What a LDW looks like to a user:

Exploring external links

The user can also link to external sites (note external data is treated just like

internal data: it is merged into the query results). This is visual integration.

(22)

Slide 26

Success Criteria – Evaluation of value

• The evaluation of the actual value for the scientists and

business in terms was assessed according to:

 Increased efficiency in lead

finding, reducing the time and effort for the scientists

 Improving the quality of

generated leads and lead

evaluation by helping the

scientists develop insights into

databases and research data

both internal & external

 Improved access to the internal

data sources by linking them and providing a uniform way of accessing them

 Competency questions provide the benchmark to determine the success of data access.

(23)

Semantic Logical Data Warehouse Benefits

• Rapid configuration of new data sources.

• Resolves data variety.

• Eliminates data replication.

(24)

28

telecommunications network proportional to the square ,

References

Related documents

The survey results indicated that there was no working relationship between 61 percent of all private sector agricultural research firms and Pakistan’s public sector research

Sejalan dengan Koster (Wismayana, 2007: 2) dalam penelitiannya menemukan bahwa sekolah belum berhasil berperan sebagai wahana yang memadai dalam membentuk konsep diri

1) Er wordt geen rekening gehouden met de architectuur in de IT aankoopstrategie van de organisatie. 2) Minimale betrokkenheid van de aankoopdienst (1) in het

In this dissertation, CoMP network operation is studied, where the BSs connected within a cooperation cluster are prompted to cooperate with each other with the objective of

Studies have found economic incentives that cause excess intake of calories such as, decrease in price of calorie-dense foods or per unit calorie, increased opportunity costs of meal

Mean distribution of morbid states of D-type (depres- sion, dysthymia, dysphoric mixed-states; black bars) or M-type (mania, hypomania, psychosis; white bars) during two years

The purpose of this two hour CE course is to provide an overview of the professional aspects of the Certified Nursing Assistant's (CNAs) role and to explore the importance

workflows and the inability to access accurate data when needed meant that prior to its ITSM overhaul, Jason’s Deli’s technology management department was reactive in addition to