• No results found

Nathan Brown. The Application of Consensus Modelling and Genetic Algorithms to Interpretable Discriminant Analysis.

N/A
N/A
Protected

Academic year: 2021

Share "Nathan Brown. The Application of Consensus Modelling and Genetic Algorithms to Interpretable Discriminant Analysis."

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Nathan Brown

The Application of Consensus

Modelling and Genetic Algorithms

to Interpretable Discriminant

Analysis

[email protected]

(2)

Discriminant Analysis Using a GA

Predictive

vs.

Diagnostic Modelling

Discriminant Analysis with a Genetic Algorithm

Consensus and Splice Modelling

Experimental Studies

¾MDDR: 1130 renin and 636 COX inhibitors1

(3)

Predictive

versus

Diagnostic Models

• Highly predictive models tend to

obfuscate what is important for the property being modelled

• Highly interpretable models tend

to be less effective in prediction power

• However, both objectives are

very important

• We want highly predictive

models that can also guide our decision-making processes

(4)

Discriminant Analysis

Supervised learning

¾ Dependent variable is known for dataset and used in training the

model with the independent variables

Optimize separation of classes

¾ Evolve weights for binned descriptors

¾ Score solutions according to ability to separate objects

Discover descriptor ranges that are important for

discrimination which can then be applied to make informed

decisions

(5)

Chromosome Encoding

Selection of

N

descriptors

Each descriptor partitioned into

B

i

bins

Each bin can take any value in the range {0…

W

}

Chromosome length is then (

N

·

B

i

)

ClogP PSA

MW

N = 3

(6)

Descriptor Selection

Calculate physicochemical descriptors

Cluster descriptors (not objects)

Select descriptors that are:

¾

more orthogonal, and

¾

more interpretable for the medicinal chemist

(7)

Fitness Functions 1

Initial Enhancement (IE)

¾ Emphasises enrichment in top NACT% of recalled molecules

¾ i.e. mean rank of all actives recalled after NACT

Global Enhancement (GE)

¾ Emphasises enrichment of all actives in recalled molecules

¾ i.e. mean rank of all actives

Maximum Difference Enhancement (MDE)

¾ Emphasises maximum difference in scores between the two

(8)

Fitness Functions 2

Existing fitness function used a combination of

evaluations:

• Number of actives in the top N%

• Average rank of actives over entire rank

Maximised Difference Enhancement (MDE)

• The difference of the average rank of the two classes being

discriminated

MDE will tend to result in molecules where the separation

between the two classes is maximised globally and

rewarding ranks with more interesting molecules in the

initial part of the rank

(9)

Consensus Models

Aim to reduce stochastic effects of using a

single chromosome

(10)

Splice Models

Essentially a manual recombination

operator to effect a more optimal solution

model based on feedback and intuition

(11)

Renin Consensus Discrimination Model

0 1 2 3 4 5 6 M1TR M1TS1 M1TS2 CM1T S1 CM1T S2 M5TR M5TS1 M5TS2 CM5T S1 CM5T S2 M6TR M6TS1 M6TS2 CM6T S1 CM6T S2 G lob al Enha ncement

(12)

Renin Consensus Discrimination Model

0 1 2 3 4 5 6 M1TR M1TS1 M1TS2 CM1T S1 CM1T S2 M5TR M5TS1 M5TS2 CM5T S1 CM5T S2 M6TR M6TS1 M6TS2 CM6T S1 CM6T S2 G lob al Enha ncement

(13)

Renin Splice Discrimination Model

0 1 2 3 4 5 6 7 M1TS 1 M1TS 2 CM1T S1 CM1T S2 M5TS 1 M5T S2 CM5T S1 CM5T S2 M6TS 1 M6TS 2 CM6T S1 CM6T S2 SM6 TR SM6 TS1 SM6 TS2 Global Enh ancemen t

(14)

Renin Splice Discrimination Model

0 1 2 3 4 5 6 7 M1TS 1 M1TS 2 CM1T S1 CM1T S2 M5TS 1 M5T S2 CM5T S1 CM5T S2 M6TS 1 M6TS 2 CM6T S1 CM6T S2 SM6 TR SM6 TS1 SM6 TS2 Global Enh ancemen t

(15)

COX Splice Discrimination Model

0 0.5 1 1.5 2 2.5 3 3.5 M1TR M1TS1 M1TS2 CM1TS1 CM1TS2 M2TR M2TS1 M2TS2 CM2TS1 CM2TS2 G loba l Enhan cement

(16)

COX Splice Discrimination Model

0 0.5 1 1.5 2 2.5 3 3.5 M1TR M1TS1 M1TS2 CM1TS1 CM1TS2 M2TR M2TS1 M2TS2 CM2TS1 CM2TS2 G loba l Enhan cement

(17)

Comparative Study: Oral

vs.

Non-Oral Drugs

Oral

vs.

non-oral drugs dataset

1

GA model compared with models generated with

¾ Naïve Bayes Classifier (NBC)

¾ Support Vector Machines (SVM)

Investigating:

¾ Consistency of results

(18)

Oral Drug Discrimination Model

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

Training Set Test Set 1 Test Set 2

Glob

al Enha

ncement

(19)

Model Interpretability

Model weights indicate

¾ Important descriptors

¾ Important ranges

Used to guide

decision-making processes

¾ Similarity searching

¾ Filtering rules

Rules are focused on

domain of interest

(20)

Conclusions

Consensus and splice models provide consistently

improved results

GA models provide greater or similar interpretability than

other methods applied here

¾ Models are transparent as to which descriptors and their ranges

are of greatest importance in discriminating

Indications that the GA and NBC methods could be applied

in combination

(21)

Areas the Student Covered

Cluster analysis

Druglikeness

Discriminant analysis

Variable selection

Genetic algorithms

Statistical learning methods

Java programming

(22)

What does the student gain?

Coding and adapting software

Tackling everyday challenges of research

Performing research in industry

Application-context drug research

(23)

What do the mentors gain?

Freedom to pursue an avenue of interest

Developing skills in student mentoring

A new viewpoint with new ideas

Assisting in training the next generation of

(24)

Acknowledgements

University of Sheffield

¾ Milan Ganguly ¾ Val Gillet ¾ Peter Willett

UCSF

¾ Jérôme Hert

Cheminformatics

¾ Peter Ertl ¾ Stephen Jelfs

Computer-Aided Drug

Discovery

¾ Paulette Greenidge ¾ Richard Lewis ¾ Nikolaus Stiefl

Molecular & Library

Informatics

¾ Kamal Azzaoui

¾ Edgar Jacoby

References

Related documents

For the broad samples of men and women in East and West Germany every first graph in Fig- ure 1 to 4 (Appendix) shows considerable and well-determined locking-in effects during the

This essay asserts that to effectively degrade and ultimately destroy the Islamic State of Iraq and Syria (ISIS), and to topple the Bashar al-Assad’s regime, the international

After successfully supporting the development of the wind power technology, an approach is needed to include the owners of wind turbines in the task of realizing other ways, other

19% serve a county. Fourteen per cent of the centers provide service for adjoining states in addition to the states in which they are located; usually these adjoining states have

Field experiments were conducted at Ebonyi State University Research Farm during 2009 and 2010 farming seasons to evaluate the effect of intercropping maize with

Since the knowledge regarding the role of enteroviruses in young neonates with sepsis is lim- ited in our region, therefore, this study was aimed to determine the etiologic agents

Collaborative Assessment and Management of Suicidality training: The effect on the knowledge, skills, and attitudes of mental health professionals and trainees. Dissertation