• No results found

Graph partitioning algorithms for detecting functional module from yeast protein interaction network

N/A
N/A
Protected

Academic year: 2021

Share "Graph partitioning algorithms for detecting functional module from yeast protein interaction network"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

vii

TABLE OF CONTENTS

CHAPTER TITLE PAGE

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENTS iv

ABSTRACT v

ABSTRAK vi

TABLE OF CONTENTS vii

LIST OF TABLES xi

LIST OF FIGURES xii

LIST OF ABBREVIATIONS xiv

1 INTRODUCTION 1

1.1 Background 1

1.2 Existing Graph Partitioning Algorithms for Protein

Interaction Network 3

1.3 Challenges in Graph Partitioning Algorithms 4

1.4 Statement of the Problem 5

(2)

1.6 Significance and Scope of Study 8

1.7 Thesis Outline 9

1.8 Summary 10

2 LITERATURE REVIEW 11

2.1 Introduction 11

2.2 Overview of Molecular Biology 13

2.2.1 The DNA 14

2.2.2 The RNA 14

2.2.3 The Protein 16

2.3 High-Throughput Technologies for Protein

Interaction Network Detection 18 2.4 Graph Modelling in Protein Interaction Network 20 2.4.1 Global Network Analysis 22 2.4.2 Network Modularity 24 2.5 Graph Partitioning Strategies for Detecting

Functional Modules 25

2.5.1 Divisive Approach 25

2.5.2 Highly Interacted Module Approach 27 2.5.3 Clique Finding Approach 28 2.6 Comparative Analysis of Graph Partitioning

Strategies 2.7 Summary 30 33 3 RESEARCH METHODOLOGY 34 3.1 Introduction 34 3.2 Research Framework 35 3.3 Testing Datasets 37

(3)

ix

3.4 Validation Datasets 38

3.5 Post-Processing 39

3.6 Evaluation Measurement 40

3.6.1 Biological Significance Measurement 40 3.6.2 Accuracy Performance Measurement 41 3.7 Hardware and Software Requirements 43

3.8 Summary 44

4 MINING RELIABLE FUNCTIONAL MODULES

FROM INCOMPLETE AND NOISY PROTEIN

INTERACTION NETWORK 45

4.1 Introduction 45

4.2 Experimental Framework 46

4.3 The Reliable Local Dense Neighbourhood

(RELODEN) Algorithm 48

4.4 Experimental Results 53

4.4.1 Biological Significance of Detected

Modules 53

4.4.2 Accuracy Performance of Proposed

Algorithm 57

4.5 Discussion 61

4.6 Summary 64

5 DETECTING OVERLAPPING FUNCTIONAL

MODULES FROM PROTEIN INTERACTION

NETWORK 66

5.1 Introduction 66

(4)

5.3 The Overlap-RELODEN Algorithm 69

5.4 Experimental Results 72

5.4.1 Biological Significance of Detected

Modules 73

5.4.2 Accuracy Performance of Proposed

Algorithm 76

5.4.3 Discard and Overlapping Rate of Proposed

Algorithm 80

5.5 Discussion 83

5.6 Summary 87

6 CONCLUSION AND FUTURE WORKS 88

6.1 Conclusion 88

6.2 Future Works 92

(5)

xi

LIST OF TABLES

TABLE NO. TITLE PAGE

2.1 Comparative study of different graph partitioning

strategies 31

2.2 Advantages and disadvantages of graph partitioning

strategies 32

3.1 Protein-protein interaction datasets 38

4.1 Biological significance of detected modules 54

4.2 Number of proteins predicted and matched with protein

complexes 59

4.3 The comparison of overall accuracy performance 59

5.1 Biological significance of detected modules 74

(6)

LIST OF FIGURES

FIGURE NO. TITLE PAGE

2.1 Central dogma of molecular biology (copyrighted by John

Wiley and Sons, Inc., 1997) 13

2.2 The different between RNA and DNA (retrieved from

National Human Genome Research Institute, 2009) 15 2.3 Nicotinic acid phosphoribosyltransferase protein structure

(downloaded from National Institute of General Medical

Science, 2009) 17

2.4 Y2H screening process (Pandey and Mann, 2000) 19

2.5 TAP process (Huber, 2003) 20

2.6 Example of graph modelling for protein interaction network

(Jonsson et al., 2006b) 21

2.7 Global network analysis of yeast protein interaction

network 23

2.8 Example of divisive approach (Fortunato and Castellano,

2007) 26

2.9 Example of module detected by highly interacted module

(7)

xiii

2.10 Overlapping modules detected by clique finding approach

(Palla et al., 2005) 29

3.1 Research framework 36

3.2 Graph modelling in protein interaction network 37

4.1 Experimental framework 47

4.2 Proposed local clique searching procedure 49

4.3 Example of clique detected in a graph 50

4.4 Proposed local dense sub-graph detection procedure 51

4.5 Example of dense sub-graph detection process 52

4.6 Example of functional modules detected by RELODEN

algorithm 56

4.7 The comparison of recall and precision score for four algorithms using MIPS and DIP dataset

58

4.8 The comparison of the number of known complexes

predicted by four algorithms using MIPS and DIP dataset 60

5.1 Experimental framework 68

5.2 Proposed informative protein selection procedure 69 5.3 Proposed informative sub-graph construction and dense

sub-graph searching procedure 71

5.4 Example of dense sub-graph detection 72

5.5 The comparison of recall and precision scores for three

algorithms using MIPS and DIP dataset 77

5.6 The comparison of the number of detected modules by three

algorithms using MIPS and DIP dataset 79

5.7 The comparison of the number of detected modules by three

algorithms using MIPS and DIP dataset 82

5.8 The overlapping rate of different degree in detected

(8)

LIST OF ABBREVIATIONS

CPM - Clique Percolation Method

CYGD - Comprehensive Yeast Genome Database DIP - Database of Interacting Protein

DNA - Deoxyribonucleic Acid

Dr - Discard Rate

FDR - False Discovery Rate FN - False Negative

FP - False Positive

G-N - Girvan and Newman Algorithm GO - Gene Ontology

HCS - Highly Connected Sub-graph MCL - Markov Clustering

MCODE - Molecular Complex Detection

MIPS - Munich Information for Protein Sequences mRNA - Messenger Ribonucleic Acid

PI - Informative Proteins

PPI - Protein-Protein Interaction

RELODEN - Reliable Local Dense Neighbourhood RNA - Ribonucleic Acid

RNSC - Restricted Network Searching Clustering SAGA - Spt-Ada-Gcn5 acetyltransferase

(9)

xv

SNAP - S-Nitroso-N-acetylpenicillamine TAP - Tandem Affinity Purification TP - True Positive

References

Related documents

Between August 8, 2010 and October 26, 2010, Plaintiff did not report any problems to the medical staff during the regularly conducted rounds in the Special Management Unit..

There are several approaches to train and test the accuracy of the model in supervised learning algorithms. The data set is divided into three groups: training, validation and test

[20] reported a high level of lifestyle of nurses in their study on the assessment of work stress, health-promoting lifestyle, and coping strategies.. However, the results found

Ship groundings – reflecting the high values of modern maritime risks – fires and plane crashes are the top causes of business losses by total value, based on analysis of over

However, there have been no reports comparing nucleated cell numbers per tissue weight in the synovium and yields of primary synovial MSCs be- tween RA and OA patients.. Furthermore,

Any pertinent clinical situation as defined by the product label that could affect patient safety and/or therapeutic efficacy (i.e. contraindications, warnings, precautions,

The objective of the present work was the evaluation of the CO emission from a complete combustion cycle and from the startup and combustion phases coupled with the

Passion for Jumeirah and the promise of delivering STAY DIFFERENT TM GOAL OUTCOME COMMUNICATING THE BRAND.. Communicating the Brand Platform