• No results found

Programming for Chemical and Life Science Informatics

N/A
N/A
Protected

Academic year: 2021

Share "Programming for Chemical and Life Science Informatics"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Programming for Chemical

and Life Science Informatics

I573 Week 3 (Graph Theory -I)

Rajarshi Guha 24th January, 2008

(2)

Outline

 What are graphs?

 Where can we use graphs and graph theory?  Basic concepts

 Algorithms on graphs

 Toolkits for graph theory  Books / references

(3)

What Is a Graph?

 In abstract terms, a collection of nodes (or vertices),

some or all of which are connected by edges

 Nodes and edges can have

properties associated with them

 Nodes and edges can represent

a variety of things − atoms, bonds − people, friendship − cities, roads Edge Node

(4)

Where Do We Use Graph Theory?

 Cheminformatics − Ring perception − Substructure searching − Reaction transformations − Molecular descriptors

(5)

Where Do We Use Graph Theory?

 Chemistry

− Stability of fullerenes

− Indications of chemical reactivity − Electronic structure

(6)

Where Do We Use Graph Theory?

 Biology

− Metabolic networks − Ecological systems

− Analysis of gene expression − Proteomics

 Many other areas

− Wherever there's a set of objects that can be

(7)

Size – number of edges in a graph

Degree – the number of edges connected to a

vertex

Loop – an edge whose start and end

nodes are the same

Directed graph – a graph in which

one can traverse edges in a specific direction only

(8)

Graph Theory Concepts

Subgraph – a graph g is a subgraph of G if all

nodes and edges of g are contained in the set of nodes and edges of G

 The red nodes and edges constitute

(9)

Data Structures in Graph Theory

 Adjacency matrix A B C D 1 1 0 0 D 1 1 1 0 C 0 1 1 1 B 0 0 1 1 A D C B A

(10)

Data Structures in Graph Theory

 Distance matrix A B C D 0 1 2 3 D 1 0 1 2 C 2 1 0 1 B 3 2 1 0 A D C B A

(11)

Isomorphic Graphs

 Two graphs with the same number of nodes,

connected in the same way

 Two graphs G1 and G2 are isomorphic if

− there is a one to one correspondence between

their nodes and

− if two nodes are adjacent in G1 they are also

(12)

Subgraph Isomorphism

 Given a graph G1, find a subgraph of G1 that is

isomorphic to a graph G2

 This is known to be NP-complete

− Not computable in polynomial time G1

(13)

Maximum Common Subgraph

Isomorphism

 Given graphs G1 and G2, what is the largest common

subgraph?

 This is known to be NP-complete

− Not computable in polynomial time

(14)

Maximum Common Subgraph

Isomorphism

 Given graphs G1 and G2, what is the largest

common subgraph?

 This is known to be NP-complete

(15)

Traversing Graphs

Walk – an alternating sequence of nodes and

edges, beginning and ending at a node

Length – number of edges in a walkPath – a walk with unique vertices

Cycle – a walk with unique vertices, but starting

(16)

Examples of Walks, Paths & Cycles

A walk of length 4

A cycle of length 6

(17)

Algorithms

 Graph traversal − Breadth first − Depth first  Finding cycles  Shortest paths

(18)

Breadth First Traversal

 Uninformed search  Start from a node S

 Examine all the nearest neighbors of S

 For each neighbor look at its nearest neighbors  Stop when all nodes have been visited

(19)

Breadth First Traversal

(20)

What Is It Good For?

 Find all connected components

− The set of nodes reached by a BFS are the largest

connected component containing the start node

 Finding the shortest path between 2 nodes  Used for path finding in computer games

(21)

Depth First Traversal

 Start from a node S

 Follow the first neighbor till a node is reached

that has no new neighbors

 Repeat for each nearest neighbor of S  Stop when all nodes have been visited  Requires less memory than BFS

(22)

Depth First Traversal

 The order of the nodes

visited starting from A is A, B, D, F, C, G, E

or

(23)

What Is It Good For?

 Finding connected components  Topological sorting, used in

− instruction scheduling − Makefile dependencies

(24)

Spanning Tree

 Subgraph of a connected, undirected graph that

is a tree and connects all the vertices

The nodes connected by the the dark edges represent a spanning tree of the original graph

(25)

Minimum Spanning Tree

 If edges have weights, then the minimum

spanning tree is that spanning tree with the overall minimum cost

 Kruskal's algorithm or Prim's algorithm

The dark edges represent a path that has the lowest cost amongst all possible spanning trees

4 5 3 1 3 7 5 9 7 8 8 6 6 5 4 2

(26)

What Is It Good For?

 Determining optimal paths  Laying cables for cable TV

− Cables must be laid along certain paths − Some paths are most costly than others

− Find the paths connecting all houses that is

cheapest

(27)

Toolkits for Graph Theory

 JGraphT - Java library for graph theory algo's  Supports

− directed and undirected graphs − weighted/unweighted edges

− variety of graph types

 This is different from Jgraph which is a graph

(28)

Toolkits for Graph Theory

 Boost graph library (BGL) – a C++ library  Based on STL so it makes heavy use of

templates and parametrization

 A wide variety of algorithms that include

− Shortest paths − Spanning trees

− Sorting and ordering

(29)

References

 Books for graph theory in chemistry

− Bonchev − Trinajistic

 General books

− Bonchev & Rouvray − Chartrand

− Trudeau

References

Related documents

In this study, it is aimed to develop the Science Education Peer Comparison Scale (SEPCS) in order to measure the comparison of Science Education students'

When transfecting HEK 293T cells with either “mIFP-P2A-mNG2(full)” or “mIFP- P2A-mNG211SpyCatcher and mNGX1-10 (X represent 2, 3A or 3K)”, we observed both mNG31-10

This suggest that developed countries, such as the UK and US, have superior payment systems which facilitate greater digital finance usage through electronic payments compared to

This narrative inquiry bears ontological, epistemological, and ethi- cal implications for teacher education programs because identity joins emotions and knowledge

Quality: We measure quality (Q in our formal model) by observing the average number of citations received by a scientist for all the papers he or she published in a given

Kordts-Freudinger and Geithner (2013) investigated whether differences in mean scores between paper and online SETs were the result of different evaluation settings (i.e.,

The Weigh Cell comes standard equipped to sup- ply final weighing values via a CAN interface as ready to connect modular components.. The inte- grated software filters can

Based on this understanding, this work focuses on the effect of the inclusion of shallow donor dopant such as gallium into the normal cadmium chloride post-growth treatment as