Programming for Chemical
and Life Science Informatics
I573 Week 3 (Graph Theory -I)
Rajarshi Guha 24th January, 2008
Outline
What are graphs?
Where can we use graphs and graph theory? Basic concepts
Algorithms on graphs
Toolkits for graph theory Books / references
What Is a Graph?
In abstract terms, a collection of nodes (or vertices),
some or all of which are connected by edges
Nodes and edges can have
properties associated with them
Nodes and edges can represent
a variety of things − atoms, bonds − people, friendship − cities, roads Edge Node
Where Do We Use Graph Theory?
Cheminformatics − Ring perception − Substructure searching − Reaction transformations − Molecular descriptorsWhere Do We Use Graph Theory?
Chemistry
− Stability of fullerenes
− Indications of chemical reactivity − Electronic structure
Where Do We Use Graph Theory?
Biology
− Metabolic networks − Ecological systems
− Analysis of gene expression − Proteomics
Many other areas
− Wherever there's a set of objects that can be
Size – number of edges in a graph
Degree – the number of edges connected to a
vertex
Loop – an edge whose start and end
nodes are the same
Directed graph – a graph in which
one can traverse edges in a specific direction only
Graph Theory Concepts
Subgraph – a graph g is a subgraph of G if all
nodes and edges of g are contained in the set of nodes and edges of G
The red nodes and edges constitute
Data Structures in Graph Theory
Adjacency matrix A B C D 1 1 0 0 D 1 1 1 0 C 0 1 1 1 B 0 0 1 1 A D C B AData Structures in Graph Theory
Distance matrix A B C D 0 1 2 3 D 1 0 1 2 C 2 1 0 1 B 3 2 1 0 A D C B AIsomorphic Graphs
Two graphs with the same number of nodes,
connected in the same way
Two graphs G1 and G2 are isomorphic if
− there is a one to one correspondence between
their nodes and
− if two nodes are adjacent in G1 they are also
Subgraph Isomorphism
Given a graph G1, find a subgraph of G1 that is
isomorphic to a graph G2
This is known to be NP-complete
− Not computable in polynomial time G1
Maximum Common Subgraph
Isomorphism
Given graphs G1 and G2, what is the largest common
subgraph?
This is known to be NP-complete
− Not computable in polynomial time
Maximum Common Subgraph
Isomorphism
Given graphs G1 and G2, what is the largest
common subgraph?
This is known to be NP-complete
Traversing Graphs
Walk – an alternating sequence of nodes and
edges, beginning and ending at a node
Length – number of edges in a walk Path – a walk with unique vertices
Cycle – a walk with unique vertices, but starting
Examples of Walks, Paths & Cycles
A walk of length 4
A cycle of length 6
Algorithms
Graph traversal − Breadth first − Depth first Finding cycles Shortest pathsBreadth First Traversal
Uninformed search Start from a node S
Examine all the nearest neighbors of S
For each neighbor look at its nearest neighbors Stop when all nodes have been visited
Breadth First Traversal
What Is It Good For?
Find all connected components
− The set of nodes reached by a BFS are the largest
connected component containing the start node
Finding the shortest path between 2 nodes Used for path finding in computer games
Depth First Traversal
Start from a node S
Follow the first neighbor till a node is reached
that has no new neighbors
Repeat for each nearest neighbor of S Stop when all nodes have been visited Requires less memory than BFS
Depth First Traversal
The order of the nodes
visited starting from A is A, B, D, F, C, G, E
or
What Is It Good For?
Finding connected components Topological sorting, used in
− instruction scheduling − Makefile dependencies
Spanning Tree
Subgraph of a connected, undirected graph that
is a tree and connects all the vertices
The nodes connected by the the dark edges represent a spanning tree of the original graph
Minimum Spanning Tree
If edges have weights, then the minimum
spanning tree is that spanning tree with the overall minimum cost
Kruskal's algorithm or Prim's algorithm
The dark edges represent a path that has the lowest cost amongst all possible spanning trees
4 5 3 1 3 7 5 9 7 8 8 6 6 5 4 2
What Is It Good For?
Determining optimal paths Laying cables for cable TV
− Cables must be laid along certain paths − Some paths are most costly than others
− Find the paths connecting all houses that is
cheapest
Toolkits for Graph Theory
JGraphT - Java library for graph theory algo's Supports
− directed and undirected graphs − weighted/unweighted edges
− variety of graph types
This is different from Jgraph which is a graph
Toolkits for Graph Theory
Boost graph library (BGL) – a C++ library Based on STL so it makes heavy use of
templates and parametrization
A wide variety of algorithms that include
− Shortest paths − Spanning trees
− Sorting and ordering
References
Books for graph theory in chemistry
− Bonchev − Trinajistic
General books
− Bonchev & Rouvray − Chartrand
− Trudeau