• No results found

Design of a Connected Component Analysis Algorithm to Con-

2.1 The Graph Formalism for Comparing Worms

2.1.1 Design of a Connected Component Analysis Algorithm to Con-

Planaria in the CellSim platform are composed of a collection of discrete cells rather than interconnected regions. Thus, a first step to deriving a graph-based representa- tion of the cell-based planaria is to translate the cells within a simulation snapshot into discrete regions, and to determine which regions are connected to each other. In order to do this, an algorithm that uses a connected component analysis approach derived from similar methods used in computer vision and document analysis has been developed [4]. The algorithm first iterates through all the cells in a snapshot and assigns each cell a region type (e.g., head or tail). The assignment of cell type can be complicated, examining many different factors for each cell, or can simply depend on the molecular concentrations of some user-defined indicator resources associated with a particular cell. Three resources, hCell (head), iCell (trunk), and tCell (tail) have been defined to serve as cell-state indicators in this study. For this work, a simple approach is used where a cell is assigned to a region type based on the highest total concentration of each indicator resource. For example, a cell is assigned a trunk

Figure 2.1: The assignment of a cell to a state depends on the molecular concen- trations of cell-state indicators. The differentiate state of cells are color-coded to enable visual distinction of cells and the composition of a region: head (blue), trunk (yellow), and tail (purple). Within a given cell, the location of a given resource can be distributed between the internal compartment (e.g., cytosol) and the surface (e.g., membrane). Concentrations of representative resources inside (I) and on the surface (S) are provided. In this example, both Cell 1 and Cell 2 are assigned to a trunk state and Cell 3 is assigned to a trunk state, since the concentrations of the indicator molecules (iCell and tCell, respectively) for these states are the highest.

state if its concentration of iCell is greater than hCell and tCell, as shown in Figure 2.1. The approach of assigning a cell to a region based on the highest concentration of an indicator resource is a simplification of reality, since many more factors may go into differentiation of a cell in an organism. Chapter 3 examines a more complex way of evaluating a cell’s region type.

After calculating cell type, the algorithm must determine all of the spatially cohesive regions of cells sharing the same type using connected component analysis. The connected component analysis algorithm (Algorithm 2.2) starts with a call to the ProcessConnectedComponents function with a simulation snapshot as a parameter. A simulation snapshot is a complete description of a particular step in a simulation; this includes a list of all cells, including cell contents, resources, and locations.

ProcessConnectedComponents(snapshot):

list = new list of connected components for each unprocessed cell c in snapshot comp = new connected component add c to comp

set c as processed

GatherConnected(c, comp, snapshot) add comp to list

for comp in components:

calculate parameters for comp

GatherConnected(c1, comp, snapshot):

for each unprocessed cell c2 in snapshot if c1 and c2 are connected:

if c1 and c2 are not of the same type: mark c1 and c2 as border cells else:

add c2 to comp set c2 as processed

GatherConnected(c2, comp, snapshot)

Figure 2.2: Pseudocode for connected component analysis recursive algorithm to separate a list of cells taken from the simulation snapshots into discrete morphology regions.

The ProcessConnectedComponents function iterates through all cells and calls the GatherConnected function for each unassigned/marked cell.

The GatherConnected function recursively collects and marks all other cells in the snapshot that belong to the same spatially cohesive region as the starting cell. A cell is defined to be in the same spatially cohesive region as the starting cell if it is of the same type as the starting cell and is either connected to the starting cell or to some other cell already determined to be in the starting cell’s region. Two cells are considered connected if the Euclidean distance between them is below a user-specified threshold. Additionally, if two cells are close enough to each other to be considered

connected, but are assigned to different regions because they are of different types, those cells are identified as border cells. Border cells are used to determine which regions are linked to each other.

Once each cell in the snapshot is assigned to a specific region, links between regions are calculated. The algorithm determines how many neighbors each region is connected to using the border cells found during the recursive process in Algorithm 2.2, and creates links between the connected regions. Two regions are be considered linked if they share border cells.

By the graph formalism, each link between regions is parametrized by the distance between the connected regions’ centers, the angle of the link measuring its tilt relative to the x-axis, and the location along the link where the two regions meet. The number of parameters for a region depends on the number of links it has. A component parameter is defined as the Euclidean distance from the center of a region to a region border in a specific direction. The center of a region is calculated by averaging the spatial centers of every cell in a particular region. The border of a region is calculated by finding the furthermost cell in a specific direction.

The connected component gathering algorithm allows the cell-based representa- tion of the worm to be converted into a graph, and this graph can be subsequently compared to a target graph pulled from the Planform database. The result of the graph-to-graph comparison is used as a fitness value to guide the genetic algorithm search process.