• No results found

Network Analysis Methods

In document Stanley_unc_0153D_17596.pdf (Page 149-153)

6.2.1 Creating Networks with SparCC

Given the OTU count table that we generated, the next objective was to construct two networks. The first network was the OTU co-occurence network constructed from the 24 patients with acute lung injury. The other network is the OTU co-occurence network between healthy patients. To construct these networks, we used SparCC (Friedman and Alm, 2012), the method discussed in Chapter 2. As a recap, the objective of this method is to create sparse correlation networks between OTUs, based on their counts. Due to the count-based nature of the data, SparCC is a more appropriate approach for constructing correlation networks than Pearson correlation as it does not lead to as many spurious connections. The authors further point out that spurious correlations are often worse when the diversity of the sample (defined as some function of the number of OTUs present) is low. Hence, SparCC is a state-of-the-art method because it also takes this diversity issue into account. The correlation network in each case is between OTUs, according to their co-occurence patterns across patients in the associated cohort (healthy vs. acute lung injury).

The correlation networks returned by SparCC had both positive and negative edge weights. In this analysis, we only consider positive edges, as most of the the community detection literature is not amenable to signed networks. We were further curious to see how we could more closely hone in on the important structures by thresholding the network, or removing edges less than some threshold weight.

We based this threshold on two criteria. First, we sought to find a stable threshold where slight variations in threshold would not dramatically change the number of communities detected with a modularity-based community detection algorithm. Next, we also sought to identify the threshold producing a node-to- community partition similar to the results produced at an adjacent threshold. In summary, both of these methods seek to find a stable threshold that does not dramatically change the community structure. For both networks, this threshold turned out to be 0.14. In other words, we discarded edges with a weight less than 0.14. This produced 4 communities (identified through modularity optimization) in each network. We show the acute lung injury and healthy OTU co-occurence networks (left and right, respectively) in Figure 6.1. Here, nodes are colored by their community assignment. From even this early glance, we observe that the structures of these networks are quite different. This observation allowed us to further analyze the difference in biological function reflected in these different network structures.

ALI

No ALI

Figure 6.1:Microbial co-occurence networks for each patient cohort. We constructed networks with SparCC in the ALI and non-ALI cohort networks (left and right, respectively). Four communities were identified in each network. Nodes are colored by their community assignment.

6.3 Results

6.3.1 Community overlap between network

After constructing the network for each cohort, we first evaluated the similarity in all pairs of communities across both networks, and used bioinformatics tools to further uncover the biological differences. We denote the communities in the ALI network by ALI 1, ALI 2, ALI 3, and ALI 4.

No ALI A No ALI B No ALI C No ALI D

ALI 1 1 1 27 2

ALI 2 50 5 9 22

ALI 3 66 46 37 5

ALI 4 77 5 15 4

Table 6.1: Comparing Networks in Each Patient Cohort. We compare the OTUs in each pair of communities in the ALI and No ALI cohort networks. Large overlaps are denoted by pink shading in the table.

Similarly, we denote the four communities in the No ALI network by No ALI 1, No ALI 2, No ALI 3, and No ALI 4. In table 6.1, we show the contingency table used to compare the communities in the two networks. Each entry counts the number of OTUs shared between the community pair. We denoted the large overlaps (i.e. sharing many common OTUs) by pink shading in the table. In particular, we highlight the similarity between ALI 4 and No ALI A, ALI 3 and No ALI B, ALI 1 and No ALI C, and ALI 2 and No ALI D.

6.3.2 Evaluating functional differences

Next, we sought to study functional differences in the airway microbiota between patients with and without acute lung injury. In other words, each of the OTUs contains different genes, which leads to different functions (i.e. many OTUs contain genes that encode glycoside hydrolase activity). Moreover, we hypothesized that there would be a difference in the functions of the communities between the ALI and non ALI networks. To investigate this, we used PICRUSt (Langille et al., 2013), a bioinformatic approach used to predict the function of each community. PICRUSt works by looking at each community and the known genetic information about the OTUs assigned to that community and determining the enrichment of particular functions.

6.3.3 Classifying each community according to predicted function

PICRUSt originally returned 6,911 unique functions according to the communities across both networks. We were interested to see if we could train a classifier to predict an OTUs community assignment based on its inferred function, according to PICRUSt. In other words, each OTU has several predicted functions, according to its genetic content and we wished to test if we could predict an OTUs community assignment in each network based on the presence or absence of certain functions. From

the 6,911 features returned by PICRUSt, we reduced the set by filtering out the functions that were not associated with any OTU. We further filtered out functions if they had a small ratio of within-class variance to between-class variance, meaning that we only wanted features that varied between classes. Using a ratio threshold of 0.05 brought the number of features in our model to 328. A random forest model was trained with half of the data for the ALI and No ALI datasets independently, using the 328 PICRUSt functions as the features. To visualize which features were most predictive in being able to classify an OTU into a community, based on function, we measured its importance their importance based on their Gini importance (Menze et al., 2009). In Figure 6.2, the biological functions are presented from top to bottom for the ALI and No ALI networks (left and right, respectively) in terms of their Gini score.

ALI No ALI

Figure 6.2:Predictive functions for community classification. We used a set of 328 filtered functions to predict OTU-to-community assignment in the ALI and No ALI networks. Here we show show the functions identified as the most strong predictors for each community in the ALI and No ALI networks (left and right, respectively). Functions with more discriminative ability in classification from the random forest classifier are ranked higher on the list.

When these highly ranked predictors were compared between the ALI and no ALI networks, there was very little observed overlap between the highly ranked features between the ALI and no ALI networks, which suggested that acute lung injury is severely altering the microbiome and its function.

In document Stanley_unc_0153D_17596.pdf (Page 149-153)