DBLP Case Studies
For the DBLP dataset, the yearly co-authorship relations among the authors are divided into 50 clusters based on the title of the papers (Section 4.2). Note that the clusters are based on the publication titles and we use the most frequent words that belong to a cluster to describe the topic it represents. Two of the CIRMs are shown in Figure 8.13 capture the frequent co-authorship relational changes that are thematically different. The first CIRM shows the periodic changes in research topics represented as 8 and 2 and the topic similarity between these topics is 0.15. The CIRM captures the periodic transitions of the relations as author a and b collaborate with other authors c, d, and e over the time. The embeddings show relations among the four different sets of authors. The second CIRM shows the periodic changes in research topics represented as 2 and 20 and the topic similarity is 0.15.
113 8 2 2 8 8 8 20 20 b 8 a Publication Topics
2: System, Distributed, Base, and Model 8: Image, Segment, Color, and Retrieval
Embedding 1: a: Marc Rioux b: François Blais c: Guy Godin d: J.-Angelo Beraldin e: Luc Cournoyer (a) b 2 a a 20 b a 2 b Publication Topics
2: System, Distributed, Base, and Model 20: Recognition, Speech, Pattern, and Feature (b) c 8 a b c b 2 2 d a 2 e Embedding 3: a: R. Schettini b: Gianluigi Ciocca c: Carla Brambilla d: Isabella Gagliardi e: Silvia Zuffi Embedding 2: a: Nikola Pavesic b: France Mihelic c: Ivo Ipsic d: Jerneja Gros e: Bostjan Vesnicer Embedding 4: a: Shigemi Nagata b: Yusuke Uehara c: Rujie Liu d: Takayuki Baba e: Daiki Masumoto d Embedding 1: a: Javier Ferreiros b: Rubén S.Segundo c: Javier M. Guarasa d: Ricardo de Córdoba Embedding 3: a: David Llorens b: Federico Prat c: Rafael Ramos- Garijo
d: Juan Miguel Vilar
Embedding 2: a: Brian Kingsbury b: Nelson Morgan c: Steven Greenberg d: Adam Janin Embedding 4: a: Karmele López de Ipiña b: Luis J. Rodríguez c: M. Inés Torres d: Mikel Larrañaga
Figure 8.13: Two CIRMs capturing co-authorship patterns. The edge labels represent the domain of the publications. The vertices are labeled to identify the authors in an embedding. These CIRMs are collected using φ = 100 and β = 0.50.
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 70 60 50 E m b e d d in g s Support
GT Class Distribution Good_cBad_c Good_ic Bad_ic
Figure 8.14: A distribution of the CIRM embeddings. The Good class represents the production runs with high yield and the Bad class represents with poor yield. CIRMs were collected using φ = (70, 60 and 50), β = 0.50, mmin= 3, kmin= 3, and kmax= 8.
Genentech Case Studies
The authors of [106] showed that the CRMs can be used as features for building a predictive model. Out of the 247 production runs included in the GT dataset, based on the quality of the yields, 48 of the runs are labeled as Good, 48 of the runs are labeled as Bad, and the remaining were not labeled. They analyzed the embeddings of the discovered CRMs and showed that the CRMs are present mostly as part of the Good runs. We wanted to ensure that the underlying network characteristics captured by CRMs are still captured by CIRMs.
Figure 8.14 shows the class distribution of the embeddings for the discovered CIRMs and CRMs. It shows that the CIRMs are present mostly as part of the high yield runs, since more than 75% of the embeddings belong to the Good class (Good ic). Note that the ratio of the embeddings supporting the Good class remains consistent between CRMs (Good c) and CIRMs (Good ic). Even though there are fewer CIRMs detected compared to CRMs, the information captured within the discovered CIRMs represents the characteristics of the underlying dynamic network as well as the CRMs.
Chapter 9
Conclusion
In this dissertation we presented several algorithms that can efficiently and effectively analyze the changes in dynamic relational networks. The new classes of dynamic pat- terns enable the identification of hidden coordination mechanisms underlying the net- works, provide information on the recurrence and the stability of its relational patterns, and improve the ability to predict the relations and their changes in these networks. Specifically, the qualitative analysis of each class of patterns has shown the information captured by these patterns about the underlying networks and proven to be useful for building models.
9.1
Thesis Summary
The objective of this dissertation has been identifying different evolving relational pat- terns from dynamic relational networks that capture valuable information characterizing the underlying network. In this section we summarize the contributions and the results.
Mining the Evolution of Conserved Relational States We presented an algo- rithm for finding all maximal non-redundant evolution paths of the induced relational states in a dynamic network. This can be used to discover the transitions of the con- served relational states over time and to better understand the cause of such changes in the stable patterns in a dynamic network. Our experimental evaluation on multiple real world datasets show that the algorithm is able to discover interesting evolution paths
from all datasets and can scale well to large and dense dynamic networks.
Mining the Evolution of Conserved Relational States We introduced coevolv- ing relational motifs to represent patterns that change in a consistent way over time in a dynamic network and presented an algorithm to efficiently find all frequent coevolving relational motifs. The algorithm can be used to discover unknown coordination mech- anisms in a system by identifying the patterns that evolve and move in a similar and highly conserved fashion in the dynamic networks. The experimental evaluation using multiple real world datasets show that CRMminer is able to discover CRMs from all datasets and CRMminerx scales better than CRMminer for large and dense dynamic networks. Further, the qualitative analysis shows that the discovered patterns cap- ture important information and can be used as differentiating features for other mining problems.
Mining the Evolution of Conserved Induced Relational States We presented coevolving induced relational motifs to capture patterns that focus on identifying all relations between the set of entities and how that complete set of relations change in a consistent way across different snapshots of the network. The algorithm efficiently handles the additional complexity of ensuring induced isomorphism and allows the an- chored CIRMs to grow beyond the initial size. Using multiple real world datasets, the experimental evaluation shows the efficiency and scalability of the algorithm. Further, the qualitative analysis shows that the fewer induced evolving patterns were able to cap- ture same level of characteristics of the underlying network as the arbitrarily evolving patterns.