• No results found

Engineering Communities with Different Research Interests

3.4 Applications

3.4.3 Engineering Communities with Different Research Interests

The network metric lower bounds succeed in distinguishing the different collaboration patterns in engineering and mathematics communities. We now illustrate that the lower bounds are also able to identify distinctive features of engineering communities with different research interests. To see this we consider the networks constructed from annual publications of TAC, TSP, and TWC. Figure 21 shows the two dimensional Euclidean embeddings of the networks with respect to the summation of the lower bounds b0, b1, and b2. We expect more variations in annual networks because the time for averaging behavior is reduced. Besides, it is hard to argue that intrinsic and obvious differences exist in the collaboration patterns in automatic control, signal processing, and wireless communication communities. Still, networks constructed from the same journal but different annuals tend to be close to each other and form clustering structures. An unsupervised classification with one linear boundary in the embeddings run across the summation of lower bounds would generate 4 (20%) errors out of 20 networks in all three classification problems considered. The less obvious clustering structure formed by networks from different journals in Figure 21 (c) compared to (a) and (b) also suggests that the collaboration patterns in research communities of signal processing and wireless communication are more similar compared to that of automatic control.

Chapter 4

Frequency Representation of

Networks by Persistent Homology

Leveraging on the association between networks and persistent homology in Section 3, in this chapter we propose to definehomofrequencyof tuples in the network as the duration of the homo- logical feature generated by the tuples (Definition 21 in Section 4.1). Tuples forming a long-lasting homological feature – a high homofrequency – represent a core feature in the network that should not be considered as noise. On the other hand, tuples that generate a short-lasting homolog- ical feature – a low homofrequency – denote an unimportant feature in the network that are more likely to be noise. The homospectrum of a network collects all of its homofrequencies and the corresponding homological generators. This definition of homospectrums enable us to easily distinguish networks generated from different processes (Section 4.1.1). The first key theoretical result is that we can recover a network from its homospectrum (Definition 22 and Theorem 7). In other words, homospectrums offer a different representation of the same information represented in the network space. The dual space, formed by homospectrums and networks, behaves like frequency domain and time domain in classical signal processing. With such definition of ho- mofrequencies, we can then definehomofiltersto remove undesired homofrequencies and keep the homofrequencies of our interest (Section 4.2). The filtering can be applied in the homospectrum domain (Definition 23) as well as directly in the network domain (Definition 24). We illustrate that as expected, h-lowpass filter enables us to remove the unimportant features and keep the core representation in a point cloud data (Section 4.2.1). Another important aspect about defining frequency via homological features is that the difference between the original network and the filtered network using h-highpass filter is small (Theorem 8 in Section 4.3). In fact, such difference is no larger than the duration of the longest-lasting homological features removed during filtering. In time-series filtering, Parseval’s theorem in Fourier transform guarantees that the energy in the filtered signal is mostly preserved, if we only remove frequencies with small absolute frequency coefficients. Theorem 8 has similar interpretation, with the duration of the homological features

s1

Dissimilarity Network Persistence Diagrams Homospectrums

1-homospec trum 0.4 0.3 0.1 0.2 0.8 0.5 0.4 0.7 0.5 0.52 0.41 0.6 0.6 0.6 0.9 0.7 0.9 0.8 0.9 birth death (0.8, 0.9) (0.7, 0.9) (0.6, 0.9) (0.52, 0.8) (0.7, 0.9) (0.8, 0.9) (0.52, 0.8) (0.6, 0.9) 0.2 0.1 0.3 0.28 0-homospec trum birth death (0.1, 1) (0.1, 1) (0.2, 0.7) (0.2, 0.7) (0.3, 0.41) (0.5, 0.6) (0.3, 0.41) (0.5, 0.6) (0.4, 0.52) (0.4, 0.6) (0.4, 0.6) Large Homofrequency Perennial Duration Low Homofrequency Ephemeral Duration (0.4, 0.52) 0.9 0.5 0.2 0.12 0.1 0.11 s0

Figure 22: From dissimilarity networks to homospectrums through persistence diagrams. Persis- tence diagrams are constructed from the dissimilarity networks, with generators for each homo- logical feature being recorded. For each order k, homofrequencysk denotes the duration of the

homological feature. The aggregation of homofrequencies for all homological features forms the homospectrum.

denote the “energy”. Besides, if we apply filter onto a pair of networks, the difference between the original network distance and the distance between the filtered networks is no greater than the longest duration removed (Theorem 9).

4.1. Homospectrum of Networks

The appearance and disappearance of homological features induce a natural notion of duration, i.e. the difference between the disappearance time of a homological feature and the corresponding appearance time of the feature. This observation motivates us to represent the unstructured infor- mation encoded by the network by the more organized information of durations of homological features. In classical signal processing, a time-series can be represented by coefficients in Fourier frequency domain, where low frequencies denote small temporal variation and high frequencies represent large variation. In homological signal processing, a network can be represented by its

homospectrums, in which high homofrequencies indicate perennial homological features and high

homofrequencies represent ephemeral homological features.

To formalize such intuition, we need to slightly adapt the definition of persistent homology as in classical topology. The reason for this is because the birth and death time of persistent homology only capture the relationship values of the nodes, but do not capture which nodes are included in the relationship. For that purpose, we include the generators of each homological feature as we formally state the next.

Definition 20 Given a filtration L, its k-dimensional extended persistence diagram Qk is a collection of

points of the formq = [qb,qd], each associated with a tuple(Gb,Gd)where qb and qd > qb represent the

birth and death time of a homological feature,Gb={x0:k}denotes the collection of simplices that exhibit the

As per Definition 20, for each point in the persistence diagram, the timeqb represents the reso-

lutionα= qb at which thek-homological features generated by the simplices in the collectionGb

start to appear, and the timeqd describes resolutionα = qd at which the feature is no longer a

homological feature due to the addition of the simplex Gd = x0:k+1. For the same example as in

Figure 14, there are still two points,[0.5, 1] and[0.7, 0.8]in the 1-persistence diagram, but each of them is associated with a tuple denoting their generators. In specific, the point[0.5, 1] is associ- ated with({[a,b],[b,c],[c,d],[d,a]},[b,c,d])denoting its birth at resolution 0.5 is due to the cycle formed by[a,b],[b,c],[c,d], and[d,a] and its death at resolution 1 is due to the simplex[b,c,d]. In fact, the actual death generator in this case is inessential as it represents the maximum dis- similarity, i.e. the simplex actually does not exist. Similarly, the point[0.7, 0.8]is associated with

({[a,b],[b,d],[d,a},[a,b,d]). Compared to persistence diagram, extended diagram also records the generators for each of the homological features.

With the extended persistence diagram, we would be able to define the homofrequencies as the difference between the death time and birth time of each homological feature, as we state the next. Definition 21 Given a pointqin the k-dimensional extended persistence diagram Qk, its homofrequency

sk(q) is qd−qb. The k-homospectrum Sk = {sk}collects the homofrequency of all k-dimensional homo-

logical feature. The K-order homospectra SK = {Sk}kK=01collects all k-homospectrums up to order K−1.

Denote the space of all K-order homospectras asSK and the h-transform from networks to homospectrums

asZ : ˜DK→ SK.

The homofrequency for a certain homological feature represents how short the homological fea- ture exists in the filtration. The lower the homofrequency, the shorter the duration where the homological feature exists. On the other hand, the higher the homofrequency, the longer the existence. As for networks representing the temporal dynamics for the formation of a research community as in Figure 4, a homological feature denoting high 1-homofrequency describes a number of authors, who each pair started collaboration a while ago but only recently start to collaborate together or never collaborate together; in other words, it describes a “hole” in the community that last a relatively long time. On the other hand, a homological feature denoting low 1-homofrequency describes a group of researchers, who start to write publications together shortly after their pairwise initial collaboration; in other words, it describes a “hole” in the com- munity that gets filled very soon. It can be seen that a network up to order K can have K+1 different notions of homofrequency: starting from 0-homofrequency, describing when different communities join together, then 1-homofrequency, quantifying holes within communities. An il- lustrating example describing how we construct the homofrequency of networks, via persistence diagram, is demonstrated in Figure 22. Inherited from the extended persistence diagram, each homofrequency records the associated generator tuple. This enables the recovery of the original network from the homospectrums, as we state next.

Definition 22 Given a homospectra SK={Sk}kK=−01, its corresponding dissimilarity network DKXis recov-

death time[qb,qd]as well as the generators(Gb,Gd)recorded, where the dissimilarities for all simplices in

Gbnot defined yet are set to qb, and the dissimilarity for the simplex inGdis set to qd. Iterate this process

over all s0and then iterate over all Skup to K−1. If a tuple x0:kappears in the birth generators of multiple

k-homological features, its dissimilarity dk

X(x0:k)is set as the smallest of the birth times of the features.

The inverse h-spectrum from homospectrums to networks is denoted asZ−1:SKD˜K.

The network is recovered in a bottom-up fashion. For each homological feature, there is only one simplex in the death generatorGd, and the dissimilarity for that simplex is set to the death time

of the homological feature. Each homological feature may have multiple simplices in the birth generatorGb, and those that have not been defined are set to the birth time of the homological feature. The condition for a tuple appearing in multiple birth generators ensures order increasing property to be satisfied. The main result in this section is that it is able to prove thatZ−1is well

defined for dissimilarity networks, as we state next.

Theorem 7 The transform Z−1 defined in Definition 22 is a well-defined inverse map of Z defined in

Definition 21, i.e. for any dissimilarity network DK

X ∈ DK, we haveZ−1◦ Z(DXK) =DXK.

Proof: To prove Theorem 7, we first demonstrate that all tuples with recorded dissimilarity exist in the generators of extended persistence diagram, as we state next.

Lemma 1 Given a dissimilarity network DK

X, any of its tuples with recorded dissimilarity and with unique

elements appear either in the death generatorGdof a(k−1)-th dimensional homological feature or the birth

generatorGb of the k-th dimensional features.

Proof : Given any tuple x0:k with unique nodes, Definition 10 indicates that the k-simplex φk defined by the convex hull conv{x0:k} appears strictly after any of its faces conv{x0:bs:k} in the filtration. Supposeφkappears at timeαand denotekφk=∑iβiψki−1withβithe coefficients, then

eachψik−1appears strictly before timeα.

Now suppose that the appearance ofφk trivializes a(k−1)-th dimensional homological feature. This means thatφk is the boundary to trivialize the(k−1)-th dimensional cyclekφk. Since each face ψik−1 of φk appears strictly before time α, the cycle kφk results in a homological feature. The death time of this homological feature is α, or equivalently, the time represented by the relationshiprkX(x0:k). This indicates thatx0:kis the death generator of the homological feature.

On the other hand, if the appearance ofφkdoes not trivialize a(k−1)-th dimensional homological feature, then the(k−1)-cyclekφkis in the collection of simplices appearing before or on timeα. This means thatkφk can be represented by a sum of the boundaries of somek-chainsΦki,

kφk=

i

with coefficientsβi andk-chainsΦki appearing before or on timeα. By the definition ofk-chains,

Φi =∑jβ0jψkj with coefficientsβ0j andk-simplicesψkj appears before or on timeα. Therefore, (4.1)

can be written askφk=∑jβ00jkψkj. Rearranging terms,

k

j β00jψkjφk =0. (4.2)

This implies that ∑iβ00iψik−φk is a k-cycle. There must be a new cycle formed sinceφk just ap- pears. The cycle cannot be trivialized immediately since any(k+1)-chainΨk+1with

k+1Ψk+1=

∑iβ00iψki −φk would involve a simplex [x0:k,l]for some node xl with tuplex0:k,l consisted of non-

repeating elements where this simplex [x0:k,l] appears strictly after α. Therefore we have a k-th

dimensional homological feature with birth timeα, or equivalently, the time denoted by the rela- tionshiprkX(x0:k). Consequently,x0:kis in the birth generator of the homological feature formed.

Finally, combining the observations with the fact that the death and birth generators of all homo- logical features are recorded in the extended persistence diagram concludes the proof.

Back to the proof of Theorem 7, since network homofrequency also records the birth and death generators, all non-trivial tuples can be found in the birth and/or the death generators of network homofrequency as well. We reconstruct the network starting from the 0-homofrequency. Since the birth generators of 0-persistent homology is just a 0-simplex, i.e. a single node, and from Lemma 1, all 0-simplices can be found in generators, we could perfectly recover all single nodes and their corresponding dissimilarities in the form ofd0X(x). Besides dissimilarities of all single nodes coming from the birth generators of 0-persistent homology, the death generators of 0-persistent ho- mology unveils information about some of the edges. Consequently, we could recover some edges and their corresponding dissimilarities in the form ofd1

X(x,x0). Only examining the generators

of 0-persistent homology could not recover all edges. Nonetheless, from Lemma 1, all 1-simplices can be found in generators. Those edges do not appear in the death generators of 0-persistent homology must appear in the birth generators of some 1-persistent homological features. As a result, we could recover the remaining edges uncovered during investigating 0-persistent homol- ogy, and their corresponding dissimilarities. After examining the death generators of 0-persistent homological features and and the birth generators of 1-persistent homological features, we will recover all edges and their corresponding dissimilarities. Following the same approach, after ex- amining the death generators of 1-persistent homological features and and the birth generators of 2-persistent homological features, we will perfectly recover all triangles and their corresponding dissimilarities in the form ofd2X(x,x0,x00). Iterating by increasing order of homological features would eventually recover all dissimilarities defined in the original network.

The result as in Theorem 7 justifies the definition of homofrequency and homospectrum, i.e. a network could be equally represented in the network domain as well as in homospectrum. Compared to networks which depend on the underlying node space and labelling, homospectrum provides a universal description and also has the advantage that there is an implied ordering

0 2 4 6 8 10 Average 0-Homospectrum 0 0.2 0.4 0.6 0.8 1 0-Homofrequency

(a) 50 nodes, Erd˝os-R´enyi model

0 2 4 6 8 10 Average 1-Homospectrum 0 0.2 0.4 0.6 0.8 1 1-Homofrequency

(d) 50 nodes, Erd˝os-R´enyi model

0 2 4 6 8 10 Average 0-Homospectrum 0 0.2 0.4 0.6 0.8 1 0-Homofrequency

(b) 50 nodes, unit circle model

0 2 4 6 8 10 Average 0-Homospectrum 0 0.2 0.4 0.6 0.8 1 1-Homofrequency

(e) 50 nodes, unit circle model

0 2 4 6 8 10 Average 0-Homospectrum 0 0.2 0.4 0.6 0.8 1 0-Homofrequency

(c) 50 nodes, correlation model

0 2 4 6 8 10 Average 1-Homospectrum 0 0.2 0.4 0.6 0.8 1 1-Homofrequency

(f) 0 nodes, correlation model

Figure 23: The average homospectrums of the networks constructed from three different models with same number of nodes. For each network model, we generate 50 random networks with 50 nodes. We evaluate the 0-frequency and 1-frequency for all networks, where homofrequency coefficients are computed up to resolution of 0.001. The average of such homofrequency coeffi- cients across all networks is examined and visualized. Networks generated from different models exhibit highly distinguishable patterns in their homospectrums.

of frequencies. Theorem 7 only applies for dissimilarity networks, because simplices in relaxed dissimilarity network may generate homological features that are killed in the same time, resulting in information lost in the homospectrums. We will illustrate how the homospectrum looks like for exemplifying networks as next.