Efficient Core Consistency - Rank Computation

2.5 Rank Computation

2.5.4 Efficient Core Consistency

The core consistency is a simple way to detect the rank of a PARAFAC decomposition, but it is also hard to compute for tensors with high dimensions.

This is due to the computation of the term (C ⊗ B ⊗ A)^†, that comprises three Kronecker products and the computation of its pseudo-inverse. To avoid the products and the pseudo-inverse Papalexakis and Faloutsos [139] devised the so called efficient CORCONDIA, which takes advantage of the Singular Values Decomposition (SVD) to write in a computationally easier way the aforementioned term and speed up the computation of the core consistency.

Property 12. The pseudo-inverse (C ⊗ B ⊗ A)^† can be written as (V^a⊗ V^b⊗ V^c) (Σ⁻¹a ⊗ Σ⁻¹b ⊗ Σ⁻¹c ) (U^Ta ⊗ U^Tb ⊗ U^Tc) ,

where A= UaΣ_aV^T_a, B= UbΣ_bV^T_b, and C= UcΣ_cV_c^T, i.e. the respective SVD.

74 Tensor Decompositions

Proof. By using the properties of the Kroneker product and the SVD of a matrix is possible to write

(A ⊗ B) = [(U^aΣ_aV^T_a) ⊗ (B = UbΣ_bV^T_b)]

= [(U^aΣa) ⊗ (U^bΣb) (V^a⊗ V^b)^T]

= [(Uâ⊗ U^b) (Σâ⊗ Σ^b) (Vâ⊗ V^b)^T] .

The resulting matrix Ua⊗ U^b is orthonormal and Σa⊗ Σ^b is diagonal with non-negative values. Thus, as the SVD is unique

A⊗ B = [(Uâ⊗ U^b) (Σâ⊗ Σ^b) (Vâ⊗ V^b)^T] is the SVD of A⊗ B, whose pseudo-inverse is equal to

(V^a⊗ V^b) (Σ⁻¹a ⊗ Σ⁻¹b ) (U^Ta ⊗ U^Tb) .

The proof for three matrices is straightforward. The resulting equation that has to be solved in place of Eq. (2.24) is

vecG= (V^a⊗ V^b⊗ V^c) (Σ⁻¹a ⊗ Σ⁻¹b ⊗ Σ⁻¹c ) (U^Ta ⊗ U^Tb ⊗ U^Tc) vecX .

In this chapter, we introduced the fundamental notions about tensor factor-ization problems which are needed for the work developed in the next chapters.

In particular, we have illustrated the techniques and computational methods that we will use as a basis to extend the tensor decomposition framework and thus tackling the questions introduced in the Introduction.

Chapter 3 Datasets

In this chapter we introduce data on human proximity, which will be used to test our methods in Chapter 4 and 5. Recording data of human proximity is particularly important to understand what are the underlying mechanisms behind the social interactions. A standard method to record interactions among people relies on surveys, diaries and questionnaires, filled by volunteers.

However, surveys provide a partial image about people interactions [144].

To have a more complete picture electronic devices are useful tools to record human proximity. For instance, Wi-Fi or Bluetooth signals, that have different spatial ranges, can be considered as a proxy of human proximity, i.e. physical co-presence.

A high temporal resolution way of recording face-to-face interactions relies on the use of radio-frequency identification devices (RFID). These devices are worn by individuals and exchange data packets in specific spatial ranges. If two devices exchange packets through the same antenna, then it effectively implies that the devices are in the same area.

Here, we consider data collected through RFID in the context of the So-cioPatterns collaboration (www.sociopatterns.org). SoSo-cioPatterns is an inter-disciplinary research collaboration started in 2008, involving researchers and developers from several institutions: the Institute of Scientific Interchange (ISI Foundation) of Torino, Italy; the Center of Theoretical Physics (CPT) of Marseilles, France; the Physics Laboratory of the École Normale Supérieure (ENS) of Lyon, France; and the Bitmanufaktur of Berlin, Germany.

76 Datasets They devised a protocol to record the interactions of people, wearing the RFID sensors in closed environments, e.g., hospitals, schools, military camps, and conferences. The protocol is defined as follows and details are provided by Cattuto et al. [145]:

1. people participating to an experiment wear small RFID wearable sensors, 2. once started, the devices exchange radio packets at close range,

3. the power of the signal can be tuned to extend the area covered by each sensor,

4. the human body acts as a shield for the radio signal, thus allowing the recording of face-to-face interactions only,

5. the information on the contacts is both stored by the sensors and sent to antennas, placed all over the environment.

This protocol allows to collect a stream of contacts that helps in the study of human dynamics. This finds its application in the development of models for the transmission of infectious diseases, such as influenza.

In the present work we will test several model based on tensor decomposition techniques on four SocioPatterns datasets, chosen for their diverse features. All the datasets are related to human face-to-face proximity measured by using the RFID sensors in closed environments. In particular, we will show the details about datasets related to two different types of environment: elementary schools, and scientific conferences.

The data collected in elementary schools are summarized in the Lyon primary school (LSCH) dataset, and in the Hong Kong primary school (HKSCH) dataset, while the other data were collected during the ACM Hypertext 2009 conference (HT09) in Italy, and the conference of the Société Française d’Hygiène Hospitalière (SFHH) in France.

In the Lyon primary school, 231 students and 10 teachers took part in the experiment as volunteers. Students and teachers were divided into 10 school classes. During the experiment, both the face-to-face interaction of people and their location in time were recoded. In particular, there were 15 locations in

77 the school, covered by antennas: 10 school classes, 2 stairs, 1 playground, 1 cafeteria, 1 control room. Data were collected among 2 days of observation.

In the Hong Kong primary school, the volunteers correspond to 709 students and 65 teachers divided into 30 school classes. Data where collected among 10 days of observation.

The HT09 dataset was collected during the ACM Hypertext 2009 conference, where the face-to-face proximity contacts of 113 conference attendees were recorded along 3 consecutive days.

Similarly, the SFHH dataset was collected during the aforementioned con-ference, where face-to-face proximity contacts of 417 volunteers were recorded during the 2 days of the conference.

Finally, data were recorded with 20 seconds of resolution, but for application purposes we aggregated the resulting network in time, with different aggregation levels, depending on the application. The resulting time-varying networks and the related aggregation levels are summarized in Tab. 3.1

Table 3.1 Time-varying network data and aggregation levels. In this table we reported the number of nodes and snapshots of the time-varying network created by starting from different datasets. For each dataset we provided the aggregation (in minutes) used in the applications shown throughout the thesis.

Dataset Aggregation (min) n. nodes n. snapshots

LSCH 13 241 150

LSCH 15 241 131

LSCH Locations 15 241 131

HKSCH 5 774 2680

HT09 15 113 237

SFHH 15 417 129

Chapter 4 Non-negative Tensor

Decomposition for Mesoscale Structure Detection in

Time-varying Networks

Part of the work described in this chapter has been previously published in [12]

and [13].

In this chapter, we show that tensor decomposition techniques can be applied to study time-varying networks with the aim of extracting meaningful temporal and topological patterns. We explain how to apply the NTF on time-varying networks, by focusing our analysis on time-varying social networks whose links represent the physical proximity of people in closed environments (as described in Chapter 3).

In the first part of the chapter, we provide the general procedure to carry out NTF on time-varying networks. First of all, we show how to represent the time-varying network as a tensor. Second we apply the decomposition on the network represented as a tensor to find its approximated version as a combination of sub-networks. Third, we explain the NTF results, by analysing the overall approximated network and the matrices provided as an output by the method.

In document Tensor decomposition techniques for analysing time-varying networks (Page 93-99)