• No results found

Massive multi-aspect datasets have emerged from many fields. In this section, we review the different ways that multi-aspect data can be structured. The existing applications of multi- aspect data are typically categorized by domain, e.g., social networks, healthcare, chemistry, computer vision, etc. The following schema looks at the nature of the data and classifies them by the objects that the data has been collected around.

2.2.1 Multi-Aspect Data that Describes Singular Objects

One type of multi-aspect data describes various sets of variables related to a set of objects. In this case, we could have different sets of variables measured on different samples, e.g., dif- ferent conditions or times, where objects can be any meaningful entities or research interests, such as process batches(batch×time×variables [156,157]), physical locations (sites×time×

indicators [16,58,115,148,179,206]), patients (patient×medication×diagnosis [86,87,229]), users in a social network (nodes × time × measurements [158,168]), authors in text-based systems (author × time × keyword [106,207,208]).

In process control, Nomikos et al. [156] use multi-aspect data to monitor batch processes. Each batch is associated with a set of measurements, including flow rates of styrene, the temperatures of the feeds, the reactor, and the density of the latex in the reactor at a sequential interval of 5 minutes. In environmental research, Lee et al. [115] construct a multi-aspect dataset to represent hourly indoor air quality index measurements for various sites, e.g., NO, NO2, NOx, CO and PM2.5. In the healthcare domain, authors [87] work with

the data of patients’ diagnoses and their corresponding procedures to derive phenotypes. In social networks, one dimension could be the actors in the network and the other dimensions could be different types of measurements related to the actors. For example, Oliveira and Gama [158] build a tensorial representation of the student network by measuring the degree, eigen centrality, closeness, and betweenness centrality of each student at different snapshots of time to track the evolution of dynamic social networks. In text-based systems, Sun et al. [207] extract a three-order tensor from DBLP data to encode authors’ keywords in their publications for each year.

Regardless of the distinct domains, multi-aspect data can be used to characterize a cer- tain type of entities using longitudinal or cross-sectional measurements. In the case where longitudinal measurments are taken, the multi-aspect data can be also regarded as multi- variate data.

2.2.2 Multi-Aspect Data that Describes Pairwise Objects

The second category of multi-aspect data records the measurements related to two sets of objects. A simple case would be a multivariate image that presents various wavelenths as variables for pixels, which have x-coordinates and y-coordinates, therefore having row × column × measurements [116,117,183,217,253,256]. The goal of such a data construction is to discover the relationships between the objects in the cross-category. Researchers have used this scheme to construct corresponding multi-aspect datasets in various other domains, e.g.,

network security (originIP × destinationIP × time [12,135,137]), transportation (origin × destination × time [98,214,215,226]), and social networks (person × person × time [170,

177,239]).

With this multi-aspect representation, hyper-spectral images can be simply represented as third-order tensors: two ways for rows and columns and one way for the spectral band [183]. The signal subspace that integrates the spatial and spectral information has lead to signifi- cant improvement in target detection. Video data has also been represented as a 3D tensor X ∈ RI×J ×K, where I and J are the spatial dimensions of a video frame and K is the

total number of frames [217]. Designating two ways to represent the same set of entities bears more expressiveness in their interactions. In domain of network security, Maruhashi et al. [137] builds a four-way tensor from the port number and the time ticks of the network traffic from the source IP to the destination IP. Leveraging a tensor-based representation of the heterogeneous traffic network enables the discovery of structured relationships. The same ideas are also often applied to social networks and transportation networks. Peng and Li [177] construct a three-way tensor from the email exchanges between 184 users in 44 months, based on the Enron email dataset. Want et al. segment the Beijing area and build a three-way tensor based on taxi trips between 651 zones in 24 hours to understand the spatial-temporal structure of the traffic dynamics. Each of the element in the tensor indicates the volume of traffic from the i-th origin area to the j-th destination area in the k-th time domain.

We have also seen multi-relational data [154] structured in the form of tensorial repre- sentation. In this scenario, the multi-aspect data represent the dyadic relational data which consist of n entities and m relations. One example of such is the knowledge graph [23], where different entities in the graph are linked by various types of relations. Modeling as multi-aspect is an effective, straightforward solution for multiple binary relations [154]. High-dimensional, sparse spaces are a generic setting where factorization models achieve competitive results [153,219].

Most of the work surveyed builds tensors to describe pairwise objects in the tensor, rather than a singular object. Mining from such representations can decouple the latent embeddings of the entities into separate parts as the objects are examined with meaningful

semantics. In the transportation data, the areas can have a representation that indicates where people tend to depart and another representation of where people tend to arrive. In relational data, the entity can have dedicated representations that treat them as subjects in addition to objects. Additional modes, such as time and measurements, are used to describe the varying natures of the relations between these entities under different contexts. For example, the traffic in the morning is different from that in the evening.