BIG data are a collection of dataset consisting of

(1)

A Tensor-Based Approach for Big Data

Representation and Dimensionality Reduction

Liwei Kuang, Fei Hao, Laurence T. Yang, Man Lin, Changqing Luo, and Geyong Min

Abstract—Variety and veracity are two distinct characteristics of large-scale and heterogeneous data. It has been a great challenge to efficiently represent and process big data with a unified scheme. In this paper, a unified tensor model is proposed to represent the unstructured, semi-structured and structured data. With tensor extension operator, various types of data are represented as sub-tensors and then are merged to a unified tensor. In order to extract the core tensor which is small but contains valuable information, an Incremental High Order Singular Value Decomposition (IHOSVD) method is presented. By recursively applying the incremental matrix decomposition algorithm, IHOSVD is able to update the orthogonal bases and compute the new core tensor. Analyses in terms of time complexity, memory usage and approximation accuracy of the proposed method are provided in this paper. A case study illustrates that approximate data reconstructed from the core set containing18%elements can guarantee93%accuracy in general. Theoretical analyses and experimental results demonstrate that the proposed unified tensor model and IHOSVD method are efficient for big data representation and dimensionality reduction.

Index Terms—Tensor, HOSVD, Dimensionality Reduction, Data Representation

F

1 I

NTRODUCTION

B

IG data are a collection of dataset consisting of massive unstructured, semi-structured, and struc-tured data. The four main characteristics of big data are volume (amount of data), variety (range of data types and sources), veracity (data quality), and veloc-ity (speed of incoming data). Although many studies have been done on big data processing, very few have addressed the following two key issues: (1) how to represent the various types of data with a simple model; (2) how to extract the core data sets which are smaller but still contain valuable information, es-pecially for streaming data. The purpose of this paper is to explore the above raised issues which are closely related to the variety and veracity characteristics of big data.

Logic and Ontology [1], two knowledge represen-tation methodologies, have been investigated wide-ly. Composed of syntax, semantics and proof theo-ry, Logic is used for making statements about the world. Although Logic is concise, unambiguous and expressive, it works with the statements that are true or false and is hard to be used for reasoning with • L. Kuang, F. Hao and C. Luo are with the School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan430074, China.

• L.T. Yang is with the School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China, and the Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada.

• M. Lin is with the Department of Computer Science, St. Francis Xavier University, Antigonish, NS, Canada.

• Geyong Min is with the College of Engineering, Mathematics and Physical Sciences, University of Exeter, Exeter, EX4 4QF, United Kingdom.

unstructured data. Ontology is the set of concepts and relationships that can help people communicate and share knowledge. It is definitive and exhaustive, but it also causes incompatibility among different applica-tion domains, and thus is not suitable for representing and integrating heterogeneous big data.

The study of data dimensionality reduction has been reported in the literature. Previous approach-es include Principal Component Analysis (PCA) [2], Incremental Singular Value Decomposition (SVD) [3], and Dynamic Tensor Analysis (DTA) [4]. These meth-ods are available for low dimension reduction but suffer from some limitations because they are time-consuming when being performed on high-dimension data and fail to extract the core data sets from stream-ing big data.

This paper presents a unified tensor model for big data representation and an incremental dimension-ality reduction method for high-qudimension-ality core set ex-traction. Data with different formats are employed to illustrate the representation approach, and equivalent theorems are proven to support the proposed reduc-tion method. The major contribureduc-tions are summarized as follows.

• Unified Data Representation Model: We

pro-pose a unified tensor model to integrate and represent the unstructured, semi-structured, and structured data. The tensor model has extensible orders to which new orders can be dynamically appended through the proposed tensor extension operator.

• Core Tensor Equivalence Theorem:To tackle the

recalculation and order inconsistency problems in big data processing with tensor model, we prove a core tensor equivalence theorem which

(2)

can serve as the theoretical foundation for de-signing incremental decomposition algorithms.

• Recursive Incremental HOSVD Method: We

present a recursive Incremental High Order Sin-gular Value Decomposition method for stream-ing data dimensionality reduction. Detailed anal-yses in terms of time complexity, memory usage and approximation accuracy are also investigat-ed.

The remainder of this paper is organized as follows. Section 2 recalls the preliminaries of tensor decom-position. Section 3presents a framework for big data representation and processing. A unified tensor model for big data representation is proposed in Section 4. Section5presents a novel incremental dimensionality reduction method. A case study of intelligent trans-portation is investigated in Section6. After reviewing the related works in Section7, we conclude the paper in Section8.

2 P

RELIMINARIES

This section reviews the preliminaries of singular value decomposition [5] and tensor decomposition [6]. The core tensor and truncated bases described in the preliminaries can be employed to make big data smaller.

Definition 1: Singular Value Decomposition

(SVD). Let M ∈ Rm×n denote a matrix, the

factorization

M =UΣVT (1)

is called the SVD of M. Matrices U and V refer to the left singular vector space and the right sin-gular vector space of matrix M respectively. Both U and V are unitary orthogonal matrices. Matrix Σ = diag(σ1, σ2, ..., σk, ..., σl), l = min{m, n} is a

diagonal matrix that contains the singular values of M. In particular,

Mk =UkΣkVkT (2)

is called the rank-ktruncated SVD ofM, whereUk=

[u1, .., uk], Vk = [v1, .., vk], Σk =diag(σ1, ..., σk), k <

l. The truncated SVD ofM is much smaller to store and faster to compute. Among all rank-kmatrices,Mk

is the unique minimizer of∥M −Mk∥F.

Definition 2: Tensor Unfolding. Given a P-order

tensor T ∈ RI1×I2×...×IP_{, the tensor unfolding [7]} T(p) ∈ RIp×(Ip+1Ip+2...IPI1I2...Ip−1) contains the

ele-ment ti1i2...ipip+1...iP at the position with row num-ber ip and column number that is equal to (ip+1−

1)Ip+2. . . IPI1. . . Ip−1+(ip+2−1)Ip+3. . . IPI1. . . Ip−1+

. . .+ (i2−1)I3I4. . . Ip−1+. . .+ip−1.

Example 1. Consider a three-order tensor T ∈ R2×4×3_{, Fig. 1 shows the three unfolded matrices}_T

(1),

T(2) and T(3).

Definition 3: p-mode product of a

tensor by a matrix. Suppose a tensor

21 22 23 24 25 26 27 28 11 12 13 14 15 16 17 18 1 2 3 4 5 6 7 8 1 11 21 2 12 22 3 13 23 4 14 24 5 15 25 6 16 26 7 17 27 8 18 28 = 1 5 11 15 21 25 2 6 12 16 22 26 3 7 13 17 23 27 4 8 14 18 24 28 1 2 3 4 5 6 7 8 11 12 13 14 15 16 17 18 21 22 23 24 25 26 27 28 T(1) = T(2) T(3)= T

Fig. 1. Three-order tensor unfolding; tensorT is unfold-ed to three matrices.

T ∈ RI1×I2×...×Ip−1×Ip×Ip+1×...×IP _and _a matrix U ∈ RJp×Ip_, _the _p_-mode _product (T×pU)∈RI1×I2×...×Ip−1×Jp×Ip+1×...×IP is defined as (T×pU)i1i2...ip−1jpip+1...iP = Ip ∑ ip=1 (ai1i2...ip−1ipip+1...iP ×ujpip). (3)

The p-mode product is a key linear operation for dimensionality reduction, and the truncated left sin-gular vector matrixUJp×Ip (Jp< Ip) is used to reduce the dimensionality of orderIp from Ip to Jp.

I₂ I1 I3 I4 I₅ I6 2 × = I₂ I3 I4 I₅ I6 I1

Fig. 2. Tensor dimensionality reduction with p-mode product; the dimensionality of the 2nd order is reduced from 8 to 2 by a2×8matrix.

Definition 4: Core Tensor and Approximate Ten-sor.For an initial tensorT, the core tensorS [8] and the approximate tensorTˆ are defined as

S=T×1U1T×2U2T...×PUPT, (4)

and

ˆ

T =S×1U1×2U2...×PUP. (5)

The core tensorS is viewed as a compressed ver-sion of initial tensor T. By keeping only the left k unitary orthogonal vectors of the unfolded matrix, the principal characteristics are reserved. Big data applications can simply keep the core tensor S and truncated bases U1, U2, . . . , UP. When needed, data

can be reconstructed by generating the approximation tensor with Eq. (5). The right singular vector matrices V1, V2, . . . , VP and the singular values are unified to

the core tensor which contains the coordinates of the left singular vector matrices in the approximate tensor.

(3)

In general, the reconstructed data are more efficient than the original data as noise, inconsistency and redundancy are removed.

1 2

3

^

Fig. 3. Illustration of the core tensor and the approxi-mate tensor. The core tensor and the truncated unitary orthogonal bases (U1, U2, U3) are called core data

sets that can be used to make big data smaller, while the reconstructed approximate tensor is a substitute for the initial tensor.

3 D

ATA

R

EPRESENTATION AND

P

ROCESS

-ING

F

RAMEWORK

In this section, a tensor-based data representation and processing framework is proposed. Fig. 4 depicts a three-tier framework in which different modules are enabled in each layer. We elaborate the functions and responsibilities of each module through a bottom-up view approach.

Streaming Data

Unstructrued Data Semi-structured data Structured data …

Data Service

Mining Algorithm Inference Method Data Visualization

Data Dimensionality Reduction Data Tensorization Data Collection Healthcare Finance Transportation Data Analysis

Video...Audio XML ... HTML GPS ... EHR

…

Fig. 4. Data representation and processing framework.

1) Data Collection Module. This module is in charge of collecting various types of data from different areas, for example, video clip, XML document and GPS data. The streaming data will incrementally arrive and temporarily ag-glomerate together without changing their orig-inal format.

2) Data Tensorization Module. Since the collect-ed unstructurcollect-ed, semi-structurcollect-ed and structurcollect-ed

data are not uniform, these data need to be represented as a unified tensor model. The sub-tensors with various orders will be generated to model the data according to their initial format. Then, all the sub-tensors will be integrated as a unified heterogeneous tensor.

3) Data Dimensionality Reduction Module. This module is for efficiently processing the high di-mension tensorized data, and extracting the core data sets that are more smaller for storage and computation. The reduction can be enhanced by virtue of implementation of the proposed IHOSVD algorithm which can incrementally up-date the orthogonal bases of each unfolded ma-trices.

4) Data Analysis Module. Numerous algorithm-s algorithm-such aalgorithm-s clualgorithm-stering algorithmalgorithm-s, multi-aalgorithm-spect predication algorithms, etc., are included in this module. The module can help obtain potential values behind large scale heterogeneous data. Data visualization module in this layer helps users easily understand the data values.

5) Data Service Module.Data service module pro-vides services according to the requirements of different applications. For instance, with the s-mart monitor appliances, proactive health care services can be provided to users based on the thorough understanding of their physical status. This paper mainly focuses on data tensorization module and data dimensionality reduction module.

4 A U

NIFIED

D

ATA

R

EPRESENTATION

M

OD

-EL

This section proposes a tensor-based data representa-tion model and tensorizarepresenta-tion approach for transform-ing heterogeneous data to a unified model. Firstly, an extensible order tensor model and tensor extension operator are presented. Secondly, we illustrate how to tensorize the unstructured, semi-structured and structured data as sub-tensors. Thirdly, the integration of sub-tensors as a unified tensor is studied. Tensor order and tensor dimension, two confusing concepts, are then discussed in the end.

4.1 Extensible Order Tensor

In general, time and space are two basic characteristics of data collected from different areas, while users are major recipients of data services. Therefore, a general tensor-based data model is defined as

T ∈RIt×Is×Iu×I1×...×IP_. ₍₆₎ Eq. (6) shows a (P+ 3)-order tensor which contains two parts, namely, the fixed part RIt×Is×Iu _{and the} extensible partRI1×...×IP_{. The tensor orders}_I

t,Isand

(4)

In the tensor model, data characteristics are rep-resented as tensor orders. For example, the color space characteristic of unstructured video data can be modeled as Ic. For heterogeneous data, various

characteristics are represented as tensor orders and attached to the fix part using the proposed tensor extension operator.

Definition 5: Tensor Extension Operator.Let A ∈

RIt×Is×Iu×I1_{, and}_B ∈_RIt×Is×Iu×I2_{, the tensor}

exten-sion operator is given by the following function f : A⃗×B→C, C ∈RIt×Is×Iu×I1×I2_. ₍₇₎

Operator _×⃗ _{satisfies the associative law. In other} words, (A⃗×B)×⃗C =A⃗×(B ⃗×C). By virtue of Eq. (7), heterogeneous data can first be tensorized as low order sub-tensors and then extended to a high order unified tensor. The operator merges the identical or-ders while keeping the diverse oror-ders. Elements of the identical order are accumulated together. For instance, sub-tensorTsub1and sub-tensorTsub2have time order

denoted as It−1, It−2, where It−1 ∈ {i1, i2}, It−2 ∈

{i1, i3}. After extension, time order of the new tensor

T =Tsub1×⃗Tsub2 becomes It∈ {i1, i2, i3}.

4.2 Tensorization Method

Examples of unstructured data include video data and audio data, while semi-structured data are composed of XML documents, ontology data, etc. Representa-tives of structured data are numbers and character strings stored in relational database. In this paper, video clip, XML document and GPS data are em-ployed to illustrate the tensorization process.

Red Green Blue One frame Width H ei g h t Fra_m es Iw Ih If Ic

Fig. 5. Represent video clip as four-order tensor.

Video data can be represented as four-order tensor or three-order tensor. To represent a video clip of MPEG-4 format, 25 frames per second, 768 ×576 resolution and RGB color space, a four-order tensor RIf×Iw×Ih×Ic _{is adopted with} _I

f, Iw, Ih, Ic

indicat-ing frame, width, height and color space. For in-stance, a 750-frame MPEG-4 video clip with resolu-tion of 768 ×576 and RGB color can be tensorized as R750×768×576×3_{. In some applications, RGB color}

is usually transformed to gray level using equation Gray= 0.299R+ 0.587G+ 0.114B, and the represen-tation is replaced by a three-order tensorR750×768×576. Fig. 5 shows the process of transforming a video clip to a four-order tensor.

Extensible Markup Language (XML) is semi-structured. Fig. 6 shows a simple XML document with seven elements and one attribute. The elements

2 3 6 5 4 10 9 Root Element <University> Element <Student> Attribute <Category> Element <Name> Element <Research> Element <ID> Element <Area> Element <Focus> Text 20128803 Text Liang Chen Text …… Text …… 1 7 8 11 12 1 2 3 <?xml version='1.0' encoding='UTF-8' ?> <University> <Student Category=‘doctoral'> <ID>20128803</ID> <Name>Liang Chen</Name> <Research> <Area>Internet of Things</Area> <Focus>Architecture;Sensor Ontology</Focus> </Research> </Student> </University> (a) (b) (c) (d) Ier Iec Ien

Fig. 6. Represent XML document data as a three-order tensor; (a) gives an initial XML document, (b) is the parsed tree, (c) shows the relationships between elements, and the three-order tensor is illustrated in (d).

contain tags and contents both consisting of characters from unicode repertoire. An XML document has a hierarchical structure and can be parsed as a tree. Fig. 6(b) is the parsed tree of Fig. 6(a). XML Document can be tensorized as a three-order tensor, where Ier

and Iec indicate the row and column orders of the

markup matrix, andIen represents the content vector

order. For example, the XML document in Fig. 6(a) is tensorized as T ∈R12×12×28, where 28 is the length of element ‘Focus’. Relationships among element, at-tribute and text are represented as numbers. In Fig. 6(c), number 1 is used to indicate the parent-child relationship. Iy × Ix It Iid Iid Iname = Iy Iid Ix It Iname

Record StudentID Longitude Latitude Time

1 2 3 30.51989529 114.41225837 07-28 10:36:15 D20128803 D20128803 D20128803 114.41209096 30.51987968 07-28 10:36:25 114.41194219 30.51992848 07-28 10:36:35 . . . . . .

Record StudentID StudentName

1 D20128803 . . . Liang Chen . . . . . .

Fig. 7. The upper table is modeled as a four-order tensor, the lower table is modeled as a two-order sub-tensor, and the two sub-tensors are unified as a five-order tensor.

Relational database is widely used to manage struc-tured data. In database table, simple fields with

(5)

num-ber or character string type can be modeled as a matrix. For complex field, e.g. BLOB, new orders are needed for representation. In Fig. 7, the structured GPS data and student data are unified as a five-order tensor.

4.3 Unified Tensor Representation Model

Big data are composed of unstructured datadu,

semi-structured data dsemi and structured data ds. Due to

the requirement of processing all types of heteroge-nous data, a unified data tensorization operation is performed using the following equation

f : (du∪dsemi∪ds)→_|Tu∪Tsemi_{z ∪T_}s T

. (8)

With Eq. (7) and Eq. (8), du, dsemi and ds are

trans-formed to subtensorsTu,TsemiandTswhich will later

be integrated as a unified tensor T. For example, on the basis of transformed video clip, XML document and structured tables as described in Figs. 5−7, the final tensor is consequently obtained as follows,

T ∈RIt×Is×Iu×Iw×Ih×Ier×Iec×Ien×Iid×Ina_. ₍₉₎ In Eq. (9), orderIf is identical to orderIt, orderIx,

Iy are combined to orderIs, and orderIc is

unneces-sary because gray level is adopted. Since too many orders may increase the decomposition complexity, less orders are preferable at the data representation stage.

An element of the ten-order tensor in Eq. (9) is described as an eleven-tuple

e= (T I, SP, U, W, H, ER, EC, EN, ID, N A, V), (10) where T I, SP and U refer to the fixed order time, space and user, W and H denote the orders from video data,ER, ECandENare XML document char-acteristics,IDandN Aare for GPS data, andV is the value of elemente. Such type of tuples generated from heterogeneous tensor are usually sparse, and only the nonzero elements are essential for storage and computation. The generalized tuple formate according to Eq. (6) is defined as

e= (T I, SP, U, i1, ..., iP, V). (11)

Fig. 8 illustrates the extensible order tensor model from another point of view. The fixed part containing T I,SP and U is seen as an overall layer, while the extensible part is deemed as an inner layer. The tensor T is simplified as a two layer model where the inner model is embedded to the three-order (It×Is×Iu)

overall model. Using the tensorization method, the heterogeneous data are modeled as sub-tensors that are inserted to the two-layer model to generate the unified tensor.

I

_t

I

_s

I

_u Video XML Document GPS

Fig. 8. Visualization of the two-layer model for data representation.

4.4 Tensor Order and Tensor Dimension

As tensor order and tensor dimension are two key concepts for data representation, we give a brief com-parison between them. Tensor T ∈ RI1×I2×...×IP _has P orders, and order i(1≤i≤P) hasIi dimensions.

AP-order tensor can be unfolded to P matrices. For the mode-iunfolded matrixT(i), the number of rows

is equal to Ii, while the number of columns is equal

to ∏

1_≤j_≤P, j_̸=i

Ij. In many big data applications, it is

impractical to store all dimensions of big data which contain redundancy, uncertainty, inconsistency and incompleteness, thus it is essential to extract valuable core data. During the extraction of core data set, the number of tensor orders remains the same while the dimensionality is significantly reduced.

5 I

NCREMENTAL

T

ENSOR

D

IMENSIONALITY

REDUCTION

A novel method is proposed for dimensionality re-duction on streaming data in this section. Firstly, two problems of tensor decomposition are defined. Then two equivalence theorems are proven and an In-cremental High-Order Singular Value Decomposition (IHOSVD) method that can efficiently compute the core data sets on streaming data is presented. Finally, complexity and accuracy of the proposed method are discussed.

5.1 Problems Definition

Two important problems related to incremental ten-sor dimensionality reduction are: (1) the recalculation problem; (2) the order inconsistency problem. They are formally defined below.

Problem 1: Tensor Decomposition Recalculation. LetS1denote the core tensor obtained from the

previ-ous tensorT1.T denotes a new tensor. CombiningT1

withT, we obtain T2=T1∪ T. According to Eq. (4),

the new core tensorS2 of new tensorT2 is computed

with

(6)

Decomposition recalculation occurs in Eq. (12) be-cause the previous decomposition results during com-puting core tensor S1 are not reused.

Problem 1 can be solved using Algorithm 1 and Algorithm 2 that are designed with the proposed recursive incremental singular value decomposition method.

Problem 2: Tensor Order Inconsistency. Assume

T1,S2andT2are defined as previous tensor, new core

tensor and new combined tensor, to computeS2with

Eq. (4), the row number of the truncated orthogonal matrix U must be consistent with dimensionality of the tensor orderIn. However, one order

dimensional-ity of the combined tensorT2 is not equal to the row

number of truncated orthogonal matrixU.

For instance, let T1 ∈ R2×2×2 be a three-order

tensor, T1(1) ∈ R2×4, T1(2) ∈ R2×4 and T1(3) ∈ R2×4

are three unfolded matrices ofT1. Given a new tensor

χ ∈ R2×2×2, combining it with previous tensor T1

along the third order I3, we obtainT2∈R2×2×4. The

third order dimensionality of T2 is 4, while the row

number of the truncated orthogonal basis computed from matrixT1(3)is 2. This leads to order

inconsisten-cy.

In this paper, Theorem1, Theorem2and Algorithm 3 are presented to address Problem2.

5.2 Basis and Core Tensor Equivalence Theorem-s

The left singular vector matrix U plays a key role on dimensionality reduction and data reconstruction. Similarly, the truncated k-rank orthogonal unitary basesU1, U2, ..., UP of the unfolded matrices construct

the most basic coordinate axes of a P-order tensor. For heterogeneous big data dimensionality reduction, the major difficulty lies in computing the bases on variable dimension. Our approach extends dimension to fixed length and finds out equivalent basis. In this paper, two theorems are presented and proven to support our approach.

Theorem 1: Basis Equivalence of SVD. Let M1

be a m1 by n matrix, and M2 be a m2 by n matrix

whose left m1 columns contain matrixM1 and right

m2−m1 columns are zeros. Namely, M2 = [M1 0],

M1∈Rm1×n, M2∈Rm2×n, m1 < m2. If the singular

value decompositions of matrix M1 and matrix M2

are expressed as

M1=U1Σ1V1T, M2=U2Σ2V2T, (13)

Then, the unitary orthogonal basisU1is equivalent to

U2.

Proof.From Eq. (13), we obtain M2M2T= [M1 0]× [ M₁T 0 ] =M1M1T. (14) Consider M2×M2T=U2Σ2V2T×V2ΣT2U T 2 =U2(Σ2ΣT2)U T 2, (15) and M1×M1T=U1Σ1V1T×V1ΣT1U T 1 =U1(Σ1ΣT1)U T 1, (16) we obtain U1(Σ1ΣT1)U T 1 =U2(Σ2ΣT2)U T 2. (17)

Note that both sides of Eq. (17) are spectral decom-positions of two equal symmetric matrix. Addition-ally, the diagonal matrices Σ1ΣT1 and Σ2ΣT2 consist

of the eigenvalues of the equal matrix. According to the uniqueness characteristic of eigenvalues, Σ1ΣT1

and Σ2ΣT2 are equal. It can be concluded that U1

is equivalent to U2. The equivalence implies that U1

can be calculated by multiplying U2 with a series of

Elementary Matrix [9].

Based on Theorem1, the following two corollaries can be derived.

Corollary 1: Let M1 = [v1, v2, ..., vn], M2 =

[v1, v2, ...,0, ...,0, ..., vn], where vi is column vector,

then the two matrices have equivalent left singular vector bases. Corollary 2: Suppose M2 = [ M1 0 ] , then matrix M1and matrixM2have equivalent left singular vector

bases. With Corollary 2, the orthogonal basis U1 can

be obtained by trimming the bottom zeros of the orthogonal basisU2.

Theorem 1, Corollaries 1 and 2 are employed to prove Theorem2defined as follows. Before the proof, we introduce a special matrix which will be used in Theorem2.

Definition 6: Extension Matrix. An extension

ma-trix is defined as M = [ I 0 ] , M ∈RJp×Ip_{, J} p> Ip.

Multiply the P-order tensor T ∈ RI1×I2×...×Ip×...×IP with extension matrix M along order p, the dimen-sionality of this order is extended fromIp toJp.

Theorem 2: Core Tensor Equivalence of HOSVD. Let T and G be P-order tensors, where T ∈ RI1×I2×...×IP _and _G ∈ _RI1×I2×...×(lIp)×...×IP_, _l _{is a} non-negative integer. Define M as an extension ma-trix,M ∈RIp×(lIp)_{. Tensor}_T _and_G_satisfy

T =G×pM =G×n [ Ip 0lp ] .

Proof.Unfold tensorT and tensorGto P matrices T(1), T(2), ..., T(P), andG(1), G(2), ..., G(P). According

to Theorem1, Corollaries1 and 2, the corresponding unfolded matrices of tensorT andGhave equivalent left singular vector bases. Besides, thep-mode product of tensor T by matrices A, B posses the following properties

T×iA×jB=T×jB×iA, (18)

and

(7)

Employing Eq. (4), core tensorsST,SGare

calculat-ed with the following equations

ST =T×1U1T×2U2T×3...×PUPT, (20)

and

SG=G×1U1T×2U2T×3...×PUPT. (21)

With Eqs. (18)−(21), we obtain ST =T×1U1T×2U2T×3...×PUPT

= (G×pM)×1U1T×2U2T×3...×PUPT

=G×1U1T×2U2T×3...×PUPT×pM

=SG×pM.

(22)

Theorem 2 reveals that extending a tensor by padding zero elements will not transform the core tensor. After unified representation of big data, order number of the incremental tensor and the initial tensor are equal, but the dimensionality are different. Theo-rem 2 can be used to solve this problem by resizing dimensionality.

5.3 Incremental High Order Singular Value De-composition

We propose an IHOSVD method for incremental di-mensionality reduction on streaming data. IHOSVD method consists of three algorithms that are used for recursive matrix singular value decomposition and in-cremental tensor decomposition. The three algorithms are separately described in detail.

Algorithm1is a recursive algorithm with recursive function given in Eq. (23). During the running process, functionf will call itself (Step4) over and over again to decompose matrices Mi and Ci. Each successive

call reduces the size of matrix and moves closer to a solution until matrix M1 is reached finally, the

recursion stops and the function can exit.

f(Mi, Ci) = {

svd(M1), i= 1

mix(f(Mi−1, Ci−1), Ci), i >1 (23)

Algorithm 1 Recursive matrix singular value

decom-position,(U, Σ, V) =R−M Svd(Mi, Ci).

Input:

Initial matrixMi.

Incremental matrixCi.

Output:

Decomposition resultsU, S, V of matrix[MiCi].

1: if(i== 1)then 2: [U, Σ, V] =svd(M1). 3: else 4: [Uj, Σj, Vj] =R−M Svd(Mi−1, Ci−1). 5: [U, Σ, V] =mix(Mi−1, Ci−1, Uj, Σj, Vj). 6: end if 7: return U, S, V.

Algorithm 1 calls function mix (Step 5) to merge column vectors of the incremental matrix with the

decomposed components of initial matrix. Additional vectors are projected to the orthogonal bases and the coordinates are combined to the singular values. Detailed procedures of function mix are described in Algorithm2.

For most tensor unfolding, the number of rows is less than the number of columns. For such type of matrices, Algorithm 1 can efficiently compute the singular values and singular vectors by splitting the columns for recursive decomposition.

L K J Projec tion Proje_ction Coordinates Coordinates J K L (a) (b) J I × × × × C U U ∑ V U U′ ∑′ _V_′

Fig. 9. (a) Incrementally incoming column vectors are projected on unitary orthogonal bases; (b) The middle quasi-diagonal matrix is diagonalized and the previous singular vector matrices are updated.

Algorithm 2 Merge incremental matrix

with decomposition results, (U, Σ, V) = mix(Mi₋1, Ci₋1, Uj, Σj, Vj).

Input:

Initial matrixMi₋1 and incremental matrix Ci₋1.

Decomposition resultsUj, Σj, Vj of matrixM.

Output:

New decomposition results U, Σ, V.

1: ProjectCi−1 on the orthogonal space spanned by

Uj,L=UjT×Ci−1.

2: Compute H which is orthogonal to Uj, H =

Ci−1−Uj×L.

3: Obtain the unitary orthogonal basisJ from matrix H.

4: Compute the coordinates of matrix H,K=JT_×

H.

5: Execute SVD on the new matrix [U J], [U′, Σ′, V′] =svd([U J]).

6: Obtain new decomposition results,([U J], U′)→ U, Σ′→Σ, V′→V.

7: return U, S, V.

Algorithm 2 applies SVD updating [3] technique for incrementally matrix factorization. The additional columns in matric Ci−1 are projected on the

uni-tary orthogonal bases of previous matrixMi−1 (Step

1). Some column vectors are linear combination of orthogonal unitary bases Uj, others are components

orthogonal to the space spanned byUj. As illustrated

(8)

to obtain the bases Uj, J and coordinates L, K. The

operations are implemented as Steps 2 ∼ 4 . The column space of singular vector matrixU are spanned by the direct sum of the above two unitary orthogonal bases as follows

CS(U) =span(Uj⊕J). (24)

Combining the coordinates with the previous singu-lar values, we obtain a quasi-diagonal sparse matrix which is easy for decomposition. The new equation consisting of the above orthogonal bases and coordi-nates is defined as [Mi−1, Ci−1] = [Uj, J] [ Σj L 0 K ] [ V 0 0 I ]T . (25) Let U¯ and V¯ denote the unitary orthogonal bases of the quasi-diagonal matrix in Eq. (25), the updated singular vector matrices are

U = [Uj J]×U , V¯ = [ V 0 0 I ] ¯ V . (26)

Eq. (4) suggests only the left singular vector matrix U is essential for tensor decomposition. Therefore, computation of matrix V can be omitted in Step 6 of Algorithm2.

Employing the above two algorithms, we propose Algorithm 3 for incrementally computing the core tensor. In this algorithm, extension matrix is used to ensure order consistency (Step1). Unitary orthogonal basesU(1), ..., U(P)are updated from Step2 to Step4,

as well as the new core tensor S is obtained in Step 6. For demonstration purpose, Fig. 10 shows a simple example with a three-order tensor.

T 1.Extension 2.HO-SVD 3.Extension 4.Unfolding 5.Update U1,U2,U3 1 U 2 U 3 U S 6. New U1,U2 U3,S T T T T(1) T₍₂₎ T(3)

Fig. 10. Example of incremental tensor decomposition, truncated orthogonal basesU1, U2, U3 of new tensor

T are updated incrementally.

5.4 Complexity and Approximation Accuracy 5.4.1 Time Complexity

Execution time of the proposed IHOSVD method con-sists of matrix unfolding, incremental singular value decomposition of each unfolded matrices, and prod-uct of a tensor by the truncated bases. Let T imeunf,

T imeisvd and T imeprod denote the time used by the

Algorithm 3 Incremental tensor singular

value decomposition, (S, [U, Σ, V]new) =

I−T Svd(χ, T, [U, Σ, V]initial).

Input:

New tensor χ∈RI1×I2×...×IP_. Previous tensor T ∈RI1×I2×...×IP_.

Previous unfolded matrices SVD results [U, Σ, V]initial.

Output:

New truncated SVD results[U, Σ, V]new.

New core tensorS.

1: Extend tensorχ and tensorT to identical dimen-sionality.

2: Unfold new tensorχ to matricesχ(1), ..., χ(P).

3: Call algorithmR−M Svdto update above unfold-ed matrices.

4: Truncate the new orthogonal bases.

5: Combine new tensor χwith initial tensor T. 6: Obtain new core tensorS withn-mode product.

7: return S, and[U, Σ, V]new.

above processes respectively, the total time consump-tionT imesatisfies

T ime=T imeunf+T imeisvd+T imeprod. (27)

Tensor unfolding is a simple transformation withO(1) time complexity.T imeisvdis equal toT ime1+T ime2+

...+T imeP = ∑P

i=1T imei, where T imei refers to the

time consumed by unfolded matrixT(i). According to

Eq. (23), timeT imeisvd can be obtained with

T ime(i) = {

C1, i= 1

T ime(i−1) +C2, i >1, (28)

whereC1and C2 are constants. The recursive calling

process first adds columns and then updates them with the previous decomposition results. The time complexity of decomposing one unfolded matrix is O(k2_n₎_{, where}_k_{refers to the number of the truncated}

left singular vectors. For a truncated orthogonal basis U with k column vectors, time complexity of the product of a tensor by a matrix isO(k2_n_{). To}

decom-pose ap-order tensorT withpunfolded matrices, the time complexity of the proposed IHOSVD method is O(1) +O(pk2n) +O(pk2n), namelyO(pk2n).

5.4.2 Memory Usage

Let M emu denote the memory used to store all

truncated orthogonal bases,M emr−msvdandM emmix

refer to the memory usages for recursive process in Algorithm 1, then the total memory used by the proposed IHOSVD method is defined as

M em=M emu+M emr−msvd+M emmix. (29)

Complexity ofM emuis equal toO(kn). To

incremen-tally compute the core tensor, IHOSVD method needs to keep all the truncated orthogonal bases, and the

(9)

memory usage is∑Pi=1kiIi. According to Eq. (23), the

needed memory during the recursive process is equal to

Mi+Ci+Mi−1+Ci−1+...+M1+C1. (30)

Complexity of the above memory usage is O(kn). Therefore, the complexity of total memory usage is O(kn) +O(kn), i.e.O(kn). For ap-order tensorT with punfolded matrices, the complexity isO(pkn). 5.4.3 Approximation Accuracy

Reconstruction error between initial tensor and ap-proximate tensor can be exactly measured with Frobe-nius Norm [10] as T−Tˆ F = ( I1 ∑ i1=1 , ..., IP ∑ ip=1 (ai1,...,ip−ˆai1,...,ip) 2 )12_. (31) For the unfolded matrix T(i) of initial tensor T, the

approximate matrix isTˆ(i)=UiΣiViT. The

reconstruc-tion error is caused by approximareconstruc-tion of all unfolded matrices. To clearly analyze tensor dimensionality reduction degree and tensor approximation degree, we present two ratios.

Definition 7: The Dimensionality Reduction Ratio

of tensor T is defined as ρ= nnz(S) + N ∑ i=1 nnz(Ui) nnz(T) , (32)

where S denotes the core tensor, and Ui is the

mode-itruncated orthogonal basis. The core data sets of tensor T are composed of S (core tensor) and U1, U2, ..., UP. Because only nonzero elements of the

core data sets are stored, ratioρcan accurately reflect the dimensionality reduction degree.

Definition 8: The Reconstruction Error Ratio of

tensorT is defined as e= T−Tˆ F ∥T∥_F . (33) Ratioereflects the degree of reconstruction error with tensor Frobenius Norm. In this paper, the pair (ρ, e) is employed to describe the dimensionality reduc-tion degree and reconstrucreduc-tion error degree. Obvi-ously, the ratio ρis inversely proportional to ratio e. Computation accuracy is important for tensor data approximation, and in most applications, HOSVD type algorithms can find a better approximation. To obtain higher accuracy, High-Order Orthogonal Iter-ation (HOOI) [11] method can be utilized to find the best rank approximation. The High-Order Singular Value Decomposition (HOSVD) and the Higher Order Orthogonal Iteration (HOOI) of Tensor can be viewed as extensions to the Singular Value Decomposition (SVD).

6 C

ASE

S

TUDY

In this section, we illustrate the proposed unified data representation model and incremental dimensionality reduction method with an Intelligent Transportation

case. The test data used in experiments consist of unstructured video data collected with fixed cameras and mobile phones, semi-structured XML documents about traffic information, and structured trajectory data. After dimensionality reduction, the core tensor and the truncated bases are small to store, but accurate and fast for reconstruction of big data.

6.1 Demonstration of Tensor Unfolding

We construct a five-order tensorT ∈R480×640×3×2×3 by extracting three frames from unstructured video clip and three users from semi-structured XML doc-ument. Fig. 11(a) shows the five unfolded matrices of tensor T. The five orders represent height, width, color space, time and user respectively.

T T(1) T(2) T(3) T(4) T(5) Video User

(a) Five unfolded matrices of five-order tensor. Incremental Tensor Data

Previous Tensor Data

It Is Iu Ier … Order Inconsistency (b) Incremental data on unfolded matrices of eight-order tensor.

Fig. 11. Heterogenous tensor unfolding and incremen-tal tensor unfolding.

To demonstrate incremental tensor unfolding, an eight-order tensor T ∈ RIt×Is×Iu×Ih×Iw×Ic×Iec×Ier _is constructed. Incremental data are appended along the time order It. Unfolded matrices of the combined

new tensor (initial tensor and incremental tensor) are shown in Fig. 11(b). Order inconsistency of the new tensor occurs in orderIt, because the incremental data

are appended as rows on the bottom of the unfolded matrix.

(10)

Fig. 11(a), Fig. 11(b) and Fig. 8 in Section 4 illus-trate the tensor model from different viewpoints, and demonstrate how the heterogeneous data are stacked together. Fig. 8 demonstrates the procedure of em-bedding unstructured video data and semi-structured XML document to a three-order tensor, as well as Fig. 11(a) and Fig. 11(b) show the inner elements of the unified tensor model.

6.2 Dimensionality Reduction and Approximation Error

There exists a tradeoff between dimensionality re-duction and approximation error. Fig. 12 shows two video frames reconstructed from the above five-order tensor under three different approximation error ratio, namely 4%, 7%, and 24%. Fig. 13(a) plots the two ratios together, and illustrates that the reconstruction error ratio increases gradually as the dimensionali-ty reduction ratio decreases. The core data sets are composed of core tensorS and truncated orthogonal basesU1, ..., U5. Fig. 13(b) shows their proportions to

the dimensionality reduction ratio. Generally, the pro-portion of the core tensor is bigger than the truncated bases. Initial Tensor 0.4% e= e=7% Approximate Tensor 24% e= 0% e=

Fig. 12. Video frames reconstructed with different ap-proximation error ratios.

Diverse data types can result in different dimen-sionality reduction ratios and approximation error ratios. With repeated experiments on video clips, XML documents and GPS data, the results show that the core set containing18% elements can guarantee 93% accuracy in general. In practice, the balance between dimensionality reduction and computation accuracy is determined by the application requirement.

6.3 Time and Memory Comparison

Compared with the general High Order Singular Val-ue Decomposition method, the proposed incremental High Order Singular Value Decomposition method is efficient and memory saving. To evaluate the two de-composition methods, we perform them in computers of Intel Core (TM) i5 CPU at 3.2 GHZ with total 4 cores and8GB RAM. We divide the unified tensor to four blocks and normalize the tensor size as well as the decomposition time for better comparison. During the process of dimensionality reduction, the gener-al HOSVD method integrates the additiongener-al tensor

1 2 3 5% 15% 25% 35% 45% Experiment No. Ratio

Dimensionality Reduction Ratio (ρ) Reconstruction Error Ratio (e)

41% 18% 5% 24% 7% 4% (a) 1 2 3 5% 15% 25% Experiment No. Proportion Core Tensor Truncated U 1 Truncated U 2 Truncated U 3,U4,U5 (b)

Fig. 13. (a) Tradeoff between dimensionality reduction and reconstruction error; (b) Proportion of the core tensor to truncated orthogonal bases.

blocks with previous tensor blocks to generate a new tensor which is then repeatedly decomposed. Differ-ent from this type of repeated HOSVD method, the incremental HOSVD method updates the truncated orthogonal bases and dynamically computes the core tensor. Fig. 14 demonstrates that the decomposition time of the repeated HOSVD method is greater than the incremental HOSVD method. Additionally, de-composition time of the incremental HOSVD method increases more gently than the repeated HOSVD method from the normalized tensor size 0.25. As the normalized tensor size grows beyond 0.75, the repeated HOSVD method runs out of memory while the incremental HOSVD method continues to run. From theoretical point of view, with more orthogonal bases are appended to the left singular vector matrix, the middle quasi-diagonal contains less orthogonal columns, and the time consumption during the diago-nalization process decreases. In brief, the incremental HOSVD method is more efficient because it projects additional tensor unfolding to previous truncated or-thogonal bases rather than directly execute the orthog-onalization procedure.

(11)

0 0.25 0.5 0.75 1 0 0.2 0.4 0.6 0.8 1

Normalized Tensor Size

Nomalized Decomposition Time

Incremental HOSVD Repeated HOSVD

Out of Memory

Fig. 14. Comparison between the repeated HOSVD method and the incremental HOSVD method.

7 R

ELATED

W

ORK

This section reviews related works on data represen-tation and high order singular value decomposition.

Data Representation: Big data are composed of

unstructured, semi-structured and structured data. In particular, the multimedia as an unstructured data, is mostly encoded as MPEG4 and H.264. MPEG-4 [12] is a method for defining compression of audio and visual digital data. H.264 [13] is a widely used standard for video compression. The semi-structured Extensible Markup Language (XML) [14] is a flexible text format that defines a set of rules for Encoding documents. XML is both for human-readable and machine-readable. The characteristics making up an XML document are divided into markup and content. Kim and Candan [15] proposed a tensor-based rela-tional data model that can process multi-dimensional structured data. Ontology, such as resource descrip-tion framework (RDF) [16] and web ontology lan-guage (OWL) [17], is playing an ever important role in the exchange of a wide variety of data.

Higher Order Singular Value Decomposition: A

tensor [6, 7] is the generalisation of a matrix and usu-ally called multidimensional array. Tensor is a more effective data representation model from which valu-able information can be extracted using high order singular value decomposition (HOSVD) [8] method. Because HOSVD imposes orthogonal constraints on the truncated column bases, it may be considered as a special case of the commonly used TUCKER [18] decomposition. Although low rank truncation of the HOSVD is not the best approximation of the initial data, it is considered to be sufficiently good for many applications. Analysis and mining of data with HOSVD has been adopted in many applications such as tag recommendations [19, 20], trajectory indexing and retrieval [21], hand-written digit classification [22].

Studies of data representation and dimensionality reduction have been reported in literatures. However,

unified model for heterogenous data representation has been neglected, as well as decomposition prob-lems during incremental data processing have not been considered. The contributions of this paper are using a unified tensor model to represent the large scale heterogeneous data and developing an efficient approach for extracting the high-quality core tensor which is small but contains valuable information.

8 C

ONCLUSION

This paper aims at representing and processing the large scale heterogeneous data generated from multi-ple sources. Firstly, we present a unified tensor-based data representation model that can integrate unstruc-tured, semi-structured and structured data. Secondly, according to the proposed model, an incremental high order singular value decomposition (IHOSVD) method is proposed for dimensionality reduction on big data. We prove two theorems that can solve the problem of decomposition recalculation and order in-consistency. Finally, an intelligent transportation case is investigated for evaluating the method. Theoret-ical analyses and experimental results of the case study provide the evidences that the proposed data representation model and incremental dimensionality reduction method are promising, and they pave a way for efficiently mining and analyzing in big data applications.

9 A

CKNOWLEDGMENT

This work was supported by the National Nature Sci-ence Foundation of China under Grant61201219and by the Fundamental Research Funds for the Central Universities, HUST: CXY13Q017and 2013QN122.

R

EFERENCES

[1] I. F. Cruz and H. Xiao, “Ontology Driven Da-ta Integration in Heterogeneous Networks,” in

Complex Systems in Knowledge-Based Environments: Theory, Models and Applications. Springer, 2009, pp. 75–98.

[2] H. Abdi and L. J. Williams, “Principal Com-ponent Analysis,”Wiley Interdisciplinary Reviews: Computational Statistics, vol. 2, no. 4, pp. 433–459, 2010.

[3] M. Brand, “Incremental Singular Value Decom-position of Uncertain Data with Missing Values,” in Computer Vision ECCV 2002. Springer, 2002, pp. 707–720.

[4] J. Sun, D. Tao, and C. Faloutsos, “Beyond Streams and Graphs: Dynamic Tensor Analysis,” inProc. of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2006, pp. 374–383.

(12)

[5] E. Henry, J. Hofrichter et al., “Singular Value Decomposition: Application to Analysis of Ex-perimental Data,” Essential Numerical Computer Methods, vol. 210, pp. 81–138, 2010.

[6] C. M. Martin, “Tensor Decompositions Workshop Discussion Notes,” American Institute of Mathe-matics, 2004.

[7] T. G. Kolda and B. W. Bader, “Tensor Decompo-sitions and Applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009.

[8] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A Multilinear Singular Value Decomposition,”

SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1253–1278, 2000.

[9] H. Anton,Elementary Linear Algebra. Wiley. com, 2010.

[10] C. Meyer, Matrix Analysis and Applied Linear Al-gebra Book and Solutions Manual. SIAM, 2000, vol. 2.

[11] L. De Lathauwer, B. De Moor, and J. Vandewalle, “On the Best Rank-1 and Rank-(R 1, R 2,..., Rn) Approximation of Higher-Order Tensors,”

SIAM Journal on Matrix Analysis and Applications, vol. 21, no. 4, pp. 1324–1342, 2000.

[12] I. E. Richardson,H. 264 and MPEG-4 Video Com-pression: Video Coding for Next-Generation Multime-dia. Wiley. com, 2004.

[13] D. Marpe, T. Wiegand, and G. J. Sullivan, “The H. 264/MPEG4 Advanced Video Coding Stan-dard and Its Applications,”IEEE Communications Magazine, vol. 44, no. 8, pp. 134–143, 2006. [14] E. Van der Vlist,XML Schema: The W3C’s

Object-Oriented Descriptions for XML. O’Reilly Media, Inc., 2011.

[15] M. Kim and K. S. Candan, “Approximate Tensor Decomposition within a Tensor-Relational Alge-braic Framework,” inProc. of the 20th ACM Inter-national Conference on Information and Knowledge Management. ACM, 2011, pp. 1737–1742. [16] I. Horrocks, P. F. Patel-Schneider, and

F. Van Harmelen, “From SHIQ and RDF to OWL: The Making of a Web Ontology Language,” Web Semantics: Science, Services and Agents on the World Wide Web, vol. 1, no. 1, pp. 7–26, 2003.

[17] D. L. McGuinness, F. Van Harmelenet al., “OWL Web Ontology Language Overview,” W3C Rec-ommendation, vol. 10, p. 10, 2004.

[18] L. R. Tucker, “Some Mathematical Notes on Three-Mode Factor Analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311, 1966.

[19] P. Symeonidis, A. Nanopoulos, and Y. Manolopoulos, “Tag Recommendations Based on Tensor Dimensionality Reduction,” in

Proc. of the 2008 ACM Conference on Recommender Systems. ACM, 2008, pp. 43–50.

[20] R. Wetzker, C. Zimmermann, C. Bauckhage, and S. Albayrak, “I Tag, You Tag: Translating Tags

for Advanced User Models,” in Proc. of the 3rd ACM International Conference on Web Search and Data Mining. ACM, 2010, pp. 71–80.

[21] Q. Li, X. Shi, and D. Schonfeld, “A General Framework for Robust HOSVD-Based Indexing and Retrieval with High-Order Tensor Data,” in

Proc. of the 36th IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2011, pp. 873–876.

[22] B. Savas and L. Eld´en, “Handwritten Digit Clas-sification Using Higher Order Singular Value De-composition,” Pattern Recognition, vol. 40, no. 3, pp. 993–1003, 2007.

Liwei Kuang is currently studying for the PhD degree in School of Computer Science and Technology at Huazhong University of Science and Technology, Wuhan, China. He received the master’s degree in School of Computer Science from Hubei University of Technology, Wuhan, China, in 2004. From 2004 to 2012, he was a Research Engi-neer with FiberHome Technologies Group, Wuhan, China. His research interests include Big Data, Pervasive Computing and Cloud Computing.

Fei Hao is an assistant professor in Huazhong University of Science and Tech-nology. He received the B.S. and M.S. de-grees in School of Mathematics and Comput-er EngineComput-ering from Xihua UnivComput-ersity, Cheng-du, China, in 2005 and 2008, respectively. He was a research assistant at Korea Ad-vanced Institute of Science and Technology and Hangul Engineering Research Center, Korea. He has published over 30 research papers in international and national Journals as well as conferences. His research interests include social comput-ing, big data analysis and processing and mobile cloud computing.

Laurence T. Yangreceived the B.E. degree in Computer Science and Technology from Tsinghua University, China and the PhD de-gree in Computer Science from University of Victoria, Canada. He is a professor in the School of Computer Science and Technolo-gy at Huazhong University of Science and Technology, China, and in the Departmen-t of CompuDepartmen-ter Science, SDepartmen-t. Francis Xavier University, Canada. His research interests include parallel and distributed computing, embedded and ubiquitous/pervasive computing, and Big Data. His research has been supported by the National Sciences and Engi-neering Research Council, and the Canada Foundation for Innova-tion.

(13)

Man Linreceived the B.E. degree in Com-puter Science and Technology from Ts-inghua University, China,1994. She received the Lic. and Ph.D degrees from the Depart-ment of Computer Science and Information at Linkopings University, Sweden, in 1997 and 2000, respectively. She is currently an associate professor in Computer Science at St. Francis Xavier University, Canada. Her research interests include system design and analysis, power aware scheduling, optimiza-tion algorithms. Her research is supported by NSERC (Naoptimiza-tional Sciences and Engineering Research Council, Canada) and CFI (Canada Foundation for Innovation).

Changqing Luoreceived his B.E. and M.E. degree from Chongqing University of Posts and Telecommunications in2004and2007, respectively, and the Ph.D. from Beijing Uni-versity of Posts and Telecommunications in 2011, all in Electrical Engineering. After the graduation, he joined the school of Comput-er Science and Technology, Huazhong Uni-versity of Science and Technology in2011, where he currently works as an Assistant Professor. His current research focuses on algorithms and optimization for wireless networks, cooperative com-munication, green comcom-munication, resouce management in hetero-geneous wireless networks, and mobile cloud computing.

Geyong Min is a Professor of High Per-formance Computing and Networking in the Department of Mathematics and Computer Science within the College of Engineering, Mathematics and Physical Sciences at the University of Exeter, United Kingdom. He received the PhD degree in Computing Sci-ence from the University of Glasgow, United Kingdom, in2003, and the B.Sc. degree in Computer Science from Huazhong Universi-ty of Science and Technology, China, in1995. His research interests include Next Generation Internet, Wireless Communications, Multimedia Systems, Information Security, High Performance Computing, Ubiquitous Computing, Modelling and Per-formance Engineering.