Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

(1)

AI Forum 2015; Kaohsiung, Taiwan; June 5-6, 2015

Jun Wang

Parallel Data Selection Based on Neurodynamic Optimization

in the Era of Big Data

Department of Mechanical and Automation Engineering

The Chinese University of Hong Kong Shatin, New Territories, Hong Kong

School of Control Science and Engineering

Dalian University of Technology Dalian, Liaoning, China

[email protected]

(2)

Outline

 Introduction

 Problem formulations

 kWTA networks

 Simulation results

 Sorting application

 Filtering Application

 Concluding remarks

 Future works

 References

(3)

Multiple Winners-take-all Operation

 The k-winners-take-all (kWTA) operation is to select the k largest inputs out of n inputs

(1 ≤ k < n).

 kWTA is a general rule in nature and society.

 kWTA has widespread applications in data mining, machine learning, classification, clustering, computer vision, etc.

 It is a common building block for many

models such as ART and SOM.

(4)

k Winners-take-all Operation

As the number of inputs increases and/or the selection process should be operated in real time, parallel algorithms and hardware

implementation are desirable.

(5)

Parallel k Winners-take-all Operation

k u ₁ u ₂ u _n

x ₁ x ₂ x _n

(6)

Problem Formulations

"The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or

experimental skills. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination

and marks real advances in science."

Albert Einstein

(7)

Problem Formulations

(8)

Problem Formulations (cont’d)

(9)

Problem Formulations (cont’d)

(10)

Problem Formulations (cont’d)

(11)

Model Selection and Redesign

 The kTWA problem has been formulated as an equivalent linear and quadratic

programming problems.

 All existing neurodynamic optimization

models for linear and quadratic programming can be applied.

 Now the question is: which is the best in

terms of model complexity and computational

efficiency?

(12)

QP-based Primal-Dual Network

(13)

QP-based Projection Network

(14)

LP-based Projection Network

(15)

QP-based Simplified Dual Net

(16)

LP-based Discontinuous Network

(17)

Discontinuous Activation Function

(18)

Convergence Conditions

(19)

Simulation Results

(20)

Simulation Results (cont’d)

(21)

Simulation Results (cont’d)

(22)

QP-based Discontinuous Network

(23)

Discontinuous Activation Function

(24)

Convergence Condition

(25)

Simulation Results

(26)

Simulation Results(cont’d)

(27)

Simulation Results (cont’d)

(28)

QP-based Improved Dual Network

(29)

Model Comparisons

Model Number of layer(s) Number of neuron(s) Number of connections

LP-based primal-dual

network 4 3n + 1 6n + 2

QP-based primal-dual

network 4 3n + 1 6n + 2

LP-based projection network 2 n + 1 2n + 2

QP-based projection

network 2 n + 1 2n + 2

QP-based simplified dual

network 1 n 3n

LP-based discontinuous net 1 n 2n

QP-based discontinuous

network 1 n 2n

QP-based improved dual

network 1 1 n

(30)

Simulation Results

(31)

Discrete-time Counterpart

(32)

Activation Function with High

Gain

(33)

A New Model

(34)

Desirable Properties

 The kWTA model with Heaviside activation function has been proven to be globally

stable and globally convergent to the kWTA solutions in finite time.

 Derived lower and upper bounds of convergence time are respectively

 It essentially solves the dual problem of the

linear programming formulation.

(35)

Convergence Time

 As a linear system with a discontinuous bias, the converence time of the kWTA network can be computed as a function of input vector u.

 The expectation and variance of the convergence time can also be computed, based on Binomial

distribution, as functions of initial states.

Y. Xiao, Y. Liu, C.-S. Leung, J. P.-F. Sum, K. Ho, “Analysis on the convergence time of dual neural network-based kWTA,” IEEE Trans. Neural Networks and Learning Systems, vol. 23, pp. 676-682, 2012.

J. P.-F. Sum, C.-S. Leung, K. Ho, “Effect of Input Noise and Output Node Stochastic on Wang's kWTA,” IEEE Trans. Neural Networks and Learning Systems, vol. 24, pp.

1472 - 1478 , 2013.

(36)

Reformulated Problem

(37)

Reformulated Problem (cont’d)

(38)

Reformulated Problem (cont’d)

(39)

Simulation Results with

Randomized Integer Inputs

(40)

Simulation Results with Low-

Resolution Inputs

(41)

Initial State Estimation

 Although the state of kWTA model is

guaranteed to be globally convergent in finite time from any initial state, prior information is helpful to initialize the state closely to the

steady state.

 Obviously, the steady state of y ∈ (u _k+1 , u _k ]

depends on the distribution of u ₁ , u ₂ , . . . , u _n ,

as well as the values of k and n.

(42)

Initial State Estimation (cont’d)

 General distribution

 Uniform distribution

 Normal distribution

(43)

Initial State Estimation (cont’d)

(44)

Uniform Distribution

(45)

Normal Distribution

(46)

Simulation Results (convergence

time) with Infinity Gain

(47)

Simulation Results (convergence

time) with Unity Gain

(48)

Discrete-time Version

(49)

Simulation Results ( n = 10 ⁶ , k = n /2)

(50)

Simulation Results ( n = 10 ⁶ , k = n /2)

(51)

Monte Carlo Simulation Results

(52)

Monte Carlo Simulation Results

(53)

Estimated Complexity (uniform)

(54)

Estimated Complexity (normal)

For data with a dimension of 10 ¹⁰⁰

(1 Googol), it would need about 8.44

iterations on average!

(55)

Histograms of Convergence Iterations

(56)

Histograms of Convergence Iterations

(57)

Histograms of Convergence Iterations

(58)

Histograms of Convergence Iterations

(59)

Sorting Operation

 Sorting is a fundamental process to arrange data in an order according to their values.

 It accounts for 25% of data processing time (Knuth).

 For sorting with large number or high

dimensional data, parallel sorting approaches are more desirable.

 Numerous sorting algorithms and models

have been developed with varied efficiencies.

(60)

Parallel Sorting Representation

For example, a permutation matrix:

(61)

Parallel Sorting Representation (cont’d)

A modified version:

(62)

Logic Reversal

 A simple logic can be used to flip over the

redundant '1' elements after the first '1' in

each row; i.e.,

(63)

Parallel Sorting based on k WTA

 Let each kWTA network computes one

column of the above sorting matrix from left to right with k increasing from 1 to n - 1.

 Specifically, a WTA network with a single state variable (i.e., k=1) is adopted to

determined the largest element of the list.

 Next, a kWTA network with k = 2 computes

the second item in the list without recounting

the first item.

(64)

Parallel Sorting based on k WTA

 As such, the whole list of n items can be

sorted using n-1 kWTA networks without the need for computing the last item.

 As a result, only n-1 neurons will be needed.

 It is a substantial reduction of the model

complexity compared with the analog sorting

networks with n ² neurons.

(65)

Illustrative Example

In this case, only five (5) neurons are

needed by using five kWTA networks here.

In contrast, 36 neurons are needed in the

analog sorting network (Wang, 1995).

(66)

Simulation Results (state variable)

(67)

Simulation Results (output variables)

(68)

Rank-order Filter

 Rank order filters are nonlinear filters with many applications including digital image processing, speech processing, coding and digital TV, etc.

 A rank order filter functions by working by selecting its input with a certain rank as its output.

 Rank order filters entails substantial

processing power to implement, which limits

their real-time signal processing applications.

(69)

Rank-order Filter Based on k WTA

Nevertheless, rank order filters can benefit from their parallelism realizations.

Specifically, a 𝑘 WTA network with 𝑘 = 𝑟 is used in parallel to another 𝑘 WTA network

with 𝑘 = 𝑟 − 1 to select the input with its rank

order being 𝑟 .

(70)

Simulation Results (median filter)

(71)

Simulation Results (median filter)

(72)

Simulation Results (median filter)

(73)

Image Processing

Percentage of speckle noise in image 10%

(74)

Image Filtering (cont’d)

(75)

Image Filtering (cont’d)

(76)

Image Filtering (cont’d)

(77)

Image Filtering (cont’d)

(78)

Image Filtering (cont’d)

 Put the original image into median filter

The Original image Original image after median filtering

(79)

Color Image Filtering

Percentage of speckle noise in image 10%

(80)

Color Image Filtering (cont’d)

Percentage of speckle noise in image 10%

(81)

Color Image Filtering (cont’d)

(82)

Color Image Filtering (cont’d)

(83)

Color Image Filtering (cont’d)

(84)

Results & Discussion

- Image Processing

(85)

Color Image Filtering (cont’d)

(86)

Color Image Filtering (cont’d)

(87)

Information Retrieval

 The efficiency of information retrieval from large database is essential.

 The techniques for information retrieval from large data sets play a very important role as the size of the world-wide web exceeded

possibly more than 30 billion nowadays.

(88)

Web Information Retrieval

 There are basically two parts in web information retrieval:

 One is calculating the weight of all the pages or data.

 The other is find the most “wanted” k results with highest weightings.

 The second one is the top-k query or front

page problem.

(89)

A Toy Problem from Wikipedia

 7 pages

 17 links

 The PageRank

weight of each

page and link is

provided.

(90)

Selection Results ( k =3)

Output vector x=[1,1,0,0,1,0,0]

^T

Pages 1, 2, and 5

are with higher

PageRank weights

(91)

Film-director-actor-writer Network

 Crawled from Wikipedia under the category of English

language films

 34,279 pages

 142,426 links

 Part of the square adjacency matrix is shown by the figure, where a dot on the i ^th column and the j ^th row represents that there is a directed link pointed to the j ^th page from the i ^th one.

The rest of the matrix is 0.

(92)

Selection Results ( k =10)

The answer to this query [3111, 3869, 4058, 4621, 6938, 8974, 10341,

11502, 13320, 15326] ^T can be easily achieved from the sparse

representation of the output vector x =

g(u _i -y(t)), where 10 of the elements are

nonzero.

(93)

Parallel Data Selection Based on Neurodynamic Optimization in the Era of Big Data

Jun Wang

Parallel Data Selection Based on Neurodynamic Optimization

in the Era of Big Data

Department of Mechanical and Automation Engineering

The Chinese University of Hong Kong Shatin, New Territories, Hong Kong

School of Control Science and Engineering

Dalian University of Technology Dalian, Liaoning, China

[email protected]

Outline

 Introduction

 Problem formulations

 kWTA networks

 Simulation results

 Sorting application

 Filtering Application

 Concluding remarks

 Future works

 References

Multiple Winners-take-all Operation

 The k-winners-take-all (kWTA) operation is to select the k largest inputs out of n inputs

(1 ≤ k < n).

 kWTA is a general rule in nature and society.

 kWTA has widespread applications in data mining, machine learning, classification, clustering, computer vision, etc.

 It is a common building block for many

models such as ART and SOM.

k Winners-take-all Operation

As the number of inputs increases and/or the selection process should be operated in real time, parallel algorithms and hardware

implementation are desirable.

Parallel k Winners-take-all Operation

k u 1 u 2 u n

x 1 x 2 x n

Problem Formulations

"The mere formulation of a problem is far more essential than its solution, which may be merely a matter of mathematical or

experimental skills. To raise new questions, new possibilities, to regard old problems from a new angle requires creative imagination

and marks real advances in science."

Albert Einstein

Problem Formulations

Problem Formulations (cont’d)

Problem Formulations (cont’d)

Problem Formulations (cont’d)

Model Selection and Redesign

 The kTWA problem has been formulated as an equivalent linear and quadratic

programming problems.

 All existing neurodynamic optimization

models for linear and quadratic programming can be applied.

 Now the question is: which is the best in

terms of model complexity and computational

efficiency?

QP-based Primal-Dual Network

QP-based Projection Network

LP-based Projection Network

QP-based Simplified Dual Net

LP-based Discontinuous Network

Discontinuous Activation Function

Convergence Conditions

Simulation Results

Simulation Results (cont’d)

Simulation Results (cont’d)

QP-based Discontinuous Network

Discontinuous Activation Function

Convergence Condition

Simulation Results

Simulation Results(cont’d)

Simulation Results (cont’d)

QP-based Improved Dual Network

Model Comparisons

Model Number of layer(s) Number of neuron(s) Number of connections

LP-based primal-dual

network 4 3n + 1 6n + 2

QP-based primal-dual

network 4 3n + 1 6n + 2

LP-based projection network 2 n + 1 2n + 2

QP-based projection

network 2 n + 1 2n + 2

QP-based simplified dual

network 1 n 3n

LP-based discontinuous net 1 n 2n

QP-based discontinuous

network 1 n 2n

k u ₁ u ₂ u _n

x ₁ x ₂ x _n

 Obviously, the steady state of y ∈ (u _k+1 , u _k ]

depends on the distribution of u ₁ , u ₂ , . . . , u _n ,

Simulation Results ( n = 10 ⁶ , k = n /2)

Simulation Results ( n = 10 ⁶ , k = n /2)

For data with a dimension of 10 ¹⁰⁰