• No results found

A Fast Parallel SGD for Matrix Factorization in Shared Memory Systems

N/A
N/A
Protected

Academic year: 2021

Share "A Fast Parallel SGD for Matrix Factorization in Shared Memory Systems"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

A Fast Parallel SGD for Matrix Factorization in Shared Memory Systems

Yong Zhuang

Department of Computer Science National Taiwan University

Joint work with Wei-Sheng Chin, Yu-Chin Juan, and Chih-Jen Lin

(2)

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 2 / 41

(3)

Introduction

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

(4)

Introduction Motivation

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 4 / 41

(5)

Introduction Motivation

Motivation

Matrix Factorization is an effective method for recommender systems, e.g., Netflix Prize, KDD Cup 2011, etc.

But training is slow. When we did a HW on training

KDD Cup 2011 data, it took too long for us to do

experiments

(6)

Introduction Motivation

Motivation (Cont’d)

Distributed MF is possible, but complicated

Yahoo!Music: 1 million users, 600 thousands items and 252 million ratings: can be stored in memory We didn’t see many studies on parallel MF on multi-core systems

Our goal is to develop a parallel MF system in shared memory systems, so at least others students didn’t have the same trouble as us

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 6 / 41

(7)

Introduction Matrix Factorization

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

(8)

Introduction Matrix Factorization

Matrix Factorization

For recommender systems: A group of users give ratings to some items

User Item Rating

1 5 100

1 10 80

1 13 30

. . . . . . . . .

u v r

. . . . . . . . . (u, v ) = r

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 8 / 41

(9)

Introduction Matrix Factorization

Matrix Factorization (Cont’d)

R

m × n

m : u : 2 1

1 2 .. v .. n

r

u,v

?2,2

m, n : numbers of users and items u, v : index for u

th

user and v

th

item

r

u,v

: u

th

user gives a rating r

u,v

to v

th

item

(10)

Introduction Matrix Factorization

Matrix Factorization (Cont’d)

q

2

R

m × n

≈ ×

P

T

m × k

Q

k × n

m : u : 2 1

1 2 .. v .. n

r

u,v

?2,2

p

T1

p

T2

: p

Tu

: p

Tm

q

1

q

2

.. q

v

.. q

n

k : number of latent dimensions r

u,v

= p

Tu

q

v

?

2,2

= p

T2

q

2

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 10 / 41

(11)

Introduction Matrix Factorization

Matrix Factorization (Cont’d)

Objective function:

min

P,Q

X

(u,v )∈R

(r

u,v

− p

Tu

q

v

)

2

+ λ

P

kp

u

k

2F

+ λ

Q

kq

v

k

2F

,

SGD: Loops over all ratings in the training set.

Prediction error: e

u,v

≡ r

u,v

− p

Tu

q

v

SGD update rule:

p

u

← p

u

+ γ (e

u,v

q

v

− λ

P

p

u

) ,

q

v

← q

v

+ γ (e

u,v

p

u

− λ

Q

q

v

)

(12)

Introduction Parallel Matrix Factorization

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 12 / 41

(13)

Introduction Parallel Matrix Factorization

Parallel Matrix Factorization

After r

3,3

selected, the ratings in gray blocks cannot be updated due to racing condition

r3,1 r3,2 r3,3 r3,4 r3,5 r3,6

r6,6

1 2 3 4 5 6

1 2 3 4 5 6

r

3,1

= p

3T

q

1

r

3,2

= p

3T

q

2

..

r

3,6

= p

3T

q

6

——————

r

3,3

= p

3T

q

3

r

6,6

= p

6T

q

6

(14)

Introduction Parallel Matrix Factorization

Parallel Matrix Factorization (Cont’d)

We can split the matrix to blocks.

Then use threads to update the blocks where ratings in different blocks don’t share p or q

1 2 3 4 5 6

1 2 3 4 5 6

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 14 / 41

(15)

Existing Problems in Parallel Matrix Factorization

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

(16)

Existing Problems in Parallel Matrix Factorization Locking Problem

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 16 / 41

(17)

Existing Problems in Parallel Matrix Factorization Locking Problem

Locking Problem

DSGD [Gemulla et al., 2011]

Distributed system: Communication cost is an issue

T nodes, the matrix → T × T blocks to reduce

communication cost

(18)

Existing Problems in Parallel Matrix Factorization Locking Problem

Locking Problem (Cont’d)

We apply it in shared memory system

1 2 3

1

2

3

Shared memory: Idle time will be an issuse

Block 1: 20s Block 2: 10s Block 3: 20s We have 3 threads

hi

Thread 0→10 10→20

1 Busy Busy

2 Busy Idle

3 Busy Busy

ok 10s wasted!!

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 18 / 41

(19)

Existing Problems in Parallel Matrix Factorization Memory Discontinuity

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

(20)

Existing Problems in Parallel Matrix Factorization Memory Discontinuity

Memory Discontinuity

HogWild [Niu et al., 2011]: assume the proability of racing condition is really low

All R, P, and P are memory discontinuous R

= ×

P

T

1 3 Q

5 2

4 6

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 20 / 41

(21)

Our approach

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

(22)

Our approach

Our approach

We proposes a fast parallel SGD for Matrix

Factorization in shared memory systems (FPSGD) It applies two strategies to speed up Matrix

Factorization

Lock-Free Scheduling Partial Random Method

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 22 / 41

(23)

Our approach Lock-Free Scheduling

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

(24)

Our approach Lock-Free Scheduling

Lock-Free Scheduling

We split the matrix to enough blocks.

E.g., for 2 threads, we split the matrix to 4 × 4 blocks

0 0 0 0

0 0 0 0

0 0 0 0

0 0 0 0

0 is the updated counter recording the number of updated times for each block

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 24 / 41

(25)

Our approach Lock-Free Scheduling

Lock-Free Scheduling (Cont’d)

Firstly, T

1

selects a block randomly For T

2

, it selects a block neither green nor gray

T

1

0 0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

(26)

Our approach Lock-Free Scheduling

Lock-Free Scheduling (Cont’d)

For T

2

, it selects a block neither green nor gray randomly For T

2

, it selects a block neither green nor gray

T

1

T

2

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 26 / 41

(27)

Our approach Lock-Free Scheduling

Lock-Free Scheduling (Cont’d)

After T

1

finishes, the counter for the corresponding block is added by one

T

2

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

(28)

Our approach Lock-Free Scheduling

Lock-Free Scheduling (Cont’d)

T

1

can select available blocks to update Rule: select one that is least updated

T

2

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 28 / 41

(29)

Our approach Lock-Free Scheduling

Lock-Free Scheduling (Cont’d)

FPSGD: applying Lock-Free Scheduling FPSGD**: applying DSGD-like Scheduling

0 2 4 6 8 10

0.84 0.86 0.88 0.9

RMSE

Time(s)

FPSGD**

FPSGD

(a) MovieLens 10M

0 100 200 300 400 500 600 22

22.5 23 23.5 24

Time(s)

RMSE

FPSGD**

FPSGD

(b) Yahoo!Music

MovieLens 10M: 18.71s → 9.72s (RMSE: 0.835)

Yahoo!Music: 728.23s → 462.55s (RMSE: 21.985)

(30)

Our approach Partial Random Method

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 30 / 41

(31)

Our approach Partial Random Method

Partial Random Method

For SGD, there are two types of update order Update order Advantages Disadvantages

Random Faster and stable Memory discontinuity Sequential Memory continuity Non-stable

Random Sequential

R R

Lock-free scheduling: the property of randomness

How to make it more friendly to cache?

(32)

Our approach Partial Random Method

Partial Random Method (Cont’d)

Access both ˆ R and ˆ P memory continuously R : (one block) ˆ

= ×

P ˆ

T

Q ˆ

1 2

3 4

5 6

Partial: FPSGD is sequential in each block

Random: FPSGD is random when selecting block

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 32 / 41

(33)

Our approach Partial Random Method

Partial Random Method (Cont’d)

0 20 40 60 80 100

0.8 0.9 1 1.1 1.2 1.3

Time(s)

RMSE

Random Partial Random

(a) MovieLens 10M

0 500 1000 1500 2000 2500 3000 20

25 30 35 40 45

Time(s)

RMSE

Random Partial Random

(b) Yahoo!Music

The performance of Partial Random Method is

better than that of Random Method

(34)

Experiments

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 34 / 41

(35)

Experiments

Experiments

Some of the state-of-art methods

CCD++ [Yu et al., 2012]: Coordinate descent method (Best paper award in ICDM 2012) DSGD [Gemulla et al., 2011]: Stochastic gradient descent

HogWild [Niu et al., 2011]: Stochastic gradient

descent

(36)

Experiments

Experiments (Cont’d)

We compare FPSGD with DSGD, HogWild, CCD++, and FPSGD*

FPSGD: with fast SSE instructions FPSGD*: without SSE instructions

0 20 40 60 80 100

0.84 0.86 0.88 0.9

Time(s)

RMSE

DSGD HogWild CCD++

FPSGD FPSGD*

(a) MovieLens 10M

0 500 1000 1500 2000 2500 3000 22

23 24 25

Time(s)

RMSE

DSGD HogWild CCD++

FPSGD FPSGD*

(b) Yahoo!Music

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 36 / 41

(37)

Experiments

Experiments (Cont’d)

Method Category Problem DSGD Stochastic Locking problem HogWild Stochastic Memory discontinuity

Stochastic methods may get a better solution for its randomness

Deterministic methods (e.g., CCD++) avoid tuning

learning rate which is the advantage over Stochastic

methods

(38)

Conclusion

Outline

1

Introduction Motivation

Matrix Factorization

Parallel Matrix Factorization

2

Existing Problems in Parallel Matrix Factorization Locking Problem

Memory Discontinuity

3

Our approach

Lock-Free Scheduling Partial Random Method

4

Experiments

5

Conclusion

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 38 / 41

(39)

Conclusion

Conclusion

We point out some computational bottlenecks in existing parallel SGD methods

We propose FPSGD to address these issues and confirm its effectiveness by experiments

1 SGD iteration:

1 thread: around 30s → 8 threads: around 4s for

Yahoo!Music with 252 million ratings

(40)

Conclusion

Conclusion (Cont’d)

We develop the package LIBMF available at http://www.csie.ntu.edu.tw/~cjlin/libmf Updated paper and instructions of LIBMF is at http://www.csie.ntu.edu.tw/~cjlin/papers/

libmf.pdf

Your comments to our work are very welcome

Yong Zhuang (National Taiwan Univ.) Oct 16, 2013 40 / 41

References

Related documents

Dyer et al., (2006) argue that subsistence households do adjust their supply to changes in agricultural output prices through multiple factor linkages when there is at least a

The Effect of Consumption of Unhealthy Snacks on Diet and the Risk of Metabolic Syndrome in Adults: Tehran Lipid and Glucose Study, Iran. Zahra

As William Perry (1970) demonstrated decades ago, the teaching of reading and writing skills as a thinking process can transform how learners view themselves. Perry succeeded

Three items measure solidarity, five items measure information exchange and three items measure flexibility from the perception of the franchisee towards relational norms

In conclusion, infectious and non-infectious adverse pulmonary reactions occur in patients with cancer who are administered a regimen including monoclonal antibodies. Clinicians

looking at general health, high resting heart rate has been found to be associated with higher age, low physical fitness level, hyperthyroidism, obe- sity, hypertension,

These modelling approaches were (i) a multimedia model initially explored as an opportunity to visualise supply and value chain issues for educational purposes; (ii) an agent