• No results found

Factor Models for Gender Prediction Based on E-commerce Data

N/A
N/A
Protected

Academic year: 2021

Share "Factor Models for Gender Prediction Based on E-commerce Data"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

Factor Models for Gender Prediction Based on

E-commerce Data

Immanuel Bayer

Data Mining Competition PAKDD 2015, HoChiMinh City, Vietnam

(2)

Outline

Hierarchical Basket Model

Modeling Autocorrelation

Sequential Block Voting

Results & Implementation

(3)

Outline

Hierarchical Basket Model

Tree Encoding

Factorization Machine

Modeling Autocorrelation

Sequential Block Voting

Results & Implementation

(4)

Product Hierarchy

u1, 2014-11-13, 2014-11-14,

A01/B01/C01/D01/

u2, 2014-11-14, 2014-11-15,

A02/B02/C02/D02/;A02/B02/C03/D03/;

u3, 2014-11-14, 2014-11-16,

A01/B01/C01/D02/;A01/B04/C05/D98/;

D01

D02

D06

D22

D45

D98

D21

D89

D15

C01

C02

C05

B01

B04

A01

(5)

Path Encoding

u3, 2014-11-14, 2014-11-16,

A01/B01/C01/D02/;A01/B04/C05/D98/;

D01

D02

D06

D22

D45

D98

D21

D89

D15

C01

C02

C05

B01

B04

A01

x

i

=

{2,

0, . . .

|

{z

}

|A|

,

1,

0,

0,

1, . . .

|

{z

}

|B|

0,

1, . . . ,

1,

0, . . .

|

{z

}

|D|

}

(6)

Factorization Machine

FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=

1

w

j

x

j

+

p

X

j

=

1

p

X

j

0

=

j

+

1

x

j

x

j

0

h

v

j

,

v

j

0

i

I

w

0

R

,

w

R

p

,

V

R

p

×

k

are the model parameters

I

k

N

is the size/ dimensionality of the latent space.

I

the model has one feature vector

v

i

for each variable

x

i

.

Immanuel Bayer University of Konstanz

(7)

Factorization Machine

FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=

1

w

j

x

j

+

p

X

j

=

1

p

X

j

0

=

j

+

1

x

j

x

j

0

h

v

j

,

v

j

0

i

I

w

0

R

,

w

R

p

,

V

R

p×k

are the model parameters

I

k

N

is the size/ dimensionality of the latent space.

I

the model has one feature vector

v

i

for each variable

x

i

.

Immanuel Bayer University of Konstanz

(8)

Factorization Machine

FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=

1

w

j

x

j

+

p

X

j

=

1

p

X

j

0

=

j

+

1

x

j

x

j

0

h

v

j

,

v

j

0

i

I

w

0

R

,

w

R

p

,

V

R

p×k

are the model parameters

I

k

N

is the size/ dimensionality of the latent space.

I

the model has one feature vector

v

i

for each variable

x

i

.

Immanuel Bayer University of Konstanz

(9)

Factorization Machine

FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=

1

w

j

x

j

+

p

X

j

=

1

p

X

j

0

=

j

+

1

x

j

x

j

0

h

v

j

,

v

j

0

i

I

w

0

R

,

w

R

p

,

V

R

p×k

are the model parameters

I

k

N

is the size/ dimensionality of the latent space.

I

the model has one feature vector

v

i

for each variable

x

i

.

Immanuel Bayer University of Konstanz

(10)

Linear Part

a FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=

1

w

j

x

j

+

p

X

j

=

1

p

X

j

0

=

j

+

1

x

j

x

j

0

h

v

j

,

v

j

0

i

x

i

=

(

0

,

A

02

z}|{

1

, . . . ,

0

|

{z

}

|

A

|

,

0

,

· · ·

,

B

11

z}|{

1

, . . . ,

0

|

{z

}

|

B

|

,

0

, . . . ,

D

55

z}|{

1

, . . . ,

0

|

{z

}

|

D

|

)

p

(

female

|

x

i

)

p

(

female

|

A02

) +

p

(

female

|

B11

) +

p

(

female

|

D55

)

(11)

Linear Part

a FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=

1

w

j

x

j

+

p

X

j

=1

p

X

j

0

=

j

+1

x

j

x

j

0

h

v

j

,

v

j

0

i

x

i

=

(0,

A02

z}|{

1

, . . . ,

0

|

{z

}

|A|

,

0,

· · ·

,

B11

z}|{

1

, . . . ,

0

|

{z

}

|B|

,

0, . . . ,

D55

z}|{

1

, . . . ,

0

|

{z

}

|D|

)

p

(

female

|

x

i

)

p

(

female

|

A02

) +

p

(

female

|

B11

) +

p

(

female

|

D55

)

(12)

Linear Part

a FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=

1

w

j

x

j

+

p

X

j

=1

p

X

j

0

=

j

+1

x

j

x

j

0

h

v

j

,

v

j

0

i

x

i

=

(0,

A02

z}|{

1

, . . . ,

0

|

{z

}

|A|

,

0,

· · ·

,

B11

z}|{

1

, . . . ,

0

|

{z

}

|B|

,

0, . . . ,

D55

z}|{

1

, . . . ,

0

|

{z

}

|D|

)

p

(

female

|

x

i

)

p

(

female

|

A

02) +

p

(

female

|

B

11) +

p

(

female

|

D

55)

(13)

Pairwise Interactions

a FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=1

w

j

x

j

+

p

X

j

=

1

p

X

j

0

=

j

+

1

x

j

x

j

0

h

v

j

,

v

j

0

i

x

i

= (

0

,

A

02

z}|{

1

, . . . ,

0

|

{z

}

|

A

|

,

0

,

· · ·

,

B

11

z}|{

1

, . . . ,

0

|

{z

}

|

B

|

,

0

, . . . ,

D

55

z}|{

1

, . . . ,

D

95

z}|{

1

, . . . ,

0

|

{z

}

|

D

|

)

Example:

V

=

. . . ,

..

.

Summer

..

.

Swimming

|

{z

}

j

=

D

55

, . . . ,

..

.

Summer

..

.

Swimming

,

|

{z

}

j

0

=

D

95

. . .

,

V

R

p

×

k

(14)

Pairwise Interactions

a FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=1

w

j

x

j

+

p

X

j

=

1

p

X

j

0

=

j

+

1

x

j

x

j

0

h

v

j

,

v

j

0

i

x

i

= (0,

A02

z}|{

1

, . . . ,

0

|

{z

}

|A|

,

0,

· · ·

,

B11

z}|{

1

, . . . ,

0

|

{z

}

|B|

,

0, . . . ,

D55

z}|{

1

, . . . ,

D95

z}|{

1

, . . . ,

0

|

{z

}

|D|

)

Example:

V

=

. . . ,

..

.

Summer

..

.

Swimming

|

{z

}

j

=

D

55

, . . . ,

..

.

Summer

..

.

Swimming

,

|

{z

}

j

0

=

D

95

. . .

,

V

R

p

×

k

(15)

Pairwise Interactions

a FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=1

w

j

x

j

+

p

X

j

=

1

p

X

j

0

=

j

+

1

x

j

x

j

0

h

v

j

,

v

j

0

i

x

i

= (0,

A02

z}|{

1

, . . . ,

0

|

{z

}

|A|

,

0,

· · ·

,

B11

z}|{

1

, . . . ,

0

|

{z

}

|B|

,

0, . . . ,

D55

z}|{

1

, . . . ,

D95

z}|{

1

, . . . ,

0

|

{z

}

|D|

)

Example:

V

=

. . . ,

..

.

Summer

..

.

Swimming

|

{z

}

j

=

D55

, . . . ,

..

.

Summer

..

.

Swimming

,

|

{z

}

j

0

=

D95

. . .

,

V

R

p×k

(16)

Pairwise Interactions

a FM model of order

d

=

2

ˆ

y

FM

(

x

) :=

w

0

+

p

X

j

=1

w

j

x

j

+

p

X

j

=

1

p

X

j

0

=

j

+

1

x

j

x

j

0

h

v

j

,

v

j

0

i

x

i

= (0,

A02

z}|{

1

, . . . ,

0

|

{z

}

|A|

,

0,

· · ·

,

B11

z}|{

1

, . . . ,

0

|

{z

}

|B|

,

0, . . . ,

D55

z}|{

1

, . . . ,

D95

z}|{

1

, . . . ,

0

|

{z

}

|D|

)

Example:

V

=

. . . ,

..

.

Summer

..

.

Swimming

|

{z

}

j

=

D55

, . . . ,

..

.

Summer

..

.

Swimming

,

|

{z

}

j

0

=

D95

. . .

,

V

R

p×k

(17)

Outline

Hierarchical Basket Model

Modeling Autocorrelation

Sequential Block Voting

Results & Implementation

(18)

Factoring Joint Probabilities

20

15

10

5

0

5

10

15

20

Lag

0.80

0.85

0.90

0.95

1.00

Autocorrelation

We can factorize the joint probability by conditioning on features that

describe the related samples.

p

(

y

0, . . . ,

y

n|

x

0, . . . ,

x

n) :=

n

Y

0

p

(

y

i

|

x

i

r

,

x

i)

(19)

Relational Features

u3, 2014-11-13, 2014-11-14,

A01

/B01/C05/D11/

u4, 2014-11-14, 2014-11-16,

A02

/B01/C01/D02/;

A05

/B04/C05/D98/;

u5

, 2014-11-14, 2014-11-16,

A05

/B04/C05/D98/;

u6, 2014-11-14, 2014-11-16,

A04

/B03/C06/D22/;

A05

/B14/C45/D68/;

u7, 2014-11-14, 2014-11-16,

A01

/B01/C01/D03/;

A01

/B04/C05/D78/;

x

a1

= [0,

A2

z}|{

1

,

0,

A4

z}|{

1

,

A5

z}|{

2

, . . .

]

x

a

1:2

= [

A

1

z}|{

3

,

A

2

z}|{

1

,

0

,

A

4

z}|{

1

,

A

5

z}|{

2

, . . .

]

Combining different lags and categories we can describe the sample

neighborhood with:

x

u

5

= [

x

a

1

,

x

a

1:2

,

x

b

1:3

,

x

d

1

]

(20)

Relational Features

u3, 2014-11-13, 2014-11-14,

A01

/B01/C05/D11/

u4, 2014-11-14, 2014-11-16,

A02

/B01/C01/D02/;

A05

/B04/C05/D98/;

u5

, 2014-11-14, 2014-11-16,

A05

/B04/C05/D98/;

u6, 2014-11-14, 2014-11-16,

A04

/B03/C06/D22/;

A05

/B14/C45/D68/;

u7, 2014-11-14, 2014-11-16,

A01

/B01/C01/D03/;

A01

/B04/C05/D78/;

x

a

1

= [

0

,

A

2

z}|{

1

,

0

,

A

4

z}|{

1

,

A

5

z}|{

2

, . . .

]

x

a1

:

2

= [

A1

z}|{

3

,

A2

z}|{

1

,

0,

A4

z}|{

1

,

A5

z}|{

2

, . . .

]

Combining different lags and categories we can describe the sample

neighborhood with:

x

u

5

= [

x

a

1

,

x

a

1:2

,

x

b

1:3

,

x

d

1

]

(21)

Relational Features

u3, 2014-11-13, 2014-11-14,

A01

/B01/C05/D11/

u4, 2014-11-14, 2014-11-16,

A02

/B01/C01/D02/;

A05

/B04/C05/D98/;

u5

, 2014-11-14, 2014-11-16,

A05

/B04/C05/D98/;

u6, 2014-11-14, 2014-11-16,

A04

/B03/C06/D22/;

A05

/B14/C45/D68/;

u7, 2014-11-14, 2014-11-16,

A01

/B01/C01/D03/;

A01

/B04/C05/D78/;

x

a1

= [0,

A2

z}|{

1

,

0,

A4

z}|{

1

,

A5

z}|{

2

, . . .

]

x

a1

:

2

= [

A1

z}|{

3

,

A2

z}|{

1

,

0,

A4

z}|{

1

,

A5

z}|{

2

, . . .

]

Combining different lags and categories we can describe the sample

neighborhood with:

x

u5

= [

x

a1

,

x

a1

:

2

,

x

b1

:

3

,

x

d1

]

(22)

Outline

Hierarchical Basket Model

Modeling Autocorrelation

Sequential Block Voting

Results & Implementation

(23)

Identifying Sequential Blocks

u1,

2014-11-13, 2014-11-14,

A01/B01/C01/D01/

u2,

2014-11-14, 2014-11-15,

A02/B02/C02/D02/;A02/B02/C03/D03/;

u3,

2014-11-14, 2014-11-16,

A02/B02/C02/D02/;A02/B02/C03/D04/;

1:

blockId

[:]

0

2:

count

0

3:

for

i

1,

n

do

4:

if

endTime(i)

endTime(i-1)

then

5:

count ++

6:

end if

7:

blockId

[

i

]

count

8:

end for

(24)

0

20

40

60

80

100

120

140

160

block size

0

1

2

3

4

5

6

7

(25)

Block based Voting

1:

if

blockSize(i)

10 AND (median(i)

.6 OR median(i)

.9)

then

2:

if

median(i)

.9

then

3:

predict female

4:

else if

median(i)

.6

then

5:

predict male

6:

end if

7:

else

.

per sample threshold

8:

if

y

i

.82

then

9:

predict female

10:

else

11:

predict male

12:

end if

13:

end if

(26)

Outline

Hierarchical Basket Model

Modeling Autocorrelation

Sequential Block Voting

Results & Implementation

(27)

Results & Implementation

Score

Place

Final Result

0.84067348

7

I

Full Competition Source Code:

https://github.com/ibayer/PAKDD2015_Competition

I

Factorization Machine Implementation:

https://github.com/ibayer/fastFM

https://github.com/ibayer/PAKDD2015_Competition https://github.com/ibayer/fastFM

References

Related documents