Factor Models for Gender Prediction Based on
E-commerce Data
Immanuel Bayer
Data Mining Competition PAKDD 2015, HoChiMinh City, Vietnam
Outline
Hierarchical Basket Model
Modeling Autocorrelation
Sequential Block Voting
Results & Implementation
Outline
Hierarchical Basket Model
Tree Encoding
Factorization Machine
Modeling Autocorrelation
Sequential Block Voting
Results & Implementation
Product Hierarchy
u1, 2014-11-13, 2014-11-14,
A01/B01/C01/D01/
u2, 2014-11-14, 2014-11-15,
A02/B02/C02/D02/;A02/B02/C03/D03/;
u3, 2014-11-14, 2014-11-16,
A01/B01/C01/D02/;A01/B04/C05/D98/;
D01
D02
D06
D22
D45
D98
D21
D89
D15
C01
C02
C05
B01
B04
A01
Path Encoding
u3, 2014-11-14, 2014-11-16,
A01/B01/C01/D02/;A01/B04/C05/D98/;
D01
D02
D06
D22
D45
D98
D21
D89
D15
C01
C02
C05
B01
B04
A01
x
i
=
{2,
0, . . .
|
{z
}
|A|
,
1,
0,
0,
1, . . .
|
{z
}
|B|
0,
1, . . . ,
1,
0, . . .
|
{z
}
|D|
}
Factorization Machine
FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=
1
w
j
x
j
+
p
X
j
=
1
p
X
j
0=
j
+
1
x
j
x
j
0h
v
j
,
v
j
0i
I
w
0
∈
R
,
w
∈
R
p
,
V
∈
R
p
×
k
are the model parameters
I
k
∈
N
is the size/ dimensionality of the latent space.
I
the model has one feature vector
v
i
for each variable
x
i
.
Immanuel Bayer University of Konstanz
Factorization Machine
FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=
1
w
j
x
j
+
p
X
j
=
1
p
X
j
0=
j
+
1
x
j
x
j
0h
v
j
,
v
j
0i
I
w
0
∈
R
,
w
∈
R
p
,
V
∈
R
p×k
are the model parameters
I
k
∈
N
is the size/ dimensionality of the latent space.
I
the model has one feature vector
v
i
for each variable
x
i
.
Immanuel Bayer University of Konstanz
Factorization Machine
FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=
1
w
j
x
j
+
p
X
j
=
1
p
X
j
0=
j
+
1
x
j
x
j
0h
v
j
,
v
j
0i
I
w
0
∈
R
,
w
∈
R
p
,
V
∈
R
p×k
are the model parameters
I
k
∈
N
is the size/ dimensionality of the latent space.
I
the model has one feature vector
v
i
for each variable
x
i
.
Immanuel Bayer University of Konstanz
Factorization Machine
FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=
1
w
j
x
j
+
p
X
j
=
1
p
X
j
0=
j
+
1
x
j
x
j
0h
v
j
,
v
j
0i
I
w
0
∈
R
,
w
∈
R
p
,
V
∈
R
p×k
are the model parameters
I
k
∈
N
is the size/ dimensionality of the latent space.
I
the model has one feature vector
v
i
for each variable
x
i
.
Immanuel Bayer University of Konstanz
Linear Part
a FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=
1
w
j
x
j
+
p
X
j
=
1
p
X
j
0=
j
+
1
x
j
x
j
0h
v
j
,
v
j
0i
x
i
=
(
0
,
A
02
z}|{
1
, . . . ,
0
|
{z
}
|
A
|
,
0
,
· · ·
,
B
11
z}|{
1
, . . . ,
0
|
{z
}
|
B
|
,
0
, . . . ,
D
55
z}|{
1
, . . . ,
0
|
{z
}
|
D
|
)
p
(
female
|
x
i
)
∝
p
(
female
|
A02
) +
p
(
female
|
B11
) +
p
(
female
|
D55
)
Linear Part
a FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=
1
w
j
x
j
+
p
X
j
=1
p
X
j
0=
j
+1
x
j
x
j
0h
v
j
,
v
j
0i
x
i
=
(0,
A02
z}|{
1
, . . . ,
0
|
{z
}
|A|
,
0,
· · ·
,
B11
z}|{
1
, . . . ,
0
|
{z
}
|B|
,
0, . . . ,
D55
z}|{
1
, . . . ,
0
|
{z
}
|D|
)
p
(
female
|
x
i
)
∝
p
(
female
|
A02
) +
p
(
female
|
B11
) +
p
(
female
|
D55
)
Linear Part
a FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=
1
w
j
x
j
+
p
X
j
=1
p
X
j
0=
j
+1
x
j
x
j
0h
v
j
,
v
j
0i
x
i
=
(0,
A02
z}|{
1
, . . . ,
0
|
{z
}
|A|
,
0,
· · ·
,
B11
z}|{
1
, . . . ,
0
|
{z
}
|B|
,
0, . . . ,
D55
z}|{
1
, . . . ,
0
|
{z
}
|D|
)
p
(
female
|
x
i
)
∝
p
(
female
|
A
02) +
p
(
female
|
B
11) +
p
(
female
|
D
55)
Pairwise Interactions
a FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=1
w
j
x
j
+
p
X
j
=
1
p
X
j
0=
j
+
1
x
j
x
j
0h
v
j
,
v
j
0i
x
i
= (
0
,
A
02
z}|{
1
, . . . ,
0
|
{z
}
|
A
|
,
0
,
· · ·
,
B
11
z}|{
1
, . . . ,
0
|
{z
}
|
B
|
,
0
, . . . ,
D
55
z}|{
1
, . . . ,
D
95
z}|{
1
, . . . ,
0
|
{z
}
|
D
|
)
Example:
V
=
. . . ,
..
.
Summer
..
.
Swimming
|
{z
}
j
=
D
55
, . . . ,
..
.
Summer
..
.
Swimming
,
|
{z
}
j
0=
D
95
. . .
,
V
∈
R
p
×
k
Pairwise Interactions
a FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=1
w
j
x
j
+
p
X
j
=
1
p
X
j
0=
j
+
1
x
j
x
j
0h
v
j
,
v
j
0i
x
i
= (0,
A02
z}|{
1
, . . . ,
0
|
{z
}
|A|
,
0,
· · ·
,
B11
z}|{
1
, . . . ,
0
|
{z
}
|B|
,
0, . . . ,
D55
z}|{
1
, . . . ,
D95
z}|{
1
, . . . ,
0
|
{z
}
|D|
)
Example:
V
=
. . . ,
..
.
Summer
..
.
Swimming
|
{z
}
j
=
D
55
, . . . ,
..
.
Summer
..
.
Swimming
,
|
{z
}
j
0=
D
95
. . .
,
V
∈
R
p
×
k
Pairwise Interactions
a FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=1
w
j
x
j
+
p
X
j
=
1
p
X
j
0=
j
+
1
x
j
x
j
0h
v
j
,
v
j
0i
x
i
= (0,
A02
z}|{
1
, . . . ,
0
|
{z
}
|A|
,
0,
· · ·
,
B11
z}|{
1
, . . . ,
0
|
{z
}
|B|
,
0, . . . ,
D55
z}|{
1
, . . . ,
D95
z}|{
1
, . . . ,
0
|
{z
}
|D|
)
Example:
V
=
. . . ,
..
.
Summer
..
.
Swimming
|
{z
}
j
=
D55
, . . . ,
..
.
Summer
..
.
Swimming
,
|
{z
}
j
0=
D95
. . .
,
V
∈
R
p×k
Pairwise Interactions
a FM model of order
d
=
2
ˆ
y
FM
(
x
) :=
w
0
+
p
X
j
=1
w
j
x
j
+
p
X
j
=
1
p
X
j
0=
j
+
1
x
j
x
j
0h
v
j
,
v
j
0i
x
i
= (0,
A02
z}|{
1
, . . . ,
0
|
{z
}
|A|
,
0,
· · ·
,
B11
z}|{
1
, . . . ,
0
|
{z
}
|B|
,
0, . . . ,
D55
z}|{
1
, . . . ,
D95
z}|{
1
, . . . ,
0
|
{z
}
|D|
)
Example:
V
=
. . . ,
..
.
Summer
..
.
Swimming
|
{z
}
j
=
D55
, . . . ,
..
.
Summer
..
.
Swimming
,
|
{z
}
j
0=
D95
. . .
,
V
∈
R
p×k
Outline
Hierarchical Basket Model
Modeling Autocorrelation
Sequential Block Voting
Results & Implementation
Factoring Joint Probabilities
20
15
10
5
0
5
10
15
20
Lag
0.80
0.85
0.90
0.95
1.00
Autocorrelation
We can factorize the joint probability by conditioning on features that
describe the related samples.
p
(
y
0, . . . ,
y
n|
x
0, . . . ,
x
n) :=
n
Y
0
p
(
y
i
|
x
i
r
,
x
i)
Relational Features
u3, 2014-11-13, 2014-11-14,
A01
/B01/C05/D11/
u4, 2014-11-14, 2014-11-16,
A02
/B01/C01/D02/;
A05
/B04/C05/D98/;
u5
, 2014-11-14, 2014-11-16,
A05
/B04/C05/D98/;
u6, 2014-11-14, 2014-11-16,
A04
/B03/C06/D22/;
A05
/B14/C45/D68/;
u7, 2014-11-14, 2014-11-16,
A01
/B01/C01/D03/;
A01
/B04/C05/D78/;
x
a1
= [0,
A2
z}|{
1
,
0,
A4
z}|{
1
,
A5
z}|{
2
, . . .
]
x
a
1:2
= [
A
1
z}|{
3
,
A
2
z}|{
1
,
0
,
A
4
z}|{
1
,
A
5
z}|{
2
, . . .
]
Combining different lags and categories we can describe the sample
neighborhood with:
x
u
5
= [
x
a
1
,
x
a
1:2
,
x
b
1:3
,
x
d
1
]
Relational Features
u3, 2014-11-13, 2014-11-14,
A01
/B01/C05/D11/
u4, 2014-11-14, 2014-11-16,
A02
/B01/C01/D02/;
A05
/B04/C05/D98/;
u5
, 2014-11-14, 2014-11-16,
A05
/B04/C05/D98/;
u6, 2014-11-14, 2014-11-16,
A04
/B03/C06/D22/;
A05
/B14/C45/D68/;
u7, 2014-11-14, 2014-11-16,
A01
/B01/C01/D03/;
A01
/B04/C05/D78/;
x
a
1
= [
0
,
A
2
z}|{
1
,
0
,
A
4
z}|{
1
,
A
5
z}|{
2
, . . .
]
x
a1
:
2
= [
A1
z}|{
3
,
A2
z}|{
1
,
0,
A4
z}|{
1
,
A5
z}|{
2
, . . .
]
Combining different lags and categories we can describe the sample
neighborhood with:
x
u
5
= [
x
a
1
,
x
a
1:2
,
x
b
1:3
,
x
d
1
]
Relational Features
u3, 2014-11-13, 2014-11-14,
A01
/B01/C05/D11/
u4, 2014-11-14, 2014-11-16,
A02
/B01/C01/D02/;
A05
/B04/C05/D98/;
u5
, 2014-11-14, 2014-11-16,
A05
/B04/C05/D98/;
u6, 2014-11-14, 2014-11-16,
A04
/B03/C06/D22/;
A05
/B14/C45/D68/;
u7, 2014-11-14, 2014-11-16,
A01
/B01/C01/D03/;
A01
/B04/C05/D78/;
x
a1
= [0,
A2
z}|{
1
,
0,
A4
z}|{
1
,
A5
z}|{
2
, . . .
]
x
a1
:
2
= [
A1
z}|{
3
,
A2
z}|{
1
,
0,
A4
z}|{
1
,
A5
z}|{
2
, . . .
]
Combining different lags and categories we can describe the sample
neighborhood with:
x
u5
= [
x
a1
,
x
a1
:
2
,
x
b1
:
3
,
x
d1
]
Outline
Hierarchical Basket Model
Modeling Autocorrelation
Sequential Block Voting
Results & Implementation
Identifying Sequential Blocks
u1,
2014-11-13, 2014-11-14,
A01/B01/C01/D01/
u2,
2014-11-14, 2014-11-15,
A02/B02/C02/D02/;A02/B02/C03/D03/;
u3,
2014-11-14, 2014-11-16,
A02/B02/C02/D02/;A02/B02/C03/D04/;
1:
blockId
[:]
←
0
2:
count
←
0
3:
for
i
←
1,
n
do
4:
if
endTime(i)
≥
endTime(i-1)
then
5:
count ++
6:
end if
7:
blockId
[
i
]
←
count
8:
end for
0
20
40
60
80
100
120
140
160
block size
0
1
2
3
4
5
6
7
Block based Voting
1:
if
blockSize(i)
≥
10 AND (median(i)
≤
.6 OR median(i)
≥
.9)
then
2:
if
median(i)
≥
.9
then
3:
predict female
4:
else if
median(i)
≤
.6
then
5:
predict male
6:
end if
7:
else
.
per sample threshold
8:
if
y
i
≥
.82
then
9:
predict female
10:
else
11:
predict male
12:
end if
13:
end if
Outline
Hierarchical Basket Model
Modeling Autocorrelation
Sequential Block Voting
Results & Implementation
Results & Implementation
Score
Place
Final Result
0.84067348
7
I
Full Competition Source Code:
https://github.com/ibayer/PAKDD2015_Competition
I
Factorization Machine Implementation:
https://github.com/ibayer/fastFM