• No results found

Jaccard with Singular Value Decomposition Hybrid Recommendation Algorithm

N/A
N/A
Protected

Academic year: 2020

Share "Jaccard with Singular Value Decomposition Hybrid Recommendation Algorithm"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

2016 International Conference on Wireless Communication and Network Engineering (WCNE 2016) ISBN: 978-1-60595-403-5

Jaccard with Singular Value Decomposition

Hybrid Recommendation

Algorithm

Chen XU

1

, Ling CHEN

1,2

and Ben-jie FAN

2

1Department of Computer Science, Yangzhou University, Yangzhou, 225009, China

2State Key Lab of Novel Software Tech, Nanjing University, Nanjing, 210093, China

Keyword: Jaccard, Singular value decomposition, Hybrid recommendation.

Abstract. In this paper, a comprehensive recommendation algorithm is studied, and a combination of Jaccard and singular value decomposition algorithm is proposed to improve the recommendation accuracy and recall rate. First, we use the Jaccard algorithm to find their recommended listings ranking matrix, which joined the incremental update. Then we use SVD algorithm to complete Jaccard matrix ranking algorithm, so as to obtain a complete list of recommendations, which are recommended according to actual requirements. Experimental results show that the proposed algorithm has higher performance compared with other related recommendation algorithms.

Introduction

With the rapid development of Internet technology, the amount of information in the network[1] has brought a sharp rise, the problem of information overload[2], namely: the amount of information presented so that users can not get the interesting and useful information on their own, so the using efficiency of information and reduce. On the other hand, a large number of information that no one shows any interest became the dark information[3] in the network. Personalized recommendation system[4-10] came into being, which is a very effective way to solve this problem. It is based on the user's information needs, interest, etc., will be the user interested in the product, the information is recommended to the user. Compared with the search engine, it is recommended to make personalized computing by studying the user's interest preference[5,6]. The system can find the user's interest point, so as to guide the users to find their own information needs. A good recommendation system can not only provide users with personalized service, but also to establish a close relationship with the user, so that users can rely on the recommendation[7-10]. Recommendation algorithm is the core of the personalized recommendation system. The idea of this paper is to mix the second kinds of recommendation algorithms in the mixed recommendation: in the framework of some kind of recommendation strategy, and mix the other recommendation strategies, namely: Jaccard with SVD.

Related Work

Jaccard

The basic idea of Jaccard[11-14] index: we define the number of neighbors of the X node as (X) in the internet. The similarity between two nodes can be defined as the number of X and Y and common neighbor nodes by dividing the number of node X and Y elements in the Union, the formula is as follows:

) ( ) ( ) ( ) ( ) , ( Y X Y X Y X Jaccard       

(2)

between the two listings is 0, the value of the molecular term is 0. In order to solve this problem, we introduce a method of singular value decomposition (SVD).

Singular Value Decomposition(SVD)

The basic idea of singular value decomposition (SVD)[15-18]: We assume that the user - Housing rating matrix R is M x N matrix, and make the matrix R divided into three parts S, U, V. U is the M x M matrix. V is the N x N matrix. S is the N x M matrix, and the non-diagonal elements of S are 0. The diagonal elements are singular values of matrix R, and are satisfied from the order of large to small order. Its formula is expressed as follows: R = U x S x V.

Singular value decomposition has an advantage, which allows the existence of a simplified approximation matrix. For the matrix S to retain the K singular value, the other with 0 to replace. In this way, we can simplify the dimension of S to a matrix (K<N) with only K singular values. Because of the introduction of 0, so you can delete the S median of 0 rows and columns, get a new diagonal matrix SK. If the matrix U and V are also in accordance with this method, we can get the K x N matrix UK and K x M matrix VK, then the reconstructed matrix R1 = UK*SK*VK. The R1 approximation is equal to R, so that the dimension reduction of the matrix is achieved. Schematic diagram is as follows:

S V

R U

M x N M x K

[image:2.595.152.444.305.416.2]

K x K K x N

Figure 1. Singular value decomposition.

We want to do is on the recommendation of the housing, so we need to take out the VK matrix and calculate the similarity of the matrix by the Euclidean distance.

The basic idea of Euclidean distance: Euclidean metric (also known as Euclidean distance) is a commonly used distance definition, refers to the actual distance in m dimensional space between two points, or vector natural length (i.e., the distance to the origin). Euclidean distance in two dimensional and three dimensional space is the actual distance between two points. We consider each line of the matrix VK as a N dimensional vector, and then compute the Euclidean distance between rows and rows to obtain the similarity matrix.

Algorithm Framework and Process

First, we use Jaccard algorithm to calculate the recommended index ranking matrix; then we use SVD algorithm to complete Jaccard matrix ranking algorithm Index, so as to obtain a complete list of recommended ones. Schematic as shown below:

[image:2.595.134.469.623.760.2]
(3)

Incremental Algorithm Design

The Basic Idea of the Algorithm

Jaccard update mechanism: Offline Jaccard algorithm complexity is O(M*N2), where M for the number of users, N for the number of houses. We designed an incremental Jaccard algorithm to reduce the time complexity. Drawing as follows:

0-1 Scoring matrix Jaccard Similarity matrix Figure 3. Incremental flow chart.

[image:3.595.87.511.162.410.2]

0-1 Scoring matrix old user ID Intersection Table Set table Figure 4. Incremental flow chart.

Incremental log extracted 0-1 score array: the number of users M', the number of listings is still N. We will be five minutes of new data is divided into new users and old users, old users that: before there is a user to click on the user. We analyze the incremental log score matrix:

Case 1 new users click on a suite, did not change the intersection of the first row of the table and set in the table value, the value of Jaccard.

Case 2 new users click on two or more units of housing: any two suites

 

i j, , on the intersection

of the following table and set the table:inti j,  ; unii j,  .

Case 3 The old users click on a suite, and the suite has been visited: no impact on the Jaccard value.

Case 4 The old users click on a suite, and the suite has not been visited: the user is located in the original 0-1 scoring array of the first line, and scan all non housing. If u k_ 0, else inti k, inti k, ;

,

unii k  . If u k_ 1, else inti k,  ; unii k, unii k, .

Case 5 The old users click on a number of suites, and these rooms have been visited before: no impact on the Jaccard value.

Case 6 The old user clicks more suites, and these rooms have not been accessed: these rooms update intersection table and Set table by case 4.

Case 7 The old user clicks more suites, and Some of the rooms have been clicked: We filter out the houses that have not been visited and update intersection table and Set table by case 4.

Algorithm Time Complexity Analysis

Because the intersection and union will form a calculation table in every day, so you do not need to add complexity analysis.

case 1: 0; case 2: O(M’*N2

); case 3: 0;

case 4: O(M’*N); case 5: 0; case 6: O(M’*N2); case 7: O(M’*N2 ).

(4)

M '. This means that the incremental calculation of Jaccard can save a lot of time. According to previous experience, if 5 weeks of log is used as a training set, M’ =1300, 5 min average incremental user volume is MAX M’= 40. According to the average click on 5 sets of different listings, the efficiency of at least 13000/401 x e2 times.

We made five minutes of new data be divided into new users and old users, old users that: before there is a user to click on the user. For the old users, we will change the original matrix of the corresponding data; for the new users, we add the corresponding line number in the original matrix, calculate the similarity, and storage data. We ignore the old user's data, and calculate the new data that is all as new user data. 5 minute data stream contains new user clicks, but also the old user clicks. We take 5 minutes of all the data as a new user's click behavior. We analyze the accuracy and the error of the algorithm under the Jaccard update mechanism:

Figure 5. Error analysis and comparison.

In the figure, the red line indicates that the Jaccard algorithm to click on the accuracy rate. The blue line indicates that the incremental Jaccard algorithm to click on the accuracy rate, we can see that the error between the two is very small. In the Jaccard update, we can take 5 minutes of data flow as a result of the new user clicks.

Experimental Data Selection and Algorithm Evaluation Index

Data Set

[image:4.595.95.490.225.419.2]

We took a sample of a company's mobile phone for three months as an experimental data set, as shown in the following table:

Table 1. Topological properties of data sets.

Number of users Number of house sparsity

Three month 24803 1783 0.0061

The analysis of Table 1: A total of 1783 houses, a total of 24803 users, the sparsity of only 0.0061

Data Analysis

We selected a company three months of data as the experimental data set. The data analysis is carried out on the data set, count the number and click the new day within three months of the new user, and click new is the number of listings. Shown in figure 6:

(5)
[image:5.595.97.479.73.263.2]

Figure 6. Data analysis for three months.

Evaluation Index

Ranking Score. Score Ranking focuses on the location of the edge of the test set in the final order. H =U - ET is set for unknown edges. Ri represents the unknown edge i∈EP ranking in the

ranking. The unknown edge of the Score Ranking value is RSiri/|H|. |H| represents the number

of elements in a collection H. Traversing all the edges in the test set, the Score Ranking value of the system is:

 

 

p

p i E

i

E i

p i p

H r E

RS E

RS

| | | |

1 |

| 1

Obviously, the smaller the value of RS, the higher the accuracy of the algorithm.

Precision. Precision only considers the target user score list in L before the boundary is accurate prediction. The Precision is defined as:

L m precision

Obviously, the higher the Precision value, the higher the accuracy of the algorithm.

Hits. Hits refers to the user to click on the recommended items accounted for the proportion of total users click on the project. Suppose we recommend 10 suites, and then the user to click on a. If the user clicks on the room in the recommended 10 suites, it is recorded as a successful hit. We set M to represent the total number of users to click on the house, L said the total number of houses were recommended to be. Hits value is:

M L Hits

Obviously, the higher the value of Hits, the higher the rate of recommendation algorithm.

Comparison and Analysis of Algorithm Performance

(6)
[image:6.595.127.470.86.253.2]

Table 2. Experimental Result.

RS Precision Hits

CN 0.5082 5.3750e-004 37.1497%

Salton 0.4737 0.0025 40.8557%

Jaccard 0.4353 0.0311 41.1871%

Sorenson 0.4354 0.0300 41.1871%

HPI 0.7368 2.0767e-004 3.7662%

HDI 0.4350 0.0311 40.0723%

LHN 0.4642 0.0026 1.2052%

AA 0.5046 0.0014 38.0838%

RA 0.5014 0.0032 39.3793%

PA 0.4370 0.0266 13.8295%

NMF 0.4405 0.0227 28.834%

Jaccard With SVD 0.4753 0.0311 42.1671%

From the table data can be found, we compare the three evaluation indicators found that the performance of the Jaccard_With_SVD algorithm is optimal. So we designed the method of combining Jaccard and SVD is reasonable.

Conclusions

In this paper, we study the hybrid recommendation algorithm of Jaccard and SVD. The experimental results show that compared with the previous algorithms, such as: NMF, AA, RA and so on; Jaccard_with_SVD algorithm has a higher hit rate, better accuracy, better accuracy. At the same time, The increment that we add can make users to more quickly get their recommend the latest results witch their want .

References

[1] Hooper E, Holloway R. Intelligent techniques for effective network protocol security monitoring, measurement and prediction[J]. International Journal of Security and Its Applications, 2008, 2( 4) : 1 - 10.

[2] Resnick P, Iacovou N, et al. GroupLens: an open architecture for collaborative filtering of netnews[C]/ /Proceedings of the 1994 ACM conference on computer supported cooperative work, Chapel Hill, North Carolina, United States, October 22-26, 1994.

[3] HaiLing Xu, Xiao Wu, Xiaodong Li, etc. Comparison Study of Internet Recommendation System [J]. Journal of software, 2009.

[4] XiaoBo Tang, Zhao Zhang. Research on Personalized Recommendation System of online social network based on hybrid graph. Information theory and Practice 36.2(2013):91-95.

[5] GuoXiang Wang, HePing Liu. Summary of personalized recommendation system. Computer engineering and Applications 48.7(2012):66-76.

[6] Ansari A, Essegaier S, Kohli R. Internet Recommendation Systems[J]. Journal of Marketing Research, 2013, 37(3):363-375.

[7] Burke R. Hybrid Web Recommender Systems[M]// The adaptive web. Springer-Verlag, 2007:377-408.

[8] Adomavicius G, Tuzhilin A. Context-Aware Recommender Systems[J]. Ai Magazine, 2011, 32(3):217-253.

(7)

[10] Ma, Hao, Zhou, et al. Recommender systems with social regularization[J]. WSDM, 2011:287-296.

[11] Greenwald A G, Nosek B A, Sriram N. Consequential validity of the implicit association test: comment on Blanton and Jaccard (2006).[J]. American Psychologist, 2006, 61(1):62-71.

[12] Alvesalo J, Greco D, Leinonen M, et al. A proof for the positive definiteness of the Jaccard index matrix[J]. International Journal of Approximate Reasoning, 2013, 54(5):615-626.

[13] Shameem M U S, Ferdous R. An efficient k-means algorithm integrated with Jaccard distance measure for document clustering[C]// Asian Himalayas International Conference on Internet, 2009. Ah-Ici. IEEE, 2009:1-6.

[14] Niwattanakul S, Singthongchai J, Naenudorn E, et al. Using of Jaccard Coefficient for Keywords Similarity[J]. Lecture Notes in Engineering & Computer Science, 2013, 2202(1).

[15] Furnas G W, Deerwester S, Dumais S T, et al. Information retrieval using a singular value decomposition model of latent semantic structure[C]// International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1988:465-480.

[16] Brand M. Incremental Singular Value Decomposition of Uncertain Data with Missing Values[C]// European Conference on Computer Vision. Springer-Verlag, 2002:707-720.

[17] Xu S, Zhang J, Han D, et al. Singular value decomposition based data distortion strategy for privacy protection[J]. Knowledge and Information Systems, 2006, 10(3):383-397.

Figure

Figure 1. Singular value decomposition.
Figure 4. Incremental flow chart.
Table 1. Topological properties of data sets.
Figure 6. Data analysis for three months.
+2

References

Related documents

: Assessment of pharmacological strategies for management of major depressive disorder and their costs after an inadequate response to first-line antidepressant treatment in

A large multicentre study of 4382 patients found a sta- tistically significant reduction in rates of transient and permanent recurrent laryngeal nerve paralysis with the use

Though banks offering cross-border financial services cannot create much liquidity, they do not put the banking sector at risk because they are able to allocate liquidity

two-stage antenna grouping scheme in generalized spatial modulation MIMO systems with. dual-polarized antennas, which leads to the lowest possible ABEP

In order to investigate the effects of HTS materials on the resonant characteristics of microstrip antennas, the resonant frequencies of equilateral triangular patch antenna of

6: Relationship between dimensionless backwater rise (h/d3) and Fr3/Fr3L for flow between triangular end-noses piers with different pier locations under subcritical condition.

residence, sex and date of birth would bring a higher disclosure risk. In this paper we address the emerging problem of de-identification techniques, namely, the problem of