• No results found

User Propensity Analysis For Movie Prediction Rating Based On Collaborative Filtering And Fuzzy System

N/A
N/A
Protected

Academic year: 2020

Share "User Propensity Analysis For Movie Prediction Rating Based On Collaborative Filtering And Fuzzy System"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

User propensity analysis for Movie prediction rating

based on Collaborative filtering and Fuzzy system

Rachit Tomar

U

Research ScholarU: B.Tech 4P th

P

year (Computer Science) Maharaja Surajmal Institute of Technology (GGSIPU), Delhi, India

Cherag Verma

U

Research ScholarU: B.Tech 4P th

P

year (Computer Science) Maharaja Surajmal Institute of Technology (GGSIPU), Delhi, India

U

Abstract

In the last few years a self-regulating active system is being worked on that develops recommendations for user before the user actually requests for it. Content based and Collaborative filtering techniques are used to provide intelligent individual recommendations. The prediction system proposed lays basis on the technique of recommendation system applying collaborative filtering, the problems of which are solved using fuzzy system. We used data of users rating about movie to predict and verify results. RMSE (Root Mean Square Error) of each movie is calculated from the system. On comparison of predicted RMSE with systems value we figure out the accuracy of the system. As applied by the results, the system can be used as base for many types of recommendation system and media.

Keywords: Collaborative filtering, Fuzzy system, RMSE (Root Mean Square Error).

I. INTRODUCTION

Recommendation system aims in filtering information and provide users with results that might interest them. Suggestion for laptops on Amazon, watches on Flipkart are few real life examples. Emphasis is laid on domain and characteristics of data available while designing such engines. Data-mining techniques like information filtering and pattern recognition have been used to develop recommendation system. Information filtering methods are mostly researched among related methods. Research in the field of recommendation system is divided by collaborative filtering and content based recommendation.

Content-based filtering systems are based on profile attributes that is item to item and item to user preference. Range of recommendable contents remains low as content is of text type which can be easily characterised. Collaborative filtering systems analyse historical interactions alone, using user propensity to predict user preference.

II. PREDICTION SYSTEM BASED ON MOVIE RATING

A.

U

Data recognition

The analysis method for personal propensity is usually normalizing personal information such as sex, age and occupation. This may lead to security problems by personal information.

(2)

And user’s satisfactions are lower than expected because user’s data also wasn’t perfectly matched with personal taste and individuality. We were developing movie rating prediction system based on personal propensity with previous watched movie records for user satisfaction.

The structure of proposed system is shown in Fig. 1. All data is consisted of text-type. Data from the Netflix is evaluated information of nearly 5 million users and about 18 thousand movies. Rating points are from 1 to 5. Probe and qualifying data are offered to evaluate system performance. Data set involved is base data for movie evaluation and prediction system. Training data or the base data is reorganised as it was not suitable to analyse personal interests. The movie_id is expressed as Movie ID and id’s range is from 1 to 17,770. We can know about evaluation of each user through the 17,770 movie files. The user_u is expressed as User ID and u’s range is from 1 to 2,649,429 and 480,189 users are distributed among this range.

The actual data is used to evaluate system performance by comparison of predicted value against the actual rating. Qualified data or RMSE is error between predicted value and known value. System is made to minimise error between actual and predicted value thereby increasing system performance and consistency.

B.

U

The sequence of movie rating prediction

Figure 3 shows the series of steps followed for rating prediction of target user about target movie. Information of movie and target user can be extracted through probe data. Filtering is performed to identify reliability of user information based on rating of previous movie’s watched by the target user. Mean value of each user’s movie rating record can be used to evaluate filtered users while collaborative filtering is used to evaluate unfiltered users to analyse personal propensity. A group of users having similarity with the target user is created. Similarity calculation is performed to predict target user’s rating using the above mention group. Further fuzzy inference process helps in improving results of collaborative filtering.

(3)

C.

U

User filtering

Reliability of watched movie information is very important for this research. But training data consists of all meaningless information. For example, a user rates points about 17635 movies per day in the training data from Netflix.

So by this system performance degradation is evident. So, user filtering is required to have reliability of related user group. In this paper, there are two conditions to be filtered. First, the users who is exceeded the mean number of rated movies per day are filtered, that is, users who rated over 200 movies per day are filtered. Second, users who rated per day over 25% of total user’s rate are filtered. And the calculation method of section is shown in (1).

section =

Max�mu,d�

mu,i

(1)

P

m

PRu,i R is the total rated number of movie (i) by user (u), mRu,dR is the rated number of movies by

user per day. By means of previous movie information, Personal propensity about filtered user is analysed without comparison with other user’s propensity. So in order to analyse the propensity, the mean value about all rated movies by user is used.

(4)

D.

U

Personal propensity based user grouping

Unfiltered users are subject to member of related user group to analyse personal propensity. By taking user’s movie profile, Related user group is organized, Fig. 4. There is user’s rate about movie item in the user-movie profile and filled blanks present watched movie item. Movie id’s range is from 1 to 17,770 and user u’s range is from 1 to 2,649,429. Related user group is created by movie rating analysis between related users through the user-movie profile and the target user. Target user is user_3 and target movie is movie_4. Related users are selected by similarity with target user among watched information by target user. Related users are user_1, user_4 and user_u-1.

E.

U

Personal propensity by user grouping

The rating is predicted by similarity with target user and related user group when target user and movie are selected.

The equation of extracting similarity is shown in (2).

similarity (s) =

N(ratetar,i ,rateu,i)

ntar

(2)

The rateRtar,iR is the rate of movie i about target user_tar, rateRu,iR is the rate of movie i about

related user_u. And N ( rateRtar,iR , rateRu,i R) is the same number between rateRtar,iR and rateRu,i R. The

(5)

nRtarR is the number of movie items by target user_tar. As similarity mu of related users is

closing to 1.0, it is more similar with target user.

Estimated rate is calculated by means of similarity, (3).

Estimate n =

∑i=ni=1si×Rate

∑i=ni=1si

(3)

In order to calculate the estimated rate, Centre of gravity method is used, (3). The

s

RiR is the

similarity between related user and target user. And Rate is the actual rate of target movie by related user. The n presents the number of related users in similarity with target user.

F.

U

Personal propensity based user grouping

U

Recommendation is unavailable in two cases: before information collection when new item is added in collaborative filtering based system and, in case of insufficiency of related user group’s information. Fuzzy system is used to make up for problem about collaborative filtering.

The fuzzy system’s input components are organized as follows: There are four input components of this system that are mean of similarity based evaluation by collaborative filtering, mean of propensity based evaluation by analysis of watched movie information, mean of date based evaluation and mean of movie based evaluation.

Rating points up to first decimal number by users are calculated from 1 to 5. Each Gaussian membership functions of input components are low (LR), middle (MR) and high rate (HR). Basis LR, MR and HR are 1, 3 and 5 of mean points. Standard deviation of 0.5 is distributed in the membership function to predict rating points about movies, we use Mamdani’s inference rule, Fig.5.

RMSE (Root Mean Squared Error) is calculated through increase in the number of fuzzy rule by relations of similarity with mean of date based evaluation, mean of user based evaluation

(6)

and mean of movie based evaluation. Fuzzy rules are established to predict rating points well at the lower region of RMSE.

III. EXPERIMENTAL RESULTS

A.

U

The results of RMSE using collaborative filtering

Movie rating prediction system and training data which is offered by Netflix to be evaluated about 480,000 users and 17,770 movies is developed using Visual C++ 6.0. To reflect character of population Random sample method is used, because total amount of handling data is too much. And also, the samples are classified as 4 parts, e.g. (100, 300, 500 and 1000 users) for comparing system performances with different sizes of the samples. The results using collaborative filtering are shown in Table I. RMSE is presented by each sample to be aired 10 times.

U

Table I

U

The results of RMSE using collaborative filtering

100 users 300 users 500 users 1000 users

1 1.0394 0.9451 0.9003 0.8945

2 0.9937 0.9622 0.9108 0.9199

3 0.9168 0.9611 0.9284 0.9674

4 0.9742 0.9844 0.9252 0.9364

5 0.9603 0.8818 0.9229 0.9301

6 1.0740 0.9446 0.8782 0.9003

7 0.9792 0.9213 0.8813 0.9035

8 0.9432 0.9499 0.8837 0.9295

9 0.9549 0.9074 0.9151 0.9331

10 0.9546 0.9202 0.9108 0.9094

(7)

U

Table II

U

Results of reliability using collaborative filtering

100 users 300 users 500 users 1000 users

Sample mean 0.97902 0.93780 0.90567 0.92241

99% of reliability

0.9668≤ m

≤0.9913

0.9333≤ m

≤0.9424

0.9036≤ m

≤0.9078

0.9207≤ m

≤0.9241

The range of RMSE is predicted from 0.9036 to 0.9913 with 99% of reliability, Table II.

B.

U

The results of RMSE using collaborative filtering and fuzzy system

The results of RMSE using collaborative filtering and fuzzy system are shown in Table III. RMSE is presented by each sample to be aired 10 times. The range of RMSE is predicted from 0.7735 to 0.8150 with 99% of reliability, Table IV. The system performance using collaborative filtering and fuzzy system is better than only collaborative filtering.

U

Table III

U

The results of RMSE using collaborative filtering and fuzzy system

100 users 300 users 500 users 1000 users

1 0.8229 0.8112 0.8320 0.8167

2 0.7562 0.8170 0.8226 0.8038

3 0.7449 0.8293 0.8358 0.8196

4 0.8137 0.7968 0.8199 0.8190

5 0.8488 0.7852 0.8092 0.8344

6 0.7949 0.7977 0.7543 0.8067

7 0.8404 0.7796 0.7905 0.7890

(8)

8 0.7211 0.7802 0.7770 0.8044

9 0.8085 0.7966 0.8910 0.8500

10 0.7110 0.8021 0.8036 0.7918

U

TABLE IV

U

The results of reliability using collaborative filtering and fuzzy system

100 users 300 users 500 users 1000 users Sample

mean 0.79624 0.79957 0.80359 0.81354

99% of reliability

0.7735≤ m ≤0.7990

0.7972≤ m ≤0.8019

0.8027≤ m ≤8085

0.8125≤ m ≤0.8150

As the results of two experiments, the range of RMSE is narrowing. And, mean of RMSE is also decreased in this result. The more number of users are handled, the more exact prediction is available.

The least RMSE is 0.9841 by team of BellKar among developed system using data from Netflix. It is difficult to compare our results with BellKar’s results. We just got 99% of reliability in the results. But BallKar’s team experimented whole data from Netflix. In this paper, proposed system has the potentiality based on our results.

IV. CONCLUSION

Prediction system, as a base technology of content recommendation system, is proposed in this paper. In this proposed movie rating system, Target user’s rating point about target movie is predicted. All data about movies and users are offered by Netflix which is the company renting movies by online system. After related user group, having similarity with target user, is classified by collaborative filtering, rating point is predicted by analysis of personal propensity. And to improve system performance of collaborative filtering, fuzzy system is applied.

(9)

RMSE is calculated by the standard of Netflix and random sample method is applied to check the system performance. The range of RMSE with 99% of reliability is predicted to compare with previous movie rating system.

By analysis of personal propensity, Target user’s rating point is predicted. And we can identify that fuzzy system is complementing defects of collaborative filtering. Proposed system can be utilized as base technology of recommendation system such as music, news, article and book.

REFERENCES

1. G. Adomavicius and A. Tuzhilin, "Towards the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions," IEEE

Transactions on Knowledge and Data Engineering 17, 634-749, 2005.

2. Basu,C., Hirsh,H. and Cohen,W, "Recommendation as Classification : Using Social and Content-based Information in Recommendation," Proc. of the Fifteenth National Conference on Artificial Intelligence(AAAI-98), pp.714-720, 1998.

3. Pazzani,M., "A Framework for Collaborative, Content-Based and Demographic Filtering," Artificial Intelligent Review13(5-6), pp.393-408, 1999.

4. Rensnick,P., Iacovou,N., Suchak,M., Nergstorm,P. and Riedl.,J. "GroupLens : An Open Architecture for Collaborative Filtering of Netnews," Proc. of CSCW '94, pp. 175-186, 1994.

5. Robert M. Bell and Yehuda Koren, “Improved Neighborhood-based Collaborative Filtering,” KDD 2007 Netflix Competition Workshop, 2007.

6. Herlocker J, Konstan J, Terveen L, and Riedl J , "Evaluating Collaborative Filtering Recommender Systems," ACM Transactions on Information Systems 22, ACM Press, 5-53, 2004.

7. "Augmenting Knowledge Reuse Using Collaborative Filtering Systems," A Dissertation presented to the faculty of the graduate school USC (Information Systems), p.191, 2001.

8. Lang, K., "NewsWeeder : Learning to Filter Netnews," In proceedings of the 12th International Conference on Machine Learning,1995.

Figure

Figure 3 shows the series of steps followed for rating prediction of target user about target movie
Table I  The results of RMSE using collaborative filtering  U
Table II  Results of reliability using collaborative filtering  U

References

Related documents

As the report states, “The [reporting] program does not cover all sources of releases and other… activities of Toxic Release Inventory (TRI) chemicals.” One of the challenges

The web application created allows the patient or health care professional to view uploaded logs, the patient can also upload data to the database from the monitor using

A fully developed Bowditch type, 1.8 L heavy duty single cylinder optical diesel engine has been used to conduct experiments on Diesel fuel under varying injection timing and pressure

Vector analysis of the refractive outcome versus the predicted postoperative keratometric astigmatism yielded the mean TIA of 2.00 ± 0.85 D; the TIA values were the same as

We have amended a generally accepted and well-tested successional management framework into a comprehensive decision tool for ecologically based invasive-plant management (EBIPM) by

What is more, according to Article 340 9 of the Criminal Code of Ukraine, “If public prosecutor refuses to prosecute on behalf of the State in court, presiding

Subjects have included young African-American male and female clients of clinics for sexually transmitted disease, Af- rican-Americans living with HIV in Baltimore, outreach workers

Endocrine Disruptor Screening Program (EDSP) was launched by the EPA in 2009 to begin testing pesticides and other chemicals for estrogenic, thyroid, and androgenic effects on