LDA Based Security in Personalized Web Search

(1)

LDA Based Security in Personalized Web Search

R. Dhivya

¹

/ PG Scholar, B. Vinodhini

²

/Assistant Professor, S. Karthik

³

/Prof & Dean

Department of Computer Science & Engineering

SNS College of Technology Coimbatore, India

Abstract- An individual person searching through the web according to their interest is known as personalized web search. Many users accessing the web but the web results shows similar results to everyone for example a fruit seller and programmer may use the same word “Apple” to search for their individual needs. In this paper, it mainly describe about the different personalized web search methods such as client side web search and others which is used to protect the personal information of users.

Keywords: Data mining, privacy preserving, personalized web search, client side web search.

I.INTRODUCTION

Data mining refers to extracting data from large amount of data. It is also known as knowledge discovery from data. It is used to convert raw data into useful information. Since most of the persons around world depends on the web for searching the information, retrieving relevant information or improving their results. Personalized web search refers to search of information based on the user’s individual interest beyond the query provided by the users. Privacy is a term which does not provide an access of information to the unauthorized person. To secure those private information or data various methods such as client side web search, re-ranking, personalizing profiles, adaptive search results in personalized web search have been proposed.

DATA, INFORMATION, AND KNOWLEDGE DATA

Data are any facts, numbers, or text that can be processed by a computer. Today, organizations are accumulating vast and growing amounts of data in different formats and different databases. This includes:

 operational or transactional data such as, sales, cost, inventory, payroll, and accounting

 nonoperational data, such as industry sales, forecast data, and macro economic data

 meta data - data about the data itself, such as logical database design or data dictionary definitions

INFORMATION

The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale transaction data can yield information on which products are selling and when.

KNOWLEDGE

Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a

(2)

manufacturer or retailer could determine which items are most susceptible to promotional efforts.

KNOWLEDE DISCOVERY DATABASES PROCESS

The Knowledge Discovery in Databases process comprises of a few steps leading from raw data collections to some form of new knowledge. The Figure 1.1 represents the iterative process consists of KDD:

Figure 1.1: Knowledge Discovery Databases

II.RELATED TOPICS

CLIENT SIDE WEB SEARCH AGENT

Information retrieval systems (e.g., web search engines) are critical for overcoming information overload. A major deficiency of previous retrieval systems that they are generally lack of user modeling and are not reconciling to individual users, resulting in inherently and not providing the best retrieval performance [1].In Personalized web search a web does not know about the user and it provides the information based on the user queries if the same queries were submitted by the number of user it gives the irrelevant results. To avoid that a search engines should know about the user interest from the user search details. An implicit user model is implemented for extending the current query of the user. While retrieving the data or information from the web, there

might be loss of some information to overcome this client side web search and the decision theoretic framework and it can improve the search accuracy.

The use of immediate search context and implicit feedback information as eager updating of search results to maximally benefit to user [1].

LINK AND CONTENT PERSONALIZATION Hyperlink-based personalized search systems have a problem in that they do not clarify whether their search results actually satisfy each user’s information need. This is because personalization based on a user’s context, i.e., browsing patterns, bookmarks, and so on is not performed. The personalized Web sites described have the following shortcomings: (1) users have to rate items or adjust sliders to obtain relevant information in “Link Personalization”and (2) in “Content Personalization” [6] described the load on users becomes high because they have to answer questionnaires in advance to register their personal preferences or demographic information, and they have to change their registered information by themselves if their interests change. In addition, the recommender systems have the potential to provide serendipitous recommendations if users are only willing to rate items. However, in actuality, most users are unwilling to rate items even though user’s ratings for items are key factors to achieving better recommendations. As a result, the accuracy of recommendations may be poor.

Not necessary to believe that approaches based on user ratings provide users with more relevant information that satisfies each user’s information need. Therefore, search system should directly and exactly capture the changes in each user’s preferences without any user effort in order to provide more relevant information for each user. In order to construct such a system, several approaches have been proposed to adapting search results according to each user’s information need. A given approach is novel because it allows each user to perform a fine-grained search by capturing the

(3)

changes in each user’s preferences without any user effort [6]. When a user submits a query to a search engine through a Web browser, the search engine returns search results corresponding to the query.

Based on the search results, the user may select a Web page in an attempt to satisfy his/her information need. In addition, the user may access more Web pages by following the hyperlinks on his/her selected Web page and continue to browse. System monitors the user’s browsing history and updates his/her profile whenever his/her browsing page changes.

When the user submits a query the next time, the search results adapt based on his/her user profile.

RERANKING

User can get more information in the web search and those searches can be listed and ranked according to their user search preferences for personalization. The information is used to re-rank web search results within a relevance feedback framework [2]. Most of the personalized search were based on the explicit relevance feedback to get the better performance an implicit relevance feedback were used and this contains a collection of information for detailed web search which includes that detail information and cooperation’s of client, user browsing history and previous searched web pages. In this paper, improving the search experiences of users, by collecting the search related information, previously issued queries and previously visited web pages [2].

Client side algorithms are developed for the personalized web search. All profile storage and processing is done on the client side which has a best user profile and quantity representation is efficient. A single parameter does not improve the search results but some of the parameters are improved and this is only applicable for small extensions.

Fig Preprocessing

PERSONALIZING PROFILES

In several approaches to creating the user profile for capture user information through the searching histories and collecting the activities of user which needs to install server. User profiles are built based on their activity at the search site itself and the use of these profiles to provide the improved personalized search results [3]. Google wrapper has been used to build the profiles based on queries and snippets for examining search results. A Method ontology’s and semantic concepts were included to find the unambiguity of words and extension of current web is an semantic web, in this an automatic techniques are implemented. Using this queries and snippets a ranking is improved in search results but it can harm few queries. Dynamically these algorithms are not adaptive for merging and splitting concepts.

(4)

Fig Personalized score frame

ADAPTIVE SEARCH RESULTS

Searching through the web search information adapted to users with various information needs this paper proposed that adapting search results according to each user’s need for relevant information without any user effort and verifying the effectiveness [4]. A modified collaborative filtering is used to collect the previous browsing details of user for 24 hrs, user preferences can be achieved by built the user profile.

In web hyperlinks structures are playing a important role for providing an Pagerank and this helps to provide better search results. Two types of personalization are available which are namely Link Personalization and Content Personalization. Link Personalization which involves selecting the links that are more relevant to the user and changing the original navigation space by reducing or improving the relationships between web pages [4] and Content Personalization is used to provide the variety of information to different users. To improve the performance and effectiveness of search engine collecting the feedback from users, analyzing the previous browsing details .The drawback of this information could not send through the movies, music and long term of user’s browsing details are not possible.

USER CUSTOMIZABLE PRIVACY PRESERVING SEARCH

Personalized web search has plays an important role in improving the performance of a variety of search engines in the internet. While searching through the web, users are unwilling to disclose their individual and personal information during their search it becomes major obstacle for the performance of the web search. The solution to this problem user information has to be collected and analyzed to figure out the user intention behind the issued query [5].

Two methods are used click log based and profile based method were click log based method is as stable and the profile based method is an unstable

process and the profile based method improves the performance of web search experiences. Using this UPS (User customizable privacy preserving search) can adapt the each individual user profiles and it reducing the information loss in retrieving the data from the search engines. It can perform online generalization [5] and reveals that UPS could achieve the quality search results also user information can be protected. Drawback in this paper is to find the richer relationship among the topics and an advanced method to building the user profile also improving the performance.

PRESERVING USER PRIVACY

Web search engines (e.g. Google, Yahoo, Microsoft Live Search, etc.) are widely used to find certain data among a huge amount of information in a minimal amount of time. However, these useful tools also pose a privacy threat to the users: web search engines profile their users by storing and analyzing past searches submitted by them. To address this privacy threat, current solutions propose new mechanisms that introduce a high cost in terms of computation and communication. A novel protocol is used specially designed to protect the users privacy in front of web search profiling. System provides a distorted user profile to the web search engine. The implementation details and computational and communication results that show that the proposed protocol improves the existing solutions in terms of query delay [7]. This scheme provides an affordable overhead while offering privacy benefits to the users.

UTILITY SHARING OF PERSONAL DATA The potential value of harnessing data about people to enhance online services coupled with the growing ubiquity of online services raises reasonable concerns about privacy. Both users and the hosts of online applications may benefit from the custom-tailoring of services. However, both may be uncomfortable with the access and use of personal information. There has been increasing discussion about incursions into the

(5)

privacy of users implied by the general logging and storing of online data[4]. Beyond general anxieties with sharing personal information, people may more specifically have concerns about becoming increasingly identifiable; as increasing amounts of personal data are acquired, users become members of increasingly smaller groups of people associated with the same attributes. Most work to date on personalizing online services has either ignored the challenges of privacy and focused efforts solely on maximizing utility or has completely bypassed the use of personal data. One vein of research has explored the feasibility of personalizing services with methods that restrict the collection and analysis of personal data to users own computing devices (Horvitz, 2006). Research in this realm includes efforts to personalize web search by making use of the content stored on local machines, as captured within the index of a desktop search service. Rather than cut off opportunities to make personal data available for enhancing online services or limit personalization to client-side analyses, we introduce and study utility-theoretic methods that balance the costs of sharing of personal data with online services in return for the benefits of personalization. Such a decision-theoretic perspective on privacy can allow systems to weigh the benefits of enhancements that come with adaptation with the costs of sensing and storage according to users preferences. User can explicitly quantify preferences about utility and privacy and then solve an optimization problem to find the best trade. An approach is based on two fundamental observations. The first is that, for practical applications, the utility gained with sharing of personal data may often have a diminishing returns property; acquiring more information about a user adds decreasing amounts to the utility of personalization given what is already known about the user’s needs or intentions. On the contrary, the more information that is acquired about a user, the more concerning the breach of privacy becomes. For example, a set of individually non-identifying pieces of information may, when combined, hone down the user to membership in a small group, or even identify

an individual. We map the properties of diminishing returns on utility and the concomitant accelerating costs of revelation to the combinatorial concepts of submodularity and supermodularity, respectively

LATENT DIRCHLET ALLOCATION

Web users usage are increased day by day according to that new features are needed and common web search engines are providing multi irrelevant answers for the particular(individual) search due to these unnecessary problems users losing their duration in hours and it makes inconvenient to the users. To reduce these problems LDA(Latent Dirchlet Allocation) is used. LDA is a Simpler Technique and these are used for matching the relevant text and finding the relationship between each keyword.

Relation between each and every keyword are most useful for finding the better search results and it can improve the search experience user more when compared to other.

III. CONCLUSION

This paper presents a Various techniques used for individual web search. Personalized web search is used to improve the quality of various search services on the internet. To protect the personal privacy of the user and it should not compromise the search quality.

A methods and algorithms used in this are to improve the search quality and protect the privacy of the user.

An future work is to finding the richer relationship between keywords and to protect the privacy of the users in Personalized web search.

IV. REFERENCES

[1] X. Shen, B. Tan, and C. Zhai, “Implicit User Modeling for Personalized Search,” Proc. 14th ACM Int’l Conf. Information and Knowledge Management (CIKM), 2005

[2] J. Teevan, S.T. Dumais, and E. Horvitz, “Personalizing Search via Automated Analysis of Interests and Activities,” Proc. 28th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR), pp. 449-456, 2005.

(6)

[3] M. Spertta and S. Gach, “Personalizing Search Based on User Search Histories,” Proc. IEEE/WIC/ACM Int’l Conf. Web Intelligence (WI), 2005.

[4] K. Sugiyama, K. Hatano, and M. Yoshikawa, “Adaptive Web Search Based on User Profile Constructed without any Effort from Users,” Proc. 13th Int’l Conf. World Wide Web (WWW),2004.

[5] Lidan shou, He Bai, Ke Chen,Gang Chen, “Supporting Privacy Protection in Personalized Web Search,” IEEE vol.26 ,2014

.

[6]Tan.B, Shen.X, and Zhai.C, 2006 “Mining Long-Term Search History to Improve Search Accuracy,” Proc. ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD)

[7]Castellı´-Roca.J, Viejo.A, and Herrera-Joancomartı´.J, 2009

“Preserving User’s Privacy in Web Search Engines,” Computer Comm., vol. 32, no. 13/14, pp. 1541-1551,.

[8]Krause.A and Horvitz.E, 2010 “A Utility-Theoretic Approach to Privacy in Online Services,” J. Artificial Intelligence Research, vol. 39, pp. 633-662,.

[7]Castellı´-Roca.J, Viejo.A, and Herrera-Joancomartı´.J, 2009

“Preserving User’s Privacy in Web Search Engines,” Computer Comm., vol. 32, no. 13/14, pp. 1541-1551,.

[8]Krause.A and Horvitz.E, 2010 “A Utility-Theoretic Approach to Privacy in Online Services,” J. Artificial Intelligence Research, vol. 39, pp. 633-662,..