Web Query Classification based on Click Through Approach for Personal Retrieval Performance

(1)

Saranya K, IJRIT 664

International Journal of Research in Information Technology (IJRIT)

International Journal of Research in Information Technology (IJRIT) International Journal of Research in Information Technology (IJRIT) International Journal of Research in Information Technology (IJRIT)

www.ijrit.com www.ijrit.com www.ijrit.com

www.ijrit.com ISSN 2001-5569

Web Query Classification based on Click Through Approach for Personal Retrieval Performance

Saranya K

PG Scholar, Department of CSE CMS College of Engineering Namakkal, Tamilnadu, India [email protected]

S VenKatesh M.E.,

Associate Professor (Sr.G),

Department of CSE CMS College of Engineering

Namakkal, Tamilnadu, India

Abstract— A broad-topic and unclear query, different users may have dissimilar search goals when they submit it to a search engine. Effective personalization cannot be achieved without accurate user profiles. This paper presents a method user interest is employed in the clustering process to achieve personalization effect. First, a framework to discover different user search goals for a query by clustering the proposed feedback sessions. Second, a novel approach is introduced to generate pseudo-documents. Third, a new criterion is put forward to evaluate performance of inferring search goals. Finally, several personalized systems that employ click through data to capture users’ interest have been proposed with rank.

Keywords—User search goals, feedback sessions, and pseudo-documents.

I. INTRODUCTION

In web search applications, queries are submitted to search engines to stand for the information requirements of users.

However, sometimes queries may not exactly represent users’ specific information needs since many unclear queries may cover a broad topic and dissimilar users may want to get information on different aspect when they submit the similar query. For example, when the query “the sun” is submit to a search engine, some users want to position the homepage of a United Kingdom newspaper, while some others want to learn the accepted information of the sun. Therefore, it is necessary and possible to capture different user search goals in information recovery.

Information need is a user’s particular need to find information, to satisfy his or her needs. It helps in improving user search engine importance and user familiarity. The future concept helps in optimizing the search engine base on users seek goals.

• First, a structure to infer dissimilar user search goals for a query by cluster feedback sessions. We express that clustering feedback sessions is more efficient than clustering search results or clicked URLs straightforwardly. Moreover, the distributions of dissimilar user search goals can be obtained rightfully after feedback sessions are clustered.

Second, a novel optimization method to combine the enrich URLs in a feedback session to form a pseudo-document, which can successfully return the information require of a user. Thus, we can tell what the user search goals are in specify.

Finally, a new criterion CAP to calculate the performance of user search goal termination based on restructuring web search results. Thus, we can resolve the number of user search goals for a query.

(2)

Saranya K, IJRIT 665

Fig.1 Web Data Extraction

II. FRAMEWORK OF THE APPROACH The framework consists of the two levels:

The Upper Level:

Feedback sessions are first extracted from user click through logs.

These are mapped to Pseudo-documents.

User search goals are indirect by clustering the pseudo documents, with some keywords.

The Lower Level:

The unique search results are restructed based on the user search goals indirect.

Then, evaluate the performance of restricting search results by Classified Average Precision (CAP) criterion.

This evaluation result is used as the feedback to select optimal number of search goals in the upper part.

Fig.2 System Architecture

(3)

Saranya K, IJRIT 666

III. REPRESENTATION OF FEEDBACK SESSIONS

The proposed feedback sessions includes both clicked and unclick URL in a single session.

The clicked URLs tell what users require.

Unclick URLs reflected what users do not care about. Unclick URLs after the last clicked URL are includes in the feedback sessions.

Fig.3 Sample Feedback Sessions

Map Feedback Sessions to Pseudo-Documents:

It is unsuitable to directly use feedback sessions for inferring user search goals. Thus some representation method is needed to describe feedback sessions in a more efficient and coherent way. Binary Vector Method can be used to represents the feedback session, where “1” represents “clicked” and “0” represents “unclick”.

Fig.4 Goal Text.

For a query, users will usually have some vague keywords representing their interests in their mind. These keywords called goal text, which reflect user information needs.

Building of a pseudo document includes two steps:

1) Representing the URLs in the feedback session: First develop the URLs with additional textual contents by extract the titles and snippets. In this way, every one URL in a feedback session is representing by a small text paragraph that consists of its titles and snippet. Some textual processes are implemented to those paragraphs, such as transforming all the letters to lowercases, stop and eliminating stop words.

(4)

Saranya K, IJRIT 667

2) Forming Pseudo-document based on URL representations: An optimization method to combine both clicked and

unclicked URLs in the feedback session.

Fig.5 mapping feedback sessions to pseudo-documents

It reflects what users desire and what they do not care about. It can be used to accurate goal text in user mind.

IV. EVALUATION BASED ON RESTRUCTURING WEB SEARCH RESULTS

User search goals are not predefined; hence evaluation of its inference is a huge problem. If user search goals are indirect properly, the search results can be restructured properly. Thus an evaluation method “classified Average Precision” is calculated. It helps to decide on the best cluster number.

1) Restructuring Web Search Results: It is an application of inferring user search goals. The indirect ones are represented by the feature representation of each URL in the search result.

Then categorize them into a cluster centered by the inferred search goals. This is done by choosing least distance between URL vector and user-search goal vectors. It is restructuring and ranking are complete.

2) Evaluation Criterion:

Form user click through logs, we get the relevant and irrelevant feedbacks.

The development of estimate method is as follows:

Average Precision (AP)

Voted Average Precision (VAP)

Classified Average Precision (CAP)

(5)

Saranya K, IJRIT 668

Fig.6 example for the calculation of AP, VAP, and Risk

V. EXISTING SYSTEM

In the existing system all the feedback sessions of a query are first extracted from user click-through logs and mapped to pseudo-documents. Then, user search goals are indirect by clustering these pseudo-documents and depicted with some keywords.

As we do not know the accurate number of user search goals in advance, several dissimilar values are try and the best possible value will be determined by the feedback from the lower part. In the lower part, the original search results are restructured based on the user search goals inferred from the top part. Then, evaluate the performance of restructuring search results by this evaluation criterion Classified Average Precision (CAP). And the estimate result will be used as the feedback to select the optimal number of user search goals in the upper part.

Limitations:

In accessible system it reschedules the original result based on feedback process.

1. It focus on clicked URL based feedbacks based on clicked calculation 2. It support human being URL calculation support

3. Didn’t reflect on the URL status process 4. Search history is based on the browser only.

5. Didn’t consider the history process.

V. PROPOSED SYSTEM

The existing feedback session consists of both clicked and unclick URLs and ends with the last URL that was clicked in a sole session. It is used to find what the user need on that time based on this clicked data presented feedback consists.

Most document-based methods center of attention on analyzing users’ clicking and browsing behaviors recorded in the users’

click throughout data. On Web search engines, click through data are important implicit feedback method from users. An example of click throughout data for the query “apple,” which contains a list of ranked search results accessible to the user, with classification on the results that the user has clicked on. It rearranges the result both based on user interest and most clicked URL links. The user attention is not considers by the single session or last search session. In future system create a user side view and monitor users search session any time and also from somewhere.

Advantages:

1. Dynamic interested also will change.

2. User interested based search result will focus first

(6)

Saranya K, IJRIT 669

3. URL ranking in takes place.

4. Search history will be worldwide process.

VI. CONCLUSION

First, initiate feedback sessions to be analyze to gather user search goals rather than search results or click URLs. Both the click URLs and the unclick ones before the last click are measured as user understood feedbacks and taken into account to make feedback sessions. Therefore, feedback sessions can reproduce user information needs more capably. Second, we map feedback sessions to pseudo documents to fairly accurate goal texts in user minds. The pseudo-documents can develop the URLs with additional textual contents including the titles and snippets. Based on these pseudo-documents, abuser search goals can then be exposed and depicted with various keywords. Finally, criterion CAP is formulated to estimate the performance of user search goal conclusion. Experimental results on user click-through logs from a commercial search engine express the helpfulness of our future methods.

The complication of this approach is low down and can be implementing in truth. The proposed approach and discover user search goals for some popular queries offline at first. Our proposed systems create user profile for each and monitor relevant document on ranked sequence user search session anytime from anywhere. Thus, Users can find what they want conveniently.

REFERENCES

[1] R. Baeza-Yates, C. Hurtado, and M. Mendoza, “Query Recommendation Using Query Logs in Search Engines,” Proc. Int’l Conf.

Current Trends in Database Technology (EDBT ’04), pp. 588-596, 2004.

[2] D. Beeferman and A. Berger, “Agglomerative Clustering of a Search Engine Query Log,” Proc. Sixth ACM SIGKDD Int’l Conf.

Knowledge Discovery and Data Mining (SIGKDD ’00), pp. 407-416, 2000.

[3] S. Beitzel, E. Jensen, A. Chowdhury, and O. Frieder, “Varying Approaches to Topical Web Query Classification,” Proc. 30th Ann. Int’l ACM SIGIR Conf. Research and Development (SIGIR ’07), pp. 783-784, 2007.

[4] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li, “Context-Aware Query Suggestion by Mining Click-Through,” Proc. 14th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (SIGKDD ’08), pp. 875-883, 2008.

[5] H. Chen and S. Dumais, “Bringing Order to the Web: Automatically Categorizing Search Results,” Proc. SIGCHI Conf. Human Factors in Computing Systems (SIGCHI ’00), pp. 145-152, 2000.

[6] C.-K Huang, L.-F Chien, and Y.-J Oyang, “Relevant Term Suggestion in Interactive Web Search Based on Contextual Information in Query Session Logs,” J. Am. Soc. for Information Science and Technology, vol. 54, no. 7, pp. 638-649, 2003.

[7] T. Joachims, “Evaluating Retrieval Performance Using Clickthrough Data,” Text Mining, J. Franke, G. Nakhaeizadeh, and I. Renz, eds., pp. 79-96, Physica/Springer Verlag, 2003.

[8] T. Joachims, “Optimizing Search Engines Using Clickthrough Data,” Proc. Eighth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (SIGKDD ’02), pp. 133-142, 2002.

[9] T. Joachims, L. Granka, B. Pang, H. Hembrooke, and G. Gay, “Accurately Interpreting Clickthrough Data as Implicit Feedback,”Proc.

28th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR ’05), pp. 154-161, 2005.

[10] R. Jones and K.L. Klinkner, “Beyond the Session Timeout: Automatic Hierarchical Segmentation of Search Topics in Query Logs,”

Proc. 17th ACM Conf. Information and Knowledge Management (CIKM ’08), pp. 699-708, 2008.

[11] [12] R. Jones, B. Rey, O. Madani, and W. Greiner, “Generating Query Substitutions,” Proc. 15th Int’l Conf. World Wide Web (WWW

’06), pp. 387-396, 2006.

[12] U. Lee, Z. Liu, and J. Cho, “Automatic Identification of User Goals in Web Search,” Proc. 14th Int’l Conf. World Wide Web (WWW

’05), pp. 391-400, 2005.

[13] X. Li, Y.-Y Wang, and A. Acero, “Learning Query Intent from Regularized Click Graphs,” Proc. 31st Ann. Int’l ACM SIGIR Conf.

Research and Development in Information Retrieval (SIGIR ’08), pp. 339-346, 2008.

[14] M. Pasca and B.-V Durme, “What You Seek Is what You Get: Extraction of Class Attributes from Query Logs,” Proc. 20th Int’l Joint Conf. Artificial Intelligence (IJCAI ’07), pp. 2832-2837, 2007.

[15] B. Poblete and B.-Y Ricardo, “Query-Sets: Using Implicit Feedback and Query Patterns to Organize Web Documents,” Proc. 17th Int’l Conf. World Wide Web (WWW ’08), pp. 41-50, 2008.

[16] D. Shen, J. Sun, Q. Yang, and Z. Chen, “Building Bridges for Web Query Classification,” Proc. 29th Ann. Int’l ACM SIGIR Conf.

[17] X. Wang and C.-X Zhai, “Learn from Web Search Logs to Organize Search Results,” Proc. 30th Ann. Int’l ACM SIGIR Conf. Research and Development in Information Retrieval (SIGIR ’07), pp. 87-94, 2007.

[18] J.-R Wen, J.-Y Nie, and H.-J Zhang, “Clustering User Queries of a Search Engine,” Proc. Tenth Int’l Conf. World Wide Web (WWW

’01), pp. 162-168, 2001.

[19]

H.-J Zeng, Q.-C He, Z. Chen, W.-Y Ma, and J. Ma, “Learning to Cluster Web Search Results,” Proc. 27th Ann. Int’l ACM SIGIR Conf.