The work presented in this dissertation addresses a rarely investigated topic: the potential of aggregating activity across multiple applications for user interest modeling. While there are theoretical or software frameworks for distributed user modeling, assessments of modeling techniques are almost always reported in terms of single applications. In this work, we present and evaluate a multi-application modeling technique that combines implicit and semi-explicit feedback across multiple everyday applications.
Our system and tool set supports a wide range of potential applications communicating with the user interest server. To affect the contents of the user interest model an application must be augmented to capture some information about content and its usage. The features described are occasionally specific to the applications (e.g. MS Word and PowerPoint, Firefox) but similar features would be available in most content producer and consumer applications involving text. Thus, the overall architecture and approach will generalize across a wide range of software applications. To the best of our knowledge, this is the first software framework designed to share semi-explicit and implicit relevance feedback among applications.
The evaluation of the alternative modeling techniques involved collecting activity data and post-task relevance assessments for a common type of activity: rapidly browsing/reading content and writing a report or presentation based on that content. While other types of information tasks exist, this is a frequent and broad enough category of task to warrant investigation. There is considerable effort involved in creating
94
an interest model server capable of communicating integrating with real-world applications like Word, PowerPoint, Adobe Acrobat and Firefox. While there is always more that can be done, we believe this infrastructure is substantial and at a reasonable point for assessment.
We have evaluated the effectiveness of the recommendation support from both semi-explicit model (authored/annotated text) and unified model (implicit + semi-explicit) and have found that they are successful in allowing users to locate the content easily because the relevant details are selectively highlighted and recommended documents (in VKB Search List) and passages within documents (in Firefox web browser) based on what the user has indicated interest in already and based on subtle changes of user’s indirect interest indicators.
The experimental results show that incorporating implicit feedback in page-level user interest estimation resulted in significant improvements when there is only indirect evidence available for user modeling. Furthermore, incorporating semi-explicit content (e.g. annotated text) with the authored text is effective in identifying segment-level relevant content. Although the study was not designed to test the VKB architecture or basic capabilities, we found that it performed well during the study as a search support interface. We find that the unified model is reasonable in assessing the document value when the semi-explicit (authored/annotated text) data is not available and comparable with semi-explicit only model when both types of feedback are available for inferring user interests. Participants find that the recommendations helped them in locating documents (in VKB Search List) that were more relevant and, once the relevant document is displayed
95
in the Firefox browser, it helped them find paragraphs that held relevant information. Participants also find that both semi-explicit and unified models are reasonable in help them locate content when their interests shift. We find that there is no significant difference between semi-explicit and unified models for supporting interest shifts, and we believe that when the participants are focused on the task-at-hand, they rarely noticed the subtle changes in the VKB Search List. We are interested in investigating how to incorporate interest shift for passage level interactions in the future.
Our results open up many possibilities for using unified feedback in medium and long-term information tasks, especially in the context of personalization of information delivery. Since we have a model that relates the unified feedback to ratings, we can use methods designed for explicit feedback on the unified data. In the future, we plan to study how semi-explicit feedback can be combined with implicit feedback for segment-level assessment and in additional personalized information delivery contexts.
Accurate models of user interest are valuable in personalizing the presentation of the often large quantity of information relevant to a query or other form of information request. Our current software framework helps by capturing user activity across multiple applications and combining this activity data in a user interest model to aid information delivery. In the future, we are interested in extending this user modeling framework based on the non-visible anatomical structure and its characteristics of the human eye.
96 REFERENCES
Abel, F., R. Baumgartner, A. Brooks, C. Enzi, G. Gottlob, N. Henze, M. Herzog, M. Kriesell, W. Nejdl and K. Tomaschewski (2005). "The personal publication reader," International Semantic Web Conference, Springer Berlin Heidelberg, pp.1050-1053. Abel, F., D. Heckmann, E. Herder, J. Hidders, D. Krause, E. Leonardi and K. Van Der Slujis (2009). "A framework for flexible user profile mashups," Workshop on
Adaptation and Personalization for Web 2.0, Trento, Italy, pp.1-11.
Abel, F., N. Henze, D. Krause and D. Plappert (2008). "User Modeling and User Profile Exchange for SemanticWeb Applications." Computer Law & Security Review 448(1): pp.4-9.
Agichtein, E., E. Brill and S. Dumais (2006). "Improving web search ranking by incorporating user behavior information," Proceedings of the ACM SIGIR conference on Research and development in information retrieval, New York, USA, pp.19-26. Assad, M., D. J. Carmichael, J. Kay and B. Kummerfeld (2007). "PersonisAD: Distributed, active, scrutable model framework for context-aware services,"
International Conference on Pervasive Computing, Springer Berlin Heidelberg, pp.55- 72.
Badi, R., S. Bae, J. M. Moore, K. Meintanis, A. Zacchi, H. Hsieh, F. Shipman and C. C. Marshall (2006). "Recognizing user interest and document value from reading and organizing activities in document triage," Proceedings of the 11th international conference on Intelligent user interfaces, Sydney, Australia, pp.218-225.
Bae, S., D. Kim, K. Meintanis, J. M. Moore, A. Zacchi, F. Shipman, H. Hsieh and C. C. Marshall (2010). "Supporting document triage via annotation-based multi-application visualizations," Proceedings of the 10th annual joint conference on Digital libraries, Surfer's Paradise, Australia, pp.177-186.
Barla, M. (2011). "Towards social-based user modeling and personalization."
Information Sciences and Technologies Bulletin of the ACM Slovakia 3(1): pp.52-60. Bennani, N., M. Chevalier, E. Egyed-Zsigmond, G. Hubert and M. Viviani (2012). "Multi-application profile updates propagation: a semantic layer to improve mapping between applications," Proceedings of the 21st international conference companion on World Wide Web, Perth, Australia, pp.949-958.
Bennett, P., R. White, W. Chu, S. Dumais, P. Bailey, F. Borisyuk and X. Cui (2012). "Modeling the impact of short-and long-term behavior on search personalization," Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, Portland, USA, pp.185-194.
97
Berkovsky, S., T. Kuflik and F. Ricci (2008). "Mediation of user models for enhanced personalization in recommender systems." User Modeling and User-Adapted Interaction 18(3): pp.245-286.
Bishop, C. (2007). "Pattern Recognition and Machine Learning," Springer, New York. Blei, D., A. Ng and M. Jordan (2003). "Latent dirichlet allocation." Machine Learning Research 3(1): pp.993-1022.
Brusilovsky, P., S. Sosnovsky and O. Shcherbinina (2005). "User modeling in a distributed e-learning architecture." User Modeling 2005. A. Liliana, Springer Berlin Heidelberg. 3538: pp.387-391.
Buscher, G., A. Dengel and L. Van Elst (2008). "Query expansion using gaze-based feedback on the subdocument level," Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, pp.387-394.
Buscher, G., L. Van Elst and A. Dengel (2009). "Segment-level display time as implicit feedback: a comparison to eye tracking," Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Boston, USA, pp.67-74.
Carmagnola, F., F. Cena and C. Gena (2011). "User model interoperability: a survey." User Modeling and User-Adapted Interaction 21(3): pp.285-331.
Carpineto, C. and G. Romano (2012). "A survey of automatic query expansion in information retrieval." ACM Computing Surveys 44(1): pp.1-50.
Casquero, O., J. Portillo, R. Ovelar, J. Romo and M. Benito (2008). "iGoogle and gadgets as a platform for integrating institutional and external services," Proceedings of 1st Workshop of Mash-Up Personal Learning Environments, pp.37-41.
Chen, L. and K. Sycara (1998). "WebMate: a personal agent for browsing and
searching," Proceedings of the second international conference on Autonomous agents, Minneapolis, USA, pp.132-139.
Chen, M., J. Anderson and M. Sohn (2001). "What can a mouse cursor tell us more?: correlation of eye/mouse movements on web browsing," Extended abstracts on Human factors in computing systems, Seattle, USA, pp.281-282.
Claypool, M., P. Le, M. Wased and D. Brown (2001). "Implicit interest indicators," Proceedings of the 6th international conference on Intelligent user interfaces, Santa Fe, USA, pp.33-40.
98
Cooke, L. (2006). "Is the Mouse a" Poor Man's Eye Tracker"?," Annual Conference- Society for Technical Communication, Las Vegas, USA, pp.252-255.
Dim, E. and T. Kuflik (2012). "User models sharing and reusability: a component-based approach," In Proceedigns of User Modeling and Personalization Workshops, Montreal, Canada,
Dumais, S. (2005). "Latent semantic analysis." Annual Review of Information Science and Technology 38(1): pp.188-230.
Fox, S., K. Karnawat, M. Mydland, S. Dumais and T. White (2005). "Evaluating implicit measures to improve web search." ACM Transactions on Information Systems 23(2): pp.147-168.
Gauch, S., M. Speretta, A. Chandramouli and A. Micarelli (2007). "User profiles for personalized information access." The Adaptive Web, Springer Berlin Heidelberg: pp.54-89.
Gerard, S. (1971). "The SMART retrieval system: Experiments in automatic document processing." IEEE Transactions on Professional Communication 15(1): pp.17-18. Germanakos, P., N. Tsianos, Z. Lekkas, C. Mourlas and G. Samaras (2008). "Capturing essential intrinsic user behaviour values for the design of comprehensive web-based personalized environments." Computers in Human Behavior 24(4): pp.1434-1451. Gonzalez, E. and Y. Zhang (2005). "Accelerating the Lee-Seung algorithm for non- negative matrix factorization." Houston, USA, Rice University: pp.1-13.
Grudin, J. (1994). "Groupware and social dynamics: eight challenges for developers." Communications of the ACM 37(1): pp.92-105.
Guo, Q. and E. Agichtein (2008). "Exploring mouse movements for inferring query intent," Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, pp.707-708.
Guo, Q. and E. Agichtein (2012). "Beyond dwell time: estimating document relevance from cursor movements and other post-click searcher behavior," Proceedings of the 21st international conference on World Wide Web. Lyon, France: pp.569-578.
Haiqin, Z., M. Zheng and C. Qingsheng (2003). "A study for documents summarization based on personal annotation," Proceedings of the Text summarization workshop. Stroudsburg, USA: pp.41-48.
99
Hall, D., D. Jurafsky and C. Manning (2008). "Studying the history of ideas using topic models," Proceedings of the conference on empirical methods in natural language processing, Honolulu, USA, Association for Computational Linguistics, pp.363-371. Harper, D. J. and D. Kelly (2006). "Contextual relevance feedback," Proceedings of the 1st International Conference on Information Interaction in Context, Copenhagen,
Denmark, pp.129-137.
Harvey, M., F. Crestani and M. Carman (2013). "Building user profiles from topic models for personalised search," Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, Burlingame, USA, pp.2309-2314.
Hijikata, Y. (2004). "Implicit user profiling for on demand relevance feedback," Proceedings of the 9th international conference on Intelligent user interfaces, Madeira, Portugal, pp.198-205.
Houben, D. K., E. Leonardi and K. Van Der Slujis (2009). "A Framework for Flexible User Profile Mashups," Adaptation and Personalization for Web 2.0, Trento, Italy, pp.1- 10.
James, P., S. Hinrich , C. Todd , C. Rob , T. Don , E. Andy , A. Eytan and B. Thomas (2002). "Personalized search." Communications of the ACM 45(9): pp.50-55.
Jansen, B., A. Spink and T. Saracevic (2000). "Real life, real users, and real needs: a study and analysis of user queries on the web." Information processing and management 36(2): pp.207-227.
Jawaheer, G., M. Szomszor and P. Kostkova (2010). "Comparison of implicit and explicit feedback from an online music recommendation service," proceedings of the 1st international workshop on information heterogeneity and fusion in recommender
systems, Barcelona, Spain, pp.47-51.
Jayarathna, S., A. Patra and F. Shipman (2015). "Unified Relevance Feedback for Multi- Application User Interest Modeling," Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries, Knoxville, USA, pp.129-138.
Joachims, T., L. Granka, B. Pan, H. Hembrooke, F. Radlinski and G. Gay (2007). "Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search." ACM Transactions on Information Systems 25(2): pp.1-27.
Jones, R. and K. L. Klinkner (2008). "Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs," Proceedings of the 17th ACM conference on Information and knowledge management, Napa Valley, USA, pp.699-708.
100
Kantor, P. B., E. Boros, B. Melamed, V. Meñkov, B. Shapira and D. J. Neu (2000). "Capturing human intelligence in the net." Communications of the ACM 43(8): pp.112- 115.
Kelly, D. (2009). "Methods for evaluating interactive information retrieval systems with users." Foundations and Trends in Information Retrieval 3(1): pp.1-224.
Kelly, D. and N. J. Belkin (2001). "Reading time, scrolling and interaction: exploring implicit sources of user preferences for relevance feedback," Proceedings of the 24th annual international ACM SIGIR conference on Research and development in
information retrieval, New Orleans, USA, pp.408-409.
Kelly, D. and N. J. Belkin (2002). "A user modeling system for personalized interaction and tailored retrieval in interactive IR." Proceedings of the American Society for Information Science and Technology 39(1): pp.316-325.
Kelly, D. and N. J. Belkin (2004). "Display time as implicit feedback: understanding task effects," Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, Sheffield, UK, ACM, pp.377-384. Kelly, D. and J. Teevan (2003). "Implicit feedback for inferring user preference: a bibliography," ACM SIGIR Forum, New York, USA, pp.18-28.
Kim, J. Y., K. Collins-Thompson, P. N. Bennett and S. T. Dumais (2012).
"Characterizing web content, user interests, and search behavior by reading level and topic," Proceedings of the fifth ACM international conference on Web search and data mining, Seattle, USA, pp.213-222.
Kobsa, A. (2007). "Generic user modeling systems." The adaptive web, Springer Berlin Heidelberg: pp.136-154.
Kobsa, A. and J. Fink (2006). "An LDAP-based user modeling server and its evaluation." User Modeling and User-Adapted Interaction 16(2): pp.129-169. Kong, W., E. Aktolga and J. Allan (2013). "Improving passage ranking with user behavior information," Proceedings of the 22nd ACM international conference on Conference on information & knowledge management, Burlingame, USA, pp.1999- 2008.
Kríž, J. (2012). "Keyword extraction based on implicit feedback." Information Sciences and Technologies Bulletin of the ACM Slovakia 4(2): pp.43-46.
Krulwich, B. and C. Burkey (1997). "The InfoFinder agent: Learning user interests through heuristic phrase extraction." IEEE Expert: Intelligent Systems and Their Applications 12(5): pp.22-27.
101
Lee, D. D. and H. S. Seung (1999). "Learning the parts of objects by non-negative matrix factorization." Nature 401(6755): pp.788-791.
Leonardi, E., F. Abel, D. Heckmann, E. Herder, J. Hidders and G.-J. Houben (2010). "A flexible rule-based method for interlinking, integrating, and enriching user data," 10th International Conference on Web Engineering, Vienna Austria, Springer Berlin
Heidelberg, pp.322-336.
Limbu, D. K., A. Connor, R. Pears and S. MacDonell (2006). "Contextual relevance feedback in web information retrieval," Proceedings of the 1st International Conference on Information interaction in Context, København, Denmark, pp.138-143.
Liu, J. and N. J. Belkin (2010). "Personalizing information retrieval for multi-session tasks: The roles of task stage and task type," Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland, pp.26-33.
Liu, N. N., E. W. Xiang, M. Zhao and Q. Yang (2010). "Unifying explicit and implicit feedback for collaborative filtering," Proceedings of the 19th ACM international conference on Information and knowledge management, Toronto, Canada, pp.1445- 1448.
Lu, Z., D. Agarwal and I. S. Dhillon (2009). "A spatio-temporal approach to
collaborative filtering," Proceedings of the third ACM conference on Recommender systems, New York, USA, pp.13-20.
Majumder, A. and N. Shrivastava (2013). "Know your personalization: learning topic level personalization in online services," Proceedings of the 22nd international conference on World Wide Web, Raleigh, USA, pp.873-884.
Marchiori, E. (2013). "Class dependent feature weighting and k-nearest neighbor classification." Pattern Recognition in Bioinformatics, Springer Berlin Heidelberg. 7986: pp.69-78.
Marios, B., P. Efi, G. Panagiotis and S. George (2013). "Modeling users on the World Wide Web based on cognitive factors, navigation behavior and clustering techniques." Journal of Systems and Software 86(12): pp.2995-3012.
Marshall, C. C. and F. M. Shipman (1995). "Spatial hypertext: designing for change." Communications of the ACM 38(8): pp.88-97.
Martinez-Villaseñor, M. d. L., M. Gonzalez-Mendoza and N. Hernandez-Gress (2012). "Towards a ubiquitous user model for profile sharing and reuse." Sensors 12(10): pp.13249-13283.
102
McCallum, A. K. (2002). "Mallet: A machine learning for language toolkit." from mallet.cs.umass.edu.
Mehrotra, R. (2015). "Topics, Tasks & Beyond: Learning Representations for
Personalization," Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, Shanghai, China, pp.459-464.
Micarelli, A., F. Gasparetti, F. Sciarrone and S. Gauch (2007). "Personalized search on the world wide web." The adaptive web, Springer Berlin Heidelberg. 4321: pp.195-230. Morita, M. and Y. Shinoda (1994). "Information filtering based on user behavior
analysis and best match text retrieval," Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, Dublin, Ireland, pp.272-281.
Moshfeghi, Y., L. R. Pinto, F. E. Pollick and J. M. Jose (2013). "Understanding relevance: An fMRI study." Advances in Information Retrieval, Springer Berlin Heidelberg. 7814: pp.14-25.
Mueller, F. and A. Lockerd (2001). "Cheese: tracking mouse movement activity on websites, a tool for user modeling," Extended abstracts on Human factors in computing systems, Seattle, USA, pp.279-280.
Navrat, P. (2012). "Cognitive traveling in digital space: from keyword search through exploratory information seeking." Open Computer Science 2(3): pp.170-182.
Nichols, D. (1998). "Implicit Rating and Filtering," Proceedings of the Fifth DELOS Workshop on Filtering and Collaborative Filtering, Budapest, pp.31-36.
Oard, D. W. and J. Kim (2001). "Modeling information content using observable behavior," Proceedings of the 64th Annual Conference of the American Society for Information Science and Technology, Washington, USA, pp.481-488.
Paliouras, G., M. Alexandros, C. Ntoutsis, A. Alexopoulos and C. Skourlas (2006). "PNS: personalized multi-source news delivery." Knowledge-Based Intelligent
Information and Engineering Systems. Bournemouth, UK, Springer Berlin Heidelberg. 4252: pp.1152-1161.
Pasi, G. (2010). "Issues in Personalizing Information Retrieval." IEEE Intelligent Informatics Bulletin 11(1): pp.3-7.
Pasi, G. (2014). "Implicit Feedback through User-system Interactions for Defining User Models in Personalized Search." Procedia Computer Science 39(1): pp.8-11.
103
Porteous, I., D. Newman, A. Ihler, A. Asuncion, P. Smyth and M. Welling (2008). "Fast collapsed gibbs sampling for latent dirichlet allocation," Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, Las Vegas, USA, pp.569-577.
Price, M., B. Schilit and G. Golovchinsky (1998). "XLibris: The active reading machine," CHI 98 Cconference Summary on Human Factors in Computing Systems, Los Angeles, USA, pp.22-23.
Rafter, R. and B. Smyth (2001). "Passive profiling from server logs in an online
recruitment environment," Workshop on Intelligent Techniques for Web Personalization at the the 17th International Joint Conference on Artificial Intelligence, Seattle, USA, pp.1-8.
Raman, K., P. N. Bennett and K. Collins-Thompson (2013). "Toward whole-session