Course Design Document
IS414: Search Engine Technologies
Version 2.7
6 June 2011
Table
of
Contents
1. Revision History ... 3
2. Overview of the Search Engine Technologies Course ... 4
3. Output and Grading Summary ... 5
4. Group Allocation for Assignments ... 6
5. Classroom Planning ... 6
6. List of Information Resources and References ... 12
7. Software Tools ... 13
1. Revision History
Version
Description of Changes
Author
Date
V1.0 Pang Hwee Hwa 19-08-2006
V1.01 Update the lesson plan Pang Hwee Hwa 10-12-2006
V1.02 Add an investigation assignment, remove mid-term test, change class schedule
Pang Hwee Hwa 18-12-2006 V2.0 Change course title, replace investigation
assignment with lab exercises and report, introduce case studies & best practices
Pang Hwee Hwa 04-05-2007 V2.01 Adjust individual versus group assessment weight Pang Hwee Hwa 12-07-2007
V2.02 Update the learning outcomes Pang Hwee Hwa 16-08-2007
V2.03 Base the application scenario of the search engine comparison on the project
Pang Hwee Hwa 10-10-2007
V2.1 Adjust class schedule Pang Hwee Hwa 13-05-2008
V2.2 Revise curriculum Pang Hwee Hwa 6-8-2008
V2.3 Adjust assessment weight Pang Hwee Hwa 21-11-2008
V2.4 Revise curriculum Pang Hwee Hwa 20-05-2009
V2.5 Add lesson on search engine marketing and Lucene labs, remove presentation of investigation findings
Pang Hwee Hwa 31-07-2009
V2.6 Revise learning outcome table Pang Hwee Hwa 9-6-2010
2. Overview of the Search Engine Technologies Course
2.1 SynopsisEnormous amount of information is stored in free or unstructured text in personal, corporate and public databases. Even in enterprises that generate large quantities of numeric transactional data, unstructured text has been estimated to constitute more than 80% of the data. The textual data include emails, news articles, reports, product brochures and, of course, the ubiquitous web. Information technology is needed to facilitate the retrieval and analysis of these text collections, in order to support timely and informed business decisions.
This course will study how search engines crawl the web, and how they retrieve relevant documents from a text archive to satisfy a user query. We will also introduce classification and clustering techniques for automatically grouping documents by content, to improve the
understandability of search results. We will learn how to promote the visibility of a Web site to the target searchers. In addition, we will examine the deployment possibilities of various search engine technologies. Through the course, students will acquire proficiency in both the technical concepts and applications of search technology.
2.2 Prerequisites
• Able to formulate HTML pages
• IS201 Object Oriented Application Development (Java programming)
• Able to write simple JSP and Java applications
2.3 Objectives
Through this course, students will:
• Study basic text retrieval and mining techniques for unstructured text documents.
• Acquire hands-on experience with an open-source, Java-based search engine, so students will be able to build text search function into their future projects.
• Learn how to promote the visibility of a Web site to the target searchers.
• Gain insights into the deployment possibilities of various search engine technologies.
• Design and promote their own cool search-enabled applications.
• Learn how to search the Internet effectively, and be discerning about quality of information gathered.
3. Output and Grading Summary
Week Date Output Assessments Individual
Weighting Group Weighting 1 2 3 4
5 Web-based search engine 5%
6 Quiz 10%
7 White paper on search product 5%
8
9
10 Product launch 10%
11 12
13 Final project submission 25%
14
15 Exam 30%
Contribution to class learning 10% Review of another team’s product 5%
Total 55% 45%
3.1 Participation (15%)
• Contribution to class learning: 10%
• Review report on assigned team’s product/service: 5%
3.2 Assignments (5%) • Web-based search engine: 5%
Social Network
Text
Retrieval
Techniques
Text
Mining
Techniques
Text Processing Applications
Social NetworkText
Retrieval
Techniques
Text
Mining
Techniques
Text Search Applications
3.3 Project (40%)
The project is intended to complement the class materials, by getting students to investigate selected topics in greater depth or breadth.
• White paper on search product/service: 5% (up to 5 pages)
• Product launch: 10%
• Final submission: 25%
o Revised white paper (5%)
o Web site design, landing page optimization (15%) o Keyword selection for ad campaign (5%)
3.4 Exam (40%) • Mid-term quiz: 10%
• Final exam: 30%.
• Students are allowed to bring one sheet of notes for the final exam.
4. Group Allocation for Assignments
Students will form teams of 5 or 6 members for the following assignments:
• Lab assignment
• Project
5. Classroom Planning
There is a 3-hour class in each week. Each class is split into two sessions of 1.5 hours. In general, one of the sessions is used for lectures, while the other session is for labs and discussions.
However, there may be variations from week to week as appropriate.
5.1 Course Schedule Summary
Week Topic Slides
Readings
Composition:
• Text Search & Retrieval
• Text Mining • Text Applications
• Class Administration
Document Retrieval Process, Boolean Model C1: Chapter 1, Sections 2.1-2.5.2
2 Vector Space Model Basic Vectors & Matrices C1: Section 2.5.3
C2: Sections 27.1-27.2
3 Web Search Engines
4 Web Link Analysis C2: Section 27.4
Project Initiation C5: Pages 172-175
C10
5 Clustering Search Engines C2: Section 26.5
C3: Sections 8.1-8.4
6 Quiz
Collaborative Filtering & Rule Mining C2: Section 26.3
7 Faceted Search C6, C7
Project Discussion 8 Break
9 Search Engine Optimization C9
10 Search Engine Marketing
11 Product launch: Teams #1 to #3 12 Product launch: Teams #4 to #6 13 Product launch: Teams #7 to #8
Selected Topics in Search & Retrieval
Course Review 14 Study Week 15 Exam (29 November 2011, 5pm to 7pm) 5.2 Weekly Plan Week: 1 Session 1:
• Introduction to the course
• Instructor • Course objectives • Topics to be covered • Expectations • Grading • Project
• Exploration on the search task
Session 2:
• Lecture: Text Retrieval Process & Boolean Model
Reading:
• C1: Chapter 1, Sections 2.1-2.5.2
Project:
Week: 2
Session 1:
• Lecture: Vector Space Model – Part I
Session 2:
• Lecture: Vector Space Model – Part II
• Lucene Lab #1: Getting Started With Lucene
Assignment:
• Homework: Vector Space Model
Reading:
• Brief Notes on Vectors and Matrices
• C1: Section 2.5.3 • C2: Sections 27.1-27.2 Project: • Team Formation Week: 3 Session 1:
• Lecture: Web Search Engines – Part I
Session 2:
• Lecture: Web Search Engines – Part II
• Lucene Lab #2: Lucene Query APIs and Similarity Queries
Assignment: Reading: Project:
Week: 4
Session 1:
Session 2:
• Project Scenarios
• Lucene Lab #3: Supporting Common Document Formats
Assignment: • Install WEKA Reading: • C2: Section 27.4 • C5: Pages 172-175 • C10 Project: • Project Initiation Week: 5 Session 1:
• Lecture: Clustering Search Engines – Part I
Session 2:
• Lecture: Clustering Search Engines – Part II
• Lab #4: Clustering with WEKA
• Discussion
Assignment:
• Due: Lucene Programming Assignment
Reading: • C2: Section 26.5 • C3: Sections 8.1-8.4 Project: Week: 6 Session 1: • Quiz
• Lecture: Collaborative Filtering & Rule Mining – Part I
Session 2:
• Lecture: Collaborative Filtering & Rule Mining – Part II
Reading:
• C2: Section 26.3
• C8: FOCI - Flexible Organizer for Competitive Intelligence
Project:
Week: 7
Session 1:
• Lecture: Faceted Search
Session 2:
• Lab #5: Mining with WEKA
• Project Discussion Assignment: Reading: • C6 • C7 Project:
• Due: White paper on search product/service
Week 8:Recess
Week: 9
Session 1:
• Lecture: Search Engine Optimization
Session 2:
• Practice: Search Engine Optimization
Assignment: Reading: • C9 Project: Week: 10 Session 1:
• Lecture: Search Engine Marketing (Part I)
Session 2:
• Lecture: Search Engine Marketing (Part II)
Assignment: Reading: Project:
Week: 11
Session 1:
• Product Launch: Team #1 and Team #2
Session 2:
• Product Launch: Team #3
Assignment:
• Review Another Team’s Product/Service
Reading: Project:
• “Public launch” of the search product/service
Week: 12
Session 1:
• Product Launch: Team #4 and Team #5
Session 2:
• Product Launch: Team #6
Assignment:
• Review Another Team’s Product/Service
Reading: Project:
• “Public launch” of the search product/service
Week: 13
Session 1:
• Product Launch: Team #7 and Team #8
Session 2:
• Lecture: Selected Topics in Search & Retrieval
• Course Review Assignment: • Student Feedback • Peer Assessment Reading: Project:
• “Public launch” of the search product/service
• Due: Final Submission
Week 14: Study Week
Week 15: Final Exam
6. List of Information Resources and References
6.1 Reference BooksC1.Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro Neto, Addison Wesley, 1999.
C2.Database Management Systems (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke, McGraw-Hill, 2003.
C3.Data Mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber. Morgan Kaufmann. 2000.
C4.The Text Mining Handbook – Advanced Approaches in Analyzing Unstructured Data, by Ronen Feldman and James Sanger, Cambridge University Press, 2007.
C5.The Search – How Google and Its Rivals Rewrote the Rules of Business and
Transformed Our Culture, by John Battelle, Penguin Group, 2006. (Course Reserves - HD9696.8.U64 BAT 2005)
6.2 Reference Papers
C6.Ka-Ping Yee, Kirsten Swearingen, Kevin Li, Marti Hearst, “Faceted Metadata for Image Search and Browsing”, ACM CHI, 2003, 401-408.
C7.Marti Hearst, “Clustering versus Faceted Categories for Information Exploration”, Communications of the ACM, 49(4), 2006, 59-61.
http://flamenco.berkeley.edu/papers/cacm06.pdf
C8.Hwee-Leng Ong, Ah-Hwee Tan, Jamie Ng, Hong Pan, Qiu-Xiang Li, “FOCI: Flexible Organizer for Competitive Intelligence”, CIKM 2001:523-525.
http://citeseer.ist.psu.edu/568101.html
C9.“Search Engine Optimization – What’s in it for you?” SEO Solutions.
http://whitepapers.techrepublic.com.com/abstract.aspx?docid=382376
C10. Gordon Bell and Jim Gemmell, “A Digital Life”, Scientific American Magazine, 18 February 2007.
http://www.scientificamerican.com/article.cfm?id=a-digital-life
6.3 Best Practices and Case Studies
S/No Title
Author
1 Open Source Search: Elixir or Poison (pdf) George Everitt
2 7 Advanced Steps to Effective Search Marketing (pdf) Omniture 3 Analytics: Measuring the Impact of Search (pdf)
Your Users Are Talking To You (pdf)
Ian Davies Avi Rappoport 4 The Death of Modern (Information) Architecture (pdf) Paul Sonderegger
5 Searching for a Reason (pdf) Andy Feit
6 Nine Ways to Fix Intranet Search (pdf) James Robertson
7 Restoring Browse in the Enterprise (pdf) David Feldman
8 The Faceted Navigation and Search Revolution (pdf) Steve Papa
9 Taxonomies, Metadata, and Search (pdf) Seth Earley
10 Search: The Quiet Revolution (pdf) Susan Feldman
11 Principles of Effective Search (pdf) James Robertson
12 Search Finds Usability (pdf)
Searching for Search Usability (pdf)
Carl Frappaolo Martin White
13 The “Other” Search (pdf) Steve Kusmer
14 Social Work: Adding Social Network Analysis to Search (pdf) Bill Ives
15 Best of Both Worlds (pdf) Jean Graef
16 Providing Knowledge for Healthcare Professionals (pdf) Phillip Britt 17 Adaptive Search and Resolution for Service and Support (pdf) Mark Angel
18 SharePoint Search: An Enterprise Contender? (pdf) Jean Graef
7. Software Tools
• Search Engines: o Lucene: http://lucene.apache.org/java/docs/ o Indri: http://www.lemurproject.org/indri/ o Windows Search: http://www.microsoft.com/windows/products/winfamily/desktopsearch/choose/w indowssearch4.mspxo Google Desktop: http://desktop.google.com
o Google Advanced Search: http://www.google.com.sg/advanced_search?hl=en • Mining:
o WEKA: http://www.cs.waikato.ac.nz/ml/weka/
o Text Analyst: http://www.megaputer.com/textanalyst.php3
8. Learning Outcomes, Achievement Methods and
Assessment
IS414 – Search Engine
Technologies
Course-specific core competencies which address the Outcomes to Assess Outcomes Faculty Methods1 Integration of business & technology in a sector context
1.1 Business IT value linkage skills
1.2 Cost and benefits analysis skills
1.3 Business software solution impact analysis skills
2 IT architecture, design and development skills
2.1 System requirements specification skills
2.2 Software and IT architecture analysis and design skills YY
• Identify the functionalities in a Web search engine architecture.
• Explain and differentiate between Google architecture and FAST Search architecture. • Apply the design principles for faceted search.
Grade quiz. Grade exam.
Graded project design.
2.3 Implementation skills Y • Set up a Web-based search engine by implementing a JSP page on Apache Tomcat that invokes Lucene.
Grade lab exercises.
2.4 Technology application skills YY
• Explain the general model for text retrieval. • Apply techniques for achieving exact matching
versus similarity-matching, through the Boolean model and the Vector space model.
• Understand the impact of synonymy and polysemy on precision-recall performance, and employ appropriate techniques to compensate for them.
• Explain the techniques for query refinement. • Explain the Google PageRank and Kleinberg’s
authority-and-hub model for web link analysis • Apply the concepts of clustering in search
engines.
• Apply the concepts of classification in search engines.
• Apply collaborative filtering and association rule mining in text recommendation.
• Apply search engine concepts and best practices in search engine optimization and marketing.
• Propose a search-enabled application on an existing platform, or that extends an existing application.
Grade quiz. Grade final exam. Grade project submission.
3 Project management skills
3.2 Risks management skills
3.3 Project integration and time management skills
3.4 Configuration management skills
3.5 Quality management skills
4 Learning to learn skills
4.1 Search skills
Y • Make use of the advanced features of common search engines.
Grade quiz. Grade final exam.
4.2 Skills for developing a methodology for learning Y • Investigate an existing search platform (like Google Earth) or search application. Grade investigation report.
5 Collaboration (or team) skills:
5.1 Skills to improve the effectiveness of group processes and work products
6 Change management skills for enterprise systems
6.1 Skills to diagnose business changes
6.2 Skills to implement and sustain business changes
7 Skills for working across countries, cultures and borders
7.1 Cross-national awareness skills
7.2 Business across countries facilitation skills
8 Communication skills
8.1 Presentation skills Y • Give a presentation on the proposed product concept and design. Grade project presentation.
8.2 Writing skills Y
• Write an investigation report. • Write a product proposal.
Grade investigation report. Grade project
submissions.
Y This sub-skill is covered partially by the course YY This sub-skill is a main focus for this course