• No results found

Course Design Document. IS414: Search Engine Technologies

N/A
N/A
Protected

Academic year: 2021

Share "Course Design Document. IS414: Search Engine Technologies"

Copied!
15
0
0

Loading.... (view fulltext now)

Full text

(1)

Course Design Document

IS414: Search Engine Technologies

Version 2.7

6 June 2011

(2)

Table

 

of

 

Contents

 

1. Revision History ... 3

 

2. Overview of the Search Engine Technologies Course ... 4

 

3. Output and Grading Summary ... 5

 

4. Group Allocation for Assignments ... 6

 

5. Classroom Planning ... 6

 

6. List of Information Resources and References ... 12

 

7. Software Tools ... 13

 
(3)

1. Revision History

Version

Description of Changes

Author

Date

V1.0 Pang Hwee Hwa 19-08-2006

V1.01 Update the lesson plan Pang Hwee Hwa 10-12-2006

V1.02 Add an investigation assignment, remove mid-term test, change class schedule

Pang Hwee Hwa 18-12-2006 V2.0 Change course title, replace investigation

assignment with lab exercises and report, introduce case studies & best practices

Pang Hwee Hwa 04-05-2007 V2.01 Adjust individual versus group assessment weight Pang Hwee Hwa 12-07-2007

V2.02 Update the learning outcomes Pang Hwee Hwa 16-08-2007

V2.03 Base the application scenario of the search engine comparison on the project

Pang Hwee Hwa 10-10-2007

V2.1 Adjust class schedule Pang Hwee Hwa 13-05-2008

V2.2 Revise curriculum Pang Hwee Hwa 6-8-2008

V2.3 Adjust assessment weight Pang Hwee Hwa 21-11-2008

V2.4 Revise curriculum Pang Hwee Hwa 20-05-2009

V2.5 Add lesson on search engine marketing and Lucene labs, remove presentation of investigation findings

Pang Hwee Hwa 31-07-2009

V2.6 Revise learning outcome table Pang Hwee Hwa 9-6-2010

(4)

2. Overview of the Search Engine Technologies Course

2.1 Synopsis

Enormous amount of information is stored in free or unstructured text in personal, corporate and public databases. Even in enterprises that generate large quantities of numeric transactional data, unstructured text has been estimated to constitute more than 80% of the data. The textual data include emails, news articles, reports, product brochures and, of course, the ubiquitous web. Information technology is needed to facilitate the retrieval and analysis of these text collections, in order to support timely and informed business decisions.

This course will study how search engines crawl the web, and how they retrieve relevant documents from a text archive to satisfy a user query. We will also introduce classification and clustering techniques for automatically grouping documents by content, to improve the

understandability of search results. We will learn how to promote the visibility of a Web site to the target searchers. In addition, we will examine the deployment possibilities of various search engine technologies. Through the course, students will acquire proficiency in both the technical concepts and applications of search technology.

2.2 Prerequisites

• Able to formulate HTML pages

• IS201 Object Oriented Application Development (Java programming)

• Able to write simple JSP and Java applications

2.3 Objectives

Through this course, students will:

• Study basic text retrieval and mining techniques for unstructured text documents.

• Acquire hands-on experience with an open-source, Java-based search engine, so students will be able to build text search function into their future projects.

• Learn how to promote the visibility of a Web site to the target searchers.

• Gain insights into the deployment possibilities of various search engine technologies.

• Design and promote their own cool search-enabled applications.

• Learn how to search the Internet effectively, and be discerning about quality of information gathered.

(5)

3. Output and Grading Summary

Week Date Output Assessments Individual

Weighting Group Weighting 1 2 3 4

5 Web-based search engine 5%

6 Quiz 10%

7 White paper on search product 5%

8

9

10 Product launch 10%

11 12

13 Final project submission 25%

14

15 Exam 30%

Contribution to class learning 10% Review of another team’s product 5%

Total 55% 45%

3.1 Participation (15%)

• Contribution to class learning: 10%

• Review report on assigned team’s product/service: 5%

3.2 Assignments (5%) • Web-based search engine: 5%

Social Network

Text

Retrieval

Techniques

Text

Mining

Techniques

Text Processing Applications

Social Network

Text

Retrieval

Techniques

Text

Mining

Techniques

Text Search Applications

(6)

3.3 Project (40%)

The project is intended to complement the class materials, by getting students to investigate selected topics in greater depth or breadth.

• White paper on search product/service: 5% (up to 5 pages)

• Product launch: 10%

• Final submission: 25%

o Revised white paper (5%)

o Web site design, landing page optimization (15%) o Keyword selection for ad campaign (5%)

3.4 Exam (40%) • Mid-term quiz: 10%

• Final exam: 30%.

• Students are allowed to bring one sheet of notes for the final exam.

4. Group Allocation for Assignments

Students will form teams of 5 or 6 members for the following assignments:

• Lab assignment

• Project

5. Classroom Planning

There is a 3-hour class in each week. Each class is split into two sessions of 1.5 hours. In general, one of the sessions is used for lectures, while the other session is for labs and discussions.

However, there may be variations from week to week as appropriate.

5.1 Course Schedule Summary

Week Topic Slides

Readings

Composition:

Text Search & Retrieval

Text Mining Text Applications

• Class Administration

(7)

Document Retrieval Process, Boolean Model C1: Chapter 1, Sections 2.1-2.5.2

2 Vector Space Model Basic Vectors & Matrices C1: Section 2.5.3

C2: Sections 27.1-27.2

3 Web Search Engines

4 Web Link Analysis C2: Section 27.4

Project Initiation C5: Pages 172-175

C10

5 Clustering Search Engines C2: Section 26.5

C3: Sections 8.1-8.4

6 Quiz

Collaborative Filtering & Rule Mining C2: Section 26.3

7 Faceted Search C6, C7

Project Discussion 8 Break

9 Search Engine Optimization C9

10 Search Engine Marketing

11 Product launch: Teams #1 to #3 12 Product launch: Teams #4 to #6 13 Product launch: Teams #7 to #8

Selected Topics in Search & Retrieval

Course Review 14 Study Week 15 Exam (29 November 2011, 5pm to 7pm) 5.2 Weekly Plan Week: 1 Session 1:

• Introduction to the course

• Instructor • Course objectives • Topics to be covered • Expectations • Grading • Project

• Exploration on the search task

Session 2:

• Lecture: Text Retrieval Process & Boolean Model

(8)

Reading:

• C1: Chapter 1, Sections 2.1-2.5.2

Project:

Week: 2

Session 1:

• Lecture: Vector Space Model – Part I

Session 2:

• Lecture: Vector Space Model – Part II

• Lucene Lab #1: Getting Started With Lucene

Assignment:

• Homework: Vector Space Model

Reading:

• Brief Notes on Vectors and Matrices

• C1: Section 2.5.3 • C2: Sections 27.1-27.2 Project: • Team Formation Week: 3 Session 1:

• Lecture: Web Search Engines – Part I

Session 2:

• Lecture: Web Search Engines – Part II

• Lucene Lab #2: Lucene Query APIs and Similarity Queries

Assignment: Reading: Project:

Week: 4

Session 1:

(9)

Session 2:

• Project Scenarios

• Lucene Lab #3: Supporting Common Document Formats

Assignment: • Install WEKA Reading: • C2: Section 27.4 • C5: Pages 172-175 • C10 Project: • Project Initiation Week: 5 Session 1:

• Lecture: Clustering Search Engines – Part I

Session 2:

• Lecture: Clustering Search Engines – Part II

• Lab #4: Clustering with WEKA

• Discussion

Assignment:

• Due: Lucene Programming Assignment

Reading: • C2: Section 26.5 • C3: Sections 8.1-8.4 Project: Week: 6 Session 1: • Quiz

• Lecture: Collaborative Filtering & Rule Mining – Part I

Session 2:

• Lecture: Collaborative Filtering & Rule Mining – Part II

(10)

Reading:

• C2: Section 26.3

• C8: FOCI - Flexible Organizer for Competitive Intelligence

Project:

Week: 7

Session 1:

• Lecture: Faceted Search

Session 2:

• Lab #5: Mining with WEKA

• Project Discussion Assignment: Reading: • C6 • C7 Project:

• Due: White paper on search product/service

Week 8:Recess

Week: 9

Session 1:

• Lecture: Search Engine Optimization

Session 2:

• Practice: Search Engine Optimization

Assignment: Reading: • C9 Project: Week: 10 Session 1:

(11)

• Lecture: Search Engine Marketing (Part I)

Session 2:

• Lecture: Search Engine Marketing (Part II)

Assignment: Reading: Project:

Week: 11

Session 1:

• Product Launch: Team #1 and Team #2

Session 2:

• Product Launch: Team #3

Assignment:

• Review Another Team’s Product/Service

Reading: Project:

• “Public launch” of the search product/service

Week: 12

Session 1:

• Product Launch: Team #4 and Team #5

Session 2:

• Product Launch: Team #6

Assignment:

• Review Another Team’s Product/Service

Reading: Project:

• “Public launch” of the search product/service

(12)

Week: 13

Session 1:

• Product Launch: Team #7 and Team #8

Session 2:

• Lecture: Selected Topics in Search & Retrieval

• Course Review Assignment: • Student Feedback • Peer Assessment Reading: Project:

• “Public launch” of the search product/service

• Due: Final Submission

Week 14: Study Week

Week 15: Final Exam

6. List of Information Resources and References

6.1 Reference Books

C1.Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro Neto, Addison Wesley, 1999.

C2.Database Management Systems (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke, McGraw-Hill, 2003.

C3.Data Mining: Concepts and Techniques, by Jiawei Han and Micheline Kamber. Morgan Kaufmann. 2000.

C4.The Text Mining Handbook – Advanced Approaches in Analyzing Unstructured Data, by Ronen Feldman and James Sanger, Cambridge University Press, 2007.

C5.The Search – How Google and Its Rivals Rewrote the Rules of Business and

Transformed Our Culture, by John Battelle, Penguin Group, 2006. (Course Reserves - HD9696.8.U64 BAT 2005)

6.2 Reference Papers

C6.Ka-Ping Yee, Kirsten Swearingen, Kevin Li, Marti Hearst, “Faceted Metadata for Image Search and Browsing”, ACM CHI, 2003, 401-408.

(13)

C7.Marti Hearst, “Clustering versus Faceted Categories for Information Exploration”, Communications of the ACM, 49(4), 2006, 59-61.

http://flamenco.berkeley.edu/papers/cacm06.pdf

C8.Hwee-Leng Ong, Ah-Hwee Tan, Jamie Ng, Hong Pan, Qiu-Xiang Li, “FOCI: Flexible Organizer for Competitive Intelligence”, CIKM 2001:523-525.

http://citeseer.ist.psu.edu/568101.html

C9.“Search Engine Optimization – What’s in it for you?” SEO Solutions.

http://whitepapers.techrepublic.com.com/abstract.aspx?docid=382376

C10. Gordon Bell and Jim Gemmell, “A Digital Life”, Scientific American Magazine, 18 February 2007.

http://www.scientificamerican.com/article.cfm?id=a-digital-life

6.3 Best Practices and Case Studies

S/No Title

Author

1 Open Source Search: Elixir or Poison (pdf) George Everitt

2 7 Advanced Steps to Effective Search Marketing (pdf) Omniture 3 Analytics: Measuring the Impact of Search (pdf)

Your Users Are Talking To You (pdf)

Ian Davies Avi Rappoport 4 The Death of Modern (Information) Architecture (pdf) Paul Sonderegger

5 Searching for a Reason (pdf) Andy Feit

6 Nine Ways to Fix Intranet Search (pdf) James Robertson

7 Restoring Browse in the Enterprise (pdf) David Feldman

8 The Faceted Navigation and Search Revolution (pdf) Steve Papa

9 Taxonomies, Metadata, and Search (pdf) Seth Earley

10 Search: The Quiet Revolution (pdf) Susan Feldman

11 Principles of Effective Search (pdf) James Robertson

12 Search Finds Usability (pdf)

Searching for Search Usability (pdf)

Carl Frappaolo Martin White

13 The “Other” Search (pdf) Steve Kusmer

14 Social Work: Adding Social Network Analysis to Search (pdf) Bill Ives

15 Best of Both Worlds (pdf) Jean Graef

16 Providing Knowledge for Healthcare Professionals (pdf) Phillip Britt 17 Adaptive Search and Resolution for Service and Support (pdf) Mark Angel

18 SharePoint Search: An Enterprise Contender? (pdf) Jean Graef

7. Software Tools

• Search Engines: o Lucene: http://lucene.apache.org/java/docs/ o Indri: http://www.lemurproject.org/indri/ o Windows Search: http://www.microsoft.com/windows/products/winfamily/desktopsearch/choose/w indowssearch4.mspx

o Google Desktop: http://desktop.google.com

o Google Advanced Search: http://www.google.com.sg/advanced_search?hl=en • Mining:

(14)

o WEKA: http://www.cs.waikato.ac.nz/ml/weka/

o Text Analyst: http://www.megaputer.com/textanalyst.php3

8. Learning Outcomes, Achievement Methods and

Assessment

IS414 – Search Engine

Technologies

Course-specific core competencies which address the Outcomes to Assess Outcomes Faculty Methods

1 Integration of business & technology in a sector context

1.1 Business IT value linkage skills

1.2 Cost and benefits analysis skills

1.3 Business software solution impact analysis skills

2 IT architecture, design and development skills

2.1 System requirements specification skills

2.2 Software and IT architecture analysis and design skills YY

• Identify the functionalities in a Web search engine architecture.

• Explain and differentiate between Google architecture and FAST Search architecture. • Apply the design principles for faceted search.

Grade quiz. Grade exam.

Graded project design.

2.3 Implementation skills Y • Set up a Web-based search engine by implementing a JSP page on Apache Tomcat that invokes Lucene.

Grade lab exercises.

2.4 Technology application skills YY

• Explain the general model for text retrieval. • Apply techniques for achieving exact matching

versus similarity-matching, through the Boolean model and the Vector space model.

• Understand the impact of synonymy and polysemy on precision-recall performance, and employ appropriate techniques to compensate for them.

• Explain the techniques for query refinement. • Explain the Google PageRank and Kleinberg’s

authority-and-hub model for web link analysis • Apply the concepts of clustering in search

engines.

• Apply the concepts of classification in search engines.

• Apply collaborative filtering and association rule mining in text recommendation.

• Apply search engine concepts and best practices in search engine optimization and marketing.

• Propose a search-enabled application on an existing platform, or that extends an existing application.

Grade quiz. Grade final exam. Grade project submission.

3 Project management skills

(15)

3.2 Risks management skills

3.3 Project integration and time management skills

3.4 Configuration management skills

3.5 Quality management skills

4 Learning to learn skills

4.1 Search skills

Y Make use of the advanced features of common search engines.

Grade quiz. Grade final exam.

4.2 Skills for developing a methodology for learning Y • Investigate an existing search platform (like Google Earth) or search application. Grade investigation report.

5 Collaboration (or team) skills:

5.1 Skills to improve the effectiveness of group processes and work products

6 Change management skills for enterprise systems

6.1 Skills to diagnose business changes

6.2 Skills to implement and sustain business changes

7 Skills for working across countries, cultures and borders

7.1 Cross-national awareness skills

7.2 Business across countries facilitation skills

8 Communication skills

8.1 Presentation skills Y • Give a presentation on the proposed product concept and design. Grade project presentation.

8.2 Writing skills Y

• Write an investigation report. • Write a product proposal.

Grade investigation report. Grade project

submissions.

Y This sub-skill is covered partially by the course YY This sub-skill is a main focus for this course

References

Related documents

Despite the resource curse literature primarily focusing on the detrimental effects of resource wealth on economic growth rates, evidence has also been provided that resource-

Using the collected data as a foundation, an optimization model that considers the build time, material usage, surface finish, interior geometry, strength characteristics, and

In Figure 13, a few key survey questions regarding tenants’ opinions towards heat use and saving money were plotted and used to compare students who pay heat separately with those

The Search application is the same on-screen Internet search engine that is on the Google Search Bar on your main Home screen.. For more information, refer to “Google Search Bar”

Luckily the search engine offers a platform through which you can manage the information Google will display in local search results - Google My Business.. Google My Business

Whenever a user types a keyword or a keyword combination in the search box of a search engine like Google, several web pages appear in the search results

Thus, it seems safe to conclude that the education-economic growth empirical research, exhibits substantial publication selection toward positive growth effects of

When describing the achievement of the objectives, please use the indicators and criteria formulated in the project proposal for monitoring purposes. In case of