• No results found

User goals and tasks have been determined as a driving force for information re- trieval in OPAC and online database interaction studies. They are also essential in the context of Web search engine research. Furthermore, users try to accomplish more diverse goals in this environment. After analyzing AltaVista user surveys and

search logs, Broder (2002) classified Web searches into three categories: informational

searches, transactional searches, and navigational searches. While informational queries accounted for about 50% of searches, transactional and navigational queries

took about 30% and 20%, respectively. The findings of this study showed that users

were not always searching for information; they also had to make transactions, such

as downloading and navigating, to find the specific URL of a site. Rose and Levin- son (2004) extended Broder’s work, further creating a hierarchy of goals instead

Interactve IR n Web Search Engne Envronments

of simple classifications. In the hierarchy, informational searches were refined to

have a series of subgoals: directed, undirected, to get advice, to locate information, and to obtain a list. Transactional searches were renamed “resource searches,” as

the underlying user goal is to obtain a resource, such as to download a file. The

richness of user goals in retrieving information requires Web search engines to have corresponding interfaces. However, these studies limited user goals only to the current search goal level.

Users not only have diverse goals for their current searches, but they also hold levels of goals in the Web environment. In her Web searching at-home study, Rieh (2004) validated Xie’s (2000) four levels of user goals in the Web environment: long-term goals (e.g., gain knowledge, professional achievement, etc.), leading search goals (prepare for an event, prepare for an online class, plan for a vacation, etc.), current search goals (look for papers, products, hotels, etc.), and interactive intentions (locate,

find, read, etc.). The levels of user goals also impose a goal structure in that higher

levels of user goals have an impact on lower levels of user goals. Furthermore, the

findings of this study indicated that people in a Web-searching environment engaged

in all four levels of goals, and they had more diverse tasks in the Web-searching

environment than in the work places identified by Algon (1997) and goals in the

libraries discussed by Xie (2000). In this environment, users sometimes looked for information just for curiosity or for entertainment purposes.

Researchers have also examined the impact of levels of goals and other factors on Web searching. Based on observation and interviews with 31 participants’ Internet and Web online catalogue searching, Slone (2003) examined how three levels of

goals—broad or situational, specific, and format—plus age differences influenced

search approaches. Broad goals represent the situations that lead users to search, such as educational, recreational, personal, and so forth, and they have an impact

on other goals. Specific goals are related to what users search for, such as a specific

subject, known organization, and so forth. Format goals are associated with the types of information users want, such as full-text articles, images, e-mail, and so

forth. The findings showed that children and adults older than 45 presented similar search approaches. One possible reason is that recreational goals were identified

more by children while personal goals were highly related to older adults, and both of these goals were found less motivating than educational or job-related goals.

Another significant finding is that the homogeneity of user goals is affected by age

group. Children (recreational goals) and adults older than 45 (personal goals) have homogeneous user goals, but the age groups of 18 to 25 years, 26 to 35 years, and 36 to 45 years all have multiple goals within a group.

Task, another term related to user goals, is an important variable that affects users’ behaviors and outcomes. Bilal (2002) compared children’s behavior and success on

three tasks: assigned fact-finding tasks, assigned research-oriented tasks, and self- generated tasks. Fifty percent of the children succeeded on the fact-finding tasks,

fully self-generated tasks. The results indicated that children were more successful on the fully self-generated tasks than the other two types of tasks. Their success on the fully self-generated tasks was attributed to the simplicity of the topics, their ability to modify the topics as they needed to, and their motivation in pursuing topics of interest. Children also exhibited different behaviors for different types of tasks. They performed the highest analytic searches on fact-based and self-generated tasks and the lowest analytic searches on research-based tasks. They used more natural language queries on fact-based tasks, less on research-based tasks, and none on the fully self-generated tasks. They browsed more and made more moves on the fully self-generated tasks than other two tasks. They looped and backtracked more

searches on the fact-based tasks than other tasks. To sum up, tasks influence users’

search behaviors and performance.

Schacter, Chung, and Dorr (1998) found a similar difference in their study between

ill-defined tasks and well-defined tasks, which are comparable to research-based tasks and fact-based tasks. Children performed better on ill-defined tasks than well- defined tasks, because ill-defined tasks require fewer analytical strategies. Children employed more analytic behaviors in achieving the well-defined tasks than in fulfill-

ing the ill-defined tasks. The only difference is that Schacter et al. discovered that

children overwhelmingly used browsing strategies regardless of their tasks. Ford, Miller, and Moss (2002) examined the relationships between tasks and system per- formance. Even though the selected two tasks all fell into the category of fact-based

tasks, they represented tasks with different levels of difficulty. The results showed that simpler tasks correlated significantly with higher relevance scores. The findings

of this study echoes the results of Bilal and Schacter et al.’s studies that retrieval performance is affected by task differences.

Not only tasks but also the interactions between tasks and other variables have impact on Web search activities. Kim and Allen (2002) explored the cognitive and

task influences on Web search activities and outcomes based on two experiments. The results showed that tasks had a significant effect on search outcomes as well as

search activities. Relatively high precision and recall were related to known-item tasks, which is comparable to the results of previous studies. The interactions amoung

task effects, cognitive abilities and problem-solving styles influenced the number of

searches completed, sites viewed, keywords searched, and bookmarks made. The interaction effect indicated that compared with other IR system environments, the

Web is more flexible for users to choose different search tools for different tasks.

Navarro-Prieto, Scaife, and Rogers (1999) associated tasks, search conditions, and levels of experience with users’ search strategies. They found that users’ cognitive

strategies were affected by types of task (fact-finding and exploratory), search con- ditions (whether the information they looked for was in Web-dispersed structure or category structure), and levels of users’ search experience. The type of task had

a strong influence on the experienced users’ search strategies. For example, in the

Interactve IR n Web Search Engne Envronments

mixed strategy at the beginning, and selected a bottom-up strategy later for the

specific fact-finding task. Simultaneously, they chose a top-down strategy for the exploratory task. The interactions among multiple variables make it difficult for

researchers to uncover the relationships between tasks and searching behaviors. Further research is needed to reveal direct relationships between tasks and search behaviors/strategies.

Usage.Pattern:.Patterns.of.Query.Formulation.and...

Reformulation.

Unlike studies on online databases, little research has investigated tactics or strate- gies in Web searching. Most studies of users’ interactions with search engines focus on patterns of query formulation and reformulation based on analysis of transaction logs submitted to search engines or Web sites. AltaVista and Excite data are the most examined by researchers. Silverstein, Henzinger, Marais, and Moricz (1999) analyzed nearly one billion queries representing 285 million user sessions captured by the Altavista search engine over a period of 6 weeks. They found some patterns of usage: (1) short sessions (average 2.02 queries per sessions), (2) short queries (average 2.35 terms per query), (3) minimum use of operators (average 0.41 operators per query with 80% of queries without any operators), (4) minimum viewing results (average 1.39 screens per query), and (5) search topics mainly related to sex. In addition, researchers have analyzed Excite data for a long period (Jansen, Spink, & Saracevic, 2000; Spink & Jansen, 2004; Spink, Wolfram, Jansen, & Saracevic, 2001), and they tend to analyze logs quantitatively. They discovered the following usage patterns in Excite search queries, with results apparently consistent with those of AltaVista queries: (1) Users do not frequently reformulate their queries (average 2.5 queries per session in 1997 and 2.3 queries in 2001); (2) users formulate short queries (average 2.4 terms per query in 1997 and 2.6 terms in 2001); (3) users do not view all the results (average 1.7 pages per query); (4) users increasingly submit Boolean queries over the years (5% of queries in 1997 and 10% in 2001); and 5) users’ search topics range from entertainment, recreation, and sex to e-commerce. Spink, Bateman, and Jansen’s (1999) Excite user survey results echo the results from log analysis. Users do not use many search terms or complex search strategies. They do not access many search features, either. The results of these studies indicated that users have a low level of interactions with search engines.

After analyzing the transaction logs of 2000 queries derived from WebCrawler, Moukdad and Large (2001) reported similar Boolean operators’ usage by users (7.8%) and higher multiterm queries submitted by users (average 3.4 terms per

query). Although 28.7% of the queries had search modifiers, only 55.5% of them

were correctly used. This study also reported high usage of complete sentences

users were more sophisticated and able to specify the information needed or they used more complete sentences. Many users seemed to form a model that considered human-Web communication as human-human communication. Wang, Berry, and Yang (2003) analyzed longitudinal user queries submitted to an academic Web site during a 4-year period. They found that the patterns of user queries between the academic Web site and search engines such as Excite and AltaVista were compatible, for example, most of the queries are unique, short queries. The longitudinal data present similar patterns across time, especially the problem of null output. Thirty percent of queries consistently resulted in zero hits over the years. Lack of basic IR knowledge and misspelling contributed to a high number of zero hits.

Most of these studies focus on the identification of patterns of general query formula-

tion; significantly fewer focus on patterns of query reformulation. Silverstein et al.

(1999) reported that users did not modify their queries much (average 2.02 queries per session). Adding terms (7.1%), deleting terms (3.1%), and modifying operators

only (1.4%) consisted of 12% of the query reformulations, while complete modifi- cations of queries comprised of 35.2% of the query reformulations. That indicated

that users had to refine or change their information need based on the results of

their previous queries. Spink, Jansen, and Ozmultu (2001) examined the patterns of query reformulation by Excite users based on the data set of 1,369 queries from 191 user sessions. Users had limited use of query reformulations. They found that

only one of the five users reformulated queries, and an average of 6.67 queries were entered for users who modified their queries. Users did not add or delete much in

their reformulations. Changing a term is the most common query reformulation,

because about 35% of queries that were modified had the same number of terms

as the preceding query. About an equal number of reformulations either increased (52%) or decreased (48%) the terms. Spink et al.’s analysis also showed less subject change, as 73% of user sessions included one topic and 27% consisted of two topics. These studies of query reformulations demonstrated limited query reformulations in the searching process, but they concentrated more on adding terms, deleting terms, and modifying operators.

Bruza and Dennis (1997) analyzed the logs of a prototype search engine, manually categorizing 1040 Web queries into 11 query transformation types. They found that users frequently repeated a query that they had already submitted. Other main categories of reformulation were term substitutions, additions, and deletions, in order of frequency. The results also revealed that users did not often split com- pound terms; make changes to spelling, punctuation, or grammatical case; or use

derivative forms of words and abbreviations. Based on these findings, Bruza and

Dennis developed a hyperindex to aid users in query term additions and deletions

by presenting more specific terms that often contain contextual information. Lau

and Horvitz (1999) analyzed a data set of 4,960 queries on the Excite search engine. They hand-tagged the data and partitioned queries into classes representing different

Interactve IR n Web Search Engne Envronments

refinement classes were derived from the data: new, generalization, specialization,

interruption, requests for additional results, duplicate queries, and blank queries. Their analysis revealed that most actions are either new queries or requests for ad-

ditional information. Relatively few users refined their searches by specialization,

generalization, or reformulation.

Rieh and Xie (2001, 2006) examined query reformulation from a semantic level based on log data derived from Excite. They characterized the facets of query reformula-

tion in Web searching and identified the patterns of multiple query reformulations

in sequences. The data consist of 313 search sessions from two data sets randomly sampled over two time periods. Three facets of query reformulation as well as nine subfacets were derived from the data. Most query reformulations involve changes of content, which account for 80.3% of query reformulations. About 14.4% of the

query reformulations are related to format alone, and only 2.8% of the modifica- tions are associated with resource reformulation. More important, the analysis of

modification sequences generated eight distinct patterns: specified, generalized,

parallel, building-block, dynamic, multitasking, recurrent, and format reformulation.

Some of the identified reformulation patterns—for example, specified reformulation,

parallel reformulation, generalized reformulation, recurrent, and building-block re-

formulation—are not necessarily new findings, as they have already been identified

in previous studies (e.g., Bruza & Dennis, 1997; Lau & Horvitz, 1999). However, this study examined these patterns of query reformulation based on analysis of sequences of multiple queries rather than of just one query movement.

In addition, this study also identified new patterns reformulations, such as dynamic, multitasking, and so forth. Saracevic’s (1996, 1997) stratified model, especially his

insightful comments about the fact that there is a direct interplay between the surface and deeper levels of interaction, was adapted as a theoretical framework for the study. The deeper-level cognitive, affective, and situational aspects are employed on the surface level to specify and modify queries. Query formulation and reformulation demonstrate the existence of the interplay. The deeper-level aspects of interactions can change frequently, which can lead to interactions on surface level, for example, changes in queries or tactics. Rieh and Xie (2006) further developed a model of Web query reformulation and suggested interactive query reformulation tools.

Studies of patterns of query formulation and query reformulation demonstrated that users take the least effort approach in Web searching. Simultaneously, their query reformulation process is dynamic in a variety of situations. It is imperative that the design of Web search engines support users’ query formulation and reformulation process. Yang (2005) calls for the need to design support features that can shift the cognitive burden from users to systems. One major problem of the above log analysis is that researchers only examined the log data that provide an overview of usage pattern. Log analysis can only account for what users have done, but it cannot answer what directs user actions, and why.