CHAPTER 2. LITERATURE REVIEW
2.1 Personal information space
2.1.3 Categorization research in information science
2.1.3.2 Categorization behavior in the context of personal information managemen
Bowker and Star (1999) stated that, “the categories represented on our desktops and in our medicine cabinets are fairly ad hoc and individual, not even legitimate anthropological folk or ethno classification … everyone uses and creates them in some form, and they are (increasingly) important in organizing computer-based work” (p. 6). In the context of
personal information management, research has been conducted to gain a better understanding of the kinds of categories that are constructed for and interwoven into people’s everyday-life interactions with their surroundings. Among other things, how people organize their work space containing a myriad of information artifacts (e.g., books, manuals, memos, forms, etc.) is of particular interest.
Malone (1983) presented one of the earliest studies of individuals’ information organization behavior in their own environment. Malone’s case study addressed how people organize information objects in their offices, in the interest of designing better office information systems. He interviewed various types of office workers and research scientists and explored the patterns of their behavior. He found that two different organizational units commonly used for grouping things are files and piles. While files are well-organized and labeled, piles are often loosely defined stacks containing mixed content. Malone observed that offices typically have a large number of piles, in part due to the fact that people tend to defer decisions about where to file things or what to do with them. He claimed that the cognitive difficulty of categorizing (including labeling) information is a major factor in explaining this behavior. For example, many subjects in his study mentioned that it is often the case that a document belongs to more than one category or is potentially related to various tasks, which makes it difficult for them to put it into a filing system. It was discovered, however, that there is a certain pattern of organization regardless of whether a stack is well-defined and coherent or not. People tend to arrange files and piles on their desk space according to relative importance and the actions that need to be taken. From this observation, Malone concluded that people organize information in their workspace not only to make it easier to find it later, but also to remind themselves of things to be done with it.
Kwasnik (1989) also explored people’s organization behavior in their offices, but specifically addressed “the influence of context on the process by which people organize and classify their own documents in their own information space” (p.145). She interviewed eight university professors and asked them to describe materials in their offices and explain the organization, recollecting “classification decisions” made about those materials. The basic premise of her study was that investigation of people’s classification behavior should be conducted in a natural setting because “classificatory decisions are always made in relation to something else” (p.147). Participants in the study were allowed to use their own words rather than to choose terms from a given set, and the investigator did not impose any constraints on how they described things. In fact, terms used by participants to name or label basically similar items showed great variety, often accompanied by further qualifying expressions. Both the noun and qualifying terms were analyzed to identify dimensions along which classificatory decisions had been made. In the analysis, a set of coding categories representing these dimensions were incrementally derived from and applied to the data in order to discover which dimensions were commonly used across participants, and which were most frequently invoked. The results showed that the most frequently occurring dimensions were form, use, time, topic, and circumstance. When groups of dimensions were analyzed, the group related to situational attributes (e.g., use) was the most frequently used and the group related to document attributes (e.g., topic) was the second. These findings indicate that traditional classification systems, which rely almost entirely on document attributes, do not provide adequate support for individuals’ needs for organizing their information space. In addition, Kwasnik suggested that the great variety of terms used by participants to label the same kinds of objects demonstrated that a document could be
classified in a variety of ways, and thus lead to the need for supporting multiple classification. Case (1991) investigated the way in which historians categorize and store information they have collected. General findings from the interview with twenty participants were comparable to those from Kwasnik (1989), in that similar dimensions (such as form, topic, and purpose or use) were found to be important. An interesting observation made in this study was that historians consider four conceptual levels of storage in order: physical space, form, topic, and treatment/purpose/quality. As Case summarized, “it is the physical space that is first given priority in document location, and then the physical form of the document. It is the third level that constrains the factor typically of most concern to the information scientists: that of the specific topics of the document” (p.664). The most specific level, which concerns treatment (e.g., intellectual genre), purpose or use, or quality, is invoked last. Given that people deal with physical package of information within the physical environment, it is not surprising that physical constraints are considered first in filing documents. Perhaps the more interesting part is that participants in this study made a topical categorization and then proceeded to make more specific distinctions based on contextual attributes such as purpose and quality (e.g., ‘good’ example for a specific argument). In fact, a similar pattern was implied in Kwasnik’s (1989) findings. In a large number of cases, ‘form’ was used as the head noun, followed by qualifiers representing other dimensions (e.g., books to be used for a class). Kwasnik also stated that “neither the document attributes nor the situational attributes can be considered independently” (p.156). There is an interaction of dimensions, and while a document’s intended use or value is often the most important factor for a classificatory decision (the end result), its content (topic) is a ‘given’ factor constituting the basis upon which the further evaluation of its relevance to a task can be made. At this
point, it is worthwhile to recall Barsalou’s (1991) theory of the complementary roles of ad hoc goal-derived categories and taxonomic categories. According to his theory, taxonomic categorization is made first to build a world model which is people’s knowledge of their physical environment. Then, in the presence of a specific task, goal-derived categories are formed on top of the taxonomic categories. This account appears to adequately explain the above findings.
Whereas the studies reviewed above are concerned with people’s organization behavior in physical environments, there are studies addressing the same question in electronic environments. Barreau (1995) and Barreau and Nardi (1995) are well-known early studies. Barreau (1995) explored the factors that influence people’s classification decisions in their electronic environment. More specifically, Barreau investigated “whether the factors which influence classification decisions in an electronic environment were consistent with the factors that Kwasnik observed for physical documents in an office” (p. 327). In her study, Barreau interviewed seven managers about their use of personal information management (PIM) systems, using a methodology similar to Kwasnik’s. Overall, the findings were analogous to Kwasnik’s findings. It was reaffirmed that document attributes were not the sole consideration in making category decisions. The context in which documents are created and used has significant impact on the way people identify and manage documents within their PIM systems. It is noted, however, that document creation and usage in the PIM systems were more dynamic in nature, partly due to the temporal characteristics and the variety of the tasks performed. Documents are organized to support the current project, and “rules that are applied for a period of time to reflect the priorities of the moment may soon be abandoned or forgotten” (p. 337). In addition, a pattern of satisficing strategies was discovered in that the
managers usually did not file documents using subdirectory features and chose to leave them all in one directory.
People’s reluctance to use a hierarchical organization of directories or folders was also reported in studies addressing how people manage and organize their bookmarks or emails. Keller et al. (1997) described the various obstacles people encounter using the bookmarking feature in a web browser, and claimed that organizing bookmarks within a hierarchical folder system is particularly challenging because “a single piece of information is often relevant in multiple ways, and thus not easily categorized within a single folder” (p. 1104). Not only does a folder system make it hard to decide where to put a bookmark, it also requires users to remember their decision when they need to access one. Abrahams et al. (1998) stated that, “Users must continually tradeoff the cost of organizing their bookmarks and remembering which bookmarks are in which folders versus the cost of having to deal with a disorganized set of bookmarks” (p. 44). Their survey of 322 Web users and analysis of bookmark archives of 50 users indicated that the majority of users choose not to organize bookmarks into folders. As long as the list is easily scanned (with a threshold of 35 bookmarks), users prefer to have an unstructured list, not only because it is easier but also because they want to retain their chronological order. Beyond the threshold of 35 bookmarks, users create folders incrementally as the number of bookmarks increases. A similar pattern was found in the way people manage their emails. Whittaker and Sinder (1996) found that people usually leave messages in their inbox and do not try to maintain a folder structure. There could be several reasons behind this behavior. First, the cognitive difficulty of filing, as discussed in Malone (1983), was noticeable. Moreover, the resulting folder structure may not be useful in retrieving messages later. As one participant put it, “any piece of information
longer than five lines has at least several axes along which you might want to look it up and it really depends how you’re coming at it and what you’re thinking about at the time” (p. 279). That is, since the way people conceive information within a message and thus categorize it can be changed over time, filing requires anticipation of future use beyond the current context.
More recently, Gottlieb (2001, 2003) investigated classificatory behavior of users in their creation of folder structures and assignment of bookmarks within the structure. One of the purposes of the study was to see whether the factors identified by Kwasnik (1989) and Case (1991) as affecting people’s categorization of information in physical environment are relevant to explain bookmarking behaviors. However, rather than examining people’s own collection of materials, participants with similar backgrounds in finance were recruited and asked to categorize the same set of internet documents (web sites). Given the structures created by individual participants, customized questionnaires were developed for each participant to solicit the reasons or motivations of particular classificatory decisions made in the creation of their own folder structure and the placement of specific items. Contrary to the highly contextual basis for organizing materials found in other studies, including Kwasnik (1995) in the physical environment and Barreau (1995) in the electronic environment, content attributes (as opposed to context attributes) were found to be the most frequently cited factors affecting their categorization decisions. As Gottlieb acknowledged, this result might be attributed to the laboratory characteristics of the study. Similar observations were made in the area of cognitive psychology. When a specific context is not given in an experiment, people tend to bring in their knowledge about categories that is most likely relevant across situations. That is, category decisions are made based upon context
independent features. Another interpretation given by Gottlieb is that, even though an attribute such as author or topic is normally considered as intrinsic to a document, its meaning may vary depending on the individual’s take on the information. By giving subjects the same set of materials, the investigator could observe not only what factors or attributes were used to make classification decisions, but also whether the resulting classification decisions are similar or different. It was found that, even when people used the identical set of attributes (e.g., topic and publisher) to make their decisions, the resulting classifications could be quite dissimilar; on the other hand, when people relied on different attributes, the end result could be quite similar.
2.1.4 Conclusion
Our understanding of cognitive categorization has been greatly extended over the last several decades. At first, categories were assumed to have clear boundaries with definitions, and to simply mirror the discontinuities in the environment independent of human beings. Both assumptions were proved to be incorrect by Rosch and other researchers in the 1970s. Categorization was shown to depend largely upon human perceptual functions, and empirical evidence indicated that categories have internal structures characterized by central tendency (prototype effects). Many models of category structure were proposed, mainly based on the idea of similarity to central prototypes or of family resemblance among exemplars. From the 1980s, theory-based approaches moved the research a step further, to show the impact of people’s knowledge and models of the world on categorization and, thus, the selective nature of categorization. More recently, in addition to the role of background knowledge, other contextual factors were also studied. Nowadays it is generally accepted that categorization is
dynamic and a context dependent process.
Barsalou’s research on ad hoc or goal-derived categories has been reviewed in detail in the previous section. Since it provides an account of cognitive behavior in the context of the activities of everyday life, Barsalou’s theory of goal-derived categories has been adopted by many researchers in their field of research, including managerial decision making (Kahneman & Miller, 1986), consumer behavior (Ratneshwar et al., 1996, 2001; Felcher et al., 2001), problem solving (Chrysikou, 2005), medical diagnosis (Custers et al., 1996), etc. This theory appears to be of particular value for explaining the cognitive processes behind information seeking, use, and management.
Categorization of information objects involves inextricably related dimensions including the object’s physical package, its content, individuals’ situations which make them highlight or overlook certain aspects of it, its current usage and potential relevance, etc. Because of this complexity, the cognitive effort required to categorize information can be overwhelming.
Information organization within information systems often imposes a hierarchical structure and relies on controlled vocabulary. However, many researchers have noted variability and subjectivity of individuals’ perception of and interaction with information objects. Research in the area of personal information management further demonstrates that the way people organize and access information is highly context dependent. It follows that any information object can be categorized in a variety of ways. Balancing between the flexibility and stability of information access systems remains a challenge.
2.2 Social information space