• No results found

CHAPTER 2. LITERATURE REVIEW

2.2 Social information space

2.2.2 Bibliometric Methods – Citation analysis

2.2.2.1 Studies using citation analysis

A basic assumption underlying most types of citation analysis is that a citation is an indicator of influence. As Wilson (1999) says, “[a] document is cited in another document because it provides information relevant to the performance and presentation of the research, such as positioning the research problem in a broader context, describing the methods used, or providing supporting data and arguments” (p. 126). This interpretation of citation, called the normative theory of citing, draws upon Merton’s norms of science, which states that scholars internalize and commit to the norms that obligate them to give credit for ideas and to specify the sources of the knowledge upon which their research is based (Wouters, 1999). From this perspective, citations are regarded as a device for acknowledging intellectual debts. It then can be argued that “the research that scientists cite in their own papers represents a roughly valid indicator of influence on their work” (Cole & Cole, 1973, p. 220). Having assumed that the cited work had an impact on or was instrumental in pursuing the research reported in the citing paper, it follows that the number of citations a paper receives in a subsequent body of literature reflects the degree of its influence or importance in the research area. This stance is

supported by the findings of Cole and Cole (1971) that citation frequency was highly correlated with several other measures of prominence or quality such as peer ratings or grant awards: “The data available indicate that straight citation counts are highly correlated with virtually every refined measure of quality. . . There can be little doubt that large differences in the number of citations received by scientists do adequately reflect differences in the quality of the work”(Cole & Cole, 1971, p.28).

This interpretation of citation count, however, has been continually challenged and questioned. The doubt about this assumption in part results in a number of studies investigating the actual function or role of citation or the motives of citing authors. The development of a distinct line of citation research that is more interpretative and constructive stems from the criticism of the normative theory of citing and the interpretation of citation data based on the presumptive arguments (see the section on Studies on citation behavior).

Perhaps a more important assumption, partly intuitive and partly informed by the normative theory, is the relatedness of content between cited and citing papers. Wilson’s (1999) description of citing behavior (quoted earlier) implies that there is a relationship between the substantive content of the two documents. Small (1978) points out that, because of the implicit assumption on a conceptual relationship, we usually expect to be able to connect a certain portion of the citing work to a cited work, and attempt to find the author’s rationale for citing particular works. The relationship between cited and citing papers further entails possible relationships between papers cited together. “If two documents are jointly cited by another document, they jointly contribute to the content and impact of that research document, and are associated by their role in that research document. Accordingly, the more two documents are co-cited from a body of literature, the greater is the association of their

content, in the opinion of the authors of that body of literature. This leads to the cocitation analysis and its application in literature mapping and visualization studies” (Schneider & Borlund, 2004, p.528).

2.2.2.1.2 Evaluative studies and relational studies

Broadly speaking, studies employing citation analysis can be divided into two groups depending on their general purpose or orientation: 1) evaluative studies, and 2) relational studies (Borgman & Turner, 2002). It can be seen that these two groups of studies are based on the assumption of influence and the assumption of conceptual relationship, respectively.

In evaluative studies, citation counts are used as indicators or measures of impact, quality, or performance. Commonly used units of analyses are individual publications, journals, authors, and research groups (White & McCain, 1997). The number of times a unit (a document, an author, etc.) is cited is the fundamental measure used in most studies. What this count actually measures is usually defined with reference to the unit of analysis and qualified in the specific context of a study. For instance, citation counts can indicate relative performance of a researcher, the level of influence of a paper within a subject domain, or quality of a research institution. The result of this kind of analysis often produces a ranked list, which may be used as a basis for policy decisions (Borgman & Furner, 2002).

While evaluative studies are based on direct counts (or sometimes normalized frequencies) belonging to individual units, in relational studies co-occurrence of certain features (e.g., cocitation) is often used for measuring associations between units (e.g., pairs of highly cited documents). As in evaluative studies, various analytic units such as documents, authors, and journals have been used. The analytic techniques generally involve

measurement of similarities, formation of clusters based on similarity measures, and spatial arrangement of clusters in a way to depict their relatedness (Hummon & Doreian, 1989). The clusters represent topics, specialty areas, or research fields, while links between them show possible relationships (McCain, 1990). As a result of the analysis, a citation network or a visualized map can be produced. These maps or networks are then used to understand the overall patterns of communication and the intellectual structure of a domain (White & McCain, 1997).

Two basic similarity measures used in relational studies are bibliographic coupling

(Kessler, 1963) and cocitation (Small, 1973; Small & Griffith, 1974). Papers are bibliographically coupled when they cite one or more papers in common. On the other hand, papers or authors are said to be co-cited when they are cited together by one or more papers published later. The difference of these two measures is clearly explained in the following figure taken from Garfield (1988). As shown in the figure, for a pair of papers A and B, bibliographic coupling measures the number of papers cited by both A and B, while cocitation measures the number of papers citing both A and B.

The basic premise for measuring similarities based on citation patterns is that authors cite earlier works that are conceptually related and relevant to the current work. A pair of papers is more likely to be related in content when they cite many of the same papers (bibliographic coupling) or when they are cited together in a great number of subsequent papers (cocitation). The level of similarity or the strength of relationships is assumed to be proportional to the count of bibliographic coupling or the frequency of cocitation (Calado et al., 2006).

As a tool for mapping science, cocitation analysis has been more widely used since it allows evolutionary perspectives. Introducing cocitation analysis, Small (1973) posited that, “If it can be assumed that frequently cited papers represent the key concepts, methods, or experiments in a field, then cocitation patterns can be used to map out in great detail the relationships between these key concepts” (p. 265). In his view, the intellectual structure of science is composed of interconnected specialties, each of which is represented by a cluster

of highly cited papers. A point of importance is that cocitation analysis enables dynamic linkages. Cocitation links between pairs of documents are not static properties of those documents. The links are constructed based on citations in later literature, outside the documents being linked. It is therefore possible to capture changes in the intellectual structure as they emerge over time, as well as the connectedness of specialties. Small and Griffith (1974) showed how cocitation analysis can be used to map the intellectual structure of specialties, starting from identifying pairs of highly cited documents linked by cocitation. Subsequently, cocitation analysis has been used to map research specialties of many disciplines (see White & McCain, 1989, pp. 140-146).

While the discussion of cocitation analysis so far has been based on the linkages between documents made by their joint citation in later documents, author cocitation analysis, developed by White and Griffith (1981), traces the linkages between authors and produces maps of prominent authors in selected domains. The unit of analysis is not a single document, but a set of documents by an author, i.e., the author’s oeuvre. Just as with document cocitation, it is assumed that “two authors are somehow related to each other if they are often jointly cited and that, the more frequently they are co-cited, the more closely they are related” (White, 1990, p.84). The map of authors is drawn such that authors with perceived similarity (based on the frequency with which their works are jointly cited by other authors) are placed closer to one another. Maps produced from author cocitation analysis provide another representation of the intellectual structure of the chosen domains. Clusters of authors in the map may represent subject areas, specialties, schools of thoughts, etc. (McCain, 1990). Bayer et al. (1990) argue that, “While single works of an individual may precipitate scientific revolutions and new scientific paradigms (Kuhn, 1962), it is more generally the case that a

body of writings by a scientist places that person in the intellectual and influence structure of a field.” (p. 444).

2.2.2.1.3 Information retrieval/information filtering applications

Although bibliometric methods, especially citation analysis, are used mainly in the context of scholarly communication, in order to understand its processes and structures, the applicability of bibliometric methods has been discussed and tested in other areas. Most notably, from the outset of the development of citation indexes, Garfield has repeatedly emphasized their value as an information retrieval (finding) tool (Garfield, 1955; 1974; 1990; 1994). In his view, a citation index is similar to a traditional subject heading system in that a citation index can be used to bring related works together as well as finding a specific work. The difference is that a citation index is more flexible because it allows associative links and thus facilitates access from different perspectives: “By virtue of its different construction, it tends to bring together material that would never be collated by the usual subject indexing. It is best described as an association-of-ideas index, and it gives the reader as much leeway as he requires. Suggestiveness through association-of-ideas is offered by conventional subject indexes but only within the limits of a particular subject heading” (Garfield, 1955, p.122). In the context of automatic indexing and information retrieval techniques, Salton noted the value of citation data for representing the subject content of documents and suggested that “documents processed in a retrieval system should normally carry bibliographic citation codes in addition to standard content indicators” (1971, p. 109). In a series of experimental studies, Shaw (1990a, 1990b, 1991a, 1991b) showed empirically that citation information can be employed in document representation in retrieval systems, with the following

conclusion: “In the context of the CF [cystic fibrosis] Database and the single link clustering criterion, the capacity of citation descriptions to associate documents relevant to the same query and discriminate between those that are not is comparable or superior to subject descriptions” (Shaw, 1991b, p.683).

More recently, bibliometric methods have been found useful in the context of web link analysis. Papers connected with citation relationships are analogous to web documents connected by means of hyperlinks. We could assume some kind of implicit value judgment or endorsement behind the citation decision and, even though the contexts are different, the decision to link to a specific web document implies a decision of a similar kind, e.g. quality, usefulness, value, etc. With the equivalent formal structure as well as similar assumptions, bibliometric methods have been successfully adopted for developing searching algorithms. Indeed, Brin and Page’s (1998) PageRank algorithm, which is used in the Google search engine, is based on the same assumption. Specifically, the PageRank algorithm takes account of the number and quality of incoming links to a webpage in ranking the page. Another prominent example is Kleinberg’s (1999) HITS algorithm based on the notion of ‘authority’ and ‘hub.’ The HITS algorithm is designed to search the web for authoritative sources on a topic. Based on the analysis of the link topology of the web, Kleinberg proposes that the web consists of ‘authority’ pages, which are authoritative sources with many incoming links, and ‘hub’ pages, which provide collections of links to authoritative sources. In the original algorithm, authorities and hubs are structurally defined, without relying on semantic information such as titles or link texts. Kleinberg's algorithm was adopted in IBM's Clever search engine (Chakrabarti, et al., 1999). The algorithm also has been applied to perform various tasks, including identification of communities (Kumar, et al. 1999) or clustering of

web documents (Chakrabarti, et al. 1998).

2.2.2.2 Studies of citation behavior

Either in evaluative studies or in relational studies, citation analysis is used as a tool for describing or explaining some phenomenon, based on citation counts or patterns. There is another line of studies in which citation behavior per se is the subject to be investigated (Leydesdorff, 1998; Snyder, Cronin, & Davenport, 1995). In a recent ARIST review, Borgman and Furner (2002) noted a trend of interpretative and constructive approaches to studying citation behaviors and suggest that those studies can be categorized as theoretical.

This trend can be traced back to early critics of citation analysis questioning the assumption that citations can be used as valid indicators of impact, quality, importance, or utility. Edge (1977; 1979) argued that, since citation analysis is concerned only with formal communication manifested in publications and, thus, does not measure intellectual influence made by informal communication or through social relations, the result can not adequately represent the influence structure. Gilbert (1977) interpreted citations as primarily rhetorical devices for authors to appeal to their readers. MacRoberts and MacRoberts (1986, 1987) cast doubt on the normative theory of citing by arguing that citing practices are incomplete and biased. Cole (2000) provided a historical review of the critics of citation analysis, especially criticisms of its use for evaluative purposes.

Controversies surrounding the reliability and validity of citation analysis have given rise to a series of studies adopting a qualitative method called citation context analysis or citation content analysis (Small, 1982). In these studies, the contexts of citations (for example, texts near footnote numbers or reference codes), were examined in order to identify what

functions or roles citations have in cited works. Early findings showing that not all citations are the same type (Chubin & Moitra, 1975; Moravcsik & Murugesan 1975) led to the development of classification schemes or citation typologies based on the different cognitive functions citations may have (Cozzens, 1981). The underlying motivation for developing a classification or typology of citation is the hope that, by distinguishing different types of citations, it would be possible to more precisely define what is being measured in the quantitative analysis of citations and improve interpretation of the results.

While many classification approaches examine the nature of relationships between the citing work and the cited work, others focus more on factors affecting citers and study their reasons and motivations for citing particular works or authors. Surveys and interviews are the most common approaches. Shadish et al. (1995) surveyed authors of psychology journal papers about their reasons for citing. Case and Higgings (2000) provided a review of citer behavior studies and replicated the study of Shadish et al. in the field of communication. Although some differences between citing behaviors in the two disciplines were noted, the common high-level finding is that there is a general tendency for authors to cite what they consider as exemplary work or “concept markers” in the research area, and that there is a spectrum of reasons for citing particular works, varying among authors.

In summary, studies attempting to answer questions about functions and roles of citations or about reasons and motivations of citers have contributed to the understanding that citation practices are more complex and multidimensional than the normative assumption suggests. Moreover, citer behavior is increasingly understood to be subjective, dynamically constructed and affected by the situation. With this broadened understanding of citation behaviors, it is possible to draw an analogy between citation decisions and relevance

judgments (Borgman & Furner, 2002). As Harter (1992) puts it, “An author who includes particular citations in his list of references is announcing to readers the historical relevance of these citations to the research; at some point in the research or writing process the author found each reference relevant. Relevance is the idea that connects IR to bibliometrics, and understanding it in one context should aid our understanding of it in the other” (pp. 612-613).