Active Component Repository Systems by
Yunwen Ye
B.Sc., Fudan University, China, 1987 M.S., Fudan University, China, 1990
A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment
of the requirements for the degree of Doctor of Philosophy Department of Computer Science
This thesis entitled:
Supporting Component-Based Software Development with Active Component Repository Systems
written by Yunwen Ye
has been approved for the Department of Computer Science
Gerhard Fischer
James Martin
Date
The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above
Supporting Component-Based Software Development with Active Component Repository Sys-tems
Thesis directed by Prof. Gerhard Fischer
It is widely believed and empirically proven that component reuse improves both the quality and productivity of software development. Before software components are reused, however, they must be located. Component repository systems provide a means to locate soft-ware components. Current component repository systems are designed to support the paradigm of development-with-reuse, which views reuse as a process independent of the whole software development process and relies on programmers to take the reuse initiative. Such systems fall short in supporting programmers who make no attempt to reuse because they do not know the existence of reusable components or they perceive reuse costs more than programming from scratch.
This dissertation advocates a paradigm shift from development-with-reuse to reuse-within-development, which views reuse as an integral part of software reuse-within-development, and component repository systems as information systems that augment programmers’ insufficient knowledge about reusable components and assist them in accomplishing their tasks. Active component repository systems—component repository systems equipped with active information delivery mechanisms—support reuse-within-development. They can be seamlessly integrated with pro-gramming environments. Through this integration, their active information delivery mechanism delivers task-relevant and user-specific components, without being given explicit reuse queries, to help programmers reuse unknown components and to reduce the cost of reuse.
An active component repository system, CodeBroker, has been developed and evaluated. CodeBroker runs continuously in the background of a programming environment and infers programmers’ needs for reusable components by monitoring their interactions with the
environ-iv ment. Potentially reusable components that match reuse queries extracted from comments and signatures in the programming environment are autonomously located and actively delivered to programmers. Formal evaluations of the CodeBroker system have indicated that it motivated programmers to reuse once relevant components were delivered, and that it was able to deliver components relevant to both the task and the background knowledge of programmers.
I feel very fortunate that my employer, Software Research Associates, Inc. (SRA), Tokyo, Japan, provided me the time and financial support to complete this research. In par-ticular, I thank Kouichi Kishida, executive vice president and technical director of SRA, for his lasting support and encouragement, without which I could not have finished this research. Yoshitaka Matsumura, Kaoru Hayashi, and Yoshikazu Hayashi have been excellent managers who have gone to great lengths to provide the best conditions for me to complete my research. I also want to thank my colleague Tomohiro Oda for his help.
I am grateful to the members of my thesis committee. Gerhard Fischer, my advisor, is simply the best advisor I could have found. His conceptual frameworks on Domain-Oriented Design Environments and on learning have provided the foundations for this research. Without his excellent skills in challenging my ideas and motivating me to think deeper, I could not have finished the research in this manner. Kumiyo Nakakoji, my mentor and role model, has provided immeasurable support, both emotionally and intellectually. She has been always there when I needed help. Brent Reeves has spent much time patiently listening to my sometimes rough ideas and reading my immature manuscripts, and has provided frank, yet friendly, critical feedback. His constructive criticism has been invaluable in guiding me to frame the research problem, prioritize my resources, and present my ideas clearly. The support from other members of my thesis committee, Ken Anderson, James Martin, and Walter Kintsch helped me to clarify my understanding, and their input is very much appreciated. In particular, I thank James Martin for his excellent course on Natural Language Processing, which introduced me to the research field
vi of information retrieval. That was one of the best courses I have ever taken.
Members of Center for LifeLong Learning and Design have been very supportive. I thank Taro Adachi for numerous, wide-ranging discussions that I have greatly enjoyed over the years. Jonathan Ostwald generously offered many times to listen to my thoughts and read my writings. His encouragement and feedback is greatly appreciated. I was extremely delighted to have as an officemate Eric Scharff, who had an answer to every computer problem I had, no matter whether it was a Mac, Windows or Linux problem. Many discussions with Rogerio de Paula helped me structure my thoughts. I thank Gerry Stahl, Hal Eden, Andy Gorman, and Francesca Iovine for their support.
Finally, I would like to thank my family members. I thank my parents, who have taught me the joy of learning and have always urged me to do my best. I thank my eldest daughter, Hanlu, for understanding when she had to spend many weekends being bored because dad had to work, and my 5-month-old daughter, Hanlei, for her innocent and sweet smiles which provided the best comfort after a day’s hard work. Most of all, I wholeheartedly acknowledge the endless love, understanding, and support that my wife, Yonghong Pan, has given to me. In particular, I thank her for her unabated confidence in me, which has cheered me greatly at times of frustration.
Chapter
1 Introduction 1
1.1 Motivation . . . 1
1.2 Goal of the Research . . . 3
1.3 Active Component Repository Systems . . . 6
1.4 The CodeBroker System . . . 7
1.5 Organization of the Dissertation . . . 9
2 Roles of Reusable Components in Programming 11 2.1 A Process Model of Programming . . . 11
2.2 Programming Knowledge . . . 13
2.3 Opportunistic Programming . . . 17
2.4 Benefits of Software Components in Programming . . . 19
3 Challenges of Software Reuse 22 3.1 Overview of Software Reuse . . . 22
3.2 General Issues of Component Reuse . . . 25
3.3 Creating Reusable Components . . . 29
3.4 Understanding the Cognitive Difficulties of Component Reuse . . . 32
4 The Component Locating Problem 40 4.1 No Attempt to Reuse . . . 40
viii 4.2 Paradigm Shift: From Development-with-Reuse to Reuse-within-Development 46
4.3 Information-Enriched Workspaces . . . 49
4.4 Active Component Repository Systems . . . 51
5 Active Information Systems 55 5.1 Basic Issues of Active Information Systems . . . 55
5.2 Acquiring Information of User Tasks . . . 59
5.3 Personalizing Information Delivery . . . 73
5.4 Dealing with Partial, Imprecise Queries . . . 75
5.5 Comparing Active Information Systems with an Example in the Real World . . 78
5.6 The Spectrum of Support for Locating Information . . . 79
6 Indexing and Retrieval Mechanisms in CodeBroker 82 6.1 Indexing and Retrieval Mechanisms . . . 83
6.2 Creating the Component Repository . . . 94
7 Locating and Delivering Components in CodeBroker 99 7.1 System Architecture . . . 99
7.2 Listener . . . 100
7.3 Fetcher . . . 103
7.4 Presenter . . . 105
7.5 The Retrieval-by-Reformulation Mechanism . . . 113
7.6 Summary of CodeBroker . . . 117
8 Evaluations of CodeBroker 119 8.1 Evaluating the Retrieval Mechanisms . . . 120
8.2 Empirical Evaluations of the CodeBroker System . . . 123
8.3 Findings about the Usage of CodeBroker . . . 128
8.5 Problems of CodeBroker and Needed Improvements . . . 143 8.6 Summary of Evaluations . . . 147
9 Related Work 149
9.1 Active Information Systems . . . 149 9.2 Component Repository Systems . . . 151 9.3 Intelligent Programming Environments . . . 154
10 Future Work and Conclusions 155
10.1 Future Work . . . 155 10.2 Summary . . . 157 10.3 Contributions . . . 159
Bibliography 161
Appendix
A The List of Queries and Relevant Components 173
B Questions Asked in the Post-Experiment Interview 176
C Abbreviations 178
x
Tables
Table
1.1 The rapid growth of the Java Core API library . . . 2
4.1 Relations between reuse mode, knowledge sources, and tool support . . . 54
5.1 A comparison between plan recognition and similarity analysis . . . 66
8.1 Average precision and recall values for LSA, Mixed (average of LSA and Okapi), and Okapi . . . 122
8.2 Programming knowledge and expertise of subjects . . . 125
8.3 Overall results of evaluation experiments with programmers . . . 129
8.4 Subjective evaluations of the CodeBroker system . . . 130
8.5 Experiment data regarding user models . . . 136
Figure
1.1 The location-comprehension-modification process of reusing components . . . 3
1.2 Software reuse failure modes . . . 4
1.3 Overview of CodeBroker . . . 9
2.1 The process model of programming . . . 14
2.2 A program and its program plans . . . 16
2.3 Orthogonality between program plans and software components . . . 17
2.4 The role of components in problem framing . . . 21
3.1 A cognitive model of the component reuse process . . . 34
4.1 Different levels of programmers’ knowledge about a component repository . . 42
4.2 The development-with-reuse paradigm . . . 47
4.3 The reuse-within-development paradigm . . . 50
5.1 Feedforward information delivery . . . 57
5.2 Autocompletion in Internet Explorer . . . 57
5.3 Feedback information delivery . . . 58
5.4 Two assumptions of similarity analysis . . . 63
5.5 The spectrum of support to information location . . . 80
xii
6.2 The process of creating a component repository from Java programs . . . 94
6.3 An example of a document generated by Javadoc . . . 96
6.4 The indexing format of method documents in CodeBroker . . . 97
7.1 The architecture of the CodeBroker system . . . 100
7.2 Component delivery based on concept queries only . . . 102
7.3 Component delivery based on both concept queries and constraint queries . . . 103
7.4 Presenting more information triggered by mouse movement . . . 106
7.5 An example discourse model . . . 107
7.6 An example user model . . . 109
7.7 An illustrative program for adaptive user modeling . . . 110
7.8 The Skip Components Menu . . . 113
7.9 The Direct Manipulation interface . . . 115
7.10 The Query Refinement interface . . . 116
7.11 Summary of CodeBroker . . . 118
Introduction
1.1 Motivation
A wide gap exists between the constantly increasing demands for complex software sys-tems and the capability of the software industry to deliver quality software syssys-tems in a timely and cost-effective manner. Software reuse, a development method of using existing reusable software components to create new programs, has been shown through empirical studies to im-prove both the quality and productivity of software development (Basili et al., 1996; Boehm, 1999). Software reuse also increases the evolvability of software systems because complex systems evolve faster when they are built from stable subsystems (Simon, 1996).
Programmers are knowledge workers, and programming is a process of progressive crys-tallization of their knowledge into a program. Knowledge needed during programming comes either from the programmer’s head or from such external sources as books, manuals, peer work-ers, and computerized information systems (Norman, 1993). A lack of needed knowledge is one of the major reasons for poor quality and productivity of programming. With the advent of objected-oriented technology, reusable software components now comprise the bulk of pro-gramming knowledge. Easy access to needed external information, in particular, reusable soft-ware components, to complement the insufficient knowledge of programmers is thus critical to the improvement of programming quality and productivity.
If programmers know a reusable software component well enough, they may integrate it into their programs whenever it is applicable without even realizing they are reusing
be-2 Version No. of Packages No. of Classes Year of Release
Java 1.0 8 211 1996
Java 1.1 23 503 1997
Java 1.2 59 1525 1998
Java 2 70+ 2100+ 1999
Table 1.1: The rapid growth of the Java Core API library
cause such reusable components become “ready-to-hand” to programmers (Winograd & Flores, 1986). However, repositories of reusable software components are often so large that program-mers cannot learn about all of the components before they start programming. Software compo-nent repositories are not static; they are constantly evolving with new compocompo-nents added and old components updated. As an example, Table 1.1 shows the rapid growth of the Java Core API (Application Programmer Interface) library—a repository of reusable components of classes and methods. Few Java programmers, if any, can claim that they know all the components in this library.
Programmers who have not learned the software component have to go through the reuse process if they want to reuse or use it in their programming. The reuse process consists of three steps: location, comprehension, and modification (Figure 1.1). Programmers have to locate those components that are potentially reusable in the current programming task from the com-ponent repository, comprehend their functionality and usage, and make necessary modifications if the components do not completely fit their needs (Fischer et al., 1991).
The foremost obstacle to the success of component reuse is that programmers cannot lo-cate needed software components quickly and easily. Locating reusable software components is often supported by component repository systems or reuse repository systems. Like many other information repository systems, browsing- and querying-oriented schemes have long served as the principal techniques for programmers to locate reusable software components. More innova-tive schemes, such as query by reformulation (Williams et al., 1982; Fischer & Nieper-Lemke, 1989; Henninger, 1993), information filtering (Belkin & Croft, 1992), and Latent Semantic
Location Modification Comprehension explanation reformulation extraction reformulation
Figure 1.1: The location-comprehension-modification process of reusing components Successful reuse requires programmers be able to locate, comprehend, and modify needed reusable components.
Analysis (Landauer & Dumais, 1997), have introduced new possibilities. Unfortunately, the problem remains that programmers simply do not actively search for components and make no attempt to reuse. According to a study by Frakes and Fox (Frakes & Fox, 1995), no attempt to reuse is the leading failure mode of software reuse (Figure 1.2). This inhibiting factor to the wide success of reuse has been reported again and again by software companies that have tried to introduce reuse into their organizations (Devanbu et al., 1991; Rosenbaum & DuCastel, 1995; Fichman & Kemerer, 1997).
1.2 Goal of the Research
Although many factors, such as the lack of managerial commitment and the difficulty in developing good reusable components, affect the widespread uptake of reuse, this research focuses on the cognitive difficulties faced by programmers who try to reuse, because only when programmers are willing and able to put reuse into their daily practice will reuse become fruitful. This research tries to create a conceptual framework to analyze what hinders program-mers from making attempts to locate reusable components and, based on the analysis, it
pro-4 poses a new approach to the design of component repository systems that can motivate and encourage programmers to reuse by reducing the difficulty of locating components.
By applying cognitive engineering (Norman, 1986) on the reuse process, a cognitive model of reuse is first built. Based on this cognitive model and past research on the effective use of large information repositories (Fischer, 2001), the following two barriers to the component locating process are identified.
Due to the large volume and constantly evolving nature of component repositories, programmers often fail to anticipate the existence of reusable components; when they do not believe that a component exists in the repository, they will not even make an
Figure 1.2: Software reuse failure modes
In the Frakes and Fox (1995) paper, seven conditions—attempting to reuse, compo-nents existing, compocompo-nents available, compocompo-nents found, compocompo-nents understood, components valid, and components integratable—form a successful reuse chain. A breakdown in any condition causes the failure of reuse. The above data were collected from 29 software development organizations. The Y-axis shows the per-centage each condition plays in causing the failure of reuse.
effort to locate it in the first place.
Even if programmers are aware of the existence of reusable components, they do not want to start the locating process if they do not know how to locate the components or if they perceive that locating the components costs more than programming from scratch.
Although reusable component repository systems have been an active research area for more than a decade, these two issues, especially the first one, have not been given enough attention. This is because those systems are designed to support the paradigm of development-with-reuse (Rada, 1995), which advocates reuse as a new paradigm for programming. Under this paradigm, the reuse process is treated as an independent process, and programmers have to change their current programming practice to embrace reuse; reusable component reposi-tory systems are researched as stand-alone systems under the assumption that programmers are always willing to use these systems and are able to use them with well-defined queries. Con-sequently, research on component repository systems has focused mainly on the information access mechanism only. Information access is an approach to obtain information that requires users1 to start the information locating process through browsing or querying.
This research proposes a paradigm shift from development-with-reuse to reuse-within-development. Development-with-reuse is a methodology-centered view of reuse that demands programmers to adapt themselves to the new methodology—reuse. It does not concern itself with the confusions and difficulties faced by programmers who try to reuse. When the ap-proach does not meet its expected success, programmers are labeled, due to their resistance to change, as having the NIH (Not Invented Here) syndrome (Fafchamps, 1994), and education of programmers about the value of reuse is called for.
Conversely, the reuse-within-development paradigm puts programmers back into the center and views reuse as an integral part of the whole programming process. It stresses that
1
Because users of component repository systems are programmers, in this thesis, the term “user” is used inter-changeably with the term “programmer”.
6 reusable component repository systems should serve as extensions to programmers’ limited knowledge. Such systems should actively participate in the programming process by provid-ing programmers immediate and easy access to reusable software components instead of beprovid-ing passively waiting for the exploration of programmers after they have made the decision to reuse. Reuse-within-development needs the support of active component repository systems. Active component repository systems are a subset of active information systems that are equipped with the information delivery mechanism. Unlike the passive information access mechanism by which users have to explicitly launch the information-seeking process by specifying their infor-mation needs in the form of well-defined queries or engaging in a series of browsing actions, the information delivery mechanism presents information to users on its own initiative without being prompted by explicit queries. With reusable components delivered by active component repository systems, programmers are able to reuse without changing their current programming practice and environment.
1.3 Active Component Repository Systems
In general, active information systems that just throw a piece of decontextualized infor-mation at a user are of little use because they ignore the user’s working context. The working context consists of the task acted upon and the user acting. The challenge of implementing an active information system or an information delivery system is to deliver context-sensitive in-formation related to both the task at hand and the background knowledge of the user. Task- and user-independent information delivery systems, or “push” systems, such as Microsoft’s “Tip of the Day,” suffer from the problem that information gets thrown at users in a decontextualized way. The “Tip of the Day” is a feature that tries to acquaint users with some arbitrarily chosen functionality in a complex system. Despite the possibility for interesting serendipitous encoun-ters of information (Roberts, 1989), most users find this feature more annoying than helpful.
The specific challenge faced by this research is to deliver context-sensitive components. In other words, how can the active component repository system capture programmers’ needs
for reusable components by understanding to some extent what their tasks at hand are and then present only those task-relevant components that are not yet known to the programmers.
Needs for reusable software components are not determined before programming starts, as most current component repository systems have assumed; they arise in the middle of the programming process (Sen, 1997). Inasmuch as programmers are using computer-based de-velopment environments to develop software systems, it is possible for component repository systems to capture the reuse needs autonomously by utilizing information available in program-ming environments when component repository systems and program development environ-ments are properly integrated. For example, in a programming editor, comenviron-ments inside pro-grams and signatures—the syntactical interfaces of program modules—are good indications of what programmers are going to develop next (Ye & Fischer, 2000). The integration of compo-nent repository systems and programming environments creates a shared workspace accessible to both programmers as well as component repository systems. This shared workspace en-ables component repository systems to play an active role in supporting reuse by programmers, with the delivery of task-relevant and user-specific reusable components. Presenting compo-nents specific to a programmer can be realized through user models (Fischer, 2001), because user models that represent programmers’ knowledge about reusable components can be used as filters by the repository system to ensure only unknown components are delivered.
1.4 The CodeBroker System
An active component repository system, CodeBroker, has been developed. CodeBroker is integrated with the program development environment—Emacs. It utilizes an information delivery mechanism to bring to the attention of Java programmers those components that are unknown to them and yet are relevant to their current programming task by
constructing a task model to capture the programming task through continuously mon-itoring programming activities in the development environment
8 identifying the domains of a programmer’s current interest by creating a discourse model based on the history of interaction between the system and the programmer
creating a user model to represent each programmer’s knowledge about reusable com-ponents to personalize the delivery.
Integrated with CodeBroker, the development environment becomes an information-enriched workspace (Ye, 2001b) consisting of the original programming environment and an augmented information display that presents reusable components dynamically based on the programming task and the programmer’s background knowledge. Programmers can access po-tentially reusable components immediately without switching working contexts. This is a dis-tinct advantage because it avoids interrupting the programming flow. The operational interface of the component repository system becomes transparent to programmers, and is replaced by three cooperative autonomous software agents (Bradshaw, 1997): Listener, Fetcher, and Pre-senter. The Listener agent creates reuse queries from the programming workspace as the task model; the Fetcher agent retrieves components matching reuse queries; and the Presenter agent presents retrieved components directly into the workspace of programmers, using discourse models and user models as filters (Figure 1.3).
The information-enriched workspace created by active component repository systems improves the “readiness-to-hand” of components because it hides the retrieval interface of com-ponent repository systems from programmers so that programmers can directly interact with reusable components rather than the repository system.
Evaluations of the system with programmers have found that the system was effective in supporting reuse along the following three dimensions:
CodeBroker effectively encouraged programmers to explore the possibility of reuse.
Programmers were able to reuse unknown software components when they were deliv-ered by the system.
Figure 1.3: Overview of CodeBroker
The programming environment is augmented with a reusable component informa-tion display (the lower buffer), which presents reusable components dynamically. These components are autonomously retrieved by three cooperative software agents (Listener, Fetcher and Presenter) based on the programming task and the program-mer’s background knowledge. In this example, the programmer can reuse the first component (highlighted) to implement the task: “Create a random number between two limits” (indicated in the doc comment), without leaving the programming envi-ronment or explicitly operating the component repository system.
The combination of task models, discourse models, and user models succeeded in most cases in delivering context-sensitive reusable components.
1.5 Organization of the Dissertation
Chapter 2 of this dissertation presents a conceptual framework of programming for ana-lyzing the roles of reusable components in programming. Most programmers follow the oppor-tunistic programming strategy, and the availability of reusable components affects the choice of different development alternatives.
After overviewing the issues of instituting systematic reuse in a software development organization, Chapter 3 analyzes in detail the difficulties of component reuse from the perspec-tive of programmers. Through cogniperspec-tive engineering, a cogniperspec-tive model of the reuse process is
10 created and the challenges faced by programmers in each step are discussed.
Chapter 4 focuses on the central theme of this research: why locating component is diffi-cult for programmers, and, in particular, what prohibits them from attempting to reuse. Drawing on past research on the use of large information repositories and on human cognition theories, the argument is made that the “no attempt to reuse” phenomenon is caused by the existence of information islands and perceived low reuse utility. The concept of active component repository system is introduced as a solution to this problem.
Chapter 5 delineates the challenges in implementing active information systems and their general solutions: Task models and discourse models contribute to the task-relevance of information delivery, and user models support the user-specific delivery. To accommodate the dynamic nature of the information-seeking process, the concept and role of retrieval-by-reformulation is discussed.
Chapter 6 describes the retrieval mechanisms used in the CodeBroker system and the CodeIndexer subsystem that creates the contents of component repository from existing pro-grams.
Chapter 7 presents the design and implementation of the CodeBroker system. Chapter 8 presents the findings from formal evaluations of CodeBroker. Chapter 9 compares this research with related work.
Chapter 10 concludes the thesis by discussing future research directions and summariz-ing the contributions of this research.
Roles of Reusable Components in Programming
With the advent of object-oriented technology, reusable software components have be-come an indispensable part of programming knowledge: “[Reusable component] library design is [programming] language design” (Stroustrup, 1995). In addition to those classes and methods included in standard libraries of programming languages, such as the Java API library, many reusable software components are developed by software development organizations specifi-cally for reuse or repackaged from previously developed systems.
Practitioners and researchers generally believe, and experiments have empirically proven that component reuse improves the quality and productivity of programming (Lange & Moher, 1989; Lim, 1994; Basili et al., 1996; Simon, 1996; Boehm, 1999). However, most analy-ses of the benefits of reusable components have been based on the products finally produced. To better understand how reusable components help programmers produce better software sys-tems faster—not a better product and a shorter production time, per se—we must analyze the roles of reusable components in the programming process. After presenting the process model of programming, drawing on design theory in general and empirical programming studies in particular, this chapter explains the benefits of reusable components in programming.
2.1 A Process Model of Programming
Viewed as a task to create a computer-executable representation—program—of a real-world problem by piecing together a set of primitive elements provided by a programming
12 language and its component libraries, programming consists of two distinctive, yet tightly in-tertwined processes: problem framing and problem solving (Sch¨on, 1983; Hoc et al., 1990; Fischer, 1994).
2.1.1 Intertwining of Problem Framing and Problem Solving
During the problem-framing process, commonly known as the specification process in software engineering, programmers try to understand the problem given in the actual problem space by building a mental representation of the programming task. This mental representa-tion is a situarepresenta-tion model that is the result of the interacrepresenta-tion between the problem and the pro-grammer’s knowledge about the problem domain (Kintsch, 1998). Different programmers with different knowledge often come up with different situation models of the same programming task. During the problem-solving process, or implementation in software engineering terminol-ogy, programmers create programs based on the situation model as a new representation in the solution space defined by the programming language and its libraries.
Although problem solving starts after problem framing, these two processes are not sep-arate. The processes of framing the problem and of solving the problem influence each other because every transformation of the framing of the problem provides the direction in which a partial solution is to be transformed, and every transformation of the constructed solution determines into which the framing is to be transformed. Just as all other designs that are the interaction between understanding (problem framing) and creation (problem solving) (Rittel, 1984; Winograd & Flores, 1986), programming is an iterative process of problem framing and problem solving. Programmers rarely complete one process before beginning the second one (Pennington & Grabowski, 1990) for the following two reasons.
(1) In most cases, programming tasks cannot be fully understood without considering the solution (Ghezzi et al., 1991). For example, given the programming task of drawing a filled circle, a programmer can define the filled circle as a trajectory of rotating one end of a fixed line 360 degrees, or as a collection of dots whose distance to a center is
not greater than the radius. Each definition is actually based on an intended solution to the problem.
(2) Programming involves many tentative problem-solving strategies. After those tentative strategies have been explored and their consequences evaluated, some become eventual commitments and some require the modification of the initial mental representation of the problem. This modification often breeds new subtasks to be solved.
2.1.2 Programming Is Knowledge Intensive
Neither problem framing nor problem solving is a process of simple transformation that converts one representation to another representation; instead, they are processes of interpreta-tion. The programming task, the situational model, and the final program are representations at different levels of formalization and abstraction intended for different purposes. Drawing on their knowledge, programmers have to interpret the previous representation by reifying abstract concepts, explicating the implicit, and structuring the symbols existing at the new representation level.
Knowledge required in programming can be divided into two categories: domain knowl-edge and programming knowlknowl-edge. Domain knowlknowl-edge is the knowlknowl-edge about the problem domain and is mainly used in the process of problem framing. Programming knowledge is the knowledge needed to construct a program in the process of problem solving. However, due to the intertwined nature of those two processes, programming knowledge also contributes to problem framing, and domain knowledge contributes to problem solving as well. Figure 2.1 illustrates the process model of programming and its reliance on knowledge.
2.2 Programming Knowledge
Among the many constituents of programming knowledge—for example, the operation of compilers and other tools, general data structure knowledge, and the capability of reasoning
14 Problem in Actual Problem Space Program in Solution Space Situation Model in Represented Problem Space Domain Specific Programming Knowledge Domain Knowledge Programming Knowledge Problem
Framing ProblemSolving
Programming
Figure 2.1: The process model of programming
Problem framing and problem solving are intertwined and they require both domain knowledge and programming knowledge. Domain knowledge and programming knowledge often overlap, and the overlap becomes domain-specific programming knowledge.
and abstracting—program plans and building blocks are two of the most important. As a series of interconnecting actions to achieve a goal (Soloway & Ehrlich, 1984; Rich & Waters, 1990), a program plan provides a skeleton structure for programs by abstracting key elements. Building blocks are the primitive elements provided by a programming language. They include basic statements of a programming language and reusable software components in repositories or libraries.
2.2.1 Program Plans
Considerable evidence exists in empirical studies of programming that program plans are the basic cognitive chunk used in program design and understanding (Soloway & Ehrlich, 1984; Rich & Waters, 1990). Programs are often added one plan chunk at a time (Rist, 1995; Detienne, 1995). Because program plans are abstract representations of a solution, during the process of programming, they need to be gradually fleshed out with building blocks. A program often contains different plans that are interlaced. Figure 2.2 shows a program and the program
plans it uses. Program plans are hierarchical. A program plan at a higher abstraction level is built upon program plans of lower levels. For example, in Figure 2.2, the planShuffling an arraycomprises three other program plans: Loop over an array,Create a ran-dom number in a range, andSwap two numbers.
2.2.2 Building Blocks
Although programmers can build a program with only the basic statements of a program-ming language, it is just as impossible to build a complex software system from basic program statements alone as it is to build a jet airplane from only nuts and bolts. Reusable software com-ponents are an indispensable part of the building blocks, especially in today’s object-oriented programming languages. A reusable software component is a software module that can be in-tegrated into a new program directly or after minor changes. A software module refers to a named and addressable abstraction—either a procedural abstraction, such as a function, or a data abstraction, such as a class. Procedures, functions, methods, and classes are all considered software modules. In this dissertation, the term module refers to software abstractions to be developed by programmers, and the term component is used to refer to those modules that have been packaged for reuse. Because basic program statements of a programming language are not of interest in this research, the term “building block” is used throughout interchangeably with the term “software component.”
2.2.3 Orthogonality of Program Plans and Software Components
Software components are used to realize program plans. Program plans and software components are orthogonal to each other: a program plan can be realized with different software components, and a software component can be used in the realization of different program plans. Figure 2.3 illustrates the orthogonal relationship between program plans and software components.
16
01 public class CardDealer{
02 static int [] cards = new int [52];
03 static { for (int i=0; i<52; i++) cards[i]=i; } 04 /** create a random number in a range */
05 public static int getRandomNumber (int from, int to) {
06 return ((int)(Math.random() * (to - from)) + from);
07 }
08 /** shuffle the cards */
09 public static void shuffleCards() { 10 int r, temp;
11 for (int i=0; i<52; i++) {
12 r = getRandomNumber(i, 52); 13 temp = cards[i]; 14 cards[i] = cards[r]; 15 cards[r] = temp: 16 } 17 }
18 public static void main(String[] args) {
19 shuffleCards();
20 for (int=0; i<52; i++) {
21 System.out.print(‘‘ ‘‘ + cards[i]);
22 }
23 }
24 }
The above program contains following program plans:
Plan Name Plan Description Lines
Realiz-ing the Plan
Create a random num-ber in a range
Get the range;
Convert a random number between [0, 1.0] to the range.
5,7
Swap two numbers
Save one data to a temporary vari-able;
Move the other data to the saved data;
Move the temporary variable to the other data.
13-15
Loop over an array
Initialize;
Set the ending condition; Perform operations;
Increase the loop variable.
11-16; 20-22 Shuffling an
array
Loop over an array;
Create a random number in a range; Swap two numbers.
11-16
Shuffling an array Swap two numbers Create a random number in a range Swap two sets of numbers
Loop over an array Plans
getInt(int, int) swapRanges()
swapInt() Math.Random() Components
Task
:
AND OR
Figure 2.3: Orthogonality between program plans and software components The taskShuffling an arraycan be implemented in at least three ways: (1) with nodes connected with solid lines (concrete implementation shown in Fig-ure 2.2, except that theswapIntwas implemented with primitive statements) (2) with nodes connected with thick dashed lines, i.e., with program plans of
Create a random number in a range, Loop over an array and
Swap two sets of numbers, and components of getInt(int, int)
andswapRanges()
(3) with the same program plans as in (2), and components connected with thin dashed lines, i.e., the components ofswapInt()andMath.Random().
2.3 Opportunistic Programming
Different strategies exist to develop a program. A top-down development strategy starts with decomposing the programming task into subtasks, choosing program plans to achieve those subtasks, and then fleshing out the program plans with reusable software components and pro-gram statements. A bottom-up development strategy starts with selecting reusable software components, and then combining them according to the structure of a program plan.
Empirical studies have revealed, however, that most programmers follow neither the top-down nor the bottom-up design strategy. In fact, their programming activities are very oppor-tunistic: They are a mixture of top-down and bottom-up strategies, and which strategy is chosen depends on the knowledge of individual programmers and the particular situation (Curtis et al., 1988; Visser, 1990). Interim decisions made during the programming process “often can lead
18 to subsequent decisions at arbitrary points in the [programming] space” (Roth & Hayes-Roth, 1979).
The opportunisticness of programming comes from the difference in each programmer’s knowledge of program plans and software components. Simon (Simon, 1996) has pointed out that cognitive activities are determined by the environment in which they take place. The en-vironment includes information present in the workspace as well as information present in the memory of human beings. Information in the workspace, including partially constructed pro-grams, talks back to the problem solvers (programmers) and serves as cues to activate relevant program plans and software components from memory (Sch¨on, 1983). Due to the difference in programmers’ familiarity with program plans and software components, which determines the link strength from cues to the activated knowledge in memory, it is quite natural that the programming process pursued by each programmer is different, and the resulting solutions vary. Taking Figure 2.3 as an example, if the programmer is more familiar with the component
swapRanges, he or she may choose the program planSwap two sets of numbers, and the final implementation will be the one connected with thick dashed lines. Conversely, if he or she is more familiar with the program plan Swap two numbers, he or she may proceed from that program plan and choose the componentswapInt.
The lack of knowledge about reusable software components needed to implement a pro-gram plan often prevents propro-grammers from considering it. However, if information about relevant reusable components is somehow present in the current workspace, it can expand pro-grammers’ solution spaces that are limited by their knowledge. Active component repository systems can complement programmers’ insufficient knowledge of reusable components by pre-senting them with immediately accessible components relevant to the current programming task in the workspace.
2.4 Benefits of Software Components in Programming
Reusable software components have both short-term and long-term benefits for the devel-opment of software systems. Short-term benefits are the immediate benefits that a programmer can attain during the implementation of a programming task. Long-term benefits may not be immediately enjoyed by the programmer who reuses the components, but they extend to the whole life cycle of the software system and to later programming activities of the programmer.
2.4.1 Short-Term Benefits
Reduced Development Time. By reusing existing software components, fewer pro-grams are written, and thus less time is spent in programming. Furthermore, because reusable components are usually carefully tested already, less time is needed in debugging and test-ing, which are the “hard and slow part” of programming (Brooks, 1995). Lim (Lim, 1994) has reported that in a Hewlett-Packard division, a nearly linear relationship exists between the percentage of reused code in the product and the productivity of programmers, measured in LOC/pm (the number of lines of noncomment source code produced by a programmer in a month). Only 5% of reused code yields an LOC/pm of 550, and as the percentage of reused code increases to 81%, the LOC/pm reaches 2,850. Similar reports can be found in (Browne et al., 1990; Hallsteinsen & Paci, 1997).
Improved Quality. Because software components are often repeatedly reused, the defect fixes from each reuse accumulate, resulting in higher quality of the developed software systems. Raymond has vividly described this incremental bug fix process as “given enough eyeballs, all bugs are shallow” in his seminal essay that explains why Open Source software systems tend to have high quality (Raymond & Bob, 2001). Basili et al. (Basili et al., 1996) have reported that the error density (errors per thousand lines of code) drops from 6.11 for systems developed without reuse to 0.12 for systems developed from reusable components. Similar formal evaluations on the contribution of reuse to the improved quality of software systems can
20 be found in (Lim, 1994; Thomas et al., 1997).
2.4.2 Long-Term Benefits
Easy Maintenance. Reusable components contribute to easy maintenance not only be-cause they have fewer defects, but also bebe-cause they facilitate communication among software developers by providing a set of common vocabulary, especially for the indirect communica-tion between system builders and system maintainers. Because reused software components are high-level abstractions, system maintainers do not need to look into the details of implementa-tion to uncover the original intenimplementa-tions of the system builders.
Improved Evolvability. To cope with constantly changing requirements and imple-mentation platforms, software systems must be able to evolve. Reusing software components improves the evolvability of software systems because it can limit the needed change to com-ponents instead of identifying and changing all occurrences distributed all over the system. Graham (Graham, 1995) has reported a very typical example as follows. Three project teams in a company had used the same formula in their software systems. Later, they discovered an error in the formula and needed to modify the systems. The team that had not created a component for the formula spent 5 weeks to find and correct each incidence of the formula. The other two teams, which had put the formula in a set of components, spent 1.5 days and 2 days, respectively to correct the system.
Increased Problem Framing Ability. The representation of a problem is an important determinant of the range of solutions that will be considered, as well as an important source of problem-solving difficulty (Hayes & Simon, 1977). Reusable software components provide programmers with higher level concepts that are both close to application domains and easy to implement. Components increase programmers’ ability to frame the problem into representa-tions that are easier to solve. A component creates an abstraction for an existing solution, and it reduces the number of items that a programmer has to hold in simultaneous contemplation because the programmer can refer to the whole solution with the abstraction, in place of the
Components Programming Languages Computer Component Developer Computer Concepts in Problem Domain Concepts in Problem Domain Programmer Programmer Programming Languages Compiler Developer Compiler Developer
Figure 2.4: The role of components in problem framing
In the top figure, programmers have to frame each concept in the problem domain based on their knowledge of the programming language. In the bottom figure, pro-grammers can represent some domain concepts with components directly (such as those having the same fill pattern in concepts and components) without thinking of their implementation.
details of the solution. Figure 2.4 illustrates the contribution software components make to the problem-framing ability. Without the support of reusable components, programmers have to frame each concept in the problem domain based on their knowledge of the programming lan-guage; with the support of software components, however, the difficulty of problem framing is reduced because certain concepts can be directly mapped to the components.
Chapter 3
Challenges of Software Reuse
This chapter consists of two parts. The first part overviews software reuse to provide a broad background for this research. It defines the concept and scope of software reuse; describes different kinds of reusable software artifacts to establish the link between component reuse and other reuse research efforts; and discusses managerial issues, legal issues, technical issues, and cognitive issues involved in instituting a reuse program within a software development organization. The second part of the chapter analyzes the difficulties of component-based reuse from the perspective of programmers who want to reuse.
3.1 Overview of Software Reuse
3.1.1 Software Reuse and Reusable Software Artifacts
A broad definition of software reuse is using existing software artifacts to construct a new software system. A software artifact can be defined as a piece of formalized knowledge that can contribute to the software development process (Dusink & Van Katwijk, 1995). There are two types of software artifacts: (1) software products that are created as “things” or deliverables during the development process, and (2) development knowledge that is applied to the process. The most commonly reused software product is source code, which is the final and most important product of software development. In addition to code, any intermediate life cycle products can be reused, which means that software developers can pursue the reuse of require-ment docurequire-ments, system specifications, modular designs, test plans, test cases, and docurequire-menta-
documenta-tion in various stages of software development.
Reusable software development knowledge and experience exists at different abstrac-tion levels: the architecture level, the modular design level, and the program (or code) level. Research on software architecture is currently aiming to define different software architecture styles for different families of software systems (Perry & Wolf, 1992). A software architecture style describes the formal arrangement of architectural elements, and can be reused by soft-ware developers to construct their new softsoft-ware systems once the style is well defined (Shaw & Garlan, 1996; Taylor et al., 1996). For example, the domain-independent multifaceted architec-ture is an architecarchitec-ture style for domain-oriented design environments, which has been reused in and refined through the development of many generations of design environments for different domains (Fischer, 1994).
Reusable knowledge on modular design can be codified in design patterns (Alexander et al., 1977) and frameworks. A design pattern is the description of a solution to recurring problems. It specifies a problem to be solved, a solution that has stood the test of time, and the context in which the solution works (Gamma et al., 1994). Design patterns provide a common vocabulary for software developers to discuss their designs and can be passed from one devel-oper to another develdevel-oper for reuse. The concept of framework comes from object-oriented programming languages (Fischer et al., 1995). A framework describes the interaction pattern among a set of collaborative classes or objects, and can be represented as a set of abstract classes that interact with each other in a particular way (Johnson, 1997). Programmers can reuse frame-works directly in their development after providing implementations for those abstract classes. Framework reuse is a mixture of knowledge reuse and code reuse.
Programming knowledge at the level of code is represented as program plans that can also be reused by programmers if a suitable representation form is defined (Rich & Waters, 1988).
24 3.1.2 Two Approaches to Reuse
Another dimension to classify reuse research is the approach it takes: Reuse can be generation-based or composition-based.
The generation-based approach reuses the process of previous software development efforts, often embodied in computer tools that automate a part of the development life cy-cle (Henderson-Sellers & Edwards, 1990). This approach weaves domain knowledge and pro-gramming knowledge into a very high-level propro-gramming language (VHLL), which is then con-verted to executable systems by a VHLL compiler or an application generator. Because VHLLs have a higher abstraction level than most high-level languages (HLLs), such as Java and C, they are relatively closer to programmers’ informal requirements. They are meant for program-mers to describe what the computer program does instead of how it is implemented. Compilers of VHLLs directly convert the VHLL programs into executable programs in HLLs. Lex and Yacc in Unix are two well-known examples; other research prototypes include SETL (Dubinsky et al., 1989) and PAISLey (Zave & Schell, 1984). Unlike VHLL compilers, which make the conversion from VHLL programs to HLL programs in one step, application generators often use a series of transformation rules to transform VHLL programs into HLL programs. A trans-formation rule maps a program in one abstraction level to a semantically equivalent but more computationally efficient program (Feather, 1989). Transformation-based application genera-tors allow programmers to control which transformation rule is applied when several applicable rules exist (Biggerstaff, 2000). Problems with the generation-based approach are the following:
(1) VHLLs are often defined for an extremely small application domain.
(2) Most VHLLs use mathematical abstractions, such as set theory or logic theory, that are actually more difficult to learn and to use than HLLs.
The VHLLs in the generation-based reuse approach have many overlaps with end-user pro-gramming languages (Repenning, 1993) that provide end-users with a simple instruction set at the abstraction level of the problem domain so they can modify the behavior of the application
systems to their own needs or add new functionality (Girgensohn, 1992; Fischer & Eisenberg, 1994).
The composition-based approach reuses existing software products in a new system to avoid repetitive work. As mentioned in the previous section, many types of software products can be reused. However, because this research focuses on the reuse of components, the dis-cussion here is limited to component reuse, although many problems and solutions discussed can be extrapolated to the reuse of other software products. Component reuse is also known as component-based development. Based on the role that components contribute to the program-ming process, component reuse is further divided into three categories.
Black-Box Reuse. In black-box reuse, a component is directly reused without mod-ification. A component can be reused as it is or reused through inheritance if the programmer creates a specialized subclass of an existing class component.
White-Box Reuse. In white-box reuse, programmers reuse the component after they have modified the components to their needs. White-box reuse does not contribute as much to the easier maintenance and evolution of software systems as black-box reuse does, but it can reduce development time.
Glass-Box Reuse. In glass-box reuse, programmers do not directly reuse the com-ponent; instead, they use it as an example for their own development. For instance, programmers can look at examples to find out how a program plan is realized and build their own system through analogy. Glass-box reuse contributes indirectly to the quality and productivity of programming because examples can reduce the cognitive load of programmers (Neal, 1996).
3.2 General Issues of Component Reuse
Despite its many benefits, component reuse has not yet received wide success in prac-tice due to the many difficulties associated with it. Component reuse introduces two different
26 processes in the life cycle of software development:
(1) the process of developing for reusable components
(2) the process of developing with reusable components
The first process creates component repositories by identifying, developing, and indexing com-ponents. The second process, commonly known as reuse process, is conducted by programmers who want to reuse. In the reuse process, programmers need to locate, comprehend, and inte-grate components (see Figure 1.1 in Chapter 1). Widespread success of reuse needs to overcome managerial, legal, technical, and cognitive issues incurred by both processes.
3.2.1 Managerial Issues
Successful systematic reuse requires the support and commitment from the managers of a software development organization. Managers should foster a reuse culture in their organi-zation by encouraging programmers to reuse. For example, managers must stop evaluating the performance of programmers based on the lines of code produced, which, unfortunately, still occurs in many software development organizations. This evaluation criterion obviously dis-courages reuse by programmers because programs developed with reusable components have fewer lines of code.
To encourage reuse, component repositories, either purchased from third parties or de-veloped in-house, should be set up. To do so, managers must be willing to make long-term investments. This also requires good metric models to analyze the economics of reuse and to identify the most effective reuse strategies. Several reuse metric models have been proposed and are in use in some companies. However, these models still lack formal validation (Frakes & Terry, 1996).
3.2.2 Legal Issues
Protecting legal rights of creators and consumers of software components is another dif-ficult aspect of instituting reuse. Currently, software is protected by copyrights that are designed to protect products in the “world of atoms”. In the world of atoms, after a product is passed from its owner to its customer, the owner no longer owns it, and the customer possesses full ownership. Software components are made of bits. In the “world of bits”, ownership does not change hands in the same way as in the world of atoms (Cox, 1996). It is extremely easy to reproduce a software component without any loss of quality, and even after the software com-ponent is passed from its owner to its customer, the owner can still have the exact same software component as the customer does. This presents a difficulty in estimating the cost of reusable components as well as defining a suitable standard mechanism to charge fees. For example, when a customer uses a purchased component in his or her application and sells many copies of the application to many end users, how much should the customer who purchased the com-ponent pay to the original creator? Should the end users of the developed application pay the original creator as well? And if that is the case, how should the law ensure that such payments are made? The lack of a standard mechanism of charging for the use or reuse of components inhibits the emerging of a robust market for components, which is essential to the widespread reuse of components.
3.2.3 Technical Issues
The implementation of systematic reuse has to overcome many technical difficulties. These difficulties exist in both phases of reuse: the initial setup of reusable component reposi-tories and the actual reuse of components during programming.
A great investment, both intellectually and financially, is required to develop and main-tain reusable component repositories. First, it is difficult to identify what kind of components should be included in component repositories. Second, the development of reusable components
28 is more difficult than usual development because reusable components should be more general and require higher quality and better documentation. It costs an estimated two or three times more to develop a reusable component than an ordinary component (Jones, 1984; Lim, 1994). Reusable component developers need to balance the inherent dilemma between the component size and its reuse potential: A larger component has more reuse value than a smaller one, but its reusability decreases because larger components are often more specific and more difficult to understand.
Another dilemma in setting up a component repository is the relationship between the number of components in a repository and the ease of finding the needed components. For a component repository to really pay off requires a critical mass of available components; how-ever, as the number of components increases, it becomes more difficult for programmers to find the needed components. Only repositories with a large number of components can reap the real benefits of reuse, so an effective searching mechanism must be provided for programmers to find the needed component.
3.2.4 Cognitive Issues
Even if good reusable component repositories do exist and a favorable reuse culture is fostered in an organization, reuse still fails if programmers do not put it into practice. After all, reuse must be carried out by programmers. This creates another dilemma for reuse: If program-mers do not reuse, the huge investment of building reuse repositories cannot be justified; and if the investment in reuse repositories cannot be justified, companies are less willing to create them, and then programmers have nothing to reuse.
Programmers’ resistance to reuse is often falsely dismissed as a mere attitude problem caused by the so-called NIH syndrome. However, recent studies have revealed that most pro-grammers do not have the NIH syndrome (Fafchamps, 1994; Frakes & Fox, 1995). On the contrary, programmers are very motivated to reuse if they know of the component or know how to locate the component (Lange & Moher, 1989; Isoda, 1995). What prevents programmers
from reusing is the fundamentally limited capability inherent in human cognition (Curtis, 1989; Fischer et al., 1991): the limitation of short-term memory, the scarcity of human attention, the mental inertia of coping with changes, the subjectiveness of evaluation, and the ambiguity of natural language. Section 3.4 provides a more detailed analysis of what kind of cognitive chal-lenges programmers have to overcome in order to reuse, and what kind of technology is needed to address such challenges.
3.3 Creating Reusable Components
Although this research is not concerned in particular with the creation of reusable com-ponents,1 because “anybody who sells a technology for reuse without providing a library of components is a snake oil salesman, a fraud, a charlatan (Zand et al., 1997),” it is worthwhile to point out the possible venues from which reusable components will come.
As mentioned before, creating reusable components is difficult, time-consuming and expensive, and repositories of components with good quality are rare. Nevertheless, stable progress has been made recently in several directions.
3.3.1 Domain Analysis and Product-Line Analysis
Domain analysis is the identification, analysis, and specification of common require-ments from a specific application domain for reuse on multiple projects within that applica-tion domain. Domain analysis produces a domain model, which is used as a starting point to construct specifications and designs for many different systems within the application do-main (Kang, 1998). The dodo-main analysis can be either synthetic or evidentiary (Fischer et al., 1995).
The synthetic domain analysis approach resembles the process of developing a single ap-plication, but it is more broadly conceived. It starts with an informal description of the
applica-1Components in the repository of the CodeBroker system come from existing libraries. For more details, see
30 tion domain, identifies the common features, and develops reusable components corresponding to each feature in the domain.
The evidentiary domain analysis approach starts with existing systems in an application domain, using reverse engineering or design recovery (Ye, 1996) to identify and repackage common components for later reuse.
Product-line analysis is a more comprehensive approach than domain analysis. A product-line is a set of products, already existing or planned to be developed, that share a common set of requirements but also exhibit significant variability in requirements (Griss, 2000). Product-line analysis differs from domain analysis in that it not only extracts the commonality of the family of systems but also provides a systematic way to treat their significant variability. In addition to common reusable components, product-line analysis often creates a product-line architecture for the family of related systems where reusable components can be plugged in (Batory et al., 2000).
3.3.2 Commercial Off-the-Shelf
Thousands of companies worldwide are developing their own information systems. There are three problems in this regard: (1) because most companies do not have enough expertise in software development, they cannot produce information systems with the highest quality; (2) because these systems are often developed internally and do not follow interoperation standards, it is very difficult to integrate them; and (3) similar functionality has been repeatedly developed. Although it may take decades for it to dominate software development, the market of COTS (Commercial Off-the-Shelf) is rapidly taking shape (Morisio et al., 2000). COTS comes in a variety of types and levels of software, e.g., components that provide specific functionality (such as subroutines, classes, frameworks, and even complete applications) or tools used to generate code (such as domain-oriented language processors and application generators). Many compa-nies are providing reusable off-the-shelf components for specific domains, and those compo-nents can be purchased by developers from the market. As this trend continues, programmers
may be able to create their own systems in the future by integrating components from different component vendors. For example, programmers or even end users may create their own word processing applications by integrating components of outline mode, spell-checking, grammar correction, and diagram drawing purchased at market in the same way as they purchase standard applications now.
3.3.3 Open-Source Components
With the advent of the Open Source movement (DiBona et al., 1999), many devel-oper communities, such as the Gamelan website2 and the Giant Java Tree,3 have formed, by which programmers can freely exchange their developed products. Moreover, some high-quality reusable component repositories, such as the Jun library (Aoki et al., 2001), have become open-source too. Traditionally, reusable components are created and maintained by creators who develop those components. Programmers who reuse those components are consumers, and do not directly contribute to the creation and evolution of components. By giving program-mers full access to the source code, Open Source breaks down the binary choice of creators and consumers (Fischer, 1998a) so that consumers can directly participate in the maintenance and improvement of reusable components or even derive new components from existing ones. This encourages the natural emergence of reusable components with good quality, following the seeding, evolutionary growth, and reseeding (SER) model (Fischer, 1998b). The initial creator develops the component (seeding), and the component experiences evolutionary growth when it is reused and modified by many other programmers (or consumers). As those modifications are incorporated back into the original component (reseeding), the quality and reusability of the component will improve. The Open Source development model is particularly promising in pushing reuse to a large scale because programmers working on complimentary projects can each leverage the results of the other freely.
2
http://www.gamelan.com
3
32 3.4 Understanding the Cognitive Difficulties of Component Reuse
The exciting advance in the creation of reusable components cannot lead to the material-ization of reuse if programmers are not able to reuse them during their own system development. To create appropriate tools to assist programmers in reusing, we need to identify the tasks faced by programmers when they try to reuse and the cognitive skills required to perform those tasks.
3.4.1 Cognitive Engineering
Component reuse is a cognitive activity, which is a goal-directed problem-solving effort. To understand the complexity of cognitive activities, Norman (Norman, 1986) has developed a method called cognitive engineering that applies what is known from cognitive science to the design and construction of tools that assists cognitive activities of human beings. Such cognitive tools, including reusable component repository systems, must address the discrepancy between a user’s goal, expressed in terms relevant to the user and his or her task, and the tool’s mechanism, expressed in terms relative to it. This discrepancy creates two gulfs: the gulf of execution and the gulf of evaluation. The gulf of execution is the gap from goals to tools, and it must be bridged by three consecutive efforts:
Intention Formation. Users decide to do something with an internal specification of the task created from their goal.
Action Specification. Users externalize the internal specification into a sequence of specified actions.
Action Execution. The actions are executed with the tool.
The gulf of evaluation is the gap from the tool output to the intended goal, and it must be bridged by another three consecutive efforts:
System Perception. Users perceive the output of the tool. Interpretation. Users interpret the perceived output.
Evaluation. Users compare the interpretation with the original goal.
3.4.2 A Cognitive Model of Component Reuse
Like other cognitive activities, component reuse starts with a goal in the mind of a pro-grammer. To achieve such goals, programmers have to translate their internal intentions into a series of physical actions constrained by a component repository system. Figure 3.1 illus-trates the actions needed for the reuse process to succeed based on the method of cognitive engineering (Ye et al., 2000):
Forming Reuse Intentions. As the first step to start a reuse process, programmers must consciously decide to use the repository with requirements for reusable compo-nents in mind. The requirements for reusable compocompo-nents arise from the goal of the programming tasks.
Formulating Reuse Queries. Programmers formulate their intentions as a reuse query in the terms provided by the component repository system. This is the step of action specification.
Retrieving Components. Reusable components matching the queries are retrieved from the repository system. This is the step of action execution.
Choosing Components. When the repository system returns matching components, programmers must choose the appropriate one by comparing them with the reuse in-tentions. This action of choosing corresponds to the gulf of evaluation for the reuse activity: Programmers read each retrieved component, interpret its meaning, and eval-uate whether the component can be reused in their tasks. The evaluation may result in the reformulation of the original query if no suitable components are fou