Reconstructing Ownership Architectures To Help Understand Software Systems

(1)

Reconstructing Ownership Architectures

To Help Understand Software Systems

Ivan T. Bowman and Richard C. Holt

University of Waterloo

Waterloo, Ontario, Canada

[email protected], [email protected]

Abstract

Recent research suggests that large software sys-tems should have a documented system architec-ture. One form of documentation that may help describe the structure of software systems is the or-ganization of the developers that designed and im-plemented the software system.

We suggest that an ownership architecture that documents the relationship between developers and source code is a valuable aid in understanding large software systems. If this document is not available, then we can reconstruct it based on the system implementation and other documentation. We examine Linux as a case study to demonstrate how to reconstruct and use this type of architecture. The reconstructed Linux ownership architecture provides information that complements other types of architectural documentation. It identifies experts for system components, shows non-functional de-pendencies, and provides estimates of the quality of components. Ownership architectures also al-low us to find problems such as under-staffed sub-systems and components that risk abandonment.

1 Introduction

Research [7, 10, 13, 14] suggests that large soft-ware systems should have a documented system architecture. An aid to program understanding,

this documentation is recognized as a mechanism for improving software quality and reducing de-velopment costs. Unfortunately, not all software systems have up-to-date and accurate architectural documentation. Several researchers have shown [7, 9, 11, 16, 17] that it is possible to reconstruct a software architecture based on the implementation of the system and the documentation that does ex-ist. This reconstructed architecture typically iden-tifies a decomposition of the software system into a hierarchy of subsystems, and shows interactions between these subsystems. There are several forms of interaction that have been used, including con-trol flow, data flow, and type dependencies. These reconstructed architectures have proven to be use-ful tools for understanding and evaluating large systems. However, these reconstructed architec-tures do not consider the organization of the devel-opers that created the system. This organization is an important aspect of large software systems, and this information is useful when attempting to un-derstand various aspects of software systems.

The relationship between the organization of a design team and the design product has long been recognized. Conway’s hypothesis [4], formulated by Brooks [3] as Conway’s law, states that the or-ganization of a software system will be congru-ent to the organization of the team that designed the system. This is perhaps an oversimplification, but the concept is recognized as a valid

(2)

forward-engineering concern. For example, Coplien [5] provides several patterns for organizing developers depending on the desired structure of a system. For systems developed by a large number of ers, it is reasonable to consider how the develop-ers of the system are organized into communicating teams.

The organization of system developers can be vi-sualized using an ownership architecture.1This ar-chitecture relates developers to the code that they develop.

In a previous paper [1] we showed that owner-ship architectures are useful when reconstructing the concrete (as-built) architecture of a software system because they can be used to predict the de-pendencies that are found between components. In the current paper, we show that ownership architec-tures are a useful formalism for describing impor-tant relations in large software systems.

It is interesting to note that Linux and Mozilla (two of the systems we studied [1]) included docu-mentation of the software developers and the code they worked on. Linux includes a Credits file, and Mozilla includes a module owners document. These documents list the developers that have con-tributed to the project and their areas of contribu-tion. There seems to be two reasons for the Linux and Mozilla groups to maintain this documentation. First, it provides a resum´e of a developer’s accom-plishments. Second, it identifies the developers that are responsible for components of the system. The Mozilla effort explicitly identifies module owners, and all changes to a module must be approved by the owner. The ownership concept is also well un-derstood by developers in other development cul-tures; Perry et al. [12] describe how developers at AT&T responded to requests for authorization to change code they were responsible for. The mean-ing and extent of code ownership seems to vary be-tween different development cultures, but it seems to be a concept that is generally understood and used by developers.

Although the Linux and Mozilla systems docu-ment the files that developers have worked on or are responsible for, the large number of developers and files makes it difficult to visualize the overall structure. If we cluster developers and files into

1_{We have coined the term ownership architecture because}

this structure is similar to a conceptual or concrete architecture. Other terms that could be used instead for this structure are own-ership diagram or ownown-ership view.

related groups, we can view higher level abstrac-tions and more easily understand the overall sys-tem structure. In addition to being very low-level information, the ownership documentation of the Mozilla and Linux systems may be out of date or inaccurate. Thus, it seems that the problems of recreating an ownership architecture for an exist-ing system are similar to the problems of recreatexist-ing a concrete architecture in that existing documenta-tion may be too low-level, outdated, or inaccurate [2].

We examined the Linux system to evaluate how a detailed ownership architecture can be recon-structed from a system implementation. This re-constructed architecture provides information that is not available in other forms of architectural doc-umentation. As we will discuss in Section 2.1, ownership architectures identify experts for sys-tem components, show non-functional dependen-cies, and provide quality estimates for components. This information is useful for understanding sys-tems. In addition, as described in Section 2.2, we can use ownership architectures to find problems such as under-staffed subsystems and components that are at risk of being abandoned.

1.1 Paper organization

The remainder of this document is organized as fol-lows. Section 2 describes ownership architectures. Section 3 describes the method we used to recon-struct the ownership architecture of Linux. Sec-tion 4 describes the ownership architecture we re-constructed from the Linux implementation. Fi-nally, Section 5 draws conclusions from this work.

2 Ownership architectures

An ownership architecture shows how developers are grouped into teams, and shows the code that these teams have worked on. Documentation of this organization is useful for all large systems (for ex-ample, those with more than ten developers). For systems where the concept of code ownership is used (which includes all of those in our experi-ence), the ownership architecture indicates which developer or team owns each component of the sys-tem.

We define an ownership architecture as follows. The elements of an ownership architecture are

(3)

source files, subsystems, developers, and teams. We use subsystems to group source files, and teams to group developers. Figure 1 shows an E/R di-agram of the elements of ownership architectures and how they are related. Subsystems can contain subsystems and source files, and similarly teams can contain (sub)teams and developers. We record a hacked2 relation between a developers and each source file that he or she has modified.

Subsytem Team

contain

Source File Developer

contain

hacked contain contain

Figure 1: Ownership Architecture Elements Figure 2 shows a simple example of an owner-ship architecture. This sample architecture con-tains four source files (F1, F2, F3, F4), four devel-opers (D1, D2, D3, D4), two subsystems (S1 and

S2), and two developer teams (T1 and T2).

S1 F1 F2 S2 F3 F4 Legend: Subsystem Source File Developer Team Developer hacked all T1 D1 D2 D4 T2 D3 hacked

Figure 2: Sample Ownership Architecture Figure 2 shows that the files in subsystem S1 were hacked by the developers in both team T1 and

2_{We use the term hacked since that is the term predominantly}

used by Linux developers.

team T2, and the files in S2 were hacked only by the developers in team T2. We use edges with a dou-ble arrow head to indicate that the hacked relation applies to all contained files or developers. For ex-ample, both D3 and D4 modified both F3 and F4. By replacing four edges with one, this convention reduces clutter in the diagram.

File F3 was hacked not only by T2 but also by

D2. This shared developer might lead us to suspect

that F3 is somehow related to S1 (perhaps a feature was implemented with code in both subsystems). This is an example of how ownership architectures can help us to make hypotheses about program de-pendencies.

2.1 Ownership architectures for

soft-ware understanding

Ownership architectures provide several benefits to developers who are attempting to understand a soft-ware system.

Identification of Experts. If a developer is

hav-ing difficulty understandhav-ing a subsystem, the own-ership architecture can suggest a set of develop-ers who are likely to be experts on the subsys-tem. This use of subsystem experts was noted by Perry et al. [12]; they found that developers re-ceived phone calls for advice about systems they had previously worked on. Sim and Holt [15] also found that newcomers to a development team relied on expert mentoring because there was insufficient accurate documentation available; most of the sys-tem knowledge resided primarily in the heads of the system developers and maintainers.

Non-Functional Dependencies. When

recon-structing a software architecture, it is natural to ex-amine dependencies that occur in the source code. Relations such as function calls, data accesses, and data type dependencies have been used to help un-derstand large systems. However, this approach cannot easily detect relationships that are not man-ifested in the system’s source code. For example, in Linux we found that one team was responsible for porting Linux to the Amiga hardware platform. This team implemented a number of device drivers, and all of these drivers implemented functionality for Amiga hardware devices. We were not able to classify these device drivers as Amiga drivers using only the source code since they share no common routines, data, nor data types. However, this clas-sification is strongly suggested by the ownership

(4)

architecture. With some notable exceptions, most Linux developers worked on fairly specialized ar-eas of the operating system. This means that if a developer worked on portions of multiple subsys-tems, we may reasonably hypothesize some rela-tionship between these portions.

Quality Estimates. The experience level of

de-velopers varies widely between individuals. This variance can affect how much confidence develop-ers have in the correctness of system components. For example, if a developer is looking for an ex-ample of how to code a particular operation, he or she would prefer to use an example developed by an expert as opposed to a newcomer to the sys-tem. By showing which developers worked on a given source file, ownership architectures help de-velopers estimate the the quality of code based on their confidence in the experience of the develop-ers that wrote the code. However, this use of per-sonal profile information to estimate quality raises ethical concerns that should be considered; for ex-ample, developers should be informed if ownership architectures are used in this way.

2.2 Evaluation using ownership

archi-tectures

Ownership architectures provide information that helps developers understand large software sys-tems. They can also be used to evaluate a system for several qualities. They can detect code that risks being abandoned, they can predict subsystems that have local understaffing or overstaffing, and they can predict whether the current organization pro-vides sufficient coverage of developers for ongoing maintenance of each subsystem.

Abandonment. If a system is actively

main-tained, it is likely that there is some developer that has worked on each source file in the system, and some set of these people are likely to be owners or potential owners of the file. However, there may be some source files in a system that have not been modified by any of the current system developers. All authors of the file may have left the organiza-tion. If no developer owns a source file or subsys-tem, it is referred to as abandoned. Abandoned source files are a source of risk in software sys-tems since the ‘live’ knowledge in the form of ex-pert developers is no longer available. Modifica-tions take longer and are risky if they are performed

by developers who are not familiar with the code3_.

An ownership architecture can be used to identify source files and subsystems that are at risk of be-ing abandoned. If a critical subsystem has only been hacked by one developer, then the system’s organizers might require that an extra developer be-come familiar with the system to avoid the risk of abandoning the subsystem. Similarly, if a devel-oper plans to leave an organization, the ownership architecture can be used to find the subsystems that would subsequently be abandoned; this can be used to assign replacement developers.

Understaffing / Overstaffing. In addition to the

risks of subsystems becoming abandoned, owner-ship architectures can be used to discover areas of the system that have the wrong number of develop-ers: either too many or too few developers can re-duce development efficiency. If many subsystems depend on a single subsystem with insufficient de-velopers, then the dependent developers may wait too long for modifications by the subsystem’s own-ers. Additional developers can be assigned to alle-viate this situation, although as Brooks warns [3], this is not a simple solution. A converse problem is that of too many developers. If too many develop-ers are assigned to work on a small subsystem, they may not be able to adequately partition the work.

Ownership Coverage. A related issue to

devel-opment bottlenecks is the idea of ownership cov-erage. When a large system is planned, it is com-monly designed as a hierarchy of subsystems. The ownership architecture shows what developers are expected to work on which subsystems. The ar-chitecture can be used to show that all subsystems have sufficient development resources; that is, there is sufficient ownership coverage.

3 Reconstruction method

Since many systems do not have a documented ownership architecture, we would like to be able to reconstruct the architecture from the implemen-tation and documenimplemen-tation. To reconstruct the ar-chitecture, we need to determine what developers have worked on the system, and which source files

3_{In one development organization known to the authors,}

de-velopers are anxious to avoid modifying abandoned subsystems lest they involuntarily adopt them and gain additional respon-sibilities. These abandoned subsystems are referred to as tar babies.

(5)

they have worked on. This section describes the ap-proach we used to extract this information from the Linux system. We considered three sources of this information: source control logs, documentation, and copyright notices.

Source Control Logs. If a system’s source code

is maintained in a source control system, then we can determine all of the modifications made by each developer. The Linux system uses CVS to store source code. We found that source control logs did not report all of the authors that had con-tributed to source files. In some cases, code was brought into Linux from other development efforts; the original authors of a file may not even have worked on the Linux kernel. In these cases and others, it appears that files were checked in to the source control system by someone other than the author.

Credits File. The Linux documentation includes

a Credits file that lists developers that have con-tributed to the Linux system, along with the general areas of the system that they have worked on. This documentation makes it easy to determine devel-opers that have worked on the Linux system, but it does not specify which source files were modified by each developer. In addition, there is no guaran-tee that this documentation is up-to-date and accu-rate.

Copyright and Change Notices. Most of the

source files in the Linux system have a copyright notice in a comment at the top of the file. This no-tice identifies the original author of the file. Most of the source files also contain comments identify-ing changes made by various developers through-out the file’s life time. These comments are kept in a file even if it is imported from another system or if it is moved within the Linux system. It ap-pears that the developers in the Linux system were generally careful to maintain copyright and change notices within the source files.

We decided to use copyright and change notices from the source files to determine which develop-ers had worked on each source file. The Credits file was not sufficiently precise, and the source control logs did not identify all of the developers that had worked on source files. We found that the copyright and change notices appear to provide sufficient in-formation to determine the developers that had ac-tually worked on each source file.

We manually examined copyright and change notices within comments of the system source files

to associate developer names with source files. Originally, we attempted to use automated source processing based on regular expressions. How-ever, we found that there were many different com-menting styles used within different subsystems of Linux. We also found that some developers used several variations of their name in different source files. For example, some developers used only ini-tials in some files but their full names in others. We manually examined all source files to extract devel-oper names and resolve all spellings of develdevel-oper names into a single identifier.

Although our extraction approach was specific to Linux, we believe that a similar approach could be used to extract the hacked relation for other soft-ware systems. We conjecture that, for many sys-tems, automated approaches could be used (either because a configuration management system has been used throughout the system’s lifetime, or be-cause a single commenting style is employed).

For commercial systems, management records could be used as an alternative source of informa-tion for reconstructing the ownership architecture. However, these records are likely to be less detailed and less accurate than the results extracted from the system implementation. Comparing the man-agement records to the extracted architecture may provide insight into problem areas in the system.

4 The Linux ownership

archi-tecture

We examined the Linux kernel, version 2.0.27. The kernel is an interesting case study because it is a widely used large system (comprising 800 KLOC). Further, the Open Source licensing of the system means there are no restrictions on discussing im-plementation details.

Using the method described in Section 3, we extracted the hacked relation. A developer was recorded as having hacked a file if his or her name appears in a copyright notice or change log item in the source file. This can be interpreted as a list of developers that have contributed to a source file, ei-ther directly or through migration of code. Table 1 shows the number of files, developers, and relations that we found. We found that some of the source files did not have any copyright notices or change logs within comments. We omitted these files from our reconstructed architecture due to the lack of

(6)

Legend: Subsystem hacked Developer Team (# people) fs mm sched net ipc scripts init lib drivers (154) fs-log (42) net (172) modules (2) ipc (1) net-fs (3) sched (5) mm (2) global (4) porting (63) scripts (3) swap (2) Omitted Edges net

Figure 3: Linux Ownership Architecture

data about the developers that worked on them. To complete the reconstruction, we would need to in-terview developers that know who authored these files.

#.cFiles 781

# Files Without Copyright Notice 82

# Developers 453

# Hacked Relations 1676

Table 1: Summary of Extracted hacked Relation With 1676 relations between developers and files, it is necessary to group files into subsystems and developers into teams in order to understand the ownership structure. We will first discuss how we grouped files into subsystems, then how we grouped developers into teams.

We used a clustering of source files that we had previously created in a case study [2] to reconstruct the Linux concrete architecture. During this pro-cess, we used the Portable Bookshelf [8] to help us by diagramming the ownership architecture. Fig-ure 3 shows our reconstruction of the Linux owner-ship architecture at the highest level of abstraction. This figure shows the top-level subsystems and the top-level teams. Edges from a team indicate the subsystems that were hacked by members of the team, and parentheses indicate how many develop-ers are in each team. Three of the teams (modules,

global, and porting) modified all the subsystems in

the Linux kernel. We omitted the edges from these teams to improve the readability of the diagram.

We reconstructed eight top-level subsystems in the Linux kernel.

1. The Process Scheduler (sched) is responsible for allocating CPU resources to processes. 2. The Memory Manager (mm) provides each

process with a local view of system memory, and implements swapping.

3. The Inter-Process Communication (ipc) subsystem provides support for communica-tion primitives such as shared memory and semaphores.

4. The File System (fs) allows user processes to access files on hardware devices. It provides abstractions for hardware devices and file sys-tems that are stored on the devices.

5. The Network Interface (net) supports com-munication with other computers using a vari-ety of hardware interfaces and protocols. 6. The Initialization (init) system is used when

the Linux kernel starts up. It initializes the computer hardware, and initiates other sub-systems.

7. The Library (lib) subsystem contains routines that are useful throughout the Linux kernel. 8. The Install Scripts (scripts) subsystem

con-tains scripts that are used to configure and in-stall a version of the Linux kernel.

We used system documentation to group devel-opers that had common roles. For example we found that a number of developers identified them-selves as having worked on device drivers. We grouped these developers together into the drivers

(7)

team. We grouped developers based on docu-mentation that stated what systems each developer had worked on. We reconstructed twelve top-level teams of developers. The teams that we created are not necessarily groups of interacting develop-ers, since we did not have access to records of their communication (Dutoit and Bruegge [6] use such records in their analysis).

The ipc, mm, net, scripts, and sched teams worked primarily on their associated subsystems. We found four teams that worked on the File Sys-tem: the swap team implemented swapping in the Memory Manager, the fs-log team implemented logical file systems, the drivers team implemented device drivers for various hardware devices used in the file system, and the net-fs team implemented logical file systems that use the networking capa-bilities (such as Sun NFS).

In addition to these teams that worked primar-ily on one subsystem, we found found three teams of developers (modules, porting, and global) that worked on almost all of the subsystems within the Linux kernel. First, we found that two developers in the modules team made pervasive changes to im-plement loadable modules. These changes affected many of the kernel subsystems. Next, the

port-ing team is responsible for implementport-ing the Linux

kernel on a variety of different hardware platforms, such as Sun Sparc, MIPS, PPC, and so on. These porting developers implemented platform-specific code in each of the top-level subsystems of the Linux kernel. Finally, we found one team (the

global team) of four developers that appear to have

worked on a majority of the Linux kernel. This team includes Linus Torvalds, the original author of Linux. These developers seem to have an exten-sive understanding of the entire Linux system.

Figure 3 shows the number of developers that have worked on each subsystem. These numbers suggest that the majority of the developers concen-trated their effort on the file system and the net-work interface. This is due to the wide variety of hardware devices and protocols that are imple-mented by Linux. The top-level ownership archi-tecture of the Linux kernel shows major subsys-tems and teams of developers, but does not pro-vide detailed information within theses subsystems and teams. For that, we need to look at lower lev-els of abstraction by examining the ownership ar-chitecture within subsystems of the Linux kernel. Due to size limitations, we will focus on only two

of these subsystems in this paper. Section 4.1 de-scribes the ownership architecture of the Memory Manager subsystem, and Section 4.2 describes the ownership architecture of the Logical File System.

4.1 Memory Manager ownership

ar-chitecture

The Memory Manager subsystem within Linux manages access to the physical memory of the com-puter. The Memory Manager maps virtual ad-dresses used by user processes into physical mem-ory addresses, and supports swapping to disk to al-low the system to run more processes than fit in physical memory.

Figure 4 shows the ownership architecture of the Linux Memory Manager subsystem. There are two major subsystems within the Memory Man-ager: hardware-dependent code, which is specific to particular hardware platforms, and hardware-independent code which can be shared across hard-ware platforms. Hardware Independent MM mmap core syscall util swap porting D.Miller O.Zborowski G.Thomas H.Macdonald R.Baechle swap B.Haible S.Tweedie mm A.Bligh R.Wolff global L.Torvalds sparc m68k mips ppc 386 alpha Hardware Dependent MM

Figure 4: Ownership Architecture of Memory Manager

In the teams shown in Figure 4, we have only shown developers who hacked code in the Mem-ory Manager. The porting developer team devel-oped the hardware-dependent code for each of the supported memory platforms. The rest of the de-velopment in the Memory Manager was accom-plished by two teams: the mm team contributed to the util subsystem. The util subsystem of the Memory Manager provides support for allocating

(8)

net-fs Net FS Unix FS porting vfat smbfs ncpfs

msdos fat _umsdos

nfs hpfs minix xiafs ext ext2 sysv Remy Card ufs isofs procfs affs A.Cahalan 386 FS

G.David, J.Gelinas, F.Gockel, M.Nalis, H. Storner, J.Tombs,

P. Waltenberg G.Chaffee J.Gelinas W.Almesberger 386 FS E.Youngdale A.Rodriguez H.J.Widmaier, R.Burr M.Dobie, P.Willisson cd-rom E.Moenkberg P.Engstad V.Lendecke Rick Sladkey

A.Walker, F.La Roche, G.Knorr, G.Kuhlmann, J.Zapala, J. Bottomley, J.Peatfield, O.Kirch,

P.Eriksson, S.Thummler, W.S.Kok M.Mares,

M.Rausch

global Linux Torvalds A.Cox

G.v.Wingerde

fs-swap _{S. Tweedie} _B.Haible

net E.Schoenfelder, G.J.Heim drivers F.N.van Kempen, M.Neuffer, D.ter Haar, J.Tranter, M.K.Johnson, P.Middelink Y.Arrouye FS D.Evans, P.Haible, P.Monday Q.F.Xia

Figure 5: Ownership Architecture of Linux Logical File Systems

memory for use by other subsystems within the ker-nel. The swap team contributed to the development of the memory swapping support within the Mem-ory Manager. In addition to the work by the

port-ing, mm and swap teams, Linus Torvalds has

con-tributed to all of the subsystems within the Memory Manager.

The Memory Manager does not appear to have serious risks of subsystems becoming abandoned. Each of the subsystems has been modified by more than one developer (with the exception of the

syscall subsystem that provides a user-process

in-terface to the Memory Manager).

4.2 Logical File Systems ownership

ar-chitecture

The fs subsystem (see Figure 3) is responsible for abstracting both hardware devices and different logical file systems. The support for different file systems is implemented in the Logical File System subsystem, which is contained within the fs subsys-tem.

Figure 5 shows the ownership architecture for the Logical File System. The variety of logical file systems allow Linux to use file systems created by operating systems such as Minix, Xenix, MS-DOS, and the Amiga. Each file system is implemented independently of the others with no shared code or

(9)

data. However, it is apparent from the organization of developers that there is similarity between some of these file systems. For example, the five file sys-tems that W. Almesberger hacked (msdos, fat, vfat,

umsdos, and hpfs within the 386 FS subsystem)

ap-pear to be relatively isolated from the rest of the file systems. These five subsystems implement file sys-tems that are compatible with other PC operating systems that are used on Intel 386 compatible com-puters, namely MS-DOS, Windows 95, and OS/2.

Similarly, the nfs, ncpfs, and smbfs subsystems within the Net FS subsystem seem related by the developers that have worked on them. These sub-systems were all hacked by the net-fs team, and this team also hacked the net subsystem, as seen in Figure 3. The sharing of developers between the Net FS and net subsystems suggests that these subsystems are related. In fact, these three subsys-tems (nfs, ncpfs, and smbfs) implement file syssubsys-tems that are distributed over a network. The nfs file system is compatible with Sun’s NFS implementa-tion, ncpfs is compatible with Novell Netware, and

smbfs is compatible with Microsoft Windows 95.

The six subsystems within the Unix FS subsys-tem at the bottom of Figure 5 also appear to be re-lated: they were all modified by S. Tweedie and R. Card, as well as a variety of other developers. We found that in fact these subsystems are related since they implement file systems that are similar to Unix file systems.

The procfs file system is used to report kernel statistics. Several teams have hacked the procfs subsystem, presumably to add statistics related to other subsystems in the kernel.

Another case where the ownership architecture predicts the function of a subsystem is the isofs subsystem near the top of Figure 5. A number of developers have hacked the isofs subsystem. One of these, E. Moenkberg, is a member of the cd-rom subteam of the drivers team. The cd-rom subteam developed device drivers for CD-ROM drives. This suggests that the isofs subsystem might somehow be related to CD-ROM drives. We found that in fact isofs implements the ISO 9660 file system that is commonly used on CD-ROMs.

Another interesting suggestion provided by the ownership architecture is the reason for the name of the xiafs subsystem. We note that developer Q.F. Xia hacked the xiafs subsystem; this indicates that the name xiafs is eponymous; this saves us from trying to interpret it as an acronym.

The predictive powers of ownership architec-tures are not perfect since developers have mul-tiple independent skills and interests. For exam-ple, E. Youngdale worked on affs, isofs, and hpfs although there is no obvious dependency between these subsystems. However, the ownership archi-tecture has been able to suggest interesting rela-tions that we were not able to find using only the source code.

5 Conclusions

This paper has described the idea of an ownership

architecture for a software system, and has shown

how such a structure is useful not only in forward engineering, but also as a document for system un-derstanding. We have demonstrated how an owner-ship architecture can be reconstructed from a sys-tem implementation and documentation, and we have shown an example ownership architecture for Linux.

Ownership architectures help us to identify sub-system experts so that we can use interviews to learn more about particular systems. They can sug-gest non-functional dependencies between subsys-tems that are related even when there is no source code dependency. Finally, ownership architectures can provide us with an estimate of the quality of a subsystem based on our confidence in the subsys-tem’s developers. Ownership architectures permit us to evaluate a software system for risks of code abandonment, local staffing problems, and overall developer coverage.

More generally, this paper suggests that owner-ship architectures help us understand important as-pects of large systems by providing a visualization of the organization of developers and the source files they have worked on.

Acknowledgments

We would like to thank Gary Farmaner for his sup-port with the Portable Bookshelf tools. The contri-bution of the Linux developer community is grate-fully acknowledged. Susan Sim provided helpful feedback on an earlier draft of this paper. This work was supported in part by the CSER (Consortium for Software Engineering) as well as by CITO (Centre for Information Technology, Ontario).

(10)

About the Authors

Ivan T. Bowman is an MMath student in the Department of Computer Science at the Univer-sity of Waterloo. Bowman received his BMath de-gree from the Computer Science Department of the University of Waterloo in 1995. His research in-terests include software architecture, reverse engi-neering, and program visualization. (See WWW http://plg.uwaterloo.ca/˜itbowman).

Richard C. Holt is a professor at the Univer-sity of Waterloo in the Department of Computer Science. Holt received his Ph.D. degree in Com-puter Science from Cornell University in 1971. His research has included work in operating systems, compiler development, and software construction methods. He is an author of the Turing program-ming language. His recent work has concentrated on Software Landscapes, which provide a visual formalism for software architectures. (See WWW http://plg.uwaterloo.ca/˜holt).

References

[1] I. T. Bowman and R. C. Holt. Software archi-tecture recovery using Conway’s law. In Proc.

of CASCON’98, pages 123–133, Toronto,

On-tario, Nov. 1998.

[2] I. T. Bowman and R. C. Holt. Linux as a case study: Its extracted software architecture. In

Proc. of ICSE’99, to appear, Los Angeles,

California, May 1999.

[3] F. P. Brooks, Jr. The Mythical Man-Month. Addison Wesley, Anniversary edition, 1995. [4] M. E. Conway. How do committees invent?

Datamation, 14(4):28–31, 1968.

[5] J. O. Coplien. A Development Process

Gen-erative Pattern Language, chapter 13, pages

183–237. Addison-Wesley, Reading, MA, 1995.

[6] A. H. Dutoit and B. Bruegge. Communica-tion metrics for software development com-puter society. IEEE Transactions on Software

Engineering, 24(8):615–628, Aug. 1998.

[7] P. J. Finnigan, R. C. Holt, I. Kalas, S. Kerr, K. Kontogiannis, H. A. M¨uller, J. Mylopou-los, S. G. Perelgut, M. Stanley, and K. Wong.

The software bookshelf. IBM Systems

Jour-nal, 36(4):564–593, Oct. 1997.

[8] R. Holt. Software bookshelf: Overview and construction. Available at http://www-turing. toronto.edu/pbs/papers/bsbuild.html, 1997. [9] R. Kazman and S. J. Carri`ere. View extraction

and view fusion in architectural understand-ing. In Proc. of ICSR5, Toronto, Canada, June 1998.

[10] P. Kruchten. The 4+1 view model of soft-ware architecture. IEEE Softsoft-ware, 12(6):42– 50, Nov. 1995.

[11] G. C. Murphy, D. Notkin, and K. Sullivan. Software reflexion models: Bridging the gap between source and high-level models. In

Proc. of the Third ACM SIGSOFT Symp. on the Foundations of Software Engineering,

pages 18–28, Washington, DC, Oct. 1995. IEEE Computer Society Press.

[12] D. E. Perry, N. Staudenmayer, and L. G. Votta. People, organizations, and process im-provement. IEEE Software, 11(4):36–45, July 1994.

[13] D. E. Perry and A. L. Wolf. Foundations for the study of software architecture. ACM

SIG-SOFT Software Engineering Notes, 17(4):40–

52, Oct. 1992.

[14] M. Shaw and D. Garlan. Software Archi-tecture: Perspectives on an Emerging Disci-pline. Prentice Hall Press, April 1996.

[15] S. Sim and R. Holt. The ramp-up problem in software projects: A case study of how software immigrants naturalize. In Proc. of

the 20th International Conference on Soft-ware Engineering, Kyoto, Japan, Apr. 1998.

[16] V. Tzerpos, R. Holt, and G. Farmaner. Web-based presentation of hierarchic software ar-chitecture. In Workshop on Software

Engi-neering (on) the World-Wide Web, Boston,

May 1997. International Conference on Soft-ware Engineering 1997.

[17] K. Wong, S. R. Tilley, H. A. M¨uller, and M.-A. D. Storey. Structural redocumentation: A case study. IEEE Software, 11(6):501–520, Jan. 1995.