A Study on the Effects of Exception Usage in Open-Source C++ Systems

(1)

A Study on the Effects of Exception

Usage in Open-Source C++ Systems

by

Kirsten Celeste Paquette Bradley

A thesis

presented to the University of Waterloo in fulfillment of the

thesis requirement for the degree of Master of Mathematics

in

Computer Science

Waterloo, Ontario, Canada, 2019

c

(2)

Author’s Declaration

I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners.

(3)

Abstract

Exception handling (EH) is a feature common to many modern programming languages, including C++, Java, and Python, that allows error handling in client code to be performed in a way that is both systematic and largely detached from the implementation of the main functionality. However, C++ developers sometimes choose not to use EH, as they feel that its use increases complexity of the resulting code: new control flow paths are added to the code, “stack unwinding” adds extra responsibilities for the developer to worry about, and EH arguably detracts from the modular design of the system. In this thesis, we perform an exploratory empirical study of the effects of exceptions usage in 2721 open source C++ systems taken from GitHub. We observed that the number of edges in an augmented call graph increases, on average, by 22% when edges for exception flow are added to a graph. Additionally, about 8 out of 9 functions that may propagate a throw from another function. These results suggest that, in practice, the use of C++ EH can add complexity to the design of the system that developers must strive to be aware of.

(4)

Acknowledgements

I’d there is one thing I have learned while completing this thesis, aside from what is presented in this document, it is that “it is dangerous to go alone”. I honestly would not have finished this thesis had it not been for the people in my life who support me. I will attempt to thank everyone here, but I feel I could double the page count if I tried to fit in everyone who deserves to be here.

Foremost, I would like to thank my supervisor, Mike Godfrey, for guiding me through this process while also giving me the space and time I needed to do things at my own pace. You’re a great supervisor, a master of dad jokes, and I am glad you gave me the chance to be your student.

To Mei Nagappan and Patrick Lam, thank you for being readers for my thesis and giving me valuable feedback to enhance this final versions.

Thank you to my parents, Sandi and Richard, for not getting too annoyed at the fact that I was so wrapped up in school at times that I would seem to go forever without communication. Thanks for all the cute pictures of cats and dogs. Thank you Kyle and Natasha for putting up with my nerdiness for as long as I can remember. Thank you Serenity and Billy for being thinking I’m a cool aunt.

I am eternally grateful for the existence coffee that is black as midnight one a moonless night. I couldn’t have done it without you.

There are many groups of people I have left to thank for the various roles you have played in my life.

• The Nerd Herd (Beth, Carolyn, Devin, Lindsay): you were the first group of peers who understood me and helped me learn about myself. We may not talk much these days due to life going separate ways, but we are there for the important things. • Grovites (Katie, Melissa, Robin, Sandra, Tannis): living at the Grove was one of the

first times I really felt like I belonged somewhere in university. We had many great adventures and I hope we have many more. May there always be a couple extra bananas and canoes in your lives.

• Teaching Option (Alex, Bre, Chelsea, Dylan, Grace, Joel, Katherine, Kendra, Niki, Sarah): you are a great group of people who made me part of their family. With such diverse backgrounds and interests, it’s amazing we never won at trivia, but we always got question 13.

(5)

• ISAs (Aabraham, Chantelle, Dhron, Ed, Kaleb, Kevin, Leonard, Max, Marie, Rob, Sana, Sean): it’s been Marvel-ous to work, watch movies, play games, and eat food with all of you. I promise I’m not a spy.

• ISCs: A group of people that saw so much potential in me and continued to hire me for just about every odd job they needed to get done. Without the support from you, I would not have been able to become a Master’s student. Thank you for giving me a chance.

• Lecturers (Jeff, Mark, Nomair): one of the most supportive group of people I’ve ever met. You all let me talk to you far more than you ought to have and always gave me confidence when I needed it most. You believe in me more than I ever have and I have truly enjoyed every coffee and adventure we’ve had together.

• Labmates (Aaron, Achyudh, Arman, Ashwin, Bushra, Cassiano, Cosmos, Davood, Gema, Jeremy, Magnus, Reza, Laksh, Sunjay): you have made my time in the lab fun. I am sorry that I was the catalyst for so many shenanigans and I’m sure now that I’m gone, there will be much more work that gets done.

• CS Department: I’ve meet many graduate students, lecturers, instructors, and pro-fessors over the several years I’ve been involved here. While I could almost never get to where I wanted to get to on time due to someone stopping me to ask how things are going (or the dreaded question of ”how’s the thesis?”), you’ve made me feel like I belong here.

• My Girl Guides (Amy, Ailbhe, Beatrix, Bella, Cassi, Charlotte Original, Charlotte 2.0, Eve, Fiona, Hanh, Hannah Bea, Gabbi, Isabel, Louise, Maya, Meredith, Mor-gana, Olivia B, Olivia Q, Parnia, Reta): while I’ve been your leader, you hooligans have taught me so much. It’s been amazing to watch you grow and get to know you all. I look forward to continuing to camp and be there over the next few years. You each have so much personality and will be great leaders one day. Thank you for putting up with how absent minded I’ve been while finishing my thesis and giving me a break from the rest of my hectic life. To your parents, thank you for sending your girls on adventures with us. Most importantly, thank you Heather and Glory for all the extra work you’ve put in this year while I’ve been finishing my thesis. You’ve been great to lead these girls with and are great leaders!

(6)

Rob, while I didn’t trust you in that first Resistance game, I (mostly) trust you now. While we’re tired and have so much work to do, I am glad we find time to chat, play games, and drink coffee.

Brad, since the first time I was an ISA, you’ve been a consistent influence in my life. Sometimes good and sometimes bad. While you don’t remember it, you taught me C++ which has been kind of crucial to my research. I think I can now safely say that I’m done at least 10% of my thesis.

Kristina, you’re a wonderful person. From baking cookies, to joyous Mario Kart vic-tories, to late nights talking about nothing in particular, all of our shenanigans have been awesome. I don’t think I’ve ever meet a person I can relate to as much as you. The influence you’ve had on my life isn’t expressible in words.

(7)

Dedication

(8)

4.2 Exception Localization . . . 27 4.3 Exception Flow . . . 28 4.3.1 Throwing Functions . . . 30 4.3.2 Exception Graphs . . . 31 4.4 Implementation . . . 32 4.4.1 Statement Usage . . . 32 4.5 Function Annotation . . . 35 4.6 Summary . . . 36 5 Conclusions 37 5.1 Limitations . . . 38 5.2 Future Work . . . 39 5.2.1 Improvements to Zelda . . . 39

5.2.2 Additional Programming Languages. . . 41

(10)

References 42

A Exception Code Example 46

A.1 Context Graph . . . 49

A.2 Exception Flow Algorithm . . . 50

A.3 Stack Unwinding . . . 51

A.4 Exception Graph . . . 54

(11)

List of Figures

2.1 Inheritance Hierarchy of Standard Exceptions . . . 11

2.2 Code example that shows aspects of exception handling written in C++. . 13

3.1 Data processing during analysis using Zelda. . . 18

3.2 Exception graph example . . . 22

4.1 Functions which throw exceptions that originate from another function. . . 29

A.1 Context graph for divisionByZeroCheck() . . . 49

A.2 Context graph for Rational(int, int). . . 49

A.3 Context graph for readRational() . . . 50

A.4 Stack unwinding example 1. . . 52

A.5 Stack unwinding example 2 . . . 53

A.6 Stack unwinding example 3 . . . 54

(12)

List of Tables

3.1 Projects for call graph comparison. . . 20

4.1 Metrics of projects with exceptions. . . 26

4.2 Prevalence of throwing functions. . . 30

4.3 Prevalence of calls to throwing functions . . . 31

4.4 Control statements usage in projects . . . 33

(13)

Chapter 1 Introduction

Robust software must be able to recover from a variety of unforeseen error conditions that may arise during run-time. Languages such as C++, Java, and Python provide a flexible language feature for this known as exception handling (EH), where functions can interrupt normal execution to allow special error handlers to run and, if possible, recover from the error condition. A key feature of EH is that errors are often handled in a part of the code that is removed from where the problem is detected; this allows developers to place error handling code in key spots of the design, rather than insisting that errors be handled when and where they are first detected.

However, using EH has both costs and risks. For example, EH may degrade performance significantly; developers sometimes believe EH makes code harder to understand [1][29] and debug [24]; “stack unwinding” adds extra responsibilities to developers to ensure resources are not lost if an exception if thrown; and if an exception is thrown but not caught, the whole program will abort. Google, for example, recommends in its C++ style guide that EH not be used at all, although this is largely because of the redesign costs that would be incurred if EH were to be added to their existing products [5].

Despite these beliefs, many modern programming languages have EH mechanisms and there are known benefits to exception use such as improving error handling across modules [1]. While there may be some inherent difficulty with exception code, this difficulty could come from the tasks for which exceptions are used [32].

Most previous exception studies have looked mainly at Java systems, so there is a dearth of empirical results about the use of EH in C++ code. While the basic EH approach

is similar in Java and C++, there are also important differences. For example, Java

(14)

throw exceptions, while C++ has deprecated the analogous feature. C++ requires careful consideration of how “stack unwinding” may affect resource management and memory leaks, while Java’s use of Garbage Collection largely obviates this need. Finally, C++ allows programmers to throw any kind of object, not just those that inherit from a master exception class.

Because of the extra concerns that C++ EH adds and given the relative absence of studies on EH in C++ systems, we decided to perform an empirical study of how EH affects C++ code. To facilitate this study, we implemented a tool that can extract expression flow through call graphs, based on LLVM tooling infrastructure; we could not find any existing tools that could perform this kind of analysis and produce textual output for further processing. Thus, a secondary contribution of our work is the development of this tool, called Zelda, which we will released as open source.

1.1 Thesis Contributions

This thesis has two main contributions:

• We present an exploratory empirical study on understanding EH usage in real-world C++ code taken from GitHub, and we evaluate the potential impact of EH usage that may not be immediately obvious from reading the code; this study helps to shed light on perceived versus observed issues in using EH in C++ systems.

• We present a static analysis tool — Zelda, which is based on LLVM tooling — that can be used to extract information about EH use in C++ code. Information about call graphs and exception flow are presented in a text format to allow analysis of projects without the need of a visual interface. This allows for many projects to be analyzed and studied, and also for further kinds of analysis to be performed using the output.

1.2 Thesis Organization

This thesis is organized into five parts. Chapter2is a summary of related work and C++ features related to exception. This summary starts with the proposal of EH mechanisms in programming languages leading to the first programming language to have an exception mechanism. This is followed by the studies of exceptions in C++ and other languages and a summary of analysis tools that influenced our tool.

(15)

Section 2.2 provides background information on the aspects of C++ that are related to EH. This includes a brief summary of C++ features that are not specific to exceptions, but are related to them, and C++ exception handling mechanisms (EHM).

Section2.3 discusses views and opinions on EH from previous research and a software company. This discussion highlights some of the perceived advantages and disadvantages to using exceptions in a project. Our research questions reflect the discussion and hope to provide meaningful results to the discussion about exception usage.

Chapter 3 explains the development of our static analysis tool and the tasks that it is currently capable of completing. Our final algorithm for exception propagation is given in detail. This chapter also explains exception graphs and throw length, both of which are concepts that we use in the analysis of our research questions.

Chapter 4contains the results of the exploratory empirical study. Each of the research questions is addressed with an empirical study on an aspect of EH code.

Chapter 5summarizes the contributions of this thesis, the limitations to this research, and describes future work that could follow.

(16)

Chapter 2 Background and related research

2.1 Related Research

2.1.1 History of Exception Research

Prior to the implementation of EHM in programming languages, there was theoretical research discussing what would be important for such features. This section focuses on research that introduced mechanisms that are similar to the approach that C++ uses.

Goodenough was one of the first researchers to discuss adding exception handling to a programming language and developed a formal notation to express exceptions and their semantics in a language [4]. He proposed that the caller of the function should be able to handle the error while the function itself should not be able to recover. The idea that exceptions should be interprocedural is still encouraged today [32]. He also discusses situations that exceptions should be used for, such as domain and range failure, and how exceptions should be implemented, such as error codes. Finally, he talks about the potential control flow of exceptions being either an escape from a procedure or to notify the user of an event occurring so the user can respond and the process can continue.

Levin proposed properties that make an EHM good, such as verifiablity and practicality, and proposed a mechanism with these properties [14]. His mechanism involved raising an exception from within a function to be handled in a calling function based on the type of the raised exception, similar to the mechanism that is used in modern C++. Different from C++, the proposed mechanism included resumption of the code where an exception was raised. Additionally, exception handling was to be performed by a handler procedure being called, as opposed to a local block of code.

(17)

PL/I and CLU were the first programming languages to have EHM implemented and was critiqued by researchers at the time. The existence of exceptions in a programming language were first researched by Maclaren by reviewing the EMH in PL/Il[16]. This work was a critique of the mechanisms which existed. The PL/I critique addressed the release of resources and changes in program state after an exception has occurred. Liskov et al. explained the EHM which they implemented in CLU and they discuss decisions which language designers should consider when developing an EHM [15].

2.1.2 Exception Usage

Previous research has addressed several topics related to the use of EH, particularly in Java and C++ systems. This includes empirical studies of how EH code is written, how EH code affects defect in systems, as well as studies of how developers perceive the benefits and drawbacks of using EH.

Shah et al. explored the question of why developers often choose not to use EH at all [30]. They interviewed Java and C++ developers about the topic, and also developed and validated a tool for visualizing EH usage. This work did not involve the analysis of existing code for EH usage.

Xie et al. studied how developers handle exceptions once they have occurred[33]. They observed that EH is often implemented to handle a conditional case, such as a file not existing, and to handle external cases, for example checking the results from a call to an API function. Their work categorizes these situations theoretically without examining real-world code or consulting developers.

Marinescu studied versions of Eclipse and showed that code that uses exceptions is more defect-prone than other code[17]. They used their tool iPlasma to inspect code at a class level to determine if there are functions in a class that throw and catch exceptions[18].

2.1.3 Exception handling in C++ systems

Bonifacio et al. studied C++ exception usage in detail [1]. They analyzed 65 C++ projects and surveyed C++ developers with a range of experience. They investigated the most common types of objects caught, what percentage of the code base is dedicated to EH, and determining what types of actions are performed in catch blocks. The survey included questions about whether developers believe C++ programmers avoid exception usage and why it may be avoided. Developers’ responses to their survey influenced our discussion on exceptions.

(18)

Schilling created a compiler that can optimise the usage of exceptions in C++ code[27]. This work is of interest because EH relies on the types of objects at run-time which is slow to determine in C++. While the goals of their work are different from ours, they present al-gorithms to determine whether functions can throw exceptions based on condition analysis. Their algorithms are more detailed than our analysis as they look at each expression and determine if it can throw an exception while we concentrate on analyzing throw statements and propagation.

Hasu argued that whether libraries use exceptions or another form of error handling in C++ (and C) should be decided by the user of the library [7]. For instance, the developer might want to have legacy code from C libraries use exceptions.

De Dinechin discussed the aspects of the assembly language for the IA-64 which must be altered to accommodate C++ EH [3]. The optimization of code cannot occur in some situations due to the possibility of a function throwing an exception and additional infor-mation which must be maintained to ensure destructors are called as the stack unwinds. Both these situations makes the final machine code more complex.

Prabhu et al. proposed an algorithm for interprocedural exception analysis for C++ [24]. This is the only other work we are aware of that explores call graphs combined with exceptions for C++ programs. Their goal was to translate C++ code with exceptions into an equivalent exception-free C++ program. The algorithm presented in [24] influenced the design of the static analysis tool that we have implemented.

2.1.4 Exception handling in other languages

Kery et al. examined EH in Java [12]. Their work used about 8,000,000 Java repositories on GitHub. They determined that superclasses, such as Exception and IOException, are the most common catch types in Java. Additionally, they categorized statements in catch blocks to determine what actions are taken to recover from exceptions, and found that rethrowing and terminating are the most common responses.

Nakshatri et al. analyzed patterns of EH in Java systems [22]. Using a corpus of 7.8 million Java projects from GitHub, they analyzed the types of exceptions and the bodies of catch statements for patterns for exception recovery. While they listed three ways to analyze the types of exceptions, they stated that “an exception thrown [...] will eventually be caught by a caller method using a try-catch block”. While this may be true in Java due to compilation failure if checked exceptions are not handled or specified for a function, there is no similar requirement in C++.

(19)

Other research has addressed concerns about the robustness of EH. For example, Kecha-gia et al. studied context-dependent exceptions in Java code [11], while Oliveira et al. ex-plored EH in Android and Java applications [23]. Robillard et al. performed static analysis of exceptions and their flow in Java [25][26]. They introduced the notions of breadth and depth of exceptions as meaningful ways to reason about global exception flow; they also included two case studies and discussed how the code would be improved by reducing both exception depth and breadth.

A concept used in the analysis in this thesis is augmenting a call graph to include exception flow. Previous work by Choi et al. and Sinha et al. suggested augmentation of Java flow graphs to represent exception usage [2][31]. Sinha proposed a control flow graph that condensed potential exception instructions to have less complicated graphs. Choi augmented the call graph by adding edges and nodes that represent exceptions at the function level that could be used to create a graph for interprocedural flow. Both performed an empirical study on their proposed graphs with Choi examining how a typical flow graph is affected by their augmentation and Sinha examining the change in graph size.

2.1.5 Static Analysis Tools

To ensure accuracy of the call graphs we produced, we compared our results to the com-mercial tool Understand by Scitools [28]. Understand is a static analysis tool that supports several tasks to understand code, such as metric calculation, dependency analysis, and control graphs, in languages including C++, Java, and Python. While the tool outputs detailed call graphs, the graphs do not consider exception flow. Additionally, call and flow graphs are produced as visuals with no easily interpreted textual option available.

LLVM is a compiler framework designed to support dynamic and static analysis de-signed by Lattner et al.[13]. LLVM consists of sub-projects, such as Clang which is a C and C++ compiler, that are available as open-source code. LLVM Libtooling API is a library designed for writing tools based on Clang. Some tools the library allow developers to specialize parsers, code generation, and Abstract Syntax Tree walkers.

The static analysis tool Rex — developed by Muscedere [20] — served as a foundation for using LLVM’s Libtooling API. Rex is a fact extraction tool for detecting hotspots for feature interaction in C++ code. This is accomplished by walking the Abstract Syntax Tree (AST) of a program and recording information about how variables are used. The appendices provide detailed instructions about what is needed to use Libtooling and a process for downloading the required systems. The source code for Rex is publicly available and it was used as a guide and foundation for the construction of our static analysis tool[21].

(20)

2.2 C++ Features

This section is a brief summary of features in C++ that are related to EH, followed by the specifics of EH mechanisms in C++. Appendix A contains a code example that contains the aspects of C++ discussed in this section with an explaination of the state of memory while the program executes.

2.2.1 Classes

Classes are a central feature in an object-oriented programming languages like C++. A class is defined as a collection of fields and methods that operate over the internal variables of those fields. An individual instance of a class is called an object. A field is a variable that is associated with each object created. A method is a function that is called on an object. The term member can be used to refer to either a field or a method. An object is created by calling a method called a constructor and it is deleted by a destructor.

Inheritance

Inheritance means that all of the members of one class are members for another class. The class that contains the members of another class is said to inherit from the other class. The class that is inherited from is a superclass and the inheriting class is a subclass. The subclass can also have additional members and override methods to perform differently on a subclass object. A subclass object can be used in place of a superclass object in any context.

A class can have multiple superclasses and subclasses at the same time. A hierarchy is a directed graph where each node represents a class, and edges indicate inheritance. A class A is a subclass of a superclass B if A inherits from B or A inherits from a class that is a subclass of B. Being a recursive definition, a subclass can have an arbitrary number of inheritances between itself and a superclass.

2.2.2 Resource Management

C++ does not have a garbage collector to reclaim heap allocated-memory or other resources that are no longer needed by the executing program. C++ uses two memory pools to store data during execution.

(21)

Stack Memory

The stack is the default location that a program stores data while it is executing. Stack memory is associated with a block of code, called a scope, such as the body of a function, if statement, and loop. When the code associated with the scope is complete, the associated block of memory goes out of scope. If an object is allocated the stack, the destructor will be called when the object goes out of scope.

Heap Memory

For data to be persistent outside of the scope it was allocated, the data must be stored on the heap. Heap-allocated memory will not be freed when it goes out of scope. Mem-ory is allocated from the heap using a new statement, which calls a constructor for the object. Heap-allocated memory should be freed using a delete statement, which calls the destructor for the object. While all classes have a trivial built-in destructor, a developer can implement their own destructor to release any resources that the object possesses.

Resource Acquisition Is Initialization

Resource Acquisition Is Initialization (RAII) is an idiom that is meant to ensure that all resources are returned when they are no longer needed. RAII promotes that all resources an object needs should be acquired when the object is created and returned when an object is destroyed. A simple way to ensure both happen is for all resources to be acquired within a constructor and to be freed in the destructor.

However, while a constructor is always executed to create an instance of an object, the destructor is not always implicitly executed. In particular, if an object is heap-allocated, the destructor of the object must be explicitly called to free the object and its resources. Fortunately, the destructor will always be called for stack-allocated objects when the block they exist in goes out of scope. RAII states that all heap-allocated data should be owned in an object on the stack, and the owner’s destructor will free the heap-allocated when the object goes out of scope. This ensures that all resources will implicitly be returned when they go out of scope.

2.2.3 Run-time Type Information

Run-time type information (RTTI) is a feature in C++ can determine that makes available the types objects at runtime. While C++ is a statically-typed language, the type of an

(22)

object is not immediately obvious in all situations. In particular, a superclass pointer or reference can refer to an instance of any of its subclasses. RTTI is used in exception handling to determine if exceptions match catch types. However, RTTI is slow as it involves traversing the entire class hierarchy to determine if an object is of a particular type.

2.2.4 Exceptions

Exception handling mechanisms vary between programming languages but typically are used as a way to interrupt control flow when an event — an exceptional event! — occurs that makes continued normal control flow impossible. For example, reading from a file that does not exist or accessing an array element that is out of bounds are actions that simply cannot be performed; EH allows the program designer to consider what to do in these situations.

The rules about which objects can be thrown and how thrown objects must be handled differ between programming languages. There are four components to exception handling in C++: throw-able data, code that throws and handles exceptions, exception specifications declared by functions, and stack unwinding.

Types of Exceptions

In C++, all objects and primitive instances can be thrown, while Java permits only classes that inherit from Throwable to be thrown. C++ provides several pre-defined exception classes, similar to Java’s Exception hierarchy, all of which descend from the minimal std::exception class, presented in Figure2.1. It is common for standard library utilities to throw these exceptions when invalid input is received.

Since any object can be thrown, developers can create their own exception classes. These classes can inherit from a standard exception, their own hierarchy or class, or can be an independent class. According to Stroustrup, the intermediate exception classes should be inherited from for different purposes [32]. For example, std::logic error is intended to be used when the error could be caught before executing the code, such as an invalid input or division by zero.

Exception Handling Code

There are three language features specific to exceptions: throw expressions, try statements, and catch statements. Figure 2.2 shows a code example that uses these features.

(23)

Figure 2.1: Inheritance Hierarchy of Standard Exceptions. Classes marked with an aster-isk(*) will be introduced in C++20.

(24)

A throw expression is used to activate exception handling. The exception causes the normal execution to stop. The runtime system travels through the call stack, popping stack frames for pending function calls, until it finds a catch statement in a calling function that is willing to catch the exception. If the main function throws an exception from its function body, or two exceptions are active at the same time, the program terminates.

A try statement surrounds a block of code that typically could throw an exception directly, or containing a function calls that throws. A catch statement follows a try statement and surrounds a block of code that is to be run if an exception occurs. If an exception from the try block matches the type of the catch statement, the associated code runs and the program continues running after the last catch statement associated with the try block of the used handler. A catch statement can catch a type if it is of the type specified or if RTTI can identify the thrown object as a subclass of the specified type. A catch statement can indicate that it is willing to catch all possible exceptions by using an ellipsis instead of a type. However, if a catch statement catches with an ellipsis, the exception and any information it might hold, such as an error message, cannot be accessed. Within any catch statement, the caught exception can be rethrown by using a throw statement without specifying the exception.

Function Exception Declaration

In C++ there are two language features that can be used to indicate that a function may throw an exception. A throw clause lists all the types of exceptions that might be thrown from a function. However, throw clauses are deprecated as of C++11 and removed as of C++17, partially due to relying on the C++ RTTI mechanism which is slow. Declaring whether functions can throw exceptions is preferred to be done with noexcept(expr) where expr evaluates to a boolean at compile time. If expr evaluates to true, then the function will not throw exceptions. Otherwise, the function can throw exceptions. If a function violates an exception specification, the program will abort.

Stack Unwinding

An exception is thrown from the context of some scope, which could be the body of a conditional statement, function, try, catch, etc. As the exception propagates, it may travel through multiple scopes. When an exception leaves a scope, the variables contained in the scope goes out of scope. The process of blocks on the stack going out of scope while an exception is propagating is called stack unwinding. When a block goes out of scope, whether due to stack unwinding or the scope being finished, the destructor will be called for

(25)

int** matrix(int size) noexcept(false) { int** n = nullptr;

try {

if ( size < 1 ) throw std::out_of_range; // calls to new may throw std::bad_alloc n = new int*[size];

for( int i = 0; i < size; ++i ){ n[i] = nullptr;

}

for( int i = 0; i < size; ++i ){ n[i] = new int[size];

}

} catch (std::bad_alloc& ba) { if ( n != nullptr ){

for (int i = 0; i < size; ++i){ delete[] n[i]; } delete[] n; } throw; } return n; }

(26)

each stack-based object and will not be called for heap-based objects. The difference with stack unwinding and a block of code going out of scope is that the remainder of the code in the associated scope will not be executed. A possible implication of not executing code is that manual resource management might not occur. Furthermore, due to destructors being called during stack unwinding and the inability for multiple exceptions to occur at once, destructors should not throw exceptions. If adhering to RAII, all resources will be released while the stack is unwinding because nothing is manually managed.

2.3 Exception Discussion

A widely-held belief that has been discussed by previous researchers — but notably without empirical evidence — is that the use of exceptions makes code more complicated. This section summarizes what various researchers, individuals, and companies have stated about how exceptions might affect design clarity.

As stated by Robillard, EH can improve error recovery across modules [1]. Its use allows the developer to be notified of incorrect usage of a module and to specify recovery actions at a location in the system’s design that the developer feels is most appropriate (i.e., not necessarily where the error is detected). The use of EH can also decrease the amount of error checking required when returning from a function, as the error can result in an exception being thrown and the appropriate handler invoked [5]. However, developers from another survey expressed that the flow of exceptions had a negative impact on modularity [1].

An important task while programming is knowing the state of the program during a function call. This requires a developer to be aware of the control flow of the code, including the expectations from any called functions. In a survey of developers, respondents stated that exception flow and handlers add new control flow paths to the code, and may run counter to their natural intuition [1]. This suggests that exceptions may have an impact on the flow of a program that makes it more complicated to understand.

The use of exceptions may also impact the design style of the code [5]. For instance, if a programmer knows that exceptions will be thrown when code is misused, they may decide to forgo checking the validity of their input and to place their error handling code in catch statements. Furthermore, to account for stack unwinding when an exception occurs, either intermediate functions have to be prepared to catch all exceptions and free their resources before rethrowing, or code has to be written that strictly adheres to Resource Acquisition Is Initialization (RAII). Apart from RAII, there are other important best practices concerning EH that developers must be aware of, such as “destructors must never throw”[32].

(27)

How exceptions are used may also influence how they are perceived by developers. Exceptions are frequently used for error handling due to their ability to travel through the call stack to a location that can recover from the error. As reported by a survey participant, “error and EH (code) is hard”[1]. While it is possible that both are difficult to write and understand, Stroustrup has stated that “exceptions make the complexity of the error handling visible. However, exceptions are not the cause of complexity” [32]. It is possible that the perceived difficulty of exceptions comes from how they are used and not that they are used.

Some programming languages allow the developer to indicate the types of exceptions which a function can throw. For example, Java has checked exceptions that must caught or declared to be thrown from a function, or the program fails to compile. Java also has unchecked exceptions, which a function does not have to catch or declare it throws for the program to compile. This leads to exceptions that can travel through functions without handlers and acknowledgement. Research has shown that unchecked exceptions result in many program crashes in Android applications [11]. C++’s exceptions are equivalent to Java’s unchecked exceptions. While a function can be specified as to whether it throws, it is up to the developer to label functions in this manner. Doing so does not force the caller to handle the exceptions at compile time like Java, but forces the program to terminate if an unspecified exception is thrown.

This discussion serves to motivate our exploration of EH practice in C++ systems. From previous research, we observed researchers and developers expressing concerns about exceptions and they stated how they believed exceptions may affect a system. However, there was no empirical evidence to support effects. Our goal was to study existing systems that use EHM and determine how exceptions affect various aspects of a system.

Our research questions were chosen based on concerns mentioned throughout previous research. We studied the localization of exceptions, the effects of exceptions on the control flow and implementation of a system, and the effects of exception specification on a system. These areas reflect previous concerns and distinct ways that the presence of exceptions may impact a system.

2.3.1 Research Questions

We have investigated four research questions:

RQ1 How localized is exception throwing and catching?

(28)

RQ3 How does the use of exceptions impact the implementation of a program?

RQ4 Do C++ exception specifications affect the outcome of exception handling efforts? We next describe our approach to answering these question.

(29)

Chapter 3 Methodology

3.1 Static Analysis Tools

To analyze the presence and usage of exceptions in a code base, a static analysis tool is needed. We could not find an existing tool that met our technical needs of performing C++ exception flow analysis and giving textual output. We thus created two static analysis tools, the second of which is used for our research.

3.2 Annie

The first tool we created was named Annie, short for Analyser. Annie read the AST that is produced from clang++ to rebuild a simplified AST containing the nodes necessary for exception analysis. Annie had two goals: determine whether exceptions are present in a code base, and track the flow of exceptions through the call graph. To determine whether exceptions are present, Annie found all nodes throw, catch, or try nodes and recorded their type and presence. This step was performed to ensure that exceptions were used frequently enough in the corpus to provide meaningful results.

The exception flow tracking involved walking the call graph to determine when functions were called and how their exceptions would flow. In this tool, the analysis started at a main function to find all functions that it called. This process was repeated recursively to ensure all called functions were analyzed. This approach was thorough, reflected the entire call graph, and was slow. It involved searching the entire call graph for exceptions, which was tedious for large programs. For these programs, large sections did not need to

(30)

Figure 3.1: Data processing during analysis using Zelda.

be analysed as they had no exceptions. Additionally, the code for uncalled functions was never analysed which resulted in code without a main function, such as libraries, not being analyzed.

While this tool worked for basic analysis, we observed that larger programs were causing Annie to terminate while still processing data. This could have been due to the program using a lot of disk space to store AST files from clang, and tracking the many paths ex-ceptions could flow involving a lot of memory to be used while the program was running. Additionally, from analyzing code, more nodes of interest were identified and the complex-ity of C++ ASTs were becoming more apparent and, despite our best efforts, the code was becoming unmanagable. Thus, we built a new tool to address these concerns.

3.3 Zee Exception Length and Destination Analyzer

The second tool we created was named Zelda — Zee1 _{Exception Length and Destination}

Analyzer — based on the LLVM Libtooling infrastructure [13]. Libtooling has a tool for walking ASTs and stores detailed information about most statements and expressions for easy access. Zelda consists of a set of AST tree walkers that extract and output information about exception handling, including the flow of exceptions, and the basic classification of expressions. The capabilities and implementation details of Zelda are discussed further below.2

3.3.1 Data Extraction

The first task was to determine the presence of exception code. As discussed in Section

2.2.4, this involves the detection of try statements, catch statements, and throw expres-sions. Each of these has a specific type of AST node, making finding all instances of these

1_{Pronounced with an outrageous French accent [}₁₀_]. 2_{We will release Zelda as open source.}

(31)

nodes a simple task in a walk of the AST. Furthermore, the declaration of the catch statement and expression of the throw expression are stored in the node, which simplifies determining the types.

The second task was to collect data about other aspects of the code in the corpus, specifically line counts and statement classification. The line counts of functions and catch statements were calculated by counting the number of newline characters present in the pretty print of the body from LLVM. Statement classification involved recording

the types of statements present. AST nodes of interest were simple to detect due to

having unique nodes in the structure, including returns, breaks, continues, throws, and

deletes. Further classification involved looking at some function calls from the C++

standard library whose use is known. For instance, calls to operator<< and printf were classified as prints.

The final task for Zelda was to extract and map exception flow through a system. This involves knowing where exceptions occur, the types of thrown exceptions and catch statements, and call information. All of this data can be determined from walking the AST; however, more information is needed than for a typical call graph. Specifically, knowing about the presence of a throw statement or a function call in a function is not enough information to understand exception flow. For this purpose, an augmented call graph, which we call a context graph, is used in this analysis. A context graph associates a call with the calling function and the context within the calling function. For our purposes, the context can be either a try statement, catch statement, or function. An example of a portion of a context graph is presented in Appendix5.2.3.

While the first two tasks are relatively simple to implement, tracking exception flow through a program is more complicated. To ensure accuracy, the call graph has to be correct. Given the size of the corpus, it was infeasible to check the accuracy of the call graphs on a large sample of projects. The commercial tool Understand by SciTools[28] was used to compare the call graphs of nine projects that were randomly chosen from the corpus. Table3.1 presents the projects used for comparison. The majority of functions in the call graph were reported by both tools, although both reported functions that the other did not detect. Each of the programs reported calls to library functions that the other did not report. Since we are not interested in exceptions thrown from libraries, this is not a concern. Additionally, Understand reported calls to parent constructors, destructors, overridden methods, and templated functions that Zelda did not report. Zelda does not report destructor calls because developers are discouraged from throwing exceptions from destructors due to a combination of a program terminating if two exceptions are active at the same time and the fact that destructors are implicitly called while the stack unwinds. The remaining calls that Zelda does not report may be important to know about and could

(32)

User Project Called(Zelda) Matching dermesser libsocket 70 94.5% Fat-Zer tdeutils 43 97.7% greg-hellings sword 914 80.5% gulp21 QeoDart 0 NA licq-im licq 10 100.0% nireis pferd 68 93.2% Nocte- rhea 81 100.0% tfarina idep 208 97.2% thaddeusbort cantranslator 33 94.0%

Table 3.1: Projects used to compare call graphs between Understand and Zelda. Matching represents the number of functions found by Zelda divided by the number of fucntions found by Understand.

be found by improving the tool.

Aside from comparing the call graphs from Zelda to the call graphs of Understand, the data extracted about exceptions and their flow was checked for accuracy. The exception data was checked first by writing small programs which used features during development. Once initial development was complete, the results of a few randomly selected projects was analyzed versus a manual traversal of exceptions. Any problems were addressed by making smaller cases that replicated the observed problem and were fixed.

3.3.2 Exception Propagation Algorithm

Once the call graph is determined, Zelda combines the context and exception information to determine the flow of exceptions through a program. This is accomplished by looking at the context of each throw statement with the following algorithm:

Find all throw statements T that are not rethrows For each throw statement t in T :

Find all context edges C for t For each context c in C:

Remove c from C

If c is a catch and t is a throw in c Add edge(context(c),t) to C

(33)

Add edge(context(c),t) to C

If c is a catch and c does not contain rethrow Continue

If c is a try

Find first catch c1 of c of type t

If no such catch exists

Add edge(context(c),t) to C else

Add edge(c1,t) to C

If c is a function

Find all function calls to c For each function f that calls c:

add edge(callContext(c,f ),t) to C

Through these steps, all exceptions are traced to either the catch statements that catch them, or to the last function that throws them.

For our empirical study, the graphs and code information are output in the Tuple Attribute Language (TA) and queried using Grok [8][9]. Grok is a programming language designed at the University of Waterloo that performs relational algebra. Previous uses of Grok include analyzing factbases representing relationships from a parser, such as Rex [20]. Operations that are present in Grok, such as transitive closure, relational composition, and set operations, make it ideal for manipulating the information from Zelda.

3.4 Exception Graphs

Exceptions unwind the stack of an arbitrary number of function calls, which means the control flow through a program can be drastically changed based on an exception being thrown. However, in a typical call graph, the semantics of returning to the calling function is implied by the graph. Thus, call graphs do not account for exceptional behaviour and the flow is inaccurately expressed. However, due to exceptions travelling up the call stack, if the developer knows a function can throw an exception, it would not typically be difficult to trace the exception through the call graph. However, some additional information is needed in the call graph

Consider a flow graph that includes both calls between functions and returns from functions. Calling a function that does not throw an exception will result in the return to

(34)

A B C D E F l:10 l:20 l:42 l:75 l:5 l:8 int(l:45) int int(uc) int(uc:E) int(uc)int int(l:89) int(uc:E)

Figure 3.2: Exception graph example. Solid lines indicate a function call and the line the call is from is a label on the edge. Dashed lines indicate a thrown exception. Dashed lines that are not directed at a node indicates the function throws the exception.

the caller to be to the same location as the call to the function. Thus, return edges are not needed and are understood to implicitly be the reverse of call edges.

Consider an exception that is caught in the function that throws it. While this does activate stack unwinding, within a function unwinding the stack is effectively equivalent to breaking out of one or more nested conditionals. Additionally, since control flow remains in the function, the graph would not be altered outside of the throwing function.

Now consider an exception that is not caught in the throwing function. All calls to this function may result in an exception throw. To represent these potential throws an edge that does not have a destination is added from the function to indicate the function throws that type. If the throwing function is called, an edge annotated with the exception type and where it is caught are added from the throwing function to the calling function. The exception being caught will travel back to the call location and then unwind the stack to find a matching catch statement. While this is occurring, destructors will run for all stack allocated objects, which means that other function calls will be occurring that should be added to the graph. For simplicity, these function calls are not considered in our exception graphs. Eventually, the exception is caught and the control flow is at a different place in a calling function than where the function call occurred. Thus, this edge, while still returning to the calling function, would be separate from the call edge. Additionally, each exception could have a unique catch location that could be represented with a unique throw edge.

Any function that throws an exception results in an edge being added to the excep-tion graph. Furthermore, these funcexcep-tions need to implement an excepexcep-tion handler or the

(35)

exception will further propagate resulting in more edges being added to the graph.

Figure 3.2 depicts an exception graph for some arbitrary function. From A, there are calls to B, C, D, and F. There are calls to E from B and D. The functions E and F each throw an exception of type int. From E, the function B catches the exception on line 45, while D does not catch the exception and results in an exception edge being added to D, annotated with “uc” for uncaught followed by E to indicate where the exception originated. Finally, A catches the exception from F on line 89 and does not catch the exception from E and adds an additional exception edge from A.

Another exception graph is in Appendix5.2.3, representing a code example.

3.5 Corpus and Data

The projects used for analysis are C++ projects that are publicly posted on GitHub. They have a main language of C++, are at least a year old, have had more than 100 commits in their lifetime, and have been committed to within the previous year. The projects were selected using a mirror of GHTorrent [6] from February 2017 with the code being the current versions from July 2017. There were 3,686 projects that matched these criteria. Removing instances of repeated projects, there was a total of 2,721 projects. The projects of the corpus and metrics are presented in Appendix B.

Various subsets of the corpus are used to address our research questions. These subsets are determined based on what aspects of code are being addressed. For example, projects that use a specific feature may be compared to projects that lack that feature, or features are checked between different aspects of the same project. If a specific subset is not specified, then the entire corpus was studied.

3.6 Exception Metrics

A metric called depth was defined by Robillard et al. to model the number of paths an exception could travel in a system [26]. Their metric inspired our metric, the throw length of an exception, which represents the number of unique functions that the exception will travel through before it is caught. This includes all functions through any path the exception could flow through. If an exception is rethrown, it is considered to be the same exception and further propagation increases the distance it travels, while an exception thrown from a catch statement is a unique exception.

(36)

The metric is influenced by McCabe’s cyclomatic complexity [19]. The intuition behind cyclomatic complexity is that more potential paths through a program likely implies that the program is more complex. The existence of additional paths due to exceptions suggests that the program may be more complex which reflects the previously expressed concern from developers.

Similar to cyclomatic complexity, throw length is about the flow of the program. How-ever, exception flow works in the reverse order of function calls. Additionally, cyclomatic complexity is calculated as the sum of the number of flows through each function, which means each function is considered individually. Throw length is calculated by investigating all the paths exceptions could take through a program. The order of called functions is crucial to this measurement unlike in cyclomatic complexity. While the order of function calls is important to the tracking the flow of exceptions, the metric is concerned with the number of functions visited and not the order of visitation. Using this metric, we can determine how many functions are affected by the introduction of exceptions.

3.7 Analysis Approach

In this section, we describe the approach used to analyze each of our research questions. RQ1 How localized is exception throwing and catching?

To determine whether an exception is intermodule, the throw nodes in our context graph contains the file name of the throw statement and a boolean indicating the throw is not intermodule yet. As the exception propagates, each time the exception visits to a function, the file of the function and throw are compared. If these two files do not match, the throw is labeled as intermodule. Similarly, when an exception visits a catch node, the file of the catch and the throw are used to determine if there is an intermodule catch.

We analyzed these results to determine how many exceptions are intermodule and compare the catch rates between intermodule exceptions and non-intermodule exceptions.

RQ2 How does the use of exceptions impact the control flow of a program?

To evaluate the impact exception have on control flow, we consider the paths that an exception may travel during the execution of a program. We identified the number of functions that throw due to a throw statement contained within in it, as well as to having an uncaught exception from a function it called. We also consider the number of calls to each function that potentially throws an exception. These results are used to determine how many edges would be added to a call graph to make it an exception graph to evaluate how many control flow paths are added due to exceptions.

(37)

During the analysis of programs, statements that dictate control flow and heap alloca-tion are counted to determine the number of occurrences in each project. Whether these nodes occur in exception code is also tracked. This data is used to determine if these state-ments are used differently in projects that contain exceptions and whether exception code uses these statements differently. Whether statements are used at different frequencies between different types of code may give insight into how exceptions are typically used.

RQ4 Do C++ exception specifications affect the outcome of exception handling efforts? We identified all functions that contain an exception specification as part of their signa-ture. The presence of a specification is used to group functions and projects. The overall throw length of exceptions is compared between these groups to determine whether there is a correlation between exception specification and throw length. Whether there is a differ-ence between throw length based on exception specification could reflect when exceptions specifications should be used.

(38)

Chapter 4 Results

In this chapter, we discuss the results of our study.

4.1 Data Set Curation

We initially considered all projects in Github that were written mainly in C++, were at least a year old, had had more than 100 commits in their lifetime, and had been committed to within the previous year; this resulted in a preliminary set of 2721 distinct projects containing over 99 MLOC. We then ran these projects through our extractor to filter out those that did not use any EH features; that is, we discarded projects that did not contain at least one catch, try, or throw node in their AST. This left us with a set of 1182 projects

comprising over 73 MLOC; Table 4.1 shows the total and median size of these systems,

and the total and median number of catch, try, and throw nodes. The 1,539 projects that did not use exceptions contained about 22 MLOC with a median of n lines of code. This suggests that larger projects tend to use exceptions more often.

Total Median Number of projects 1,182 -Lines of code 73,309,259 8,299 catch statements 90,947 13 try statements 67,686 12 throw expressions 63,269 10

(39)

While exception usage seems relatively common across the corpus, with an average of 53.5 throws per project with exceptions, the median number of throws is ten. This suggests that the majority of projects that use exceptions use them seldomly. In fact, there are n projects that have no throw expressions but have catch statements, and m projects that have no catch statements but have throw statements. A project that only catches exceptions implies that either the developer is attempting to ensure that no exceptions could be thrown without knowing if exceptions are possible, or the project uses third-party libraries that throw exceptions. Only having throw expressions suggests that either the program is meant to terminate if an exception is thrown, or the project could be a library.

4.2 Exception Localization

RQ1 How localized is exception throwing and catching?

Developers are encouraged to separate code into multiple files. A single file should contain only code which is related to a specific task. For brevity, we call all the code contained in a single file a module.

Exceptions are an encouraged way to express that an error has occurred between mod-ule. When used in this way, exceptions force the developer to be aware of and respond to improper use of code without having to check for return codes signifying an error after each call. An intermodule throw is a throw statement that unwinds the stack to or past a function that is part of a different module. If an exception is caught in at least one module it did not originate from, the catch is an intermodule catch. An exception is classified as being able to be caught if there is at least one catch statement that the exception could travel to that can catch the exception.

We found that intermodular exceptions occurred infrequently within the corpus. Of the 78,403 possible exceptions, only 9,241 (11.8%) are intermodular. This means that the vast majority of exceptions are thrown and caught within the same module. Furthermore, of the intermodule exceptions, 2,356 (25.5%) can potentially be an intermodule catch, while only 6,203 (9.9%) of exceptions that are not intermodular can be caught. Using a chi-squared test, the thrown exceptions were split by whether they were intermodule and whether they were caught. With a p-value of < 0.0001, we conclude that intermodularity and being caught are not independent variables with exceptions being caught more often if they are thrown intermodularly.

We conclude that intermodule exceptions are relatively uncommon, but still present. Considering intermodule error handling is seen as a benefit of exception handling, it is

(40)

strange that it is uncommon for exceptions to travel between modules. This may suggest that exceptions are a relatively common way to handle errors within a module while other means are used to express errors outside of a module. Developers writing a module would have the freedom to use exceptions within their module while not forcing a developer using the module to interact with exceptions.

While intermodule exceptions are uncommon, they are about 2.5 times as likely to be caught than exceptions within a module. This suggests that developers do respond to exceptions from other modules more commonly than within a module. However, the fact that exceptions are seldomly caught within a module is a contradiction to the idea that exceptions may be used within module for error handling.

The result that it is uncommon for exceptions to travel between modules and that exceptions that stay within are module are uncaught is interesting. If these exceptions are left uncaught, they must eventually propagate to the main function and cause the program to crash. This suggests that many throwing functions are in the same file as a main function, or the function is not called from any functions that can be executed. If most throws are in the same file as a main, the program could be likely be better modularized. If most throwing functions are not called, it’s possible that developers avoid these functions because they may throw exceptions.

This evidence suggests that most of the systems in our study do not handle exceptions well. The majority of exceptions remain uncaught regardless of if the exceptions travel intermodule. Further investigation could determine whether these exceptions could be executed or are dead code and give insight into whether developers fail to catch thrown exceptions, or avoid potentially throwing code. Considering thrown exceptions between modules, further analysis could be done to determine whether functions are called that perform the same task as throwing exceptions without the potential of a exception, such as vector’s methods at and operator[].

4.3 Exception Flow

RQ2 How does the use of exceptions impact the control flow of a program?

A major concern about using exceptions is the potential for increasing the complexity of control flow throughout a program. Every exception begins with a throw statement. However, the path that an uncaught exception travels depends on the state of execution when the exception is thrown. The exception could be caught by any function on the current call stack. If a function does not catch an exception thrown by a function that it

(41)

Figure 4.1: Functions which throw exceptions that originate from another function. calls, it implicitly throws the exception as well. This could lead to functions that throw exceptions that do not have throw statements and are not obvious candidates for exception analysis.

While we did not address if exceptions add complexity to a program, we studied how the precense of exceptions could affect control flow. We studied this in two ways: first by, looking at functions that throw exceptions that originated in another function; and second, by examining edges that would be added to a flow graph to express paths that are due to exceptions.

(42)

Direct Indirect Combined

Mean 22.9 237.9 260.8

Median 7.0 10.0 19.0

Standard Deviation 74.3 1491.4 1523.9

95th percentile 72.0 1100.2 1260.4

Table 4.2: Prevalence of functions that throw exceptions.

4.3.1 Throwing Functions

In general, we were curious about how many calls occur to throwing functions. We consider exceptions to be thrown directly if they originate from the function being considered. Functions throw indirectly if a function throws due to an exception travels to the function and is not handled. We addressed the following questions:

1. How many functions throw directly or indirectly? 2. How many throwing functions are called?

3. How many calls exist to throwing functions?

First, we inspected the number of throwing functions in the project. Table4.2 presents the results. While most of the results vary drastically between directly and indirectly throw-ing functions, they have similar medians. Considerthrow-ing the median is 19 when combined, more than half of the projects have fewer than 20 throwing functions present. Furthermore, we compare the count of each within a project in Figure4.1. From a linear regression with a p-value of < 0.0001, there are 8.39 functions that propagate an exception thrown from a function. This suggests that there are approximately 8 functions that indirectly throw an exception for every directly throwing function.

Knowing that there are many occurrences of throwing functions across the corpus, we wanted to know how many functions call throwing functions. Table4.3presents the results. When considering which functions call these functions, we are not concerned if the caller throws as well. Additionally, a function with multiple calls to throwing functions is counted as one function call. We observed that the median number of calls to directly throwing functions is higher than indirectly throwing calls, while the mean is significantly lower. This is interesting since there are generally more indirectly throwing functions as seen in Table 4.2.

Finally, we consider how many calls there are to each throwing function. Unlike the previous question, this count takes both caller and callee into account and represents the

(43)

Calls to Direct Calls to Indirect Throwing Calls All Calls

Mean 37.2 107.0 208.5 2417.0

Median 5.0 2.0 11.0 493.0

Standard Deviation 100.2 360.2 675.3 5370.3

95th _percentile _232.8 _562.4 _1270.6 _1996.0

Table 4.3: Prevalence of calls to throwing functions. The first two columns represent the number of unique functions that call exceptional functions. The last two columns represent the total number of calls.

total number of calls. This considers all directly and indirectly throwing functions. There is an order of magnitude more calls to all functions than to throwing functions. The impact of the throwing function calls is discussed with respect to exception graphs.

4.3.2 Exception Graphs

Given the data involved in determining the information of throwing functions, how a flow graph would be altered due to exceptions can be expressed numerically. In particular, the number of affected nodes and added edges can be determined.

The number of exception edges that do not lead to another node is the number of directly and indirectly throwing functions. This is also the number of nodes that are out-edges for exception out-edges. The calls to directly and indirectly throwing functions is the number of nodes that are in-edges for exception edges. Also, the number of added edges is the the number of calls to any throwing function plus the number of throwing functions. The number of edges in the graph prior to the augmentation is the number of function calls and is used to normalize the edges.

Finally, we investigated how the augmented call graph is changed due to exceptions. The mean growth of the call graph is 22.1% with a median of 5.1%. From a linear regression with a p-value of < 0.0001, there is a slight positive correlation between the number of functions present and the number of exception edges added, with a slope of 0.060. Comparing the number of edges added to the graph to the number of directly throwing functions, a linear regression with a p-value of < 0.0001 shows a positive correlation with a slope of 14.2. Thus, we conclude that the number of edges added to the exception graph is linearly correlated to both the number of functions present and the number of throwing functions present.

We can conclude that it is common for the number of edges in a call graphs to grow linearly when exception edges are included. While the rate at which the graph grows it

(44)

relatively small, this is concerning due to about 8 out of 9 of all throwing functions throw indirectly. This suggests that in the majority of instances, it is not immediately clear from the code that an exception could be thrown.

4.4 Implementation

There are several ways that using exceptions in a project could result in a global change to the structure of code. The most obvious change is the presence of EHM. Aside from these mechanisms, the may choose to implement other aspects of their program differently, such as using instead of checking for invalid input before a function call, they may choose write a exception handler. Also, if exceptions are used, developers are encouraged to follow RAII suggests releasing of resources to occur in destructors, and if not, deletes are likely to occur in catch statements. To answer this question, we studied how the use of statements changes depending on the presence of exceptions.

Before looking at whether exceptions being used affects the implementation of a project, we investigated the prevalence of EH code. This includes both the number of projects that use exceptions and how much of a project is dedicated to EH. EH code was defined to be the code within catch blocks. The median percent of code in a project with exceptions that is dedicated to EH is 0.64% and a median across all projects of 0.02%. We compared this to the previous result from Bonifacio [1] of 0.03% of their corpus being dedicated to exception recovery. The amount of exception handling code is consistent between the two studies.

4.4.1 Statement Usage

Comparing across projects in the corpus, we studied whether there was a difference in control statement usage based on whether any exceptions are present. The measure used is the total number of occurrences of each statement normalized by the number of lines in the project. When comparing exception code within a project to other code in the project, both are normalized by the number of lines of code present for the respective category.

The results in Table 4.4 compare the usage of statements between projects that use exceptions and those that do not. The difference in control statements used is statistically significant using a Wilcoxon rank sum test for all the statements that were studied. This

A Study on the Effects of Exception Usage in Open-Source C++ Systems