• No results found

Memory Leaks

In document After The Software Wars (Page 141-144)

Web banner ad for a tool to find memory leaks. There is a cottage industry of tools to fix problems which exists only because the programming lan-guage is broken in the first place.

One of the most common forms of memory corruption code is memory leaks — a common source of computer unreliability and frustration for users that worsens as our code becomes more com-plicated.8

7 The rover file system used RAM for every file. The rovers created a lot of system logs on its trip from Earth to Mars, and so ran out of memory just as they arrived!

8 The problem is even worse because every big C/C++ application has its own memory allocators. They grab the memory from the OS in large chunks and man-age it themselves. Now if you are done with memory, you need to return to the person who gave it to you.

Losing the address of your memory is like the sign outside a Chi-nese dry-cleaner: “No tickie, no laundry.” To prevent leaks, memory should be kept track of carefully. Unfortunately, C and C++ do not provide this basic feature, as you can allocate and lose track of memory in two lines of code:

byte[] p = new byte[100]; // p points to 100 bytes of memory p = NULL; // p now points to NULL, reference // to 100 bytes lost

“New” returns the location of the newly allocated memory, stored into vari-able p. If you overwrite that varivari-able, the address of your memory is lost, and you can't free it.

Writing code in C or C++, in which all unused memory is

returned to the operating system, is both complicated and tiresome:

Person* p = new Person("Linus", "Torvalds", 1969);

if (p == NULL) //Out of Memory or other error return;

Person* p2 = new Person("Richard", "Stallman", 1953);

if (p2 == NULL) //Out of Memory or other error {

delete (p); //Cleanup p because the p2 failed return;

}

MarriageLicense* pml = new MarriageLicense(p, p2) if (pml == NULL) //Out of Memory or other error

Code in C or C++ to manually handle out-of-memory conditions and other errors is itself bug-prone, adds complexity, and slows performance.

As programmers write increasingly complicated code, the work required to clean it up when things go wrong becomes more diffi-cult. This small example has only three failure cases and the code to remedy these error conditions makes the code twice as complex as it would otherwise be. Furthermore, code which only executes in unlikely failure scenarios doesn't get executed very frequently, and is therefore likely to have bugs in it. A rule in software is: “If the code isn't executed, it probably doesn't work.” And if your error han-dling code doesn't work, your problems will accumulate faster.

Scale this dilemma into millions of lines of interdependent code written by different people and the complexities compound beyond

our ability to fix them. To date, there is no non-trivial codebase writ-ten in C or C++ which is able to solve all of these error conditions, and every codebase I saw at Microsoft had bugs which occurred when the computer ran out of memory.9

MySQL, an otherwise highly reliable database, which powers pop-ular websites of Yahoo! and the Associated Press, still has several memory leaks (and buffer overruns.) Firefox's bug list contains sev-eral hundred, though most are obscure now.10

Let's take a look at why a memory leak can't happen when run-ning on a GC language:

byte[] p = new byte[100]; // Variable "p" points to 100 bytes.

p = NULL; // p now points to NULL.

// The system can deduce that no variables // are referencing the memory, and therefore // free it.

You don't have to call “delete” because the system can infer what memory is in use.

After the second line of code executes, “p” no longer references the 100 bytes. However, a GC system is smart, and it can take inven-tory of what memory is in use, and therefore discover that because these 100 bytes are not being referenced, they can be freed. Like-wise, if the Chinese laundry knew you had lost your ticket, they would know to throw away your clothes.

It is this simple innovation that changes programming. Memory is such a critical part of computers that we need to have the system, not the programmer, keep track of it.11

9 Often near the end of a development cycle, after fixing our feature bugs, we would focus on some of the out-of-memory bugs. While we never fixed them all, we'd make it better and feel good about it.

It is true that when you run out of memory, it is hard to do anything for the user, but not causing a crash or a further memory leak is the goal.

10 Here is a link to all active MySQL bugs containing “leak”:

http://tinyurl.com/2v95vu. Here is a link to all active Firefox bugs containing

“memory leak”: http://tinyurl.com/2tt5fw.

11 GC makes it easy for programmers to freely pass around objects that more than one piece of code is using at the same time, and the memory will be cleaned up only when every piece of code is finished with it. C and C++ do not enable this and many other common scenarios.

To write code which allows two pieces of software to share memory and to return it to the operating system only when both are finished is complicated and oner-ous. The simplest way to implement this feature in C/C++ is to do reference counting: have a count of the number of users of a piece of memory. COM, and the Linux and Windows kernels have reference counting. When the last user is fin-ished, the count turns to zero and that last user is responsible for returning the memory to the OS. Unfortunately, this feature requires complicated nonstandard code (to handle multiple processors) and places additional burdens on the pro-grammer because he now needs to keep track of what things are referencing each

It is also quite interesting that GC enables a bunch of infrastruc-ture that Microsoft's OLE/COM component system tried to enable, but COM did it in a very complicated way because it built on top of C and C++, rather that adding the features directly into the lan-guage:

COM Feature Name .Net & Java Feature Name Reference Counting Garbage Collection

BSTR Unicode strings

Type Libraries metadata + bytecodes / IL

IUnknown Everything is an Object

IDispatch Reflection

DCOM Remoting and Web Services

COM contains a lot of the same infrastructure that GC systems have, which suggests a deep similarity of some kind. Doing these features outside the language, however, made writing COM more tedious, difficult, and error-prone. “.Net” completely supersedes COM, and in much simpler packaging, so you will not hear Microsoft talk about COM again, but it will live on for many years in nearly every Microsoft codebase, and many external ones.

In document After The Software Wars (Page 141-144)