• No results found

Lisp and Garbage Collection

In document After The Software Wars (Page 135-141)

Please don't assume Lisp is only useful for Animation and Graphics, AI, Bioinformatics, B2B and E-Commerce, Data Min-ing, EDA/Semiconductor applications, Expert Systems, Finance, Intelligent Agents, Knowledge Management, Mechanical CAD, Modeling and Simulation, Natural Language, Optimization, Research, Risk Analysis, Scheduling, Telecom, and Web Author-ing just because these are the only thAuthor-ings they happened to list.

—Kent Pitman, hacker

The future has already arrived; it's just not evenly distributed yet.

—William Gibson, science-fiction writer

Memory, handed out in contiguous chunks called buffers or arrays, is the scratch space for your processor. Your computer needs to read bitmaps and e-mails from the disk or network into memory in order to display or let you edit them. (Your operating system can actually cheat and pretend to give out lots of memory, and write to disk the infrequently used portions — virtual memory. If you come back to an application after not using it for several hours, you may find your hard drive is busy for several seconds because it is loading all of your data and code back into the real memory.)

Memory contains the current state of all the applications on your computer, and is therefore an extremely precious resource. Garbage collection is a system that transparently manages the computer's memory and automatically reclaims the unused “garbage” from the programmer when he is done using it. I will spend the rest of the chapter explaining how this technology changes programming, so sit tight if my explanation doesn't make sense yet.

John McCarthy created GC in 1959 when he created Lisp, a lan-guage invented ten years before C, but which never became accepted into the mainstream:

(defun finder (obj vec start end) (let ((range (- end start)))

The binary search function written in Lisp is a simple algorithm for quickly finding values in a sorted list. It runs in Log2(n) time because at each step, it divides the size of the array in half, similar to how we look up words in a dictionary. There are many faster and more complicated algorithms, and search is a very interesting and large topic in computer science, but 99% of the time, ye olde binary search is good enough.

C and C++ was based on BCPL and other languages before it, none of which had garbage collection. Lisp is a language built by mathematicians rather than operating systems hackers. Lisp pio-neered GC, but also was clean and powerful, and had a number of innovations that even C# and Java don't have today.3

Wikipedia's web page doesn't explain why Lisp never became accepted for mainstream applications, but perhaps the biggest answer is performance.4 So instead, people looked at other, more primitive, but compiled languages. The most expensive mistake in the history of computing is that the industry adopted the non-GC language C, rather than Lisp.

3 Perhaps the next most important innovation of Lisp over C is functional program-ming. Functional programming is a focus on writing code which has no side effects; the behavior of a function depends only on its parameters, and the only result of a function is its return value. Nowadays, in object-oriented programming, people tend to create classes with lots of mutable states, so that the behavior of a function depends on so many things that it is very hard to write correct code, prove it is correct, support multiple processors manipulating that object at the same time, etc. Functional programming is a philosophy, and Lisp made it natural.

4 Most Lisp implementations ran 10x slower than C because it was interpreted rather than compiled to machine code. It is possible to compile Lisp, but unfortu-nately, almost no one bothered. If someone complained about Lisp performance, the standard answer was that they were considering putting the interpreter into hardware, i.e. a Lisp computer. This never happened because it would have sucked Lisp into the expensive silicon race.

While Lisp had many innovations, its most important was garbage collection. Garbage collection requires significant infrastructure on the part of the system and is a threshold test for an intelligent pro-gramming language.5

Because so few of the most important codebases today have adopted GC, I must explain how it improves software so my geek brethren start using it.

The six factors of software quality are: reliability, portability, effi-ciency, maintainability, functionality, and usability; I will discuss how GC affects all of these factors. The most important factor is reliabil-ity, the sine qua non of software.

5 Some argue that static type checking (declaring all types in the code so the com-piler can flag mismatch errors) is an alternative way of making software more reliable, but while it can catch certain classes of bugs, it doesn't prevent memory leaks or buffer overruns.

Likewise, there are “smart pointers” which can emulate some of the features of garbage collection, but it is not a standard part of languages and doesn't provide many of the benefits.

Apple's Objective C 2.0 has added support for GC in their language, but it is optional, and therefore doesn't provide many of the benefits of a fully GC lan-guage, like enabling reflection or preventing buffer overruns.

Reliability

Therefore everyone who hears these words of mine and puts them into practice is like a wise man who built his house on the rock. The rain came down, the streams rose, and the winds blew and beat against that house; yet it did not fall, because it had its foundation on the rock. But everyone who hears these words of mine and does not put them into practice is like a fool-ish man who built his house on sand. The rain came down, the streams rose, and the winds blew and beat against that house, and it fell with a great crash.

—Matthew 7:24-27

If the software of Star Wars was no more reliable than today's...

Reliability, according to Wikipedia, is “the ability of a device or system to perform a required function under stated conditions for a specified period of time.” In software, the specified period of time is forever, because unlike metal and other things containing atoms, it doesn't wear out.

For example, reliability means that the computer does the same things for the same inputs every time. If you create a function to add two numbers, it should always work for all numbers. If your com-puter gets hacked and doesn't respond to your input, it isn't reliable.

Even performance is related to reliability: if your computer is wast-ing CPU cycles and is no longer responsive, it isn't reliable.

Today, few would say that computers are completely reliable.

Maybe your cable box crashes: “Comcast: On Demand, Sometimes.”

Maybe your laptop doesn't recognize the wireless network in one particular Internet cafe. I have trained my family to know that when-ever their computer is misbehaving, they should reboot the system before calling me. This fixes any errors where the memory is in an unspecified state, which is the most common computer problem.

Reliability is important because software is built up as a series of layers. At one level is the code displaying a file open dialog box, and far below is a device driver reading the file you selected as a series of scattered data blocks on the hard drive. Every layer of software needs to be reliable because it is either being used by other soft-ware or by a human. Both humans and softsoft-ware need a reliable foundation.

A tiny bug in its software caused the crash of one of the European Space Agency's Ariane 5 rockets, costing $370 million:6

6 The rocket's software was written in Ada, an old language, but with many of the features of garbage collection. Code which converted a 64-bit integer to a 16-bit integer received a number too big to fit into 16 bits, and so the conversion code threw an exception. The code to handle this exception was disabled, and therefore the computer crashed. When this computer crashed, it started sending confusing diagnostic information to the flight control computer, causing it to fly in a crazy way and break apart, triggering the internal self-destruct mechanisms.

Many blame management, but this was a simple design bug (you should be very careful when throwing away data). This was compounded because they were using a specialized embedded system with a non-mainstream programming lan-guage which allowed them the capability of disabling certain exceptions. This bug could have been caught in testing, but they didn't use accurate trajectory informa-tion in the simulainforma-tions. Perhaps clumsy tools made it hard to modify test cases, and so they never got updated.

When your computer crashes, you can reboot it; when your rocket crashes, there is nothing to reboot. The Mars Spirit and Opportunity rovers had a file system bug which made the rovers unresponsive, nearly ending the project before they even landed!7

While it isn't usually the case that a software bug will cause a rocket to crash, it is typically the case that all of the software layers depending on that buggy code will also fail. Software reliability is even trickier than that because an error in one place can lead to ures far away — this is known in engineering as “cascading fail-ures.” If an application gets confused and writes invalid data to the disk, other code which reads that info on startup will crash because it wasn't expecting invalid data. Now, your application is crashing on startup. In software, perhaps more than in any other type of intellec-tual property, a bug anywhere can cause problems everywhere, which is why reliability is the ultimate challenge for the software engineer.

Perfect reliability is extremely hard to achieve because software has to deal with the complexities of the real world. Ultimately, a key to reliable software is not to let complexity get out of hand. Lan-guages cannot remove the complexity of the world we choose to model inside a computer. However, they can remove many classes of reliability issues. I'm going to talk about two of the most common and expensive reliability challenges of computers: memory leaks and buffer overruns, and how garbage collection prevents these from happening.

In document After The Software Wars (Page 135-141)