3.4 - Representing Objects - Thorsten Ball-Writing an interpreter in Go (2017).pdf

} }

As you can see, eval is recursive. When astNode is infixExpression is true, eval calls itself again two times to evaluate the left and the right operands of the infix expression. This in turn may lead to the evaluation of another infix expression or an integer literal or a boolean literal or an identifier… We’ve already seen recursion at work when building and testing the AST. The same concepts apply here, except that we’re evaluating the tree and not building it.

Looking at this snippet of pseudocode you can probably imagine how easy it is to extend this function. That comes to our advantage. We’re going to build up our own Eval function piece by piece and add new branches and capabilities as we go along and extend our interpreter.

But the most interesting lines of this snippet are the return statements. What do they return?

Here are two lines that bind the return value of a call to evalto names:

leftEvaluated = eval(astNode.Left) rightEvaluated = eval(astNode.Right)

What doesevalreturn here? Of which type are the return values? The answer to these questions is the same as the one for “what kind of internal object system will our interpreter have?”

3.4 - Representing Objects

Wait, what? You never said Monkey was object oriented! Yes, I never did and it’s not. Why do we need “a object system” then? Call it a “value system” or “object representation” then.

The point is, we need to define what our “eval” function returns. We need a system that can represent the values our AST represents or values that we generate when evaluating the AST in memory.

Let’s say we’re evaluating the following Monkey code:

let a = 5;

// [...]

a + a;

As you can see, we’re binding the integer literal5to the namea. Then things happen. It doesn’t matter what. What matters is that when we come across thea + aexpression later we need to access the value ais bound to. In order to evaluatea + a we need to get to the5. In the AST it’s represented as an*ast.IntegerLiteral, but how are we going to keep track of and represent the5while we’re evaluating the rest of the AST?

There are a lot of different choices when building an internal representation of values in an inter-preted language. And there is a lot of wisdom about this topic spread throughout the codebases of the world’s interpreters and compilers. Each interpreter has its own way to represent values, always slightly differing from the solution that came before, adjusted for the requirements of the interpreted language.

Some use native types (integers, booleans, etc.) of the host language to represent values in the interpreted language, not wrapped in anything. In other languages values/objects are represented only as pointers, whereas in some programming languages native types and pointers are mixed.

Why the variety? For one, the host languages differ. How you represent a string of your interpreted language depends on how a string can be represented in the language the interpreter is implemented in. An interpreter written in Ruby can’t represent values the same way an interpreter written in C can.

And not only do the host languages differ, but the languages being interpreted do too. Some interpreted languages may only need representations of primitive data types, like integers, char-acters or bytes. But in others you’ll have lists, dictionaries, functions or compound data types.

These differences lead to highly different requirements in regards to value representation.

Besides the host language and the interpreted language, the biggest influence on the design and implementation of value representations are the resulting execution speed and the memory consumption while evaluating programs. If you want to build a fast interpreter you can’t get away with a slow and bloated object system. And if you’re going to write your own garbage collector, you need to think about how it’ll keep track of the values in the system. But, on the other hand, if you don’t care about performance, then it does make sense to keep things simple and easy to understand until further requirements arise.

The point is this: there are a lot of different ways to represent values of the interpreted lan-guages in the host language. The best (and maybe the only) way to learn about these different representations is to actually read through the source code of some popular interpreters. I heartily recommended theWren source code, which includes two types of value representation, enabled/disabled by using a compiler flag.

Besides the representation of values inside the host language there is also the matter of how to expose these values and their representation to the user of the interpreted language. What does the “public API” of these values look like?

Java, for example, offers both “primitive data types” (int, byte, short, long, float, double, boolean, char) and reference types to the user. The primitive data types do not have a huge representation inside the Java implementation, they closely map to their native counterparts.

Reference types on the other hand are references to compound data structures defined in the host language.

In Ruby the user doesn’t have access to “primitive data types”, nothing like a native value type exists because everything is an object and thus wrapped inside an internal representation.

Internally Ruby doesn’t distinguish between a byte and an instance of the classPizza: both are the same value type, wrapping different values.

There are a myriad ways to expose data to users of programming languages. Which one to choose depends on the language design and also, again, on performance requirements. If you don’t care about performance everything goes. But if you do, you need to make some smart decisions to achieve your goals.

Foundation of our Object System

Carefree as we still are about the performance of our Monkey interpreter, we choose the easy way: we’re going to represent every value we encounter when evaluating Monkey source code as anObject, an interface of our design. Every value will be wrapped inside a struct, which fulfills this Objectinterface.

In a new objectpackage we define theObjectinterface and the ObjectTypetype:

// object/object.go package object

type ObjectType string

type Object interface { Type() ObjectType Inspect() string }

That’s pretty simple and looks a lot like what we did in thetoken package with theToken and

TokenTypetypes. Except that instead of being a struct likeTokentheObjecttype is an interface.

The reason is that every value needs a different internal representation and it’s easier to define two different struct types than trying to fit booleans and integers into the same struct field.

At the moment we only have three data types in our Monkey interpreter: null, booleans and integers. Let’s start with implementing the integer representation and build up our object system.

Integers

The object.Integertype is as small as you’d expect it to be:

// object/object.go

import (

"fmt"

)

type Integer struct { Value int64 }

func (i *Integer) Inspect() string { return fmt.Sprintf("%d", i.Value) }

Whenever we encounter an integer literal in the source code we first turn it into an

ast.IntegerLiteral and then, when evaluating that AST node, we turn it into an ob-ject.Integer, saving the value inside our struct and passing around a reference to this struct.

In order forobject.Integerto fulfill theobject.Objectinterface, it still needs aType()method that returns itsObjectType. Just like we did with token.TokenTypewe define constants for each

ObjectType:

// object/object.go import "fmt"

type ObjectType string

const (

INTEGER_OBJ = "INTEGER"

)

As I said, this is pretty much what we did in the tokenpackage. And with that in place we can add theType()method to *object.Integer:

// object/object.go

func (i *Integer) Type() ObjectType { return INTEGER_OBJ }

And we’re done with Integer! Onto another data type: booleans.

Booleans

If you were expecting big things of this section, I’m sorry to disappoint. object.Booleanis as tiny as it gets:

// object/object.go

const ( // [...]

BOOLEAN_OBJ = "BOOLEAN"

)

type Boolean struct { Value bool }

func (b *Boolean) Type() ObjectType { return BOOLEAN_OBJ }

func (b *Boolean) Inspect() string { return fmt.Sprintf("%t", b.Value) }

Just a struct that wraps a single value, abool.

We’re close to finishing the foundation of our object system. The last thing we need to do now, before we can start with our Evalfunction, is to represent a value that isn’t there.

Null

Tony Hoare introduced null references to the ALGOL W language in 1965 and called this his

“billion-dollar mistake”. Since their introduction countless systems have crashed because of

references to “null”, a value that represents the absence of a value. Null (or “nil” as in some languages) doesn’t have the best reputation, to say the least.

I debated with myself whether Monkey should have null. On one hand, yes, the language would be safer to use if it doesn’t allow null or null references. But on the other, we’re not trying to reinvent the wheel, but to learn something. And I found that having null at my disposal lead me to think twice whenever there was a chance to use it. Kinda like having something explosive in your car leads you to driving slower and more carefully. It really made me appreciate the choices that go into the design of a programming language. That’s something I consider worthwhile.

So let’s implement theNull type and keep a close look and steady hand when using it later on.

// object/object.go

const ( // [...]

NULL_OBJ = "NULL"

)

type Null struct{}

func (n *Null) Type() ObjectType { return NULL_OBJ } func (n *Null) Inspect() string { return "null" }

object.Nullis a struct just likeobject.Booleanandobject.Integer, except that it doesn’t wrap any value. It represents the absence of any value.

Withobject.Nulladded, our object system is now capable of representing boolean, integer and null values. That’s more than enough to get started with Eval.

In document Thorsten Ball-Writing an interpreter in Go (2017).pdf (Page 108-112)