Rationale - Pawn. The Language. embedded scripting language. June ITB CompuPhase

appendix c

The first issue in the presentation of a new computer language should be: why a new language at all?

Indeed, I did look at several existing languages before I designed my own. Many little languages were aimed at scripting the command shell (TCL, Perl, Python). Other languages were not designed as extension languages, and put the burden to embedding solely on the host application.

As I initially attempted to use Java as an extension language (rather than build my own, as I have done now), the differences between pawn_{and Java}

are illustrative for the almost reciprocal design goals of both languages. For example, Java promotes distributed computing where “packages” reside on diverse machines,pawn_{is designed so that the compiled applets can be eas-}

ily stored in a compound file together with other data. Java is furthermore designed to be architecture neutral and application independent, inversely

pawn_{is designed to be tightly coupled with an application; native functions}

are a taboo to some extent in Java (at least, it is considered “impure”), whereas native functions are “the reason to be” for pawn_{. From the view-}

point ofpawn_{, the intended use of Java is upside down: native functions are}

seen as an auxiliary library that the application —in Java— uses; inpawn_,

native functions are part of “the application” and the pawn _{program itself}

is a set of auxiliary functions that the application uses.

A language for scripting applications and devices: pawn _{is targeted}

as anextension language, meant to write application-specific macros or sub- programs with. pawn _{is not the appropriate language for implementing}

business applications or operating systems in. pawn_{is designed to be easily}

integrated with, and embedded in, other systems/applications. It is also designed to run in resource-constrained environments, such as on small micro- controllers.

As an extension language, pawn _{programs typically manipulate objects of}

the host application. In an animation system,pawn_{scripts deal with sprites,}

events and time intervals; in a communication application,pawn_{scripts han-}

dle packets and connections. I assume that the host application or the device makes (a subset of) its resources and functionality available via functions, handles, magic cookies. . . in a similar way that a contemporary operating system provides an interface to processes written in C/C++—e.g., the Win32

API (“handles everywhere”) or GNU/Linux’ “glibc”. To that end, pawn

has a simple and efficient interface to the “native” functions of the host application. A pawn _{script manipulates data objects in the host application}

through function calls, but itcannot access the data of the host application directly.

The first and foremost criteria for the pawn_{language were execution speed}

and reliability. Reliability in the sense that apawn _{program should not be}

able to crash the application or tool in which it is embedded —at least, not easily. Although this limits the capabilities of the language significantly, the advantages are twofold:

⋄ the application vendor can rest assured that its application will not crash due to user additions or macros,

⋄ the user is free to experiment with the language with no (or little) risk of damaging the application files.

Speed is essential: pawn_{programs would probably run in an abstract ma-}

chine, and abstract machines are notoriously slow. I had to make a language that has low overhead and a language for which a fast abstract machine can be written. Speed should also be reliable, in the sense that a pawn _script

should not slow down over time or have an occasional performance hiccup. Consequently, pawn _{excludes any required “background process”, such as}

garbage collection, and the core of the abstract machine does not implicitly allocate any system or application resources while it runs. That is, pawn

does not allocate memory or open files, not without the help of a native function that the script calls explicitly.

As Dennis Ritchie said, by intent the C language confines itself to facilities that can be mapped relatively efficiently and directly to machine instructions. The same is true for pawn_{, and this is also a partial explication why} pawn_{looks so much like C. Even though}pawn_{runs on an}_abstract _machine,

the goal is to keep that abstract machine small and quick. pawn _{is used in}

tiny embedded systems with ram_{sizes of 32 kiB or less, as well as in high-}

performance games that need every processor cycle for their graphics engine and game-play. In both environments, a heavy-weight scripting support is difficult to swallow.

A brief analysis showed that the instruction decoding logic for an abstract machine would quickly become the bottleneck in the performance of the abstract machine. The quickest way to dispatch instructions would be to use the opcode as an index in a jump table. Therefore all opcodes should have

the same size (excluding operands), and the opcode should fully specify the instruction (including the addressing methods, size of the operands, etc.). That meant that for each operation on a variable, the abstract machine needed a separate opcode for every combination of variable type, storage class and access method (direct, or dereferenced). For even three types (int,

char and unsigned int), two storage classes (global and local) and three access methods (direct, indirect or indexed), a total of 18 opcodes (3*2*3) are needed to simply fetch the value of a variable.

To get an abstract machine that is both small and quick, the number of opcodes should be kept to a minimum:∗ _{each “virtual instruction” needs to}

be handled by the abstract machine, and therefore takes code space. With 18 opcodes to load a variable in a register, 18 more to store a register into a variable, another 18 to get the address of a variable, etc. . . the abstract machine that I envisioned was quickly growing out of its desired proportions. The languages bob _and rexx _{inspired me to design a typeless language.}

This saved me a lot of opcodes. At the same time, the language could no longer be called a “subset of C”. I was changing the language. Why, then, not go a foot further in changing the language? This is where a few more design guidelines came into play:

⋄ give the programmer a general purpose tool, not a special purpose solution

⋄ avoid error prone language constructs; promote error checking

⋄ be pragmatic

A general purpose tool: pawn _{is targeted as an extension language,}

without specifying exactly what it will extent. Typically, the application or the tool that uses pawn _{for its extension language will provide many, op-}

timized routines or commands to operate on its native objects, be it text, database records or animated sprites. The extension language exists to per- mit the user to do what the application developer forgot, or decided not to include. Rather than providing a comprehensive library of functions to sort data, match regular expressions, or draw cubic B´ezier splines, pawn _should

supply a (general purpose) means to use, extend and combine the specific (“native”) functions that an application provides.

pawn _{lacks a comprehensive standard library. By intent,} pawn _{also lacks}

features like pointers, dynamic memory allocation, direct access to the op-

erating system or to the hardware, that are needed to remain competitive in the field of general purpose application or system programming. You cannot build linked lists or dynamic tree data structures in pawn_{, and neither}

can you access any memory beyond the boundaries of the abstract machine. That is not to say that a pawn _{program can never use dynamic, sorted}

symbol tables, or change a parameter in the operating system; it can do that, but it needs to do so by calling a “native” function that an application provides to the abstract machine.

In other words, if an application chooses to implement the well known peek

and pokefunctions (frombasic_{) in the abstract machine, a}pawn _program

can access any byte in memory, insofar the operating system permits this. Likewise, an application can provide native functions that insert, delete or search symbols in a table and allows several operations on them. The pro- posed core functions getpropertyand setpropertyare an example of native functions that build a linked list in the background.

Promote error checking: As you may have noticed, one of the foremost design criteria of the C language, “trust the programmer”, is absent from my list of design criteria. Users of script languages may not be experienced programmers; and even if they are,pawn_{will probably not be their}_primary

language. Mostpawn _{programmers will keep learning the language as they}

go, and will even after years not have become experts. Enough reason, hence, to replace error prone elements from the C language (pointers) with saver, albeit less general, constructs (references).† _{References are copied from C}++_.

They are nothing else than pointers in disguise, but they are restricted in various, mostly useful, ways. Turn to a C++ _{book to find more justification}

for references.

I find it sad that many, even modern, programming languages have so little built-in, or easy to use, support for confirming that programs do as the programmer intended. I am not referring to theoretical correctness (which is too costly to achieve for anything bigger than toy programs), but practical, easy to use, verification mechanisms as a help to the programmer. pawn_provides

both compile time and execution time assertions to use for preconditions, postconditions and invariants.

† _{You should see this remark in the context of my earlier assertion that many “Pawn” pro-}

grammers will be novice programmers. In my (teaching) experience, novice programmers make many pointer errors, as opposed to experienced C/C++ programmers.

The typing mechanism that most programming languages use is also an automatic “catcher” of a whole class of bugs. By virtue of being a typeless language, pawn _{lacked these error checking abilities. This was clearly a}

weakness, and I created the “tag” mechanism as an equivalent for verifying function parameter passing, array indexing and other operations.

The quality of the tools: the compiler and the abstract machine, also have a great impact on the robustness of code —whatever the language. Although this is only very loosely related to the design of the language, I set out to build the tools such that they promote error checking. The warning system ofpawn_{goes a step beyond simply reporting where the parser fails to}

interpret the data according to the language grammar. At several occasions, the compiler runs checks that are completely unrelated to generating code and that are implemented specifically to catch possible errors. Likewise, the “debugger hook” is designed right into the abstract machine, it is not an add-on implemented as an after-thought.

Be pragmatic: The object-oriented programming paradigm has not en- tirely lived up to its promise, in my opinion. On the one hand, OOP solves many tasks in an easier or cleaner way, due to the added abstraction layer. On the other hand, contemporary object-oriented languages leave you strug- gling with the language. The struggle should be with implementing the functionality for a specific task, not with the language used for the imple- mentation. Object-oriented languages are attractive mainly because of the comprehensive class libraries that they come with —but leaning on a standard library goes against one of the design goal for pawn_{. Object-oriented}

programming is not a solution for a non-expert programmer with little pa- tience for artificial complexity. The criterion “be pragmatic” is a reminder to seek solutions, not elegance.

• Practical design criteria

The fact thatpawn_{looks so much like C cannot be a coincidence, and it isn’t.} pawn _{started as a C dialect and stayed that way, because C has a proven}

track record. The changes from C were mostly born out of necessity after rubbing out the features of C that I did not want in a scripting language: no pointers and no “typing” system.

pawn_{, being a typeless language, needed a different means to declare vari-}

all variables should be declared at the top of a compound statement. pawn

is a little more like C++ _{in this respect.}

C language functions can pass “output values” via pointer arguments. The standard function scanf, for example, stores the values or strings that it reads from the console into its arguments. You can design a function in C so that it optionally returns a value through a pointer argument; if the caller of the function does not care for the return value, it passesNULLas the pointer value. The standard functionstrtolis an example of a function that does this. This technique frequently saves you from declaring and passing dummy variables. pawn _{replaces pointers with references, but references}

cannot be NULL. Thus, pawn _{needed a different technique to “drop” the}

values that a function returns via references. Its solution is the use of an “argument placeholder” that is written as an underscore character (“ ”); Prolog programmers will recognize it as a similar feature in that language. The argument placeholder reserves a temporary anonymous data object (a “cell” or an array of cells) that is automatically destroyed after the function call.

The temporary cell for the argument placeholder should still have a value, because the function may see a reference parameters as input/output. There- fore, a function must specify for each passed-by-reference argument what value it will have upon entry when the caller passes the placeholder instead of an actual argument. By extension, I also added default values for arguments that are “passed-by-value”. The feature to optionally remove all arguments with default values from the right was copied from C++_.

When speaking of BCPL and B, Dennis Ritchie said that C was invented in part to provide a plausible way of dealing with character strings when one begins with a word-oriented language. pawn _{provides two options for}

working with strings, packed and unpacked strings. In an unpacked string, every character fits in a cell. The overhead for a typical 32-bit implementa- tion is large: one character would take four bytes. Packed strings store up to four characters in one cell, at the cost of being significantly more difficult to handle if you could only access full cells. Modern BCPL implementations provide two array indexing methods: one to get a word from an array and one to get a character from an array. pawn _{copies this concept, although}

the syntax differs from that of BCPL. The packed string feature also led to the alternative array indexing syntax.

Unicode applications often have to deal with two characters sets: 8-bit for

Support for Uni- code string liter- als:138

legacy file formats and standardized transfer formats (like many of the In- ternet protocols) and the 16-bit Unicode character set (or the 31-bit UCS-4 character set). Although thepawn_{compiler has an option that makes char-}

acters 16-bit (so only two characters fit in a 32-bit cell), it is usually more convenient to store single-byte character strings in packed strings and multi- byte strings in unpacked strings. This turns a weakness inpawn_{—the need}

to distinguish packed strings from unpacked strings— into a strength: pawn

can make that distinction quite easily. And instead of needing two implementations for every function that deals with strings (anascii_{version and a}

Unicode version —look at the Win32 API, or even the standard C library),

pawn _{enables functions to handle} _both _{packed and unpacked strings with}

ease.

Notwithstanding the above mentioned changes, plus those in the chapter “Pitfalls: differences from C” (page 133), I have tried to keep pawn _close

to C. A final point, which is unrelated to language design, but important nonetheless, is the license: pawn_{is distributed under a liberal license allow-}

ing you to use and/or adapt the code with a minimum of restrictions —see appendix D.

License

appendix d

The software toolkit “pawn_{” (the compiler, the abstract machine and}

the support routines) is copyright c 1997–2011 by ITB CompuPhase, and distributed under the “Apache License” version 2.0, which is reproduced below, plus an exception clause regarding static linking. See the filenotices

for contributions and their respective licenses.

EXCEPTION TO THE APACHE 2.0 LICENSE

As a special exception to the Apache License 2.0 (and referring to the definitions in Section 1 of this license), you may link, statically or dynamically, the “Work” to other modules to produce an executable file containing portions of the “Work”, and distribute that executable file in “Object” form under the terms of your choice, without any of the additional requirements listed in Section 4 of the Apache License 2.0. This exception applies only to redistributions in “Object” form (not “Source” form) and only if no modifications have been made to the “Work”.

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBU- TION

1. Definitions

“License” shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.

“Licensor” shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.

“Legal Entity” shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, “control” means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. “You” (or “Your”) shall mean an individual or Legal Entity exercising permissions granted by this License.

“Source” form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.

“Object” form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.

“Work” shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).

“Derivative Works” shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the

In document Pawn. The Language. embedded scripting language. June ITB CompuPhase (Page 179-196)