Static type inference for dynamically typed languages

The techniques described in this section are commonly described by their authors to be ap- plicable to dynamically typed languages. However, this is not really the case. Most of these techniques rely on static type inference and thus impose a statically typed semantics to the languages they are being applied to. Type inference is used to find type errors in dynamically typed languages and to apply optimisations. We summarise the presented techniques and compare

these with each other and also our work. An important issue we will be discussing is the set of restrictions that are placed on these languages to enable type inference.

MetaML An interesting observation we can make from the type systems presented in this section is that it is claimed that a reason why type inference is deemed to be tricky for dynamically typed languages is the fact that such languages often have unrestricted metaprogramming features. Type inference can be implemented in metaprogramming languages, as can be seen in languages such as MetaML [92, 104]. Metaprogramming is restricted in this language. For example, the object bracket and escape notation is used to generate code rather than strings. In addition, an important restriction is placed on code fragments, namely these should always be lambda abstractions. Thus the generated code snippets can be safely typed.

Python and Ruby A completely different approach to statically type check languages with metaprogramming features is presented in RPython [10], a statically typed subset of the Python language. All metaprogramming features (including eval and metaclasses) may be used during the initialisation of the Python classes. In languages such as Python or RPython, even determining which file is imported when an import statement is executed can be undecidable.

In RPython, metaprogramming features cannot be used during the running of the program. RPython also rejects programs where types cannot be statically resolved. It can therefore be compared with a statically typed version of Python. A similar attempt to give a statically typed semantics to a dynamically typed language is Diamondback Ruby (DRuby) [41]. DRuby also accepts type annotations, which help the type inference. Given its dynamic nature, DRuby can only give warnings about potential type errors. It does not catch all type errors and sometimes raises type errors for programs that work well. DRuby’s static type system is elaborate and supports features such as union and intersection types, subtyping, object types, parametric polymorphism and mixins. The type inference algorithm is also flow aware.

Unlike RPython, DRuby does not support metaprogramming features such as eval. Furr et al. also developed PRuby [40], an extension to DRuby. PRuby tries to address some of the shortcomings of DRuby, related to metaprogramming. Determining the type of the result from functions such as eval is undecidable if eval accepts arbitrary strings. However, by profiling a running Ruby program, a sample of strings which are passed to eval and similar functions can be gathered. PRuby then transforms the program into one that does not make use of these features. The resulting transformed program is statically checked using DRuby. DRuby is also used in a statically typed implementation of Ruby on Rails (RoR) [9]. This works by transform- ing RoR applications into plain Ruby. The transformation avoids the use of metaprogramming features and the resulting application is then type checked using DRuby.

JavaScript. Features of JavaScript that make type inference difficult include the use of prototypes instead of classes, first class functions and weak, dynamic typing. Different type systems have been proposed for JavaScript [11,106]. Anderson [11] proposes a structural type system [80] for a subset of the JavaScript language JS0. This subset excludes prototypes and first-class functions. This type inference algorithm allows the dynamic addition of attributes to JavaScript

objects. However sophisticated, this type system cannot be applied to Python, as the class and object creation mechanism is much more dynamic than in JavaScript. Also, no consideration is made of control flow and state and therefore simple sorting functions from the Python standard library cannot be adequately type checked [49]. Thiemann [106] proposes a type system where a type is described by its base type and its features (such as members). Although a type inference mechanism is not proposed, an implementation is available. More recently, a semantics for the JavaScript language has been formalised [48], although no type inference mechanism has been proposed. Recency types [53] deal with ad hoc object initialisation patterns, i.e., objects can be created at one point and members assigned dynamically. In order to deal with this, only “contexts” of linear instruction sequences that are placed between special labels are considered. In these contexts, an object is instantiated and its fields are assigned. These labels that delineate the contexts are referred to as MASKkexpressions. These labels are automatically placed by the system, although not enough detail is given on how the placing of these MASKkexpressions is determined. The concept of a recency type is similar to present types in preemptive type checking. Present types are more sophisticated as these can change throughout linear interprocedural flows of execution, although preemptive type checking does not support objects yet. Similarly, Guha et al. introduce a type system where control flow and state is taken into consideration [49]. This enables typing of programs that make use of idioms [49] such as heap-sensitive reason- ing, dynamic dispatch and type tests. The type system is modelled for a simple semantics for JavaScript [48]. Similar to our approach, the type system supports joins and ordering. The type environments are labelled, however these are simply program points and not abstractions of call stacks as in our approach. There is also no distinction between present and future use types. An- other interesting approach for type checking JavaScript involves introducing dependent types [23]. In this approach an SMT solver is employed to check the type derivations, which are derived for all the values present in the program.

Scheme. Felleisen and Tobin-Hochstadt [109] propose the notion of occurrence typing for implementing a statically typed version of Scheme. A translation of the simple example in Figure1.1is statically rejected by this system. Bigloo [91] is another statically typed subset of the Scheme language that supports optional type annotations. These type annotations are written as assertion-style contracts which are constraints on procedures. These constraints are used to generate more efficient code when the compiler can prove they are correct. Similar to gradual typing, these are turned into runtime checks when the compiler cannot prove them correct. Smalltalk. Strongtalk [20] is a statically typed subset of Smalltalk with features such as polymorphic signatures, protocol based inheritance, generics and parametric polymorphism. The language also supports thetypecaseconstruct, where runtime type checks are presumably carried out. This work does not however define a formal type system or describe how omitted type annotations are treated.

Erlang. Marlow and Wadler [66] propose a type system which supports recursive types and subtyping. Programs are not accepted if matching or case expressions are not exhaustive [78]. Therefore, only a subset of the language is supported.

SELF. Agesen [6] proposes a type inference mechanism for SELF. This inference mechanism works by generating constraints and unifying these constraints to obtain the desired type infor- mation.

In document Preemptive type checking in dynamically typed programs (Page 44-47)