SimpleScala Language - Automated Black Box Generation of Structured Inputs for Use in Software

This section describes the SimpleScala language. Ultimately I want to generate well- typed programs in this language with CLP.

7.2.1 General Design and Syntax

SimpleScala was designed with the intention of giving the same look and feel of Scala, at least from the perspective of a student who is first learning Scala and functional programming. The abstract syntax of SimpleScala is presented in Figure 7.1. This

language contains the following notable features:

• Relatable base types, namely String (string in the formalism), Boolean (boolean in the formalism), Int (integer in the formalism), and Unit (unitType in the formalism).

• Named functions, introduced with the def keyword. • Variables introduced with the val keyword.

• Conditionals, introduced with the if and else keywords.

• Tuples, introduced with comma-separated expressions in parentheses. These can be accessed using Scala’s ._n notation, for some positive number n. Notably, tuple access is a builtin operation, as opposed to a method on a tuple object (SimpleScala notably lacks objects in any object-oriented sense).

• Higher-order functions, introduced with =>. Unlike Scala, SimpleScala higher-order functions require that the function parameter is always annotated (i.e., there is no type inference). Additionally, SimpleScala higher-order functions always take exactly one parameter; the unitType type and unit value are provided as dummy types and values for functions which wish to take no values, and tuples / currying are used for functions which wish to take more than one value.

• Algebraic data types, using Scala’s notation to create instances of these types and pattern match on them (using the match keyword). While these look and feel very similar to Scala’s approach of using case classes for similar purposes, the underlying theory and implementation is radically different. As such, they are defined with the algebraic keyword, which serves as an indicator to students that

these behave in subtly, but fundamentally, different ways than do Scala’s case classes.

• The ability to define parametric polymorphism (type variables on code) using square brackets on a def. Unlike with Scala, even if a computation does not use type variables, empty square brackets are required.

• The ability to define generics (type variables on data) using square brackets in an algebraic definition. Even if no type variables are used in a given algebraic definition, empty square brackets must be provided, in contrast to non-generic case class definitions in Scala.

• Programs (prog) consist of a series of user-defined type definitions (−−→tdefs), followed by a series of user-defined functions (−→def), followed by a single expression (e) which serves as a program entry point.

7.2.2 Type Domains

The typing rules for this language utilize a number of formal definitions which are provided in Figure 7.2. A brief description of each one of these definitions follows:

• fdefs: Records named functions defined with the def keyword in a convenient format. The parser ensures that this mapping is unique (i.e., no two functions share the same name). Specifically, fdefs maps function names to to 3-tuples of:

1. The type variables introduced (i.e., those in scope) for the function 2. The input type of the function

x ∈ Variable str ∈ String b ∈ Boolean _{i ∈ Z} _{n ∈ N}

fn ∈ FunctionName cn ∈ ConstructorName un ∈ UserDefinedTypeName T ∈ TypeVariable

τ ∈ Type ::= string|boolean|integer|unitType

|τ1 → τ2 |( ~τ )|un[~τ ]|T e ∈ Exp ::= x |str |b|i|unit|e1⊕ e2 |(x : τ ) ⇒ e|e1(e2)|fn[~τ ](e) |if (e1) e2 else e3 |{−val e}→ |( ~e )|e. n

|cn[~τ ](e)|e match {−−→case} val ∈ Val ::= val x = e

case ∈ Case ::= case cn(x) ⇒ e|case ( ~x ) ⇒ e ⊕ ∈ Binop ::= +| − | × | ÷ | ∧ | ∨ | < | ≤ tdef ∈ UserDefinedTypeDef ::= algebraic un[−→T ] =−−→cdef

cdef ∈ ConstructorDefinition ::= cn(τ )

def ∈ Def ::= def fn[−→T ](x : τ1) : τ2 = e

prog ∈ Program ::=−−→tdef −→def e

Figure 7.1: SimpleScala syntax.

Because the functions defined do not change throughout typechecking or program execution, fdefs is treated as a global constant.

• tdefs: Records user-defined types defined with the algebraic keyword in a convenient format. The parser ensures that this mapping is unique (i.e., no two user- defined types share the same name). Specifically, tdefs maps user-defined type names to pairs of:

2. A mapping of each constructor of the user-defined type to the type each constructor expects

Because the user-defined type definitions do not change throughout typechecking or program execution, tdefs is treated as a global constant.

• cdefs: Records a backwards mapping of constructor names to the name of the user-defined type the constructor creates. The parser ensures that this mapping is unique (i.e., each constructor name is unique, even across types). For the same reasoning as with tdefs, cdefs is treated as a global constant.

• tscope: Records the type variables which are currently in scope. If we are currently typechecking the program’s entry point (i.e., e in prog), then tscope = ∅. If we are currently typechecking the expression inside a function defined with def with name fn, then tscope = first(fdefs(fn)), where first gets the first element in a tuple. The program entry point and each def can be typechecked independently of each other, and tscope will never change throughout typechecking the individual unit (i.e., the program entry point or a def). As such, tscope is treated as a global variable.

• Γ: Maps the variables in scope along with their recorded types. Because variables can be added in scope at a number of points, Γ must be threaded through the typing rules. Additionally, because SimpleScala follows lexical scoping, Γ only needs to be passed down, not up. That is, with the exception of shadowing, it is not possible for a variable introduced in one scope to influence a variable in another scope, so there is no need to return which variables were in scope for a given expression. This is considered standard.

fdefs ∈ NamedFunctionDefs = FunctionName → (−−−−−−−−−→TypeVariable × Type × Type) tdefs ∈ TypeDefs = UserDefinedTypeName →

(−−−−−−−−−→TypeVariable × (ConstructorName → Type)) cdefs ∈ ConstructorDefs = ConstructorName → UserDefinedTypeName

tscope ∈ TypeVarsInScope = TypeVariable Γ ∈ TypeEnv = Variable → Type

Figure 7.2: Various definitions used in the typing rules.

7.2.3 Typing Rules and Helper Functions

The actual typing rules are shown in Figures 7.3 and 7.4, with Figure 7.3 showing all rules except for pattern matching, and Figure 7.4 showing only the rules for pattern matching. The rules have been split up only for reasons of space.

A variety of helper functions are employed in these rules. Most of these functions are intuitive and relatively standard, so their full definitions have been provided in Ap- pendix E as opposed to being directly in this chapter. A brief description of each helper function used is provided below for convenience:

• keys: Returns a set of keys in the given map.

• typeOk: Returns true if all type variables used in the given type are in scope. If no type variables are used in the given type, it simply returns true.

• typeOkList: Like typeOk, but it operates over a list of types as opposed to just a single type.

• typeReplace: This is responsible for replacing type variables with actual types at the appropriate times (i.e., when a named function is called, or when an instance of a user-defined type is created). For example, consider the call

typeReplace([A], [integer], (boolean, A)), where the notation [A] indicates a list holding type variable A, and so on. This will replace each instance of type variable Ain type (boolean, A) with integer, resulting in the new type (boolean, integer). • blockEnv: Produces a new type environment where the given list of variable/- value definitions (defined with the val keyword) are added to the provided type environment.

• tupleTypes: Given a list of expressions and a type environment, returns a list of types, where each type in the returned list corresponds to an expression in the input list at the same position of the list. For example, if given [1, true, unit] as expressions under any type environment, this would return [integer, boolean, unitType]. • tupleAccess: Given a list of types and a positive number n, yields the nth element of the list in a 1-indexed manner. If there is no such element in the input list (e.g., when only two types are provided, but n = 5), this function cannot be applied, and attempting to apply it will trigger handling for ill-typedness.

• tupGamma: Produces a new type environment where the given list of variables are in scope, each associated with a type from the given list of types. For example, consider the call tupGamma([x, y], [boolean, integer], Γ). This will produce a new type environment wherein x maps to boolean, and y maps to integer. The name of the function reflects the fact that this is used only for pattern matching on tuples (hence the “tup” part).

• casesOk: Specific to pattern matching involving user-defined types, this makes sure that there is exactly one case for each possible constructor of a type, and that all possible constructors are accounted for. If a case has been duplicated, is missing, or

does not match up with the appropriate user-defined type in play, then this returns false. Otherwise, casesOk returns true.

• casesTypes: Determines the type of the branch of each case for pattern matching on user-defined types, yielding a list of types, one for each case. This needs the actual cases, the current type environment, the generic type variables in scope for whatever user-defined type is in play, the types we want to replace the type variables with, and a mapping of constructor names to the types that each one of the constructors expects.

• asSingleton: Given a list of types, gets the first element of the list, but only if each element of the list is identical. If there are two non-identical elements in the given list, asSingleton cannot be applied. In context, asSingleton is used to ensure that the body of each branch in a pattern match on a user-defined type returns the same type. If two branches differ in type, then asSingleton triggers handling for ill-typedness.

Overall, the typing rules should be straightforward, albeit dense.

In document Automated Black Box Generation of Structured Inputs for Use in Software Testing (Page 150-157)