Evaluators With CLP
This section discusses how we use CLP for generating test suites for language tok- enizers, parsers, and evaluators. This discussion is separated into two parts:
1. A discussion of how valid inputs can be generated (Section 6.2.1). The solutions we test should be able to successfully take valid inputs and produce correct token lists or ASTs, depdending on the component being tested (tokenzers or parsers, respectively).
2. A discussion of how invalid inputs can be generated (Section 6.2.2). The solutions we test should reject such inputs, and produce well-defined error messages under such inputs.
6.2.1
Generating Valid Inputs
Throughout this subsection, we use a running example based on the grammar shown in Figure 6.1, with the corresponding tokens 0 (zero), 1 (one), - (minus), ( (left paren- theses) and ) (right parentheses). While this grammar is admittedly simple, it serves to illustrate all the applicable core concepts, and it forms a subset of the grammar used in the student assignment (Section 6.3). Of special note is that this grammar describes concrete syntax, in contrast to the abstract syntax descriptions used in prior case studies. This use of concrete syntax is unique to this chapter, and it reflects the fact that this chapter focuses on the testing of parsing-related components.
e ∈ Expression ::= ae
ae ∈ AdditiveExppression ::= pe |pe - ae pe ∈ PrimitiveExpression ::= 0|1|(e)|-pe
Figure 6.1: Small grammar used for running CLP example. Portions shown in bold represent tokens.
An executable CLP-based tokenizer applicable to tokenizing the grammar in Fig- ure 6.1 is presented in Figure 6.2, along with a query which will generate valid inputs and corresponding outputs for the tokenizer. The charToToken helper procedure maps characters to their token representations. The tokenize procedure in Figure 6.2 consists of two rules. The first rule (line 9) states that if there are no characters to tokenize, then there are no tokens produced. The second rule (lines 10-13) states that if the character input begins with a single character, then the tokens produced begin with a single to- ken, where the token is derived from the charToToken helper procedure. Furthermore, the second rule recursively calls tokenize to process the rest of the input. While this illustrative example has been kept as simple as possible, it is straightforward to add CLP code to allow for whitespace, integers with multiple digits, and multiple-character tokens,
all of which appear in the actual assignment used in our evaluation. 1 % charToToken : Character , Token
2 charToToken ( ’ 0 ’ , token_zero ) . 3 charToToken ( ’ 1 ’ , token_one ) . 4 charToToken ( ’− ’ , token_minus ) . 5 charToToken ( ’ ( ’ , token_lparen ) . 6 charToToken ( ’ ) ’ , token_rparen ) . 7 8 % t o k e n i z e : Characters , Tokens 9 t o k e n i z e ( [ ] , [ ] ) .
10 t o k e n i z e ( [ SingleChar | Chars ] , [ SingleToken | Tokens ] ) :− 11 charToToken ( SingleChar , SingleToken ) ,
12 t o k e n i z e ( Chars , Tokens ) . 13
14 ?− length ( Characters , 5) ,
15 t o k e n i z e ( Characters , Tokens ) .
Figure 6.2: CLP-based tokenizer for the language defined in Figure 6.1. tokenize takes a list of characters to tokenize along with the tokens produced from the characters, respec- tively. Line 14 gives a query which will generate all character lists (held in the variable Characrters) of length 5 which can be correctly tokenized, returning the corresponding tokens in the variable Tokens.
The parser for these tokens can similarly be implemented in CLP, using a standard recursive-descent style. Such a parser is shown in Figure 6.3, along with a query which generates all valid ASTs corresponding to some input tokens.
Unsurprisingly, an evaluator of ASTs can be similarly defined, as is shown in Fig- ure 6.4. In contrast to the tokenizer and the parser, the evaluator code in Figure 6.4 cannot be immediately used as a generator. In particular, there are two problems with this code:
1. We must bound the size of the AST produced in order to use evaluate as a generator. Such bounding was easily performed in the tokenizer and parser with the help of length, but length is not directly applicable to ASTs. This problem, along with solutions, is discussed further in Chapter 7, particularly Section 7.3.
1 % parseExpression : Tokens , Exp 2 parseExpression ( Tokens , Exp) :− 3 parseExpression ( Tokens , [ ] , Exp ) . 4
5 % parseExpression : InputTokens , OutputTokens , Exp 6 parseExpression ( Input , Output , Exp) :−
7 parseAdditiveExpression ( Input , Output , Exp ) . 8
9 % parseAdditiveExpression : InputTokens , OutputTokens , Exp 10 parseAdditiveExpression ( Input , Output , Exp) :−
11 parsePrimaryExpression ( Input , Output , Exp ) . 12 parseAdditiveExpression (
13 Input1 , Output , exp_subtract ( Left , Right ) ) :−
14 parsePrimaryExpression ( Input1 , [ token_minus | Input2 ] , Left ) , 15 parseAdditiveExpression ( Input2 , Output , Right ) .
16
17 % parsePrimaryExpression : InputTokens , OutputTokens , Exp 18 parsePrimaryExpression ( [ token_zero | Rest ] , Rest , exp_zero ) . 19 parsePrimaryExpression ( [ token_one | Rest ] , Rest , exp_one ) . 20 parsePrimaryExpression ( [ token_lparen | Input ] , Output , Exp) :− 21 parseExpression ( Input , [ token_rparen | Output ] , Exp ) .
22 parsePrimaryExpression (
23 [ token_minus | Input ] , Output , exp_unary_minus (Exp ) ) :− 24 parsePrimaryExpression ( Input , Output , Exp ) .
25
26 ?− length ( Tokens , 3) ,
27 parseExpression ( Tokens , Exp ) .
Figure 6.3: CLP-based parser for the language defined in Figure 6.1. parseExpression of arity 3 (lines 6-7), parseAdditiveExpression, and parsePrimaryExpression all take an input list of tokens, an output list of remaining tokens, and the expression produced from the input list of tokens. parseExpression of arity 2 (lines 2-3) calls parseExpression of arity 3 (line 3), and stipulates that there must be no tokens remaining (i.e., the tokens remaining is an empty list, []). Line 26 gives a query which will generate all expressions (held in variable Exp) which can be represented with 3 tokens, along with the actual input tokens (held in variable Tokens).
2. The is operator (used on lines 7 and 10 of Figure 6.4) requires that the input expression (on the righthand side) be fully instantiated. This is only true if the input expression (the first parameter to evaluate) is entirely known (i.e., ground), which
1 % e v a l u a t e : Exp , Int 2 evaluate ( exp_zero , 0 ) . 3 evaluate ( exp_one , 1 ) .
4 evaluate ( exp_subtract ( LeftExp , RightExp ) , Result ) :− 5 evaluate ( LeftExp , L e f t I n t ) ,
6 evaluate ( RightExp , RightInt ) , 7 Result i s L e f t I n t − RightInt .
8 evaluate ( exp_unary_minus (Exp ) , Result ) :− 9 evaluate (Exp , ExpInt ) ,
10 Result i s −ExpInt .
Figure 6.4: CLP-based evaluator for the language defined in Figure 6.1. evaluate takes an input expression as its first parameter, and returns the integer result of the expression in the second parameter.
is not true if evaluate is used as an expression generator. Fixing this demands that arithmetic constraint solvers be used (discussed in Chapter 2, Section 2.2.4, as well as Chapter 3, Section 3.3.5).
Rather than attack the aforementioned problems directly, we exploit the fact that the parser already produces ASTs when it is used as a generator. Given that these ASTs are fully known (i.e., they are ground), they serve as suitable inputs for the evaluate procedure defined in Figure 6.4. With this in mind, evaluate is used to determine the expected evaluation result of an AST, though parseExpression (defined in Figure 6.3) is used to actually produce the ASTs.
6.2.2
Generating Invalid Inputs
The generation of invalid inputs poses a challenge. One possibility is to selectively negate premises to produce invalid inputs by construction, as was done for the generation of almost well-typed programs in Chapter 4. However, this approach was deemed to be overly complex given the task at hand. As such, we instead employed mutation-based fuzzing, as used in LangFuzz [13]. The basic idea with mutation-based fuzzing is to first
generate valid inputs (as with the technique shown in Section 6.2.1), and then selectively mutate them by introducing arbitrary edits. These edits can yield invalid inputs which are still intuitively close to being valid, as only the partcular edits are invalid. Details regarding exactly what these edits look like for this test suite follow.
For generating invalid inputs for the tokenizer, we insert characters which will never yield valid tokens into an otherwise tokenizable stream of characters. Specifically, we insert $, = (ensuring it does not follow either > or <), =>, and =<. The character $ was chosen arbitrarily as a representative of an unconditionally invalid character, and the rest of the characters were chosen as they intuitively seem more likely to trigger faults in a buggy tokenizer. For generating invalid inputs for the parser, we first produce a valid list of tokens which can be parsed to form a valid expression. We then insert an arbitrary valid token into the list, either an integer 0 or 2 (arbitrarily chosen), or any other one of a finite list of remaining valid tokens. Because this process may still yield a parsable list of tokens (as when negating a subexpression), we run the CLP-based parser on the newly generated input to ensure that it fails, thus ensuring the input is invalid. While these approaches to generating invalid inputs for the tokenizer and parser are simplistic, we have nonetheless found them to be effective at finding faults in student code.
As for the evaluator, relatively few ASTs act as invalid inputs. By construction, the AST definition in both the Java and CLP reference solutions does not allow for the con- struction of ASTs with nonsensical structure. With this in mind, the only significant edge case which can be safely deemed “invalid” is that of cases which trigger division by zero, which is supposed to be specially handled by student solutions. We observed that a significant number of the generated valid parser outputs (ASTs produced as described in Section 6.2.1) would attempt to perform division by zero without any outside inter- vention. As such, we re-used these ASTs as inputs to the evaluator, along with a record of what the AST should evaluate to (be it a number or a trigger for division by zero).