Attributes are objects that are associated with all tokens found on the input stream. Typically, attributes represent the text of the input token, but may include any information that you require. The type of an attribute is specified via the Attrib type name, which you must provide. A function zzcr_attr() is also provided by you to inform the parser how to convert from the token type and text of a token to an Attrib. [In early versions of ANTLR,
attributes were also used to pass information to and from rules or subrules. Rule arguments and return values are a more sophisticated mechanism and, hence, in this section, we will pretend as if attributes are only used to communicate with the scanner.]
Attribute Definition and Creation
The attributes associated with input tokens must be a function of the text and the token type associated with that lexical object. These values are passed to zzcr_attr() which computes the attribute to be associated with that token. The user must define a function or macro that has the following form:
void zzcr_attr(attr, type, text)
Attrib *attr; /* pointer to attribute associated with this lexeme */ int type; /* the token type of the token */
char *text; /* text associated with lexeme */ {
/* *attr = f(text,token); */ }
Consider the following Attrib and zzcr_attr() definition.
typedef union {
int ival; float fval; } Attrib;
zzcr_attr(Attrib *attr, int type, char *text) {
switch ( type ) {
case INT : attr->ival = atoi(text); break; case FLOAT : attr->fval = atof(text); break; }
}
The typedef specifies that attributes are integer or floating point values. When the regular expression for a floating point number (which has been identified as FLOAT) is matched on the input, zzcr_attr() converts the string of characters representing that number to a C
You can specify the C definition or #include statements the file needed to define Attrib
(and zzcr_attr() if it is a macro) using the ANTLR #header directive. The action associated with #header is placed in every C file generated from the grammar files. Any C file created by the user that includes antlr.h must once again define Attrib before using
#includeantlr.h. A convenient way to handle this is to use the -gh ANTLR command line option to have ANTLR generate the stdpccts.h file and then simply include
stdpccts.h.
Attribute References
Attributes are referenced in user actions as $label where label is the label of a token referenced anywhere before the position of the action. For example,
#header <<
typedef int Attrib;
#define zzcr_attr(attr, type, text) *attr = atoi(text); >>
#token "[\ \t\n]+"<<zzskip();>> /* ignore whitespace */ add : a:"[0-9]+" "\+" b:"[0-9]+"
<<printf("addition is %d\n", a+b);>> ;
If Attrib is defined to be a structure or union, then $label.field is used to access the various fields. For example, using the union example above,
#header <<
typedef union { ... }; >>
void zzcr_attr(...) { ... };
#token "[\ \t\n]+"<<zzskip();>> /* ignore whitespace */ add : a:INT "\+" b:FLOAT
<<printf("addition is %f\n", $a.ival+$b.fval);>> ;
For backward compatibility reasons, ANTLR still supports the notation $i and $i.j, where i
and j are a positive integers. The integers uniquely identify an element within the currently active block and within the current alternative of that block. With the invocation of each new block, a new set of attributes becomes active and the previously active set is temporarily inactive. The $i and $i.j style attributes are scoped exactly like local stack-based variables in
C. Attributes are stored and accessed in stack fashion. With the recognition of each element in a rule, a new attribute is pushed on the stack. Consider the following simple rule:
C Interface
a: B | C ;
Rule a has 2 alternatives. The $i refers to the ith rule element in the current block and within the same alternative. So, in rule a, both B and C are $1.
Subrules are like code blocks in C—a new scope exists within the subrule. The subrules themselves are counted as a single element in the enclosing alternative. For example,
b : A ( B C <<action1>> | D E <<action2>> ) F <<action3>> | G <<action4>>
;
Table 17 on page 155 describes the attributes that are visible to each action.
Attribute destruction
You may elect to "destroy" all attributes created with zzcr_attr(). A macro called
zzd_attr(), is executed once for every attribute when the attribute goes out of scope. Deletions are done collectively at the end of every block. The zzd_attr() is passed the address of the attribute to destroy. This can be useful when memory is allocated with
zzcr_attr() and needs to be free()ed; make sure to NULL the pointers. For example, sometimes zzcr_attr() needs to make copies of some lexical objects temporarily. Rather than explicitly inserting code into the grammar to free these copies, zzd_attr() can be used to do it implicitly. This concept is similar to the constructors and destructors of C++. Consider the case when attributes are character strings and copies of the lexical text buffer are made which later need to be deallocated. This can be accomplished with code similar to the following.
#header <<
typedef char *Attrib;
#define zzd_attr(attr) {free(*(attr));} >>
TABLE 17. Visibility and Scoping of Attributes
Action Visible Attributes
action1 B as $1 (or $2.1), C as $2 (or $2.2), A as $1.1 action2 D as $1, E as $2, A as $1.1
action3 A as $1, F as $3
{
if ( type == StringLiteral ) { *attr = malloc( strlen(text)+1 ); strcpy(*attr, text);
} }
>>
Standard Attribute Definitions
Some typical attribute types are defined in the PCCTS include directory. These standard attribute types are contained in the following include files:
• charbuf.h. Attributes are fixed-size text buffers, each 32 characters in length. If a string longer than 31 characters (31 + 1 ‘\0’ terminator) is matched for a token, it is truncated to 31 characters. You can change this buffer length from the default 32 by redefining ZZTEXTSIZE before the point where charbuf.h is included. The text for an attribute must be referenced as $i.text.
• int.h. Attributes are int values derived from tokens using the atoi() function. • charptr.h, charptr.c. Attributes are pointers to dynamically allocated
variable-length strings. Although generally both more efficient and more flexible than charbuf.h, these attribute handlers use malloc() and free(), which are not the fastest library functions. The file charptr.c must be used with
#include, or linked with the C code ANTLR generates for any grammar using