UNIT III SYNTAX ANALYSIS 3.1 NEED AND ROLE OF THE PARSER
5. The initial state of the parser is the one constructed from the set of items containing [S'
4.13 SYMBOL TABLES
Symbol tables are data structures that are used by compilers to hold information about sourceprogram constructs. The information is collected incrementally by the analysis phases of a compiler and used by the synthesis phases to generate the target code. Entries in the symbol table contain information about an identifier such as its character string (or lexeme) , its type, its position in storage, and any other relevant information.
Lexical Syntax Semantic Intermediate
Analyzer Analyzer Analyzer generatorcode
Code Code optimizer generator
Symbol Table
Figure 4.14: interaction among Symbol table and various phases of compiler
The symbol table, which stores information about the entire source program, is used by all phases of the compiler.
CS6660 Compiler Design Unit IV 4.25
An essential function of a compiler is to record the variable names used in the source program and collect information about various attributes of each name.
These attributes may provide information about the storage allocated for a name, its type, its scope.
In the case of procedure names, such things as the number and types of its arguments, the method of passing each argument (for example, by value or by reference), and the type returned are maintained in symbol table.
The symbol table is a data structure containing a record for each variable name, with fields for the attributes of the name. The data structure should be designed to allow the compiler to find the record for each name quickly and to store or retrieve data from that record quickly.
A symbol table can be implemented in one of the following ways:
O Linear (sorted or unsorted) list
O Binary Search Tree
O Hash table
Among the above all, symbol tables are mostly implemented as hash tables, where the source code symbol itself is treated as a key for the hash function and the return value is the information about the symbol.
A symbol table may serve the following purposes depending upon the language in hand:
O To store the names of all entities in a structured form at one place.
O To verify if a variable has been declared.
O To implement type checking, by verifying assignments and expressions.
O To determine the scope of a name (scope resolution).
SymbolTable Entries
The symbol table grows dynamically even though fixed at compile time.
Each entry in the symbol table is for the declaration of a name.
The format of entries does not uniform.
Each entry can be implemented as a record consisting of a sequence of consecutive words of memory.
To keep symboltable records uniform; it may be convenient for some of the information about a name to be kept outside the table entry, with only a pointer to this information stored in the record.
The following information about identifiers are stored in symbol table.
O The name.
O The data type.
O The block level.
O Its scope (local, global).
O Pointer / address
O Its offset from base pointer
O Function name, parameter,and variable.
Characters in a Name
There is a distinction between the token id for an identifier or name.
The lexeme consisting of the character string forming the name, and the attributes of the name.
CS6660 Compiler Design Unit IV 4.26
Strings of characters may be unwieldy to work with, so compilers often use some fixed
length representation of the name rather than the lexeme.
The lexeme is needed when a symboltable entry is set up for the first time, and when we look up a lexeme found in the input to determine whether it is a name that has already appeared.
A common representation of a name is a pointer to a symboltable entry for it.
If there is a modest upper bound on the length of a name, then the characters in the name can be stored in the symboltable entry, as in Figure 4.15.
Figure 4.15: Symbol table names In fixedsize space within a record
If there is no limit on the length of a name, or if the limit is rarely reached, the indirect scheme of Figure 4.16 can be used.
Figure 4.16: symbol table names In a separate array Storage Allocation Information
Information about the storage locations that will be bund to names at run time is kept in the symbol table.
Static and dynamic allocation can be done.
Storage is allocated for code, data, stack, and heap.
COMMON blocks in Fortran are loaded separately.
The List Data Structure for Symbol Tables
The compiler plans out the activation record for each procedure.
The simplest and easiest to implement data structure for a symbol table is a linear list of records as shown in figure 4.17.
CS6660 Compiler Design Unit IV 4.27
We use a single array, or equivalently several arrays. to store names and their associated information.
If the symbol table contains n names, To find the data about a name, on the average, we search n/2 names, so the cost of an inquiry is also proportional to n.
Id1 Info1
Id1
Info1
. . .
Idn
Infon
Figure 4.17. A linear list of records.
Hash Tables for Symbol Tables
available
Variations of the searching technique known as hashing have been implemented in many compilers.
open hashing is a simplest variant of searching technique.
Even this scheme gives us the capability of performing e inquiries on n names in time proportional to n ( n+e) / m, for any constant m of our choosing.
This method is generally more efficient than linear lists and is the method of chow for symbol tables in most situations.
The basic hashing scheme is illustrated in Figure 4.34. There are two parts to the data structure:
1. A hash table consisting of a fixed array of m pointers to table entries.
2. Table entries organized into m separate linked lists, called buckers (some buckets may be empty). Each record in the symbol table appears on exactly one of these lists.
CS6660 Compiler Design Unit IV 4.28
Figure 4.18: A hash table of size 210.
Representing Scope Information
A simple approach is to maintain a separate symbol table for each scope. In effect, the symbol table for a procedure or scope is the compile time equivalent of an activation record.
Linked list is best to represent the Scope Information.
Figure 4.19: The most recent entry for a is near the front.