The vpo optimizer for ten different machines, already ported and tested, was modified to collect measurements as specified in the previous sections. It typically took three or four hours for an experi-enced person to make the machine-dependent modifications for the compiler on each machine. For the resulting compiled programs, there is little overhead for collecting data to produce the proposed measure-ments. For instance, on the VAX-11 the C benchmarks whetstoneand dhrystonewere executed with and without data collection. The two benchmarks executed with data collection code inserted required only 6% and 13% respectively more execution time than they required without data collection
instructions. The number of counters needed for whetstonewas fifty-one. When execution classes were not used and counters were placed in each basic block, whetstonewould have required eighty-four counters and 9% more execution time.
The generation of reports from the measurements is also mostly machine-independent. Most of the code from a 500 line C program that produces several different reports has remained unchanged when implemented on the different machines. These reports gather data on the following:
1. instruction type distribution 2. addressing mode distribution 3. memory reference size distribution 4. register usage
5. condition code usage 6. conditional branches taken 7. data type distribution
ARCHITECTURAL STUDY
Past architectural studies have suffered from many limitations. Some used a small set of bench-mark programs due to the difficulty of collecting data. For instance, in the CFA architecture evaluations [FSB77], twelve assembly language programs were used to evaluate and rank nine different architec-tures. Most of these programs were less than 200 static machine instructions in length.
Many studies that compare architectures do not account for differences in how the machine instruc-tions are produced. Each test program in the CFA architectural evaluainstruc-tions was hand-coded in the assembly language of the machine to test a specific feature of an architecture [FuB77]. Thus, the quality of the test programs depended upon the skill of the programmer and his knowledge of the machine.
Johnson’s Portable C Compiler (pcc) [Joh79] was retargeted to each machine in Patterson’s study [PaP82]. Thus, Patterson claimed that the different compilers in his study used the same compiler tech-nology. The quality of the code produced by each pcc compiler, however, depends on the skill of the compiler writer when constructing tables for code generation and the patterns for peephole optimization.
The methods used to accomplish several of the past architectural studies made it difficult to capture certain kinds of information and perform various experiments. Data was collected from machine instruc-tions in many studies without modifying the compiler. The methods used included simulation [BSG77], trace software [Lun77, PeS77, Wie82], and hardware monitors [ClL82]. Capturing specific types of measurements, such as the number of registers used only as temporaries, is not possible with these methods. Furthermore, determining the usefulness of proposed architectural features is difficult without the ability to modify the compiler and obtain information showing how frequently the proposed features are used.
Ease has eliminated problems that have limited some past architectural studies. Using ease to
col-lect data, one can use a number of realistic programs and colcol-lect the data in a timely fashion. For exam-ple, on the VAX-11/8600, measurements were collected from the execution of almost 100 million
-28-instructions in less than ten minutes. Properties of vpo, the optimizer used in ease, eliminate several problems. Since code to perform instruction selection and most optimizations is constructed automati-cally, the quality of the code generated by vpo for each of the architectures has less dependence on the skill of the implementors than compilers using other techniques [DaW89]. For example, tables are con-structed by the implementor of a pcc compiler to recognize the different instances of trees from which an instruction can be generated. Instructions for vpo, however, only have to be represented correctly by the machine description. Retargeting the compiler to a new machine only requires expanding the intermedi-ate language stintermedi-atements to RTLs and describing the architecture. Ad hoc case analysis is unnecessary.
Thus, the programs compiled for each of the architectures receive the same degree of optimization.
Because an effort was made to separate the machine-independent code from the machine-dependent code to facilitate the retargeting of vpo, changing the compiler to implement proposed architectural changes such as reducing the number of registers available, changing the calling sequence, or eliminating an instruction or addressing mode is relatively easy to accomplish in the vpo compiler system. Even adding a new instruction or additional registers usually can be done easily since one RTL can be translated to one or more machine language instructions.
The following sections describe a study of several architectures that involved collecting ments from the execution of the same set of test programs on each machine. First, the type of measure-ments extracted by ease from each architecture is given. Next, the characteristics of each of the architec-tures in the study is described. The set of test programs used in the study is then specified. Finally, the measurements obtained from the architectures are analyzed.