Them2c tool can act as a command-line wrapper to access the OpenCL compiler installed natively on a system, by adding option ––amd to the command line. For this option to be available, Multi2Sim
must have detected an installation of the APP SDK when it was built, as well as a correct installation of the AMD Catalyst driver. If this software is not installed,m2c is still built correctly, but option ––amd
is not available.
Whenm2c ––amdis followed by a list of .clsource files, each file is individually compiled, producing a kernel binary with the same name and the.bin extension. The target binary format is compatible with an AMD GPU when loaded with clCreateProgramWithSource, as well as suitable for simulation on Multi2Sim.
Compilation method
A vector-specific OpenCL compiler is provided as part of the OpenCL runtime libraries, and can usually be accessed only when an OpenCL host program invokes function clCreateProgramWithSourceat
runtime, followed by a call to clBuildProgram. However, the OpenCL interface can likewise be used later to retrieve the kernel binary generated internally. For each .clOpenCL C source file passed in the command line,m2c follows these steps to produce a kernel binary:
• The content of the source file is read bym2c and stored in an internal buffer.
• The OpenCL platform and context are initialized, and an OpenCL device is chosen for compilation.
• The buffer storing the kernel source is passed to a call toclCreateProgramWithSource. It is subsequently compiled for the selected target device with a call toclBuildProgram.
• The resulting kernel binary is retrieved with a call toclGetProgramBuildInfo. A compilation log is obtained with the same call, showing any possible compilation error messages.
Command-line options
The following command-line options in m2care related with the AMD native compiler wrapper:
• Option––amdis used to activatem2c’s functionality as the AMD native compiler wrapper. It must be present whenever any of the following options is used. The––amd option is incompatible with any other option presented outside of this section.
• Option––amd-listprovides a list of all target devices supported by the AMD driver. These devices do not necessarily match those OpenCL-compatible CPUs/GPUs installed on the system;
they are just possible targets for the compiler. Device identifiers shown in this list can be used as arguments to option––amd-device.
The following is an example of a device list, where the Southern Islands device is referred to as
Tahiti, the Evergreen device as Cypress, and the x86 device asIntel(R) Xeon(R).
$ m2c --amd --amd-list ID Name, Vendor
---- ---0 Cypress, Advanced Micro Devices, Inc.
1 ATI RV770, Advanced Micro Devices, Inc.
2 ATI RV710, Advanced Micro Devices, Inc.
3 ATI RV730, Advanced Micro Devices, Inc.
4 Juniper, Advanced Micro Devices, Inc.
5 Redwood, Advanced Micro Devices, Inc.
6 Cedar, Advanced Micro Devices, Inc.
7 WinterPark, Advanced Micro Devices, Inc.
8 BeaverCreek, Advanced Micro Devices, Inc.
9 Loveland, Advanced Micro Devices, Inc.
10 Cayman, Advanced Micro Devices, Inc.
11 Barts, Advanced Micro Devices, Inc.
12 Turks, Advanced Micro Devices, Inc.
13 Caicos, Advanced Micro Devices, Inc.
14 Tahiti, Advanced Micro Devices, Inc.
15 Pitcairn, Advanced Micro Devices, Inc.
16 Capeverde, Advanced Micro Devices, Inc.
17 Devastator, Advanced Micro Devices, Inc.
18 Scrapper, Advanced Micro Devices, Inc.
19 Intel(R) Xeon(R) CPU W3565 @3.20GHz, GenuineIntel
---20 devices available
• Option––amd-device <device>[,<device2>...]. Target device(s) for the final binary. Each device in the list can be either a numeric identifier, as presented with ––amd-list, or a word contained uniquely in that device name. For example, options ––amd-device 19and––amd-device Intel are equivalent, given the device list above. The following command compiles a vector addition kernel and creates a Southern Islands binary:
$ m2c --amd --amd-device Tahiti vector-add.cl Device 14 selected: Tahiti, Advanced Micro Devices Compiling ’vector-add.cl’...
vector_add.bin: kernel binary created
When only one device is specified, the.bin output file is exactly that produced internally by the AMD compiler. When multiple devices are specified, they are packed into a fat ELF binary bym2c
(see below). Notice that no space character should be included between device names in a list.
• Option––amd-dump-all. Dump additional information generated during the compilation of the kernel sources. This information is obtained in additional binary and text files, placed in two directories prefixed with the same name as the<kernel>.cl source file.
The first directory is named<kernel>_amd_files, and contains intermediate compilation files automatically generated by the AMD compiler (internally, this is done by adding flag-save-temps
as one of the arguments ofclBuildProgram).
A second directory named<kernel>_m2s_filescontains a post-processing of the final ELF binary done by Multi2Sim. Specifically, each ELF section is dumped into a separate file. The same is done for those binary file portions pointed to by ELF symbols.
Fat binaries
Fat binaries are a Multi2Sim-specific file format based on ELF that packs multiple AMD binaries into one single file. Fat binaries are generated when more than one device is specified with option
––amd-deviceas a list of devices separated by commas. For each device, m2cinternally creates the associated AMD binary, and packs it into an ELF section of the fat binary.
Additionally, a symbol table in the fat binary includes one symbol per embedded AMD binary. Each ELF symbol has the following properties:
• Fieldname is set to the name of the specific target device.
• Fieldvalue is set to a unique identifier of the target device. This value is equal to thee_machine
field of the ELF header of the embedded AMD binary. The value is useful for quick identification of the target device without having to extract it.
• Fieldindex points to the ELF section that contains the associated embedded AMD binary.
Fat binaries are optionally used for convenience by the Multi2Sim OpenCL runtime. When an OpenCL host program running on Multi2Sim loads a fat binary, it is compatible with any of the devices for which there is an associated embedded AMD binary. The fat binary content can be explored in Linux using command-line tool readelf -a <fatbinary>.bin.
Chapter 14
Tools
14.1 The INI file format
An INI file is a plain text file used to store configuration information and statistic reports for programs.
Multi2Sim uses this format for all of its input and output files, such as context configuration or cache hierarchy configuration files. This format is also used in Multi2Sim for output files, such as detailed simulation statistics reports. This is an example of a text file following the INI file format:
; This is a comment [ Button Accept ] Height = 20 Width = 40 Caption = ’OK’
[ Cancel ] State = Disabled
Each line of an INI file can be a comment, a section name, or a variable-value pair. A comment is a line starting with a semicolon; a section name is given as a string set off by square brackets (e.g.,[ Button Accept ]); and a variable-value pair is represented by separating the variable name and its value with an =sign. Section and variable names are case-sensitive in Multi2Sim.
The user can specify the values for an integer variable in decimal, hexadecimal, and octal formats. The latter two formats use the0x and0prefixes, respectively. Integer variables can also include suffixes K,
M, andG to multiply the number by 103, 106, and 109, respectively. Lower-case suffixesk,m, and g
multiply the number by 210, 220, and 230, respectively.
The inifile.py tool
Theinifile.pytool can be found in the tools/inifiledirectory within the Multi2Sim distribution package. It is a Python script aimed at automatically analyzing and modifying INI files, avoiding their manual edition. The command-line syntax of the program can be obtained by executing it without arguments. To illustrate its functionality by an example, let us run a simulation of the test-argsand
test-sortbenchmarks on a 2-threaded processor model, by using the files provided in the samples
directory. Run the following command under the samples/x86directory:
m2s --x86-sim detailed --ctx-config ctx-config-args-sort --x86-config x86-config-args-sort \ --x86-report x86-report
This command uses the ctx-config-args-sortcontext configuration file, which allocates benchmark
test-argsin context 0, and benchmarktest-sortin context 1. Likewise, it uses the
x86-config-args-sort to set up 2 threads, and dumps a detailed pipeline statistics report into file
x86-report. Allctx-config-args-sort,x86-config-args-sort, andx86-reportfiles follow the INI file format. After running this simulation, let us analyze the obtained results with theinifile.pyscript.
Reading INI files
As shown in Section 2.21, the pipeline statistics report includes one section per core, thread, and complete processor. Type the following commands:
inifile.py x86-report read c0t0 Commit.Total inifile.py x86-report read c0t1 Commit.Total inifile.py x86-report read c0 Commit.Total
These commands return the number of committed instructions in thread 0, thread 1, and core 0.
Since threads 0 and 1 are contained in core 0, the third output value is equal to the sum of the two first values.
Writing on an INI file
To show how to modify the contents of an INI file, the following example changes the context configuration file using inifile.py:
inifile.py ctx-config-args-sort remove "Context 0" StdOut
m2s --x86-sim detailed --ctx-config ctx-config-args-sort --x86-config x86-config-args-sort \ --x86-report x86-report
inifile.py ctx-config-args-sort write "Context 0" StdOut context-0.out
The first line removes the parameter StdOut in the [Context 0]section. Then, the second line reruns the simulation with the new context file. Since the redirection of the standard output has been removed, thetest-argsbenchmark dumps its output to screen. Finally, the third line restores the original contents of the context file, by adding the StdOut parameter again.
Using scripts to edit INI files
Every time the inifile.pytool is called, it analyzes the complete structure of the INI file before performing the requested action on it. For large INI files, this can entail some costly work, which becomes redundant when several actions are performed on the same file. In this case, it is possible to parse the INI file only the first time, by using aninifile.pyscript, as follows:
script=$(mktemp)
echo "read c0 Commit.Total" >> $script echo "read c0t0 Commit.Total" >> $script echo "read c0t1 Commit.Total" >> $script inifile.py x86-report run $script rm -f $script
The code above creates a temporary file (command mktemp), whose name is stored in variable script. Then, threeread actions are stored into the script to retrieve the number of committed instructions in core 0, thread 0, and thread 1. Finally, the inifile.pytool is executed with therun command, using the script file name as last argument. The result obtained by eachread command is presented in a new line on screen.