Chapter 4 IMPLEMENTATION
4.7 XPASS
4.7.4 Response Function Data Library
The original method for storage of the response function data library used in the passive version of XPASS was developed without knowing the extent to which the data libraries would grow. As the response library grew, the structure of the data library quickly became unmanageable. A new response library database format was developed for the AI response function data library, with the specific goal of having an easy to manage data library that could access data extremely fast.
The previous method involved storage of the response libraries in a structure of nested directories. While this method was relatively fast for accessing data, as only several levels need to be traversed before reaching the desired data, it lead to an exponential growth in the number of directories and files used in the response library. This leads to extremely long transfer times when copying the data library to a new computer, as the IO operations on each file become the limiting factor. Transferring ~50 GB of data stored in the directory structure took about 14 hours, and had to be done overnight. In addition, there were issues where the number of files within the data library exceeded the remaining number of files the operating system (Linux) could handle. The only options were to install an additional HDD large enough to fit the data library (not possible for most laptops), or reinstall the OS and adjust the number of inodes available to the OS. These issues with the data library make XPASS not suitable for distribution,
as it is not clear if it is possible for optical medical to store the database library in this structure. In addition, the data files are stored as compressed binary files. This is disadvantageous as the data is not human readable, and there is a performance penalty as the data must be decompressed during runtime.
The response database format developed allows for storage of an entire response function within a single ASCII text file. These files are human readable, and the entire response library can be stored within a single directory in only a few files (one file for each response function). The database format follows a semi-rigid format, which makes indexing of the data possible. This indexing makes it possible to calculate the position within the file where the desired data begins. Access of the data can be done extremely fast through this indexing, even as the database file grows extremely large. Without this indexing, obtaining data in a large text file would be time consuming as each character must be read in and checked to find the appropriate data. During development, this method was tested on a database file ~15 GB in size and it took nearly 20 minutes to locate the data at the very end of the file. With the indexing scheme, the ability to calculate as opposed to search for the start of the data reduces the time to obtain the data at the end of the ~15 GB file to nearly nothing. The database format is broken into 2 sections, a header section and a response data section. The format of these sections is described below.
The header is divided into a header line, a comment block, a variables block, and a data structure block. The header line, which is the first line of the database file contains 2 integer values, nc and nv, as well as a list (nc values long) of variable flags given by “n” or “s”. The first value, nc, on the header line gives the number of lines of comments present within the comment block. The second value of the header line, nv, gives the number of variables, besides source energy, over which the response function spans;
energy is not included as the response function data always spans source and destination energies without exception. For example, if the AI source sub-model spans electron energy, electron beam radius, target material, target thickness, and beam divergence angle, the value of nv would be 4 (electron energy is excluded). Finally, the header line ends with a list of flags, either “n” or “s”, one for each variable (so nv flags) which describe how the values in the variable block should be interpreted. If a flag is set as “n”, then the variable values will be treated as numeric values, and if the flag is set as “s”, the variable values will be treated as strings. Treating values as a string is required for variables such as target material which do not take numerical values. However, if all data was treated as simple text, then when XPASS needs data for the value 2.65, it will not find a match if the data happens to be stored under as 2.650. Using the “n” flag tell the database parser that the particular variable is a numerical value, and to treat the text entry 2.650 as the numeric value 2.65. A newline character terminates the header line.
The next section within the database file is the comment block. The comment block must be nc lines long, and can take any form. The size of the comment block (in numbers of lines) can always be adjusted by modifying the value of nc.
The variable block follows the comment block. The variable block is nv lines long, with each line containing a whitespace delimited list of the data point values present within the response data for the corresponding variable. For example, if the target material variable of the AI source sub-model spans W, Pb, Al, and Fe, then the line which corresponds to the target material variable would read,
W Pb Al Fe
where the order of values match the order at which they appear within the data block (in this example, W would appear before Pb and so forth). The order in which the variables appear within the variable block indicated the order in which they are present within the
data block. For example, if the AI source sub-model only spanned target material and target thickness, then the variable block
W Pb Al Fe
0.5 1.0 1.5 2.0 2.5
indicates that the data for all target thicknesses (0.5-2.5 mm) of W will appear first, then the data for all target thickness of Pb, and so forth. The variable block
0.5 1.0 1.5 2.0 2.5 W Pb Al Fe
indicates that the data for all target materials (W, Pb, Al, Fe) for target thickness 0.5 mm will appear first, then the data for all target materials for target thickness 1.0 mm, and so forth. See the end of this section for a detailed discussion on the indexing of the data within the response data block. The flags on the header line tell how to interpret the values for each of the respective variables in the variable block. For example, the header line (with 10 lines of comments) for the variable block
W Pb Al Fe
0.5 1.0 1.5 2.0 2.5
would be
10 2 s n
and the header line for the variable block
0.5 1.0 1.5 2.0 2.5 W Pb Al Fe
would be
10 2 n s
After the variable block, the energy and time binning structures are defined in the data structure block. The data structure block beings with a single line with three integer values, nSE, nDE, and nTme, which tell how many bins are present within the source
energy, destination energy, and time binning structures respectively. The next two lines give the source and destination upper energy bin values; and if nTme > 0, a third line is present which gives the upper time bin values. After the group structures, a blank line terminates the header section and initiates the start of the response data section.
The response data section begins after the blank line delimiter separating it from the header section. The response data section stores the response function matrix values as well as the relative errors of each matrix entry. Each matrix beings with a data description line, which lists the values for each of the variables for which the particular response matrix applies. The format of the data within the response data section, as well as the method for indexing of the data is given below.
The response function data are stored as a table which represents the matrix that the data will be placed into. If each matrix element is stored in scientific notation to a precision of d, and the relative error is stored in fixed point notation to a precision e, then each matrix entry will take up characters, including white space, on a single line. For example, if p = 4 and e = 4, the value would be stored as “3.2400E+01 0.0030 ” which includes the trailing whitespace after the relative error value. Each row within the data matrix represents an energy bin within the destination group structure, and each column represents an energy bin within the source energy group structure. Therefore, a single line in the matrix will have c*nSE + 1 characters, including the newline character, and the entire matrix will have M = (c*nSE + 1)*nDE characters. If the response function is time dependent, then matrices for each time bin appear in sequence, so the total number of characters within a time dependent response function is M = (c*nSE + 1)nDE*nTme. To simplify the indexing scheme, the data description line is padded with white space to c*nSE + 1 characters (including the newline character) to match the line width of the data. Therefore, the total number of
characters associated with a response function, including the data and the data description, is given by
4.54 An indexing scheme was developed to quickly calculate which response function within the response data section hold the desired data. Before this method is described, the order of the response data within the data section will be illustrated with an example. If the AI source sub-model spans 3 variables, target material, target thickness, and beam divergence angle, and the phase spaced encompassed by those variables is coarsely sampled, then the variable block of the database file may be given as below,
W Al Pb 0.5 1.0 1.5 2 5
where the target thickness is given in millimeters and the divergence angle is degrees. With a variable block orders as shown above, the order in which the response data will appear within the response data section is
(W,0.5,2),(W,0.5,5); (W,1.0,2),(W,1.0,5); (W,1.5,2),(W,1.5,5);
(Al,0.5,2),(Al,0.5,5); (Al,1.0,2),(Al,1.0,5); (Al,1.5,2),(Al,1.5,5); (Pb,0.5,2),(Pb,0.5,5); (Pb,1.0,2),(Pb,1.0,5); (Pb,1.5,2),(Pb,1.5,5).
Therefore, if the data for (Pb,0.5,5) is desired, the data from the 13th response function must be used, where the first response function (W,0.5,2) is considered the 0th response function.
If the response function spans nv variables, indexed from [1,nv], and the vth
variable contains Nv samples of the possible variable values, indexed from [0,Nv - 1], and
is a nv element vector which has the index of the desired data for each variable (eg. for (Pb,0.5,5) ), then the desired data can be found in the Rth response
4.55
Using equation 4.55 with nv = 3, , N1 = 3, N2 = 3, and N3 = 2, the value of
R is calculated to be which matches previous value given for R.
To index the Rth response function, the location of the first character where the
data beings must be calculated. Knowing the total number of characters associated with a response function M, the character relative to the start of the response data section where the desired response function data begins is given by R*M. To obtain the location relative to the start of the database file, the total number of characters hc in the header section, including newline characters as well as the blank line delimiter between the header and data sections, is added to R*M. The value of hc must be determined by counting each character within the header block when the database file is opened, as there are no restrictions on the number of characters within the various sections of the header block. Written out explicitly, the location of the character, relative to the beginning of the database file, where the desired data described by begins, is given by
4.56