1.2 Assumptions
2.2.2 The structure of a FITS file
A FITS file is composed by a sequence of Header Data Units (HDUs), that can be fol- lowed by a set of special records. Each HDU is composed by one header and the data that follows. The header is a sequence of 36 80- byte ASCII card images containing key- word = value statements. There are three special classes of keywords: required keywords, reserved keywords and the ones that are defined by the user. The data that follows (also called data records) is structured as the header specifies and it is binary data. The size of each logical record is 23040 bits, equivalent to 2880 bytes. Each HDU consists of one or more logical records. The last record of the header is filled with ASCII blanks so it can fill
the 23040-bit lenght. The first HDU of a FITS file is called Primary HDU. The HDUs that follow the Primary HDU are called extensions. When the FITS files contains one or more extensions, it is most likely that the Primary HDU does not contain any data. When the FITS file does not contain extensions it is called a Basic FITS, that is a file containing only the primary header followed by a single primary data array.
The Primary HDU is the first HDU of a FITS file. It is composed of one header (Pri- mary Header) and the data that follows. If the Primary HDU is alone in the FITS file (there are no extensions), so it will be called Basic FITS. It is not normal (except for FITS images) that a Primary HDU contains any data, but if it does, it has to be a matrix of data values, in binary format that it is called Primary Array.
The Extensions have the same overall organization of all the HDUs (one header and the data that follows) and they come after the Primary HDU, respecting the structure of the FITS file. The extensions brought some new functionalities to the FITS files:
• Transfer new types of data structures: Images, ASCII Tables and Binary Tables • Transfer collections of related data structures
• The data to be transported do not always fit conveniently into an array format • Transport of auxiliary Information
The Tables are used to store astronomical data that is collected and they contain rows and columns of data. In the FITS files there are two types of tables: the ASCII Tables and the Binary Tables. As the name says, the ASCII tables store the data values in an ASCII representation. The data appear as a character array, in which the rows represent the lines of a table and the columns represent the characters that make up the tabulated items. Each member of the array is one character or digit. Each character string or ASCII representation of a number are in the FORTRAN-77 format. As for the binary tables, they store the data in a binary representation. The binary tables are more efficient, compact (about half of the size for the same information content), support more features and the time spent converting to ASCII tables is eliminated. The display is not as direct as for ASCII tables. The data types that can be stored in the FITS tables are:
• L: Logical value: 1 byte • B: Unsigned byte: 1 byte • I: 16-bit integer: 2 bytes • J: 32-bit integer: 4 bytes
2.2. INTRODUCTION TO FITS 11 • K: 64-bit integer: 8 bytes
• A: Character: 1 byte
• E: Single precision floating point: 4 bytes • D: Double precision floating point: 8 bytes • C: Single precision complex: 8 bytes • M: Double precision complex: 16 bytes • P: Array Descriptor (32-bit): 8 bytes • Q: Array Descriptor (64-bit): 16 bytes
Chapter 3
Contribution to MonetDB
In this chaper we present our contribution to MonetDB, through the development of a vault module that provides a set of functionalities concerning to FITS files. We will do a short overview of the vault concept. Further, we will describe the architecture of a vault. Finally, we will list a set of procedures and functions that were developed to make the integration between FITS files and MonetDB possible.
3.1
Overview of the vaults
A vault can be defined as a safety deposit box or as a repository for valuable infor- mation. By conducting an analogy to computer science terms, a data vault can be seen as a folder that contain only images. Inside the same data vault, the objects have one important factor in common: the metadata. It is the metadata that they have in common that allows a possible distinction between different kinds of vaults, and even the ability to create a completely new data vault based on similar parameters of the objects.
What distinguishes the objects in the same vault is the data that they carry. For ex- ample, if the object is an image, we know that it will have pixels, height and width. However, the values that are assigned to each one of the attributes differ for each image. Knowing that, we can create a vault based on a directory of files. We just need to un- derstand their metadata (using appropriate tools that allow us to access it), what meta- data they have in common and what is their data model. Comprehending the data model, we can decide how it will be represented in the relational database system.
We will apply the term vault to our case study. Creating a vault directory of FITS files, that share the same metadata but for which each one contains its own information.
The system needs to understand the external formats that contain the scientific data 13
(FITS). Once this is understood, there will be a distinction between loading data and attaching data. The idea of loading high volumes of data will be abandoned as it is time consuming, and for the most part, it is not what the scientist requires. This concept allows for the attachment of data (automatic attachment of files to the database), providing the metadata to the scientists, giving them the opportunity to decide what is relevant. It will be a selective load, and it will take less time.