6.3 The Numeric Coprocessor
7.1.3 Bit Fields
#pragma pack(pop) /∗ restore original alignment ∗/
Figure 7.4: Packed struct using Microsoft or Borland
The gcc compiler also allows one to pack a structure. This tells the compiler to use the minimum possible space for the structure. Figure 7.3 shows how S could be rewritten this way. This form of S would use the minimum bytes possible, 14 bytes.
Microsoft’s and Borland’s compilers both support the same method of specifying alignment using a #pragma directive.
#pragma pack(1)
The directive above tells the compiler to pack elements of structures on byte boundaries (i.e., with no extra padding). The one can be replaced with two, four, eight or sixteen to specify alignment on word, double word, quad word and paragraph boundaries, respectively. The directive stays in effect until overridden by another directive. This can cause problems since these directives are often used in header files. If the header file is included before other header files with structures, these structures may be laid out differently than they would by default. This can lead to a very hard to find error. Different modules of a program might lay out the elements of the structures in different places!
There is a way to avoid this problem. Microsoft and Borland support a way to save the current alignment state and restore it later. Figure 7.4 shows how this would be done.
7.1.3 Bit Fields
Bit fields allow one to specify members of a struct that only use a spec-ified number of bits. The size of bits does not have to be a multiple of eight. A bit field member is defined like an unsigned int or int member with a colon and bit size appended to it. Figure 7.5 shows an example. This defines a 32-bit variable that is decomposed in the following parts:
struct S {
unsigned f1 : 3; /∗ 3−bit field ∗/
unsigned f2 : 10; /∗ 10−bit field ∗/
unsigned f3 : 11; /∗ 11−bit field ∗/
unsigned f4 : 8; /∗ 8−bit field ∗/
};
Figure 7.5: Bit Field Example
Byte \ Bit 7 6 5 4 3 2 1 0
0 Operation Code (08h)
1 Logical Unit # msb of LBA
2 middle of Logical Block Address 3 lsb of Logicial Block Address
4 Transfer Length
5 Control
Figure 7.6: SCSI Read Command Format
8 bits 11 bits 10 bits 3 bits
f4 f3 f2 f1
The first bitfield is assigned to the least significant bits of its double word.2 However, the format is not so simple if one looks at how the bits are actually stored in memory. The difficulty occurs when bitfields span byte boundaries. Because the bytes on a little endian processor will be reversed in memory. For example, the S struct bitfields will look like this in memory:
5 bits 3 bits 3 bits 5 bits 8 bits 8 bits
f2l f1 f3l f2m f3m f4
The f2l label refers to the last five bits (i.e., the five least significant bits) of the f2 bit field. The f2m label refers to the five most significant bits of f2. The double vertical lines show the byte boundaries. If one reverses all the bytes, the pieces of the f2 and f3 fields will be reunited in the correct place.
The physical memory layout is not usually important unless the data is being transfered in or out of the program (which is actually quite common with bit fields). It is common for hardware devices interfaces to use odd number of bits that bitfields could be useful to represent.
2Actually, the ANSI/ISO C standard gives the compiler some flexibility in exactly how the bits are laid out. However, common C compilers (gcc, Microsoft and Borland ) will lay the fields out like this.
1 #define MS OR BORLAND (defined( BORLANDC ) \
2 || defined ( MSC VER))
3
4 #if MS OR BORLAND
5 # pragma pack(push)
6 # pragma pack(1)
7 #endif
8
9 struct SCSI read cmd {
10 unsigned opcode : 8;
11 unsigned lba msb : 5;
12 unsigned logical unit : 3;
13 unsigned lba mid : 8; /∗ middle bits ∗/
14 unsigned lba lsb : 8;
15 unsigned transfer length : 8;
16 unsigned control : 8;
17 }
18 #if defined ( GNUC )
19 attribute ((packed))
20 #endif
21 ;
22
23 #if MS OR BORLAND
24 # pragma pack(pop)
25 #endif
Figure 7.7: SCSI Read Command Format Structure
One example is SCSI3. A direct read command for a SCSI device is spec-ified by sending a six byte message to the device in the format specspec-ified in Figure 7.6. The difficulty representing this using bitfields is the logical block address which spans 3 different bytes of the command. From Figure 7.6, one sees that the data is stored in big endian format. Figure 7.7 shows a definition that attempts to work with all compilers. The first two lines define a macro that is true if the code is compiled with the Microsoft or Borland compilers. The potentially confusing parts are lines 11 to 14. First one might wonder why the lba mid and lba lsb fields are defined separately and not as a single 16-bit field? The reason is that the data is in big endian order. A 16-bit field would be stored in little endian order by the compiler.
Next, the lba msb and logical unit fields appear to be reversed; however,
3Small Computer Systems Interface, an industry standard for hard disks, etc.
8 bits 8 bits 8 bits 8 bits 3 bits 5 bits 8 bits control transfer length lba lsb lba mid logical unit lba msb opcode
Figure 7.8: Mapping of SCSI read cmd fields
1 struct SCSI read cmd {
2 unsigned char opcode;
3 unsigned char lba msb : 5;
4 unsigned char logical unit : 3;
5 unsigned char lba mid; /∗ middle bits ∗/
6 unsigned char lba lsb ;
7 unsigned char transfer length ;
8 unsigned char control;
9 }
10 #if defined ( GNUC )
11 attribute ((packed))
12 #endif
13 ;
Figure 7.9: Alternate SCSI Read Command Format Structure
this is not the case. They have to be put in this order. Figure 7.8 shows how the fields are mapped as a 48-bit entity. (The byte boundaries are again denoted by the double lines.) When this is stored in memory in little endian order, the bits are arranged in the desired format (Figure 7.6).
To complicate matters more, the definition for the SCSI read cmd does not quite work correctly for Microsoft C. If the sizeof (SCSI read cmd) ex-pression is evalutated, Microsoft C will return 8, not 6! This is because the Microsoft compiler uses the type of the bitfield in determining how to map the bits. Since all the bit fields are defined as unsigned types, the compiler pads two bytes at the end of the structure to make it an integral number of double words. This can be remedied by making all the fields unsigned short instead. Now, the Microsoft compiler does not need to add any pad bytes since six bytes is an integral number of two-byte words.4 The other com-pilers also work correctly with this change. Figure 7.9 shows yet another definition that works for all three compilers. It avoids all but two of the bit fields by using unsigned char.
The reader should not be discouraged if he found the previous discussion confusing. It is confusing! The author often finds it less confusing to avoid bit fields altogether and use bit operations to examine and modify the bits
4Mixing different types of bit fields leads to very confusing behavior! The reader is invited to experiment.
manually.