Part II: Core CORBA
Chapter 4. The OMG Interface Definition Language
4.6 Basic IDL Types
IDL provides a number of built-in basic types, and they are shown in Table 4.1.
Table 4.1. IDL basic types.
Type Range Size
short -215 to 215 -1 = 16 bits long -231 to 231 -1 = 32 bits unsigned short 0 to 216-1 = 16 bits unsigned long 0 to 232-1 = 32 bits float IEEE single-precision = 32 bits
double IEEE double-precision = 64 bits
char ISO Latin-1 = 8 bits
string ISO Latin-1, except ASCII NUL Variable-length
boolean TRUE or FALSE Unspecified
octet 0–255 = 8 bits
any Run-time identifiable arbitrary type Variable-length
The CORBA specification requires that language mappings preserve the size of these types as shown. The value ranges shown in Table 4.1 need not be maintained by all language mappings, but CORBA requires implementations to document any deviations from the specified ranges. (The C++ mapping preserves all value ranges.)
These requirements may sound confusing. For example, when you look at the size requirements, you will find that IDL specifies only a lower bound instead of an exact size. The reason is that some CPU architectures do not have, for example, an 8-bit character type or a 16-bit integer type; on such CPUs, these types are mapped to a type larger than 8 or 16 bits. Similarly, some language mappings cannot preserve the full range of all types; for example, Java does not have unsigned integers and maps both unsigned long and long to Java int. To avoid restricting the possible target environments and languages, the CORBA specification leaves the size and range requirements for IDL basic types loose.
All the basic types (except octet) are subject to changes in representation as they are transmitted between clients and servers. For example, a long value undergoes byte swapping when sent from a big-endian to a little-endian machine. Similarly, characters undergo translation in representation if they are sent from an EBCDIC to an ASCII implementation. What happens if a character does not have a precise match in the target character set is implementation-dependent. For example, the EBCDIC character ¬ does not have an ASCII equivalent. An ORB might translate EBCDIC ¬ into ASCII ~, or it might raise a DATA_CONVERSION exception (see Section 4.10) to indicate that translation is impossible. Characters may also change in size (not all architectures use 8- bit characters). However, these changes are transparent to the programmer and do exactly what is required.
Table 4.1 does not include a pointer type. There are a number of good reasons for this. Pointer types are used much less in object-oriented programming than in non-OO languages.
Some implementation languages (such as COBOL and Java) do not support pointers. Pointers would complicate the implementation of marshaling for ORB vendors and would incur additional run-time costs.
As you will see in Section 4.8.2, the lack of pointers is no great hardship. IDL uses object references to achieve what in a non-OO environment would normally be done with a pointer. In effect, object references are pointers. However, object references can denote
only objects but cannot point to data. IDL supports recursive data types, such as trees, without introducing a data pointer type (see Section 4.7.8).
CORBA recently extended IDL to support additional numeric and character types. Because many ORBs do not yet provide these types, we cover them separately in
Section 4.21.
4.6.1 Integer Types
IDL does not have a type int, so there are no guessing games as to its range. An IDL
short is mapped to at least a 2-byte type, and IDL long is mapped to at least a 4-byte type.
Some languages (notably Java) do not support unsigned types. Because of this,
unsigned short and unsigned long map to Java short and int, respectively. This means that a Java programmer must ensure that large unsigned IDL values are treated correctly when represented as Java signed values.
4.6.2 Floating-Point Types
These types follow the IEEE specification for single- and double-precision floating-point representation [7]. If an implementation cannot support IEEE format floating-point values, it must document how it deviates from the IEEE specification.
4.6.3 Characters
IDL characters support the ISO Latin-1 character set [8], which is a superset of ASCII. The bottom 128 character positions (0–127) are identical to ASCII. The top 128 character positions (128–255) are taken up by characters such as Å, ß, and Ç. This arrangement allows most European languages to be used with an 8-bit character set. Recently, IDL was extended to support wide characters and strings. This permits use of arbitrary wide character sets, such as Unicode.
4.6.4 Strings
IDL strings support the ISO Latin-1 character set with the exception of ASCII NUL (0). Disallowing NUL inside IDL strings is a concession to C and C++; the notion of NUL- terminated strings is so deeply ingrained in C and C++ that allowing embedded NUL characters would make the use of IDL strings impossibly difficult in these languages. IDL strings can be bounded or unbounded. An unbounded string has the IDL type
string and can grow to any length. A bounded string type specifies an upper limit on the length of the string. For example, string<10> is a string type that permits only strings of up to ten characters.
The bound of a string does not include any terminating NUL character, so the string
"Hello" will fit into a string of type string<5>. (Many programming languages do not represent strings as NUL-terminated arrays, so the concept of NUL termination does not apply to IDL.)
Most C and C++ ORB implementations ignore bounded strings and treat them as if they were unbounded. This limitation arises because C and C++ do not support bounded strings natively, and emulating bounded string support would result in awkward language mappings. As a C++ programmer, you are made responsible for enforcing the bound at run time.
4.6.5 Booleans
Boolean values can have only the values TRUE and FALSE. IDL makes no requirement as to how these values are to be represented in particular languages nor about the size of a Boolean value.
4.6.6 Octets
The IDL type octet is an 8-bit type that is guaranteed not to undergo any changes in representation as it is transmitted between address spaces. This guarantee permits exchange of binary data so that it is not tampered with in transit. All other IDL types are subject to changes in representation during transmission.
4.6.7 Type any
Type any is a universal container type. A value of type any can hold a value of any other IDL type, such as long or string, or even another value of type any. Type any
can also hold object references or user-defined complex types, such as arrays or structures.
Type any is useful when you do not know at compile time what IDL types you will eventually need to transmit between client and server. Type any is IDL's equivalent of what in C++ is typically achieved with a void * or a stdarg variable argument list. However, type any is substantially safer because it is self-describing (you can find out at run time what type of value is contained in an any). Manipulation of values of type any
is type-safe; attempts to, for example, extract a float as a string return an error indication. As a result, careless misinterpretation of a value as the wrong type is much less likely than it is with the completely type-unsafe mechanism of using a void *. We look at type any and its C++ mapping in detail in Chapter 15.