additionally, the compiler may use this information to perform useful optimizations
9.3 Data Types in Programming Languages
9.4.1 Parametric Polymorphism
Can we make a function which will behave differently for different types of arguments based on an explicitly given type? We can do it to some extent, even in C89. However, we will need some rather heavy macro machinery in order to achieve a smooth result.
First, we have to know what this fancy # symbol does in a macro context. When used inside a macro, the
# symbol will quote the symbol contents. Listing 9-54 shows an example.
Listing 9-54. macro_str.c
#define mystr hello
#define res #mystr
puts( res ); /* will be replaced with `puts("hello")`
The ## operator is even more interesting. It allows us to form symbol names dynamically. Listing 9-55 shows an example.
Listing 9-55. macro_concat.c
#define x1 "Hello"
#define x2 " World"
#define str(i) x##i
puts( str(1) ); /* str(1) -> x1 -> "Hello" */
puts( str(2) ); /* str(2) -> x2 -> " World" */
Some higher-level language features can be boiled down to compiler logic performing a program analysis and making a call to one or another function, using one or another data structure, etc. In C we can imitate it by relying on a preprocessor.
Listing 9-56 shows an example.
bool pair_##T##_any(struct pair(T) pair, bool (*predicate)(T)) {\
return predicate(pair.fst) || predicate(pair.snd); \ }
printf("%d\n", any(int)(obj, is_positive) );
return 0;
}
First, we included stdbool.h file to get access to the bool type, as we said in section 9.1.3.
• pair(T) when called like that: pair(int) will be replaced by the string pair_int.
• DEFINE_PAIR is a macro which, when called like that: DEFINE_PAIR(int), will be replaced by the code shown in Listing 9-57.
Notice the backslashes at the end of each line: they are used to escape the newline character, thus making this macro span across multiple lines. The last line of the macro is not ended by the backslash.
This code defines a new structural type called struct pair_int, which essentially contains two integers as fields. If we instantiated this macro with a parameter other than T, we would have had a pair of elements of a different type.
Then a function is defined, which will have a specific name for each macro instantiation, since the parameter name T is encoded into its name. In our case it is pair_int_any, whose purpose is to check whether any of two elements in the pair satisfies the condition. It accepts the pair itself as the first argument and the condition as the second. The condition is essentially a pointer to a function accepting T and returning bool, a predicate, as its name suggests.
pair_int_any launches the condition function on the first element and then on the second element.
When used, DEFINE_PAIR defines the structure that holds two elements of a given type, and functions to work with it. We can have only one copy of these functions and structure definition for each type, but we need them, so we want to instantiate DEFINE_PAIR once for every type we want to work with.
Listing 9-57. macro_define_pair.c struct pair_int {
int fst;
int snd;
};
bool pair_int_any(struct pair_int pair, bool (*predicate)(int)) { return predicate(pair.fst) || predicate(pair.snd);
}
• Then a macro #define any(T) pair_##T##_any is defined. Notice that its sole purpose is apparently just to form a valid function name depending on type. It allows us to call pair_##T##_any in a rather elegant way: any(int), as if it was a function returning a pointer to a function.
So, syntactically we got very close to a concept of parametric polymorphism: we are providing an additional argument (int) which serves to determine the type of other argument (struct pair_int). Of course, it is not as good as the type arguments in functional languages or even generic type parameters in C#
or Scala, but it is something.
9.4.2 Inclusion
The inclusion is fairly easy to achieve in C for pointer types. The idea is that every struct’s address is the same as the address of its first member.
Take a look at the example shown in Listing 9-58.
Listing 9-58. c_inclusion.c
#include <stdio.h>
struct parent {
const char* field_parent;
};
struct child {
struct parent base;
const char* field_child;
};
void parent_print( struct parent* this ) { printf( "%s\n", this->field_parent );
}
int main( int argc, char** argv ) { struct child c;
c.base.field_parent = "parent";
c.field_child = "child";
parent_print( (struct parent*) &c );
return 0;
}
The function parent_print accepts an argument of a type parent*. As the definition of child suggests, its first field has a type parent. So, every time we have a valid pointer child*, there exists a pointer to an instance of parent which is equal to the former. Thus it is safe to pass a pointer to a child when a pointer to the parent is expected.
The type system, however, is not aware of this; thus you have to convert the pointer child* to parent*, as seen in the call parent_print( (struct parent*) &c );. We could replace the type struct parent*
with void* in this case, because any pointer type can be converted to void* (see section 9.1.5).
9.4.3 Overloading
Automated overloading was not possible in C until C11. Until recently, people included the argument type names in the function names to provide different “overloadings” given some base name. Now the newer standard has included a special macro which expands based on the argument type: _Generic. It has a wide range of usages.
The _Generic macro accepts an expression E and then many association clauses, separated by a comma.
Each clause is of the form type name: string. When instantiated, the type of E is checked against all types in the associations list, and the corresponding string to the right of colon will be the instantiation result.
In the example shown in Listing 9-59, we are going to define a macro print_fmt, which can choose an appropriate printf format specifier based on argument type, and a macro print, which forms a valid call to printf and then outputs newline.
print_fmt matches the type of the expression x with one of two types: int and double. In case the type of x is not in this list, the default case is executed, providing a fairly generic %x specifier. However, in absence of the default case, the program would not compile should you provide print_fmt with an expression of the type, say, long double. So in this case it would be probably wise to just omit default case, forcing the compilation to abort when we don’t really know what to do.
Listing 9-59. c_overload_11.c
#define print(x) printf( print_fmt(x), x ); puts("");
int main(void) {
We can use _Generic to write a macro that will wrap a function call and select one of differently named functions based on an argument type.
9.4.4 Coercions
C has several coercions embedded into the language itself. We are speaking essentially about pointer conversions to void* and back and integer conversions, described in section 9.1.4. To our knowledge, there is no way to add user-defined coercions or anything that looks at least remotely similar, akin to Scala’s implicit functions or C++ implicit conversions.
As you see, in some form, C allows for all four types of polymorphism.
9.5 Summary
In this chapter we have made an extensive study of the C type system: arrays, pointers, constant types. We learned to make simple function pointers, seen the caveats of sizeof, revised strings, and started to get used to better code practices. Then we learned about structures, unions, and enumerations. At the end we talked briefly about type systems in mainstream programming languages and polymorphism and provided some advanced code samples to demonstrate how to achieve similar results using plain C. In the next chapter we are going to take a closer look at the ways of organizing your code into a project and the language properties that are important in this context.
■