Chapter 4 Programming Techniques and Caveats
4.2 Managing Module Data
When you first start programming, you learn about the scope of data. Typically (in C and most other lexically scoped languages), a variable declared within a function or block remains in scope until the end of the function or block, but thereafter is undefined. Variables may also have global scope and remain defined throughout the program. Of course, in terms of simple C programming, variables in Apache follow these rules.
4.2.1 Configuration Vectors
Apache modules are based on callbacks. C does not provide a mechanism to share data over two or more separate callback functions, other than global scope, which is, of course, not appropriate in a multithreaded environment. Apache provides an alter- native means of managing data: the configuration vector (ap_conf_vector_t). The
primary purpose of such vectors is, as the name suggests, to hold configuration data. They also serve a more general purpose.
4.2.2 Lifetime Scopes
The Apache architecture naturally defines a different kind of scope for data— namely, the core objects of process, server, connection, and request. Most data are naturally associated with one of these objects (or some subobject such as a filter). The Apache configuration vectors together with APR pools provide a natural frame- work for module data to be tied to an appropriate object. This deals nicely with two problems:
1. Using an appropriate configuration vector deals with the scoping issue, making data available wherever they are required.
2. Using an appropriate pool deals with the lifetime of resources, ensuring that they are properly cleaned up after use.
These techniques gives us three simple and useful associations: Variables and data can be associated with the server, the connection, or the request objects.
4.2.2.1 Configuration Data
Configuration data (Chapter 9) are set at server start-up, but can be accessed later by looking them up on the configuration vectors from request_rec or server_rec:
svr_cfg* my_svr_cfg =
ap_get_module_config(server->module_config, &my_module); dir_cfg* my_dir_cfg =
ap_get_module_config(request->per_dir_config, &my_module);
When the server is running, configuration data should be treated as strictly read- only. Any changes will affect not only the current request, but also any other requests running concurrently or later in the same process.
4.2.2.2 Request Data
Apart from the configuration, the most common nontrivial case we have to deal with is where data need to be created in the course of processing a request, but scoped over more than one hook. Apache provides a pool and a configuration vec- tor that are explicitly intended to enable modules to give variables the scope and lifetime of a request:
static int my_early_hook(request_rec* r) { req_cfg* my_req ;
...
my_req = apr_palloc(r->pool, sizeof(req_cfg)) ;
ap_set_module_config(r->request_config, &my_module, my_req); /* Set the data fields of my_req as required */
}
static int my_later_hook(request_rec* r) {
req_cfg* my_req = ap_get_module_config(r->request_config, &my_module); /* Now we have all the data and we can do what we want with it */ }
And if we have a hook where the req_cfgmay or may not be already set: static int my_other_hook(request_rec* r) {
req_cfg* my_req; ...
my_req = ap_get_module_config(r->request_config, &my_module); if (my_req == NULL) {
/* It hasn't been set yet */
my_req = apr_palloc(r->pool, sizeof(req_cfg)) ;
ap_set_module_config(r->request_config, &my_module, my_req); /* Set the data fields of my_req as required */
}
/* Now we have my_req, whether or not it was already set */ }
The lesson here is to get into the habit of using the request configuration vector whenever we have data that need to be scoped over more than one hook. The con- figuration struct itself is, of course, completely defined by the module, and it con- tains exactly what the module needs it to contain. If the module is complex and has multiple different hooks, each of which needs to set variables for later use, the dif- ferent data should be combined in the configuration vector—for example, by giv- ing each function its own substructure.
Note the standard use of the request pool to allocate the request configuration vec- tor. The request configuration vector, therefore, will be freed at the end of the request, which is exactly what we want. Any data members that involve dynamic resource allocation should similarly use the request pool or register a cleanup on it, as discussed in Chapter 3 and illustrated in examples throughout this book. The request pool and request configuration solve the problem of resource management in request processing.
4.2.2.3 Connection Data
The connection is the other transient core object in Apache. It, too, presents a pool and a configuration vector for management of connection data. Use of the connec- tion configuration and pool is exactly analogous to their use with the request.
4.2.2.4 Persistent Data
A more complex case arises where a module needs to manage persistent but non- constant data. Such data may be held on the server_recobject (separate from any
configuration data fields), or even given global scope. In either case, thread-safety becomes an issue, and we need to use a mutex for any critical operations. We usu- ally also need to define a pool for our module, as we should normally only use the process pool at server startup. The mutex and the pool will have the same scope and lifetime as the variable data. We’ll discuss this in detail below.