Computational Overhead - Measuring Policy Overhead

4.2 Measuring Policy Overhead

4.2.2 Computational Overhead

Motivation

For each transaction that modifies an artifact, we require that the new state configuration be calculated and then compared against the old state configuration(s) to ensure that all specified constraints are being met. We anticipate that in many situations where states for business level policy conditions share common variables, we will not have to incur the cost of checking every state conditional independently. For example, consider a workflow for invoices with two states called, paid and unpaid, with the conditions, “paid = true” and “paid = false” respectively. Observe that we need to check only one of these conditions to conclude the state configuration for both states.

In business situations where a large number of states exist in the policy model, it is very likely that many of them will share the same variables, and thus checking whether an object belongs to several states may be accomplished much more quickly than performing the test for each condition independently. Similarly, while the state configuration is being computed, we can simultaneously check whether a particular constraint is violated or not and prune the space of possible constraint violations dynamically. These optimizations aside (discussed further in Section 4.3), in our tests we took a pessimistic stance by as- suming that there are no avenues of optimization available for the database engine/query compiler to exploit.

Test Setup

We conducted tests to measure the computational overhead of dealing with varying num- bers of constraints in a policy model. We noted earlier that there are many mechanisms present in database systems that can be used to monitor the implications of an update such as check constraints, triggers, and constraints on materialized views. Our tests were designed to explore the costs associated with repeatedly calculating the state configuration, and our objective in this section is to provide a reference for comparing the practical computational costs of two of these possible techniques in light of a varying number of assertions specified in first order temporal logic.

Figure 4.2: Time taken by a SQL server trigger to complete simple in-memory if-then styled checks grows linearly with an extremely high overhead per check. Very quickly these checks become the bulk/bottleneck in trigger processing. A C++ procedure however responds in a very predictable fashion where performing the exact same in-memory checks has no bearing on the total cost of the transaction.

Once again we used the number of states (one condition per state) as a proxy for the amount and degree of workflow complexity: 0 states (to represent no computational overhead as a measure of baseline costs), 128 states, 512 states, and 1024 states. Under this test setup, adding an extra state essentially means incurring the cost of an additional synthetically designed check to see whether an object belongs to that state or not. These checks were designed to simulate traditional business level string and arithmetic comparisons (such as “status = paid”, “shipcode = M”, “amount ≥ 250”, and “shipinstruct=overnight”) that a system will be expected to do to determine an object’s presence in each state. These tests were performed sequentially and the results of being present in one state provided no information to determine whether the object will (or will not) be present in any other state. Thus, there was no scope for optimization.

Results

Our hope was that we would be quickly able to demonstrate a very obvious fact about calculating the state configuration history. That is, determining whether an object belongs to a particular state through a simple check (SQL if-condition), where all the necessary data to do so is already in memory, would pose no transactional cost. After all, simple one line statements (“status=paid” or “amount ≥ 5”) on data that is already memory-resident, even when repeated thousands of times, should not be comparable to disk writes (which are several orders of magnitude slower than memory/CPU operations and dominate database transaction costs).

Our results (depicted in Figure 4.2) show that, contrary to what was expected, SQL based triggers, as implemented in Microsoft SQL Server 2008, are extremely inefficient at performing even the most basic operations on primitive data types. We observed a linear correlation between time taken to execute and the number of synthetically designed checks (essentially IF-statements in SQL to compare attributes against static values). At 1024 states Microsoft SQL Server took upwards of ten seconds to complete one invocation of such a sequence of checks. Our attempts to identify the cause of this significant slow down caused by increasing the number of in-memory operations was hampered by the lack of instrumentation provided by SQL Server to analyze the execution of triggers. Note that at 1024 checks our source code for a typical trigger was around 100KB, and we were able to verify that SQL Server was not re-compiling the trigger at every invocation. We were also able to confirm that the delay was purely in the execution of the compiled code (i.e., not waiting on disk or other resources). Our conclusion here is that for some reason the compiled code being generated for trigger processing was far more inefficient than expected. These results were not unique to Microsoft SQL Server and similar results demonstrating poor execution performance of SQL code were found when using IBM’s DB2 database server.

Fortunately, most database systems (including Microsoft SQL Server) allow the execution of externally written procedures, that is, executing a memory-resident program outside the database engine, and exchanging relevant parameters for analysis. Programs in a language such as C++ can be called upon to do any form of work if the database/SQL

execution engine is not adequate1_{. Such techniques are commonly used in scientific appli-} cations, where external programs such as MatLab and R are used to perform data analysis and store the results back in a database. When such an approach is adopted (moving the if-conditions into an external C++ program), there is no discernable increase in the time to test 1024 state conditions than to perform no tests at all.

We conclude that although methods for automatic generation of SQL level triggers have been well examined in prior literature, implementing them efficiently (i.e., in a manner that does not immensely slow down the system) is anything but straightforward. Because of the vast differences in SQL execution behavior and how different database engines handle trigger code, an LTL constraint to SQL trigger generator will need to be optimized for each individual database product. That being said, this customization will require little effort but a lot of familiarity with the performance quirks of a given database engine.

In document Business Policy Modeling and Enforcement in Relational Database Systems (Page 110-113)