In the formulation presented in the previous section (Section 4.2) we do not consider the complexity of each invocation. Our overview of execution, i.e. a transformation trace, does not equate to work completed by each rule. We only consider that a sequence of events has occurred. Each event, however, may have more complexity than another. Here we are not discussing computational complexity, i.e. time and memory consumption, but that each rule has some distinct features that can be used to weigh it. In the case of AtoC, we might surmise that the rule only sets the name and type of a column element; whereas EtoT has to iterate through attributes to transform them, impose constraints, keys and indices. One can argue that the task carried out by these two have different levels of complexity, so is it right that the values for EtoTy outweighs AtoCyi (see Table 4.1) by so much and will continue
to do so when it appears so many times?
∀h ∈ H, v ∈ V, confidenceG(v, h) = prominenceG(v, H)× W`V(v) (4.2)
Equation (4.2) shows an amendment of Equation (4.1), which takes into account a complexity value for a label, in our case a rule. This modification provides us with a weighing mechanism to stop values becoming disproportionately large. A key decision here is to determine what we are looking at; this will influence the measure of complexity we use. Below we explain two types of complexity. The method presented here is not dependent the complexity metric. The key decision is what are we trying to analyse? Are we analysing the transformation itself, the generated code, or both?
Rule Complexity
The main components of an M2M transformation are the rules that can be utilised by the transformer. Thus the complexity is directly related to the work that these do in changing the representation of one model to another. The guard and bind phases contain the bulk of a rule’s complexity in M2M transformation for languages like the ATLAS Transformation Language (ATL) and ETL or in the case of SiTra the check and set_properties methods (Akehurst et al. 2006). These are the functions that decide whether the rule is relevant and takes care of the setting of attributes and relationships. The initialisation phase has a negligible impact due to its pure nature of creating objects. The complexity of the checking phase is required as it may attempt to transform more or fewer objects than it should.
This problem has two sides. On the one hand, it might generate an incomplete final model as it discounts input model elements for transformation. On the other, however, it would not only cause an increase in resources, but it may also cause runtime errors due to incomplete value checking. These additional objects are of particular importance when using a duck-typed language. If a rule identifies an attribute as an entity, a TypeError would occur when looking for its constraints as attributes do not have them. The effects are the same with regards to the binding phase, as code becomes more complex there is a higher probability of mistakes happening.
Complexity comes from many places when considering transformation rules.
For instance, we can attempt to calculate the complex nature of the code itself.
McCabe (1976) introduces cyclomatic complexity: an algorithm to analyse the execution graph of a program. It is a quantitative measure of linearly independent paths through the source code of an application. This value was an attempt to reduce the complexity of modules. If the value was above ten, then the module should be broken down further or a reason provided as to why it was an accepted risk. Here we can assume the more independent paths there are, the higher the risk of error. Particularly when we consider conditional branches, if the condition were to be incorrect undefined behaviour could occur. With this in mind, we would want to be more cautious of functions with higher complexity values as there is more of a potential for error. Assume M0, M1, . . . , Mk represents the complexity of rules r0, r1, . . . , rk respectively. We consider the relative complexity of a rule
M`V (v) P
i∈ΣV Mi. We use 1 −PMi∈`V (v)
ΣV Mi as a coefficient for prominence function to reduce the confidence in more complex rules. We want for higher ratio to have smaller confidence, as a rule, is more complex. Formally we define W`V as:
W`V(v) = 1− M`V(v)
P
i∈ΣV Mi (4.3)
Another metric could be the number of test cases that have been carried out for a given rule. It is only natural for developers to generate more test cases for larger and more complex modules. Therefore it is reasonable to assume that the more tests that are available to the model are indicative of the complexity of the rule.
This assumption also relates directly to the users as validation provides confidence that a product will do as it is expected. In the case of EtoT and AtoC, if EtoT had 200 test cases and AtoC had ten then we might accept a higher score for our example. Unlike McCabe’s cyclomatic complexity, we want a higher ratio to have a higher value. Here we assume that M0, M1, . . . , Mk represent the number of tests available to rules r0, r1, . . . , rk respectively. However this time we do not negate the fraction as follows:
W`V(v) = M`V(v) P
i∈ΣV Mi
Deferred Complexity
Often M2M transformations result in a model that will be used to generate code.
For example, the result of our object-orientated view to a relational might be used to generate SQL to create the database or to create data access objects for querying.
Another, more general, example might be the generation of an imperative language from a declarative, i.e. the translation of business logic into code that will complete the tasks required.
Other options include annotating templates to inform what its output will
Probability
High Moderate Low
Severity High 2 1 1
Moderate 3 2 1
Low 3 3 2
(a) Risk Class of a task given the probability of its failure and severity if it were to occur.
Detectability
(b) Risk Priority given the Risk Class and how detectable a failure would be.
Table 4.2: GAMP5 matrices to determine the Risk Priority of a task.
eventually be used to do. For instance, will this generated code write to memory?
Will this modify any existing data? Alternatively, will this interact with mission critical devices, like a heart monitor? In these instances, we need to know what is present in the output. Here we can look at the structural or semantic meanings of the outputted code.
A key issue with this type of metric is how one would quantify it. One approach would be to use Risk Priority Numbers (RPN) to form the basis of the weighing mechanism. Risk prioritisation is from Failure Mode, Effects and Criticality Analysis (FMECA) and involves looking at the probability of failure, the severity of its effects if it were to happen, and how detectable the failure would be (Handbook 1982). Let us say we had code that was being generated to interact with a heart monitor. The probability of failure might be slim as we are using an API to acquire data from sensors. However, its severity might be high as if it incorrectly reported the value, we do not know the real status of the patient. It could also be difficult to detect if there is no practitioner available to verify it at the time so this might be deemed very complex. RPN, in the case of FMECA, uses one to ten as possible values
EtoTz AtoCz0 AtoCz1
AtoCz2
invokes
recalls invokes
recalls
invokes recalls
Figure 4.3: Transformation of a class with three attributes.
and simply multiplies them together to get a risk priority of the item in question.
We can use risk priority using the weighing equation as shown in Equation (4.3).
This data can be applied to rules and templates using annotations. RPN is also well used in clinical trials and is a vital part of Good Automated Manufacturing Practice (GAMP5) (GAMP 2008). However, in clinical trials, the value comes from a set of matrices as shown in Table 4.2. Given the probability of a failure in a task and the severity of that failure, if it were to occur, a Risk Class is generated. Using this Risk Class and how detectable the failure is would produce a Risk Priority.