Lemma 5.1. At any time a datamart is always owned by a user. With respect to the functions we used in the main rule in OLAP-ASM, we can formalise this requirement by
∀dm.datamart(dm)6=⊥ ⇒ ∃usr.owner(dm) =usr∧registered(usr)6=⊥ (5.1)
Proof. The lemma follows from how a datamart is created and maintained: 1. For the initial state, ∀dm.datamart(dm) = ⊥, the statement holds.
2. When datamart(dm) is set to 1 in the “open” rule in OLAP-ASM, the same rule sets owner(dm) =issuer(op). Furthermore, we have the guardregistered(usr) = 1 in the main rule of OLAP-ASM which ensures that the “open” rule is only called under this condition. Hence, condition (5.1) holds.
3. When registered(usr) is set to ⊥ in the“quit” rule in OLAP-ASM, the same rule sets datamart(dm) =⊥for all thedmwithowner(dm) = issuer(op), which removes all the datamarts belonging to the user. Thus, condition (5.1) still holds.
4. Ownership of a datamart only changes in three places: In the “open” rule, the owner is set to be a registered user. In the “close” and “quit” rules where ownership of a datamart dm is deleted,datamart(dm) is also set to ⊥. Thus condition (5.1) can not get violated this way either.
Furthermore, we have system properties regarding the effect of operations such as ‘quit’ and ‘close’. The former closes all the datamarts belonging to the issuer of the operation, the latter deletes a single datamart. We omit formalising these properties using transition constraints. Again, the main rule in OLAP-ASM is consistent with these properties, too. Finally, considering the consistency of the main rule in OLAP-ASM we have system properties regarding the data content of the data marts, the data warehouse as such and the underlying operational databases. The chosen rules for building datamarts render their realisation as views over the data warehouse explicit.
5.2
The Refinement-based Design Framework
The ASM method assumes that we first set up a ground model. In particular, we have assumed separate ASMs for the database, the data warehouse and the OLAP level. Each of these ASMs uses separate controlled functions to model states of the system by logi- cal structures and rulesexpressing transitions between these states. The ASMs are then linked together via queries that are expressed by these transitions. The ground model above captures only the basic requirements, further refinements are required for con- sidering design concerns such as systems optimisation, implementation, and new OLAP requirements. Our design method is developed based on the 3-tier model and the ASM method. The former provides the logical structure of the system, the latter provides the step-by-step refinement approach. In the following, we group refinements under three categories: requirements capture, system optimisation, and system implementation. For each category we present a set of refinement rules, in a rather abstract manner, that are ultimately decomposed and formalised into a set of concrete rules in a later development stage, for example, the formal rules for view integration in Chapter 6.
5.2.1 Requirements Capture
Like most software systems, our design method for data warehouse design begins with the requirements from the user end, i.e. the OLAP system. We build data warehouse schemas based on what OLAP needs, for example, the set of analysis queries or reports. As this is not a one-off process due to the dynamic nature of business analysis, it is not uncommon that we may need to deal with new OLAP requirements regularly after the data warehouse has been implemented. In particular, our data warehouse design starts from a data mart for a single business process, more data marts are to be added later on. The new requirements may require changes in the data warehouse schemata, such as adding new schemata. This type of changes can result in inconsistency in the data warehouse if schema integration is not considered. We tackle this problem with the schema integration technique, i.e. we integrate the set of new data schema from the new requirements with the existing data warehouse schema to resolve the inconsistency, and at the same time we maintain a data warehouse with little redundancy. Schema integration will be dealt with in Chapter 6. New requirements can sometimes be presented in the form of other than purely new queries, such as modified OLAP queries, which we will not deal with here. Our focus here is on the incremental design of data warehouse.
Using the data warehouse/OLAP ASM ground model as a basis, we handle new OLAP requirement, such as adding new OLAP queries, as follows: in the OLAP ASM, we define the new OLAP functions and rules. In the data warehouse ASM, we define new functions and rules, and modify the existing functions and rules to support the new OLAP queries. In the database ASM, we add and change functions and rules to support the changes of the data warehouse. In fact, we are propagating the changes gradually from the OLAP tier down to the data warehouse tier and then to the operational database tier.
The refinements for requirement capturing are classified under conservative extension or incremental refinement in the ASM method. That means, the existing functions will be preserved when new features are added in the refinement. We build a refinement process for systematically capturing new requirements in data warehouse and OLAP system design as follows.
1. Add a new rule to the OLAP ASM: this is used to model an additional OLAP func-
tion by adding a new rule name and the definition of the rule for the new function to the OLAP ASM.
Discussion: it is presumed that the newly added functionality is not present in the
OLAP ASM before. The new rule will work under a condition such that it is not defined for the old machine. In such case, adding a new rule preserves the existing functionality of the old machine. When a new rule or function is defined, we need to decide if it should be export or import for reference.
2. Add a new controlled function to the OLAP ASM: this is used to model a view that
is needed for the support of any new OLAP function, provided the existing view definitions are not yet sufficient.
3. Add new controlled function(s) to the DW ASM: this is used to model the schema
that is needed in supporting the new OLAP function in data warehouse tier.
4. Integrate controlled functions on the DW ASM: this is used whenever the schema is
5.2. THE REFINEMENT-BASED DESIGN FRAMEWORK Jane Qiong Zhao
changed accordingly.
Discussion: the integration process aims to preserve the information by the notion
of schema equivalence and dominance when two schemas are integrated. This step relates to a set of schema transformation rules which is discussed in Chapter 6.
5. Add controlled functions to the DB ASM: this is used to add new schema to support
DW ASM, provided the existing data is not sufficient.
6. Integrate controlled functions on the DB ASM: this is used whenever the schema is
extended. As a consequence, the extraction rules on the DW ASM must be changed accordingly.
7. Change the rules on DW ASM: this is used to adapt the rules defined for extracting
data for the data warehouse relations to the changes in the data warehouse schema. This is again an impact from schema integration.
8. Change the rules on DB ASM: this is used to adapt the rules are used in data
extraction upon data warehouse refresh request. Any changes, either new addition or updates, to data extraction rules should be reflected in the related rules in DB ASM.
9. Change the functions/rules on OLAP ASM: this is used to change the functions or
rules that are affected in this refinement process, such as rules that make reference to the schemas which are changed during the integration, or rules that process the newly added OLAP functions.
5.2.2 Optimisation
Some refinements are used to optimise the performance of the system. These refinement rules are applied to reorganise the specification independently from the user requirements. Refinements for system optimisation can be classified under procedural or data refinement in the ASM method. To be tailored for data warehouse and OLAP system design, some typical optimisation steps are considered:
1) To materialise the OLAP views. That is, to compute the OLAP queries in advance and store them as views in the data warehouse. When the queries are called, they can be answered by the stored views instantly from the data warehouse without waiting for computation of the queries. This will speed up the system performance particularly as business analysis is usually data intensive, but it also results in the issue of view maintenance, which we will discuss further later on in Chapter 7.
2) To update the data warehouse incrementally. That is, not to recompute the queries from scratch, as is the case in our ground model but only propagate the changes to the data warehouse.
The tailored refinement process for systematically incorporating the above two opti- misation steps is specified as follows:
10. Incorporate view materialisation :
(a) Add new controlled function in DW ASM : this is used to add the OLAP views
Discussion: for an effective approach in view selection we can adopt some selection process or algorithm such as the one we used in the case study later in Chapter 7.
(b) Integrate materialised views in DW ASM: this is used to reduce redundancy that
may have occurred after more views are materialised. A set of transformation rules can be applied, which will be discussed in detail in Chapter 6.
(c) Add new rules in DW ASM: this is used to define the transition rules to maintain
the materialised views up to date with the data warehouse changes. These rules are called after each refreshing of the data warehouse.
(d) Change the rules in DW ASM : these rules are for opening a datamart for the
OLAP ASM. After view materialisation or view integration, these rules need to be adapted too.
11. Incorporate incremental updates :
(a) Add monitored functions in DB ASM : this is used to define relations to store
updates of the source relations, called delta files.
(b) Add controlled functions in DW ASM : this is used to define relations to store
computed changes for data warehouse relations.
(c) Add rules in DW ASM: this is used to define the rules for computing the changes
from source relations and propagating changes into data warehouse relations.
(d) Replace rules in DB ASM : this is used to replace the refresh rules with the
rules for incremental updates.
5.2.3 Implementation
The final group of the refinements in our discussion is the system implementation refine- ments. Refinements for system implementation can be classified under procedural or data refinement in the ASM method. The refinements introduced in the following are mainly designed for realising high-level design decisions such as data distribution and incremen- tal maintenance. This group of refinements are classified under the procedural or data refinement in the ASM method.
12. Apply implementation refinements: these refinement rules apply to the ASMs on
all three levels and consist of realising design decisions for moving the ASMs closer to their implementation while preserving the semantics of runs. It is not our focus here to discuss how to move specification to codes, which is thoroughly studied, for example in [123, 78, 26], and particularly for data intensive applications in [97].
13. Distribution design: as it is common that an enterprise is geographically distributed,
data warehouse design methods should take distribution design into consideration. More detailed discussion on distribution design is shown in Chapter 7. Our distribu- tion design concerns the database instead of communication design. We use nodes to describe the locations where a local data warehouse resides.
(a) Replicate the data warehouse and the OLAP ASMs: for each node in the network