In this chapter we investigated different approaches for defining a similarity function to compare OLAP sessions, based on the requirements deduced from a user study
conducted with practitioners and researchers [Aligon et al.,2013]. We considered and
compared two functions for OLAP query similarity and four functions for OLAP session similarity; in particular, the latter were obtained by extending popular approaches for string comparison. Overall, the experimental results we obtained show that the alignment-based approach (an extension of the Smith-Waterman algorithm, coupled with a three-component query similarity function) is the one that best matches the users’ judgements. It is also the one that clearly gives best results on a synthetic benchmark in terms of sensitivity and capability of correctly ranking different templates of session similarity. Finally, from the point of view of efficiency, the time required for comparing two sessions is perfectly compatible with complex applications. As to future works, we propose to exploit the result of the similarity comparison between the current user session and the past ones to recommend the next query to formulate.
Agile Data Warehouse Design
In this chapter, we support BI ANYTIME by proposing a new methodology, 4WD, that combines agile principles with DW peculiarities to accelerate the DW development. We prove the effectiveness of our methodology with a case study on a pay-tvs project.
6.1
Introduction
DW systems are characterized by a long and expensive development process that hardly meets the ambitious requirements of today’s market. This is one of the main causes behind the low penetration of DW systems in small-medium firms, and even behind
the failure of whole projects [Ramamurthy et al.,2008].
As a matter of fact, DW projects often leave both customers and developers dissatis- fied. The main reasons for low customers’ satisfaction are the long delay in deliver- ing a working system and the large number of missing or inadequate (functional and non-functional) requirements. As to developers, they complain that —mainly due to uncertain requirements— it is overly difficult to accurately predict the resources to be allocated to DW projects, which leads to gross errors in estimating design times and costs.
In the light of the above, we believe that the methodological issues related to DW design deserve some further investigation aimed at improving the development process from different points of view, such as efficiency and predictability.
The available literature on DW design mainly focuses on traditional, linear approaches such as the waterfall approach, and it appears to be only loosely related to the so- phisticated design methodologies that have been emerging in the software engineering
community. Though some works about agile data warehousing have appeared [Hughes,
2008], there are also evidences that applying an agile approach tout court to DW design
has several risks, such as that of inappropriately narrowing the DW scope [Beyer and Richardson,2010].
In this chapter, we analyze the potential advantages arising from the application of modern software engineering methodologies to a DW project and we propose 4WD, a design methodology that aims at coupling the main principles emerging from these methodologies to the peculiarities of DW projects [Golfarelli et al.,2011c]. The chapter outline is as follows:
• In Section6.3, we better explain the motivation of 4WD, starting from the prob-
lems of the existing methodologies to reach the goals of a better and innovative DW development approach.
• In Section 6.4, we list the main features of 4WD, explaining how these charac-
teristics may address the aforementioned goals.
• In Section 6.5, we propose a case study on a pay-tvs project to validate our
methodology.
6.2
Related Works
DW design has been investigated by the research community since the late nineties. A classic waterfall approach was first proposed in [Golfarelli and Rizzi, 1998]; a distin- guishing feature was the inclusion of a conceptual design phase aimed at better formal-
izing the data schema. A sequential approach to design is also followed in [Luj´an-Mora
and Trujillo, 2003], where an object-oriented method based on UML is proposed to cover analysis, design, implementation, and testing. Another UML-based method is
presented in [Prat et al., 2006]; here, the use of the Common Warehouse Metamodel
(CWM) is suggested to promote a more standard approach to conceptual design. All these methodologies follow a linear approach that hardly adapts to changes and is unsuitable when requirements are uncertain.
To overcome these issues, iterative solutions have been proposed in the literature. Iterative approaches are typically adopted by methodologies like RAD and Agile. The
work in [Hughes, 2008] breaks with strictly sequential approaches by applying two
Agile development techniques, namely scrum and eXtreme Programming, to the specific challenges of DW projects. To better meet user needs, the work suggests to adopt a user stories decomposition step based on a set of architectural categories for the back- end and front-end portions of a DW. However, it does not deeply discuss how this decomposition impacts on modeling and design.
A different approach to tackle the DW design complexity is the MDA methodology
Figure 6.1: Cause-effect relationships in customer and developer dissatisfaction
from its implementation. Strong relevance is given to the development of the DW repository; the three main perspectives of MDA (CIM, PIM, and PSM) are defined using extensions of UML and CWM, and the inter-model transformations are described using the Query/View/Transformation (QVT) language. In practice, strictly applying this methodology may be hard due to the poor aptitude of users for reading formal models and investing resources in low-values activities.
A pragmatic comparison between DW design methodologies is offered in [Sen and
Sinha, 2005], where 15 different solutions proposed by BI software vendors are exam- ined. The authors emphasize the lack of software-independent approaches, and point out that all the proposed solutions hardly can deal with changes and market evolution, which creates a robustness problem.