• No results found

6.4 Feasibility Analysis Model

7.3.2 Optimization Cost Model

The primary objective of the cost model is to provide a measure of the ef- fectiveness of the materialization process by computing the ratio between the performance cost incurred by service calls execution and the materialization coverage achieved during the materialization process.

Thanks to the formulation of a cost model, it is possible to optimize the materialization process by adjusting the execution variables (e.g., prioritize services or queries in the queue) that affect the performance of the process in order to increase the cost effectiveness of the materialization.

For the given materialization, the task differentiation between participat- ing services is based on the ratio between the accumulated cost and the achieved materialization coverage per each service. Furthermore, a service that achieves more result coverage with less cost is prioritized on the service queue during the materialization process.

7.3. Optimization Strategy 165

Concretely, in the materialization process context the objective of the cost model is to quantitatively characterize parts of the call2service coupling algo- rithm.

A) Cost model for service differentiation is based on service materialization call cost per tuple. The cost is derived from materialization service properties. The cost formula derives from the service interface materialization property metrics, it is expressed per materialization call c as

cost(c) = WQP S× QueryP ageSize + WW S × W ithDuplicates + WCP D ×

CallsP erDay + WRP Q× ResponseP erQuery, where WQP S,WW S, WCP D and

WRP Q are weights associated with each metric, 0f ≤ W ≤ 1f .

Table 7.1exemplifies differentiation between MovieByTitle access pattern mapped services {IMDB1, GM1, GM2} considering their service materializa- tion property metrics obtained after each service executed a materialization call, the cost formula assumes equal significant of each property in the calcu- lation W = 1. The example also shows newly established order of services on the basis of their cost per tuple.

SI WithDuplicates QueryPageSize ResponsePerQuery cost(ci)

OrderedByCost

SI

IMDB{ci) 1-9/10 1-10/10 1/4500 0.1002 IMDB(ci)

GM1{ci) 1 - 5/10 1 - 5/10 1/9000 1.0001 GM2(ci)

GM2{ci) 1 - 1/20 1 - 20/20 1/6400 0.9501 GM1(ci)

Table 7.1: Services materialization metric differentiation.

B) Cost model for query differentiation is based on domain value statistics scores. The cost formula derived from domain value statistics for domain value dn is as follows:

score(dn) = WF×Pni=1f req(dn) + WHR×Pni=1hRate(dn); where WF and

WHR are weights associated with each score. During the query generation

phase queries are ordered by the scores of their values.

Table 7.1 exemplifies computation of Y ear scores for the M ovieByT itle materialization and consequent ordering of expressed queries. The score for- mula assumes equal significance of each statistics in the calculation, W=1.

7.3. Optimization Strategy 166

YEAR frequency harvestRate score(dn)

OrderedByScore

Q

1999 1-1/78 1-1/7 1.8442 qp< 2003 >

2003 1-1/56 1-1/12 1.8987 qp< 2008 >

2008 1-1/123 1-1/10 1.8918 qp< 1999 >

Table 7.2: Output domain value metrics differentiation.

7.3.2.1 Optimized Materialization Algorithm

In this section we provide a generic optimization algorithm 8 based on the materialization process defined in Section 4. The algorithm implements the optimization approach by applying the cost model delivered augmentation to service interface selection strategy and query queue selection strategy parts of the call2service coupling. The algorithm is shown in SPMS materialization scenario with SSQ execution model.

The algorithm assumes an initial, preconditioning phase where (i) for each input domain attribute of the participating service, an initial dictionary initDict is provided via the Dictionary input value strategy, (ii) a starting query sequence is initiated by creating queries from the initial dictionaries, (iii) a sequence of service interfaces is instantiated, the sequence features nat- ural order of services as initially their execution cost is unknown.

By loading the initial set of queries, the initial phase of the SPMS materi- alization ensures the feasibility of the materialization. The algorithm assumes presence of domain-matching attributes in input and output domains, thus, enabling sourcing of the input domain values via the Reseeding input value strategy.

7.3. Optimization Strategy 167

Algorithm 8 Optimized Materialization Algorithm

1: var {SI} // sequence of si mapped to feasible_ap

2: var best_si // si with best materialization properties., i.e., si with least execu- tion cost incurred

3: var best_mcall //mcall containing query expressed by value(s) with highest domain stats scores

4: var {C} // materialization call sequence of legal_ap

5: var Rm //wanted materialization

6: var VI //input domain values set

7: var f easible_ap ← P reconditioning() // ap with available queries in the query queue

8: {C} ← legalap.getM aterializationSequence()

9: {SI} ← legalap.getM appedServices()

10: //Generic Optimization procedure

11: while ¬Covm||{C} 6= ∅ do

12: bestsi← {SI}.getHeadElement()

13: best_mcall ← {C}.getHeadElement()

14: best_mcall.executeQuery(best_si) //call2service coupling

15: Rm← best_mcall.Result

16: // service interface selection strategy augmentation

17: best_sicost(bestmcall) = best_mcall.getW ithDuplicates() +

best_mcall.getQueryResponseT ime() + best_mcall.getQueryP ageSize()

18: {SI}.put(best_si)

19: {SI}.reorderByCost()

20: // query queue de-queuing strategy augmentation

21: for each v in best_mcall.Result do

22: score(v) = f req(v) + hRate(v)

23: VI ← Reseeding(v)

24: {C}.createN ewQueries(V I)

25: {C}.reorderByQueryQueue()

26: end for

7.3. Optimization Strategy 168 Algorithm 9 Greedy Materialization Algorithm

1: var {SI} // sequence of si mapped to feasible_ap

2: var best_si, win_si // best_si for sampling phase, win_si winner of the sam- pling phase used in progressive phase

3: var best_mcall //mcall containing query expressed by value(s) with highest domain stats scores

4: var {C} // materialization call sequence of legal_ap

5: var Rm //wanted materialization

6: var VI //input domain values set

7: var f easible_ap ← P reconditioning() // ap with available queries in the query queue

8: var {C} ← legalap.getM aterializationSequence() // materialization call se-

quence of feasible_ap

9: var {SI} ← legalap.getM appedServices() // sequence of si mapped to feasi-

ble_ap

10: var {Csample} ← {C}.getSubset(n) // allocates subset of C

11: //Sampling phase

12: for each cin{Csample} do

13: GenericOptimizationProcedure()

14: {SI}.reorderByCost()

15: win_si ← {SI}.getHeadElement()

16: end for

17: // progressive materialization phase

18: while ¬CovT||{C} 6= ∅ do

19: best_mcall ← {C}.getHeadElement()

20: best_mcall.executeQuery(win_si) //call2service coupling

21: Rm← bestmcall.Result

22: for each v in best_mcall.Result do

23: score(v) = f req(v) + hRate(v)

24: VI ← Reseeding(v)

25: end for

26: {C}.createN ewQueries(VI)

27: {C}.reorderByQueryQueue()

28: end while

29: // scraping materialization phase

30: while ¬CovT||{C} 6= ∅ do GenericOptimizationProcedure()