Case study - HEP case studies - Understanding the evaluation of access and participation outrea

Section 3: HEP case studies

3.2 Case study

For analysing the proposed algorithm we used a worst-case analysis as discussed in theCLRS book [21, chap. 2.2]. Although an average-case [21, chap. 2.2] or the Spielman and Teng’s smoothed analysis [107] would provide more accurate and realistic (and lower complexity) approximations, it is currently difficult, not possible even, to assess the probability of having more or fewer modules and/or which ones.

The average or smoothed analyses for our algorithm imply having some knowledge (probabil-istic, statistical, etc.) on the number of modules available and the requests from the applications.

We would need to be able to answer questions like: how many modules are present, how many are repeated, how many requests are made, how many can be satisfied given the modules in the

system, etc. Remember that these modules are not only the sensors, but also the modules for correlating data.

As we do not have information to make a realistic assessment, we opted for doing a worst case analysis that is more general although leading, in our case, to bad complexity results.

In this setting, the worst case scenario is when all the current modules in the model are requested to produce. This is the limiting factor. Even if requests from applications increase, if there is no way to produce the information, the impact is small, as we cut off non valid branches.

Taking the number of modules (M) in the system as the input, without considering how they correlate, we have that the limiting part of our algorithm is CHECKPOSSIBLESOURCES(...) (algorithm5).

As there is a dependency from the first parts of the algorithms on the latter, we start the analysis from the last to the first.

Table operations

A table is used inGETINFOPOSS(...) (algorithm3) as a cache for already visited nodes. We use a table with indexes. The indexing uses re-hashing and open addressing, where the number of elements in the hash is always smaller than the possible set of keys (the cardinality of), with α as the table load factor [21, chapt. 11]. There are no deletions on the table (apart from a complete clear). On the worst case, an insertion causes a re-hashing, meaning an insertion of the current entries plus the new one (total ofM entries). For the worst case every one of these insertions causes a collision and thus we have PM

n=1ninsertions, i.e. M^M₂⁺¹. As such in the worst case, an insertion isO(M²). Note however that the average case isO(_1−α¹ ) with double hashing [21, p. 274]. Looking up an entry isO(M)for the worst case as every lookup implies a collision. Again we have that the average case isO(_1−α¹ ).

ADDTOTABLE(...) =O(M²) (4.1)

GETFROMTABLE(...) =O(M) (4.2)

aggregateNodes

AGGREGATENODES(...) (algorithm6) isO(M)as at most it involves aggregating all modules.

AGGREGATENODES(...) =O(M) (4.3)

combinePossibilities

COMBINEPOSSIBILITIES(...) is used in CHECKPOSSIBLESOURCES(...) (algorithm5). For analysing it, we define Worst Case Number of Descendants (WCND) and Worst Case Possibilities (WCP) for a node. In the case of a node module, as it needs all of its inputs, the worst case for a combination isWCP^WCND. An information node can have any combination of the modules. We have in that casePWNCD

k=1

WCND k

·WCP^k

. The binomial series hasP∞ k=0

α k

x^k

= (1 +x)^α, so we can say

WCND

k=1

WCND k

·WCP^k

<(1 +WCP)^WCND (4.4)

The worst case for number of descendants of an information node is having allM modules producing the information, as illustrated in figure4.2.a. In this case the WCP is 1 as each module just provides one possibility. From equation4.4we get2^M.

80 4.2. Model

M0=1

m1 mM

...

M elements

4.2.a: Worst case number of sons

M0=1

m1 mk

Ikb

m_1k m_jk

V V

V ...

...

j elements k elements

k x j = M

4.2.b:Worst case possibilities

Figure 4.2– Worst case scenarios

For the worst case number of possibilities we take as example a two level diagram, illustrated in figure4.2.b. M₀has only an information node (I_a) that in turn haskmodules for producing it.

Each of these modules needs one input (Ikb for module k) that is produced byj modules. We have that k·j =M. The modules for producingI_kb are leaves and we can combine them for production in ((1 + 1)^j −1)ways³. At I_a we have((1 + (2^j −1))^k−1)ways. This leads to a complexity ofO(2^M).

If we had more levels equally distributed it would lead to the sameO(2^M)as the product of the number of elements in each level would need to be M. We analyse the case of two levels withkelements in the first level. This level has k−1leaves and the other module using one information node withjmodule nodes available to produce it. This would lead toO(2^M)as we would converge to2^k−1×2^j which results in2^M ask+j=M in this case. Thus:

COMBINEPOSSIBILITIES(...) =O(2^M) (4.5)

checkPossibleSources

CHECKPOSSIBLESOURCES(...) (algorithm5) has its worst case running time as:

COMBINEPOSSIBILITIES(...)× ADDTOTABLE(...) +AGGREGATENODES(...) and from the results above (equations4.1,4.3and4.5) we have

CHECKPOSSIBLESOURCES(...) = O 2^M

× O(M²) +O(M)

= O M²·2^M

(4.6)

3The binomial starts atk= 0, whereas we combine fromk= 1.

getInfoPoss

GETINFOPOSS(...) (algorithm 3) has a table lookup, discovering modules and then for the modules discovered extracting their possibilities for production. At the end, it callsCHECK -POSSIBLESOURCES(...). Discovering modules is at most going through allM modules. Getting all modules for production leads toO(M)and for the possible combination of allO(2^M), as we saw on combining possibilities. And from equations4.2and4.6, we have:

GETINFOPOSS(...) = O(M) +O(M) +O(M)× 2^M

+O M²·2^M

= O M²·2^M

(4.7) getModulePoss

GETMODULEPOSS(...) (algorithm4) adds to the table and then iterates through the inform-ation needed getting all possibilities. At the end it calls CHECKPOSSIBLESOURCES(...). The information needed is at most of sizeM. If it was more then some information was not being produced and the cycle would terminate earlier. However, for this case GETINFOPOSS(...) would only provide one result for each call. As we discussed forCOMBINEPOSSIBILITIES(...), if we needed only one information, produced by every other module we would haveO(2^M). Note that the other operations inGETMODULEPOSS(...) are O(1), not dependent onM. Thus from equation4.1and equation4.6:

GETMODULEPOSS(...) = O M²

+O(1)× 2^M

+O M²·2^M

= O M²·2^M

(4.8) checkPossibilities

CHECKPOSSIBILITIES(...) (algorithm 2) iterates through all the requests testing for the possibilities of answering each one in each cycle. Then it callsCHECKPOSSIBLESOURCES(...).

The size of the set of requests is unrelated toM as we use a cache for getting the information.

The only problem would be requesting information that can not be produced by a lack of inputs on the leaf nodes. If so, the cache would mean that we would quickly discover the impossibility of production. The possibilities as discussed above are limited by the modules and thus areO(2^M).

From equation4.6we have:

CHECKPOSSIBILITIES(...) = O(1)×O 2^M

+O M²·2^M

= O M²·2^M

(4.9) optimizeProduction

The main loop of the algorithm OPTIMIZEPRODUCTION(...) (algorithm 1) calls CHECK -POSSIBILITIES(...) and calculates a minimum from a set. The set hasWCPand as such for a non-ordered set it leads toO(2^M)for sorting. From equation4.9:

OPTIMIZEPRODUCTION(...) = O M²·2^M

+O 2^M

= O M²·2^M

(4.10)

82 4.3. Conclusion

Observations

It is easily seen that this is a very bad worst case scenario, however it is very unlikely (if at all possible) to happen. Nonetheless, it gives an understanding of how the algorithm works.

Worst case space analysis leads us to a similar pessimistic result ofO(M·2^M). In this case the cache table is thehoggingfactor where we have2^M possibilities (thus table entries) each having M modules as data (which in a real scenario would not happen for every entry).

Note that the algorithm discards possibilities that do not meet the requirements. However, in the worst case analysis all possibilities meet the requirements.

As an anecdotal example of performance from figure4.1, which has 22 modules, we have 282 operation calls performed to find 135 possibilities (worst case withM = 22is22²×2²²= 2030043136). This took an average of 0.4 ms in a laptop with an Intel 2.66 GHz CPU and 4 GB of memory, using an implementation in Java, where the possibilities table was implemented as a HashMapwith chaining⁴.

In document Understanding the evaluation of access and participation outreach interventions for under 16 year olds (Page 32-35)