Evaluation Metrics - Evaluation Design - Chapter 4: Design and Implementation 4.1 Introduction

Chapter 4: Design and Implementation 4.1 Introduction

5.5 Evaluation Design

5.5.2 Evaluation Metrics

Mettler et al. (2014) identified that because there are no guidelines available for design experiments, Design Science researchers need to develop quality criteria for their experiments. This quality criteria will help other researchers understand the validity and reliability of the design experiments.

The evaluation metrics selected for the evaluation strategy for CoPro were derived from the QoC metrics in Section 4.3.2. Only the objective QoC metrics were selected for evaluation purposes in order to match the goals set out in Section 5.3. By only considering the objective QoC metrics, four QoC metrics remained, namely freshness, reliability, granularity and confidence interval, as illustrated in Table 5.2.

Table 5.2: Remaining objective QoC metrics for low level context

Quality of Context Metric Measures Source

Freshness Indicates validity of context in terms of timeliness

Time period, Age Reliability Indicates the extent to which

context can be considered credible

Freshness, Uptodateness Granularity Indicates the precision of the

context

Max_granlevel, Cur_granlevel Confidence Interval Indicates the confidence in the

context produced

Context provider

The freshness, granularity and confidence QoC metrics remained the same for the evaluation. Reliability, however, used up-to-dateness as an input source, which is a subjective QoC metric. The algorithm for the reliability metric was therefore changed to provide a more objective view of reliability. Zheng et al. (2012) suggested that reliability may be affected by the freshness of the context as well as the trustworthiness of the environment. Taking this suggestion into consideration and the lack of overall QoC metrics, additional QoC metrics were implemented as shown in Table 5.3.

Table 5.3: Additional objective QoC metrics for low/high level context

Quality of Context Metric Measures Source

Accuracy Indicates the extent to which data is correct and reliable

Error of sensor, context values, t-distribution table Trustworthiness Indicates the trustworthiness of

the context provider

Completeness, Context object

Completeness Indicates the extent to which the available context information are present

Available context providers, Weightings of providers

Accuracy was defined as the extent to which context information was correct and reliable. However, it was difficult to identify the true value for the evaluated context information. A statistical estimation method identified by Kim and Keumsuk Lee (2006) was used to determine the accuracy of the context information. This method involved estimating a confidence interval for the sensor values produced. If the sensor value was within the confidence interval then that sensor value can be regarded as accurate. The method suggested by Kim and Keumsuk Lee (2006) was used as it was compatible with the underlying implementation of CoPro. As discussed in Section 4.3.2 in Chapter 4, a buffer was already used to capture the sensor values, which were needed for the statistical estimation method.

The estimation method was used to determine the accuracy of sensors that provided continuous context information such as the light sensor. In order to determine the accuracy, a root mean square error (RMSE) was calculated, which was then used to calculate the upper and lower limit of the confidence interval. The RMSE was calculated using Equation 7 (Kim & Lee 2006).

(7)

N represented the total number of sensor values in the buffer. the current sensor value and the mean of the sensor values in the buffer.

The next step was to calculate the confidence interval of the true value of a sensor with which the current sensor value was compared. This confidence interval was calculated using Equation 8 (Kim & Lee 2006).

(8)

V represented the calculated RMSE value. indicated the t-distribution with v = N-1 representing the degrees of freedom. By using , was then obtained from the t-distribution table (Appendix A). was used as it represented a confidence of 95% on the t-distribution table. This 95% value indicated that at 95% confidence, the true value was within the estimated interval (Walpole, Myers, Myers & Ye 2002).

Trustworthiness was defined as the trustworthiness of the context provider that produced the context information. A trustworthiness rating was assigned to each context object that indicated the reliability of the provider of that context object. To remain consistent with the other QoC metrics, the rating assigned ranged between 0 and 1. For example the network context object was assigned a trustworthiness rating of 0.9 as its context provider supplied the network objects value as soon as it was detected. The closer the rating was to 1, the higher level of trustworthiness that context provider had in producing good quality context information. The trustworthiness was also re-calculated for context information, such as the

network context that had a completeness metric. The mean of the trustworthiness and completeness was used to determine a new value for trustworthiness.

Completeness was defined as the extent to which the available context information were present for a particular context object (Manzoor et al. 2008). The completeness QoC metric was only used to evaluate those context objects that had more than one context attribute and was calculated using Equation 9 (Manzoor et al. 2010).

(9)

The total number of the attributes of the context object O that had been allocated a value was symbolized by m. The weight of the jth attribute of O that had been allocated a value was represented by Wj(O) (Manzoor et al. 2008). The numerator was calculated by summing up each jth attribute multiplied by their assigned weighting. Similarly, the total number of the attributes of context object O was symbolized by n and Wi(O) represented the weight of the ith attribute of O (Manzoor et al. 2008). The denominator was calculated by summing up each ith attribute multiplied by their assigned weighting. The completeness QoC metric was then calculated by dividing the numerator with the denominator.

If n = m then the completeness would be equal to 1, which indicated that all the attributes of context object O had been assigned a value. For example, the network context object had a total of two context attributes including mobile data and Wi-Fi. The completeness of the network object would therefore be:

C(O) = (m * mobile weight + w * wi-fi weight)/(1 * mobile weight + 1 * wi-fi weight);

The availability of mobile and Wi-Fi in the above equation was represented as m and w respectively. In the denominator m and w were indicated as 1 to denote all attributes. The weights assigned to mobile and Wi-Fi indicate each attribute's significance to the total attributes. Both weights were assigned a weighting of 0.5 for the network context object as both the mobile and Wi-Fi enables internet connectivity and were therefore considered equally important.

Based on the new QoC metrics and the suggestion by Zheng et al. (2012), the reliability algorithm was reworked and was calculated for each streaming context value, such as the light values, using Equation 10.

R(O) = (Trustworthiness + Freshness + Accuracy)/3; (10)

For other context objects that are not streaming context values such as the network connectivity values, reliability was calculated by using Equation 11.

R(O) = (Trustworthiness + Freshness + 1)/3; (11)

As the network connectivity was detected as soon as there was any change in value, the confidence interval value was set to 1, which represented 100% confidence in the value reported.

The above QoC metrics were used to evaluate the low/high-level context values. The QoC metrics that were used to evaluate the inferred context are shown in Table 5.4.

Table 5.4: Objective QoC metrics for inferred context

Quality of Context Metric Measures Source

Freshness Indicates validity of context in terms of timeliness

Time period, Age Completeness Indicates the extent to which the

available context information are present

Available context values, Weightings of context values

Certainty Indicates the confidence in the context information produced

Freshness, Completeness

Freshness of inferred context values were calculated in same manner in which the freshness for the low/high-level context values were calculated. This interoperability of the freshness QoC metric by being able to be applied to all contexts, further highlights that the change made to the time period in Section 4.3.2 was significant. This change involved using the actual time between context readings instead of using minDelay. This change was also supported by the fact that the inferred context was produced with context rules and not a sensor, which produced the minDelay value.

Completeness was calculated in the same manner as with the completeness used for the low/high context values. Each attribute of each inferred context was assigned a weighting value and was used to calculate the inferred context's completeness.

Certainty was defined as the confidence in the context information produced. Certainty was calculated using Equation 12 (Kim & Lee 2006).

(12)

Certainty was calculated using the number of inferred context requests and the number of those inferred context requests that were answered. Certainty also used the freshness metric as well as the completeness of the context object. If the inferred context produced was considered fresh (i.e. F(O) > 0) at the time that certainty was calculated, the request for inferred context was considered as answered. The ratio of number of answered requests or replies to number of requests was multiplied by the completeness of the context object to calculate the certainty of that inferred context object.

All the QoC metrics discussed in this section were used as the evaluation metrics for evaluating CoPro. These QoC metrics were used to help determine the feasibility of the proposed model. Mettler et al. (2014) suggested that derived QoC metrics may represent a valuable contribution to DSR.

In document A model for context awareness for mobile applications using multiple-input sources (Page 105-110)