Aim 4: Cloud Resource Optimization - Designing Human-Centered Collective Intelligence

In this Aim, I seek to investigate some best practices for ensuring high-availability, scalability, disaster recovery, and high performance. Over the past couple of years, we have witnessed several cloud outages across multiple high-profile cloud providers who are otherwise considered to be widely reliable in terms of cloud service uptime [54]. Unfortunately, when a cloud provider experiences an outage it can have a major financial and functional impact on mission-critical applications hosted on the provider’s infrastructure. Collective intelligence systems can be integrated into mission critical healthcare systems which require maximum uptime and high performance, hence the imminent need to delve into this facet.

1) Scenario Analysis

To help investigate this facet of the proposed framework for human-centered collective intelligence systems, I evaluated the implementation of this proposed framework component in an mHealth persuasive systems for supporting behavior change in a Smoking Cessation intervention [49, 50]. The text-message-based motivational system was deployed “live” with a centralized cloud backend hosted in the Microsoft Azure cloud and kept operational for over a three-year period without a reliability incident.

In addition, I implemented a prototype solution for delivering web-based personalized movie recommendations and video streaming content. While this was not a mission critical application, the web site was deployed to both the Google App Engine and the Windows Azure PaaS platform to validate the proposed framework. An SLA Monitoring Agent was also deployed to both clouds, taking into account the reference architecture described earlier. The active cloud was defined as the Windows Azure cloud whereas Google’s App Engine represented (Cloud B) the passive cloud platform.

The Movie Recommendation Application (App) features static image content which is stored on the cloud storage service and other Software-as-a-Service solutions as well as Facebook [107]. Other data elements served on the web site are dynamic in nature and can be supplied to the end-user through a database call invoked by the web application based on the user’s interaction with the application.

Fig. 15. SMI Problem Classifications [48]

With the two types of SMI classifications presented in Figure 15 in mind, I embarked on the implementation of two case studies that cover the spectrum of SMI analysis. From a Social Profile Analysis standpoint, I examined what goes into building a movie recommendations cloud service that pulls social insights from the movie preferences of a given user’s Facebook friends as a form of input signal for ranking personalized movie recommendations. This solution was deployed as a web application service hosted on the Microsoft Azure cloud platform. In addition to this case study, I implemented a sentiment analysis solution for evaluating Social Media Content, that is, Twitter messages (or tweets) by utilizing the Amazon AWS cloud platform for Elastic MapReduce (EMR) including Hadoop.

2) Architectural Design

In creating a generalized reference architecture for improving high availability in cloud-provider-down conditions, I offer a cross-cloud hybrid Platform-as-a-Service (PaaS) solution illustrated in Figure 16. On the other hand, Figure 17 depicts a reference model for selecting various cloud resource options for a given SMI analytics task.

AUTOMATIC CLOUD PROVIDER FAILOVER LOGICAL REFERENCE ARCHITECTURE: CONTEXT VIEW Ca pa bi lit ie s

HIGH AVAILABILITY DISASTER RECOVERY PERFORMANCE DEGRADATION

CLOUD A CLOUD B Global DNS Host Web App (Active) _{Web App (Passive)} SLA Monitor Agent SLA Monitor Agent DATABASE (Standby) STORAGE (Standby) DATABASE (Active) STORAGE (Active) IP1; URL1 IP2; URL2 End Users DATA REPLICATION (LOG SHIPPING) VPN VPN CUSTOM ARRAY- REPLICATION Monitor Service

Fig. 16. Reference Model for Cross Cloud Failover [58]

3) Research Challenges

Some of the prevalent challenges in implementing cost-effective cloud solutions that support automatic scaling and require the least infrastructure management overhead include:

- Challenge 1: Portability and fault-tolerance PaaS cloud design

- Challenge 2: Selecting the most effective cloud resources within a given cloud provider’s portfolio to support collective intelligence

4) Data Collection

I executed a simulation for automatic cloud failover to investigate the viability of using the proposed approach for Challenge 1. I collected a few metrics during the experiment to confirm feasibility. For challenge

5) Methods

To test the hypothesis that the proposed reference architecture associated with

Challenge 1 is feasible for most mission-critical PaaS hosted applications, I created a

prototypical web application as described in the case study. The executed simulations include:

• Cloud service outages, and

• Severe performance degradation introduced in the target application

For Challenge 2, I also designed a reference architecture by surveying various SMI projects through qualitative analysis and addressing common cross-cutting concerns in using cloud resources for collectively intelligent data mining tasks.

6) Algorithms

I implemented a custom fault recovery algorithm that is inspired by that of Saha and Mukhopadhyay’s proposed approach [70]. The two status conditions under which a failover will occur include situations where performance degradation in a given Cloud

Provider is detected by the SLA Monitoring Agent or the active Cloud Provider is down.

Inputs:

- n: number of cloud providers (for example, n=2) - k: critical application dependent cloud services - r: retry interval in milliseconds (for example, r=500) - s: number of retries

- p: performance degradation threshold (milliseconds)

𝑳𝒐𝒈𝒊𝒄 𝒇𝒐𝒓 𝒂 𝒈𝒊𝒗𝒆𝒏 𝒂𝒄𝒕𝒊𝒗𝒆 𝒄𝒍𝒐𝒖𝒅 𝒊, 𝟎 <= 𝒊 <= 𝒏 − 𝟏

if 𝐶𝐿𝑂𝑈𝐷_𝐹𝐴𝐼𝐿𝑂𝑉𝐸𝑅M = FALSE; // the active cloud

for all cloud services 𝑘 executing in cloud i perform a 𝑆𝑇𝐴𝑇𝑈𝑆Q check;

set 𝑆𝑇𝐴𝑇𝑈𝑆_𝑇𝐼𝑀𝐸Q to elapse time if 𝑆𝑇𝐴𝑇𝑈𝑆_𝑇𝐼𝑀𝐸Q > p

or 𝑆𝑇𝐴𝑇𝑈𝑆_𝑇𝐼𝑀𝐸Q == TIMEOUT

set 𝐹𝐴𝑈𝐿𝑇_𝐹𝐿𝐴𝐺Q = TRUE; // save the fault on k for j = 1 to s perform a 𝑆𝑇𝐴𝑇𝑈𝑆Q check; if 𝑆𝑇𝐴𝑇𝑈𝑆_𝑇𝐼𝑀𝐸Q > p or 𝑆𝑇𝐴𝑇𝑈𝑆_𝑇𝐼𝑀𝐸_Q== TIMEOUT; if j == s // retry complete set 𝐶𝐿𝑂𝑈𝐷_𝐹𝐴𝐼𝐿𝑂𝑉𝐸𝑅M = TRUE; initialize FAILOVER; else

sleep for r milliseconds; // retry to confirm j = j + 1;

end if

else if a recent 𝐹𝐴𝑈𝐿𝑇_𝐹𝐿𝐴𝐺Q= 𝑇𝑅𝑈𝐸 is found set 𝑆𝑃𝑂𝑅𝐴𝐷𝐼𝐶_𝐹𝐿𝐴𝐺Q = TRUE;

set 𝐶𝐿𝑂𝑈𝐷_𝐹𝐴𝐼𝐿𝑂𝑉𝐸𝑅M = TRUE; initialize FAILOVER;

exit for; // note sporadic issues else

exit for; // false alarm save fault on k and exit end if

end for end if

7) Limitations

The model is predicated on the assumption that it is very rare for two major cloud providers to experience a down condition concurrently. As a result, the scope of the investigation is focused on utilizing two major public cloud PaaS providers. Though, in extreme cases, the model can be extended to accommodate more than 2 cloud providers.

IV. DESIGNINGZOEI–HUMAN-CENTEREDCI

In document Designing Human-Centered Collective Intelligence (Page 53-60)