• No results found

ASSESSMENT AND MANAGEMENT

Analyses that just relabeled traditional security risks of Internet-based systems as cloud-specific tend to be use- less, if not plainly misleading. Therefore, the quest for a cloud-specific risk assessment definition still represents for many scholars and professionals one of the most challenging issues in cloud computing studies.

Some recent works have focused on specific aspects of cloud risk assessment and management that have

been overlooked in the past. Keller and König [34] high-

lighted the poor knowledge of the underlying network structure of a cloud provider as one of the most severe obstacles to the accurate identification of cloud-specific risks. An introductory analysis of a case study has been

presented by Brender and Markov [35].

A detailed Microsoft essay [36], instead, presented

a useful analysis of cloud risk management within the

formal framework of ISO 31000 [10], this way applying

a well-known international standard rather than “rein- venting the wheel” with yet another risk process man- agement. Risk assessment is divided typically into three phases, namely risk identification, risk analysis, and risk evaluation, followed by the management phase of the risk treatment. The risk management process has the

cyclic structure derived from the traditional Deming’s cycle of plan-do-check-act (PDCA), in ISO 31000 described as: “Design of framework for managing risk, implementing risk management, monitoring and review of the framework,” and “Continual improvement of the framework.” The general ISO risk process is then used to evaluate a cloud-service option. Examples of relevant

issues to consider when doing the risk analysis are [36]:

• Evaluate changes in the risk landscape resulting from the adoption of a cloud solution

• Identify the context and its boundaries

These are general issues that lead to change manage- ment and risk assessment. Other more specific problems to tackle are:

• Verify which national and international poli- cies and regulatory environments have impacts on the choices (e.g., privacy laws, record-keeping regulations)

• Consider which corporate policies and governance guidelines applies (e.g., social responsibility, sus- tainability stance)

• Evaluate the corporate’s risk appetite criteria (e.g., define the risk acceptance criteria, the likelihood and impact definition criteria, and measurement scales)

8.5.1 Security SLA for Cloud Services

Transferring risks to third parties has been a cornerstone of risk management for a long time. It gave birth to the insurance business and to actuarial studies used to eval- uate insurance risks. In general, risks are usually trans- ferred from one party to another by stipulating a formal contract between the two. The first pays a fee (or a pre- mium) to the second and in exchange obtains a risk-free or risk-reduced service, while the second accepts being in charge of the risky prospect for which they receive a compensation. Cloud solutions and in general almost all service/infrastructure externalizations have the same characteristics. Firms often externalize services to cloud providers not just so they can pay less (if it is the case), but also to be relieved of risks associated with managing a complex technological infrastructure or service, like risks of system failures, network outage, or power blackouts.

Other relevant sources of risk come from human resource management, technological change, and obsolescence. In general, cloud-based services all mitigate those risks, because it is the cloud provider who faces the negative consequences (or most of them). The cloud customer just pays the regular fee. Unfortunately, this is just the theory (or the “marketing truth”). In practice, as usual, things can easily mess up, because who is in charge of what and to which extent risks are transferred from a client to a cloud provider strictly depend on contractual terms specified in an SLA. For this reason, SLAs play such a key role in discussions about cloud risks: it is because often it all depends on them.

When cloud-specific risks are considered, two issues regarding SLAs are the subjects of ongoing research and discussion among experts: cloud SLA and security SLA. The first, cloud SLA, is meant to identify a cloud-specific SLA, which should offer explicit guarantees for typical requirements of cloud users and should be tailored to their specific needs. The second, security SLA, identifies an SLA specifically defined to guarantee a given security level. Eventually, the two types of SLA should converge in a common cloud-oriented SLA definition with both specific cloud-based provisions and a focus on security, this being perceived as the main source of risks for cloud systems. However, for sake of clarity and also to reflect the current state of the art, we keep the two cloud-ori- ented SLAs separate in the present chapter.

8.5.1.1 Cloud SLA

A practical demonstration of the limitations of cloud

SLAs has been given by Baset [20]. There, the SLAs of

some cloud providers have been analyzed with respect to some relevant parameters of the quality of a cloud offer. Unsurprisingly, none has exhibited sufficient per- formance guarantees and all have placed the burden of detecting SLA violations on customers. The components

of a typical cloud provider’s SLA are [20]:

• Service guarantee: The metrics used to measure the provision of the service over a time period (e.g., availability, response time)

• Service guarantee time period: The duration over which a service guarantee should be met (e.g., a billing month, the time elapsed since the last claim, 1 hour)

• Service guarantee granularity: The resource scale to specify a service guarantee (e.g., per service, per data center, per instance, per transaction)

• Service guarantee exclusion: Instances excluded from evaluation (e.g., abuse of the system by a cus- tomer, downtime due to scheduled maintenance) • Service credit: The amount credited to the cus-

tomer for guarantee violations (e.g., complete or partial refund of the customer fee)

• Service violation measurement and reporting: How and who measures and reports violations of a ser- vice guarantee

Just considering this list of general components, it should be evident how “the devil of cloud computing” is also likely to be found in the details. Just trading a partial, instead of a complete, refund for a lesser ser- vice fee changes the risk profile of the SLA. More likely though, given the rare possibility for a customer to negotiate the contractual terms, and excluding system abuses from the guarantees (as several current SLAs prescribe) leaves the user unprotected from most secu- rity risks. Alternatively, setting a short period of time for reporting SLA violations puts the heavy (and expensive) burden on a customer of an efficient monitoring and reporting procedure. In short, many things can eventu- ally turn against the interests of a cloud customer in a typical SLA.

The study [20] considered some of the most well-

known cloud providers. All of them guarantee service availability, although even when a service should be considered as “available” it is typically full of contrac- tual subtleties and the source of severe misalignments between the interests of a cloud customer and those of a provider. No security or other strictly cloud-specific characteristics are considered explicitly; that is, security as other aspects could be said to be indirectly guaran- teed because of their impact on service availability, but it is clearly a largely insufficient form of protection from security risks. Other components of a typical SLA may differ considerably between cloud providers. The ser- vice granularity guarantee varies from data center to per instance; scheduled maintenance is often not speci- fied whether or not this is excluded from measures of service availability; the duration of scheduled mainte- nance, too, is not always specified (unsurprisingly, when

considered, as it could sensibly change the guaranteed system availability rate); the service guarantee time period may vary from a whole year to a billing or a calen- dar month. For the service credit, SLAs have the greatest variability, making it extremely difficult to compare one commercial offer to another and grasp how effective the risk transfer is from the user to the provider. The refund is often a fraction of the customer bill if a certain service availability threshold is not met (e.g., 10% of customer bill if availability is less than 99.95% of the time, 5% of customer bill for every 30 minutes of downtime up to 100%). Instead, all cloud providers agree in their SLAs that the burden of detecting a violation should lie exclu- sively with the user and that users have a relatively short time to file a claim (e.g., one billing month, 30 business days from the last claim).

In summary, the main limitation of current SLAs is their narrow focus on just service availability or

requested completion rate [20]. On the contrary, a cloud

SLA effectively covering all key aspects of a cloud ser- vice should include guarantees for disaster recovery, privacy, security, and auditability, at least. It should also prescribe that the burden of detecting SLA violations is shared between the customer and the provider and, per- haps most important of all requirements, a SLA should be negotiable to be tailored to the user’s needs and char-

acteristics. Dimension Data [37] produced another sur-

vey comparing public cloud SLAs.

8.5.1.2 Cloud Audit and Assessment

The challenge posed by assessing and auditing a cloud

system has been investigated by Kaliski and Pauley [38]

and Djemame et al. [39]. In  Toward Risk Assessment

as a Service in Cloud Environments [38], the authors correctly note that security and privacy assessments, and audits are standard practices in evaluating risks and exposures of an in-house system. Oddly, the same is not true for cloud systems, whose core features— multitenancy, on-demand service, and location inde- pendency—make external assessments and audits highly impractical and difficult. As already noted in surveys, no cloud provider currently lets users perform independent assessments and audits of their infra- structure. However, as for risk in SLA considerations— even for audit and assessment—which specific novelties cloud systems introduce are still to be fully analyzed. In short, how should a cloud-oriented assessment and audit be performed? A first attempt to answer this

question is presented by focusing on core features of a

cloud system [38]:

• On-demand self-service: Human interaction is minimized in cloud system operations; therefore, a key control point of traditional audit and assess- ment processes is missing.

• Broad network access: Data location independence complicates the verification of legal compliance; the attack surface depends on the sheer heteroge- neity of accessing devices and end points.

• Resource pooling: The set of resources deployed for a given application is not defined a priori; virtu- alization introduces correlation between services sharing physical resources; the activity of different tenants may interfere one with the others.

• Rapid elasticity: The possibility for a cloud user to scale out and in his/her resource pool introduces a degree of dynamicity that makes audit and assess- ment more difficult.

• Measured service: The pay-per-usage paradigm typical of cloud computing means that the meter- ing capability is itself one of the most critical resource/process to audit and assess.

From the previous items, a general consideration about  audit and assessment emerges: cloud computing complexity is strictly bound with dynamicity. Traditional audit and assessment procedures are  designed for (mostly)  static systems or systems whose resources are assigned and configured statically. The different nature of  cloud systems introduces remarkable limitations to current audit and assessment procedures and, conse- quently, increases the risk level.

8.5.1.3 Cloud Security SLA

Growing concerns about security of cloud services are the reason for the interest among scholars and practitio- ners in the definition of guidelines for the inclusion of security requirements into SLAs. As already discussed, the standard practice among cloud providers is not to include any security requirements, but it is likely that the pressure in this direction from standardization bod- ies, government agencies, and analysts will produce a change in the near future. The challenge is to develop guidelines for security monitoring and enforcement that

are effective in improving the security level for users, permit assess and manage risks connected with cloud services, and are tailored to the specific needs of differ- ent users.

In Towards a Security SLA–Based Cloud Monitoring Service [21], the authors presented a detailed survey of open-source and commercial cloud monitoring tools, which are SLA based or security-oriented. The situ- ation that emerges is that many tools exist, but none has become a standard adopted by cloud providers and none, probably, has yet reached the maturity and com- pleteness to be incorporated in standard procedures of cloud providers. Aside from conceptual resistances, many technical problems still have to be solved: map- ping high-level security properties to low-level moni- toring parameters, efficiently managing the complexity introduced by virtualization, and resource elasticity are among the most challenging issues.

ENISA has published one of the most comprehensive guidelines for specifying a security SLA in cloud con-

tracts [40]. The parameters to include in a security SLA

covered in the guide and accompanied by a discussion and examples are:

• Service availability • Incident response

• Service elasticity and load tolerance • Data life-cycle management

• Technical compliance and vulnerability management

• Change management • Data isolation

• Log management and forensics

As a general comment, the effort made by ENISA to be proactive in suggesting practical solutions must be noted. After that, though, the guide is based on some assumptions yet to be accepted in actual SLAs, hence, the result of the document looks sometimes more of a wish list for future, innovative cloud SLAs than a guide for today’s contracts. Two in particular are the assumptions—already mentioned in this chapter—that do not reflect today’s practices: the burden of moni- toring and detecting SLA violations must be shared

between the cloud customer and the provider, and cloud SLA must be strongly tailored to customers’ risk profiles. Neither of these, as we have seen, is currently a standard in cloud SLAs. For example, from the ENISA guide: “Parameters should be selected according to the use-case […] Parameters should also be selected based on an analysis of an organization’s principal areas of risk and the impact that the IT service will have on these.”

Interestingly, ENISA does not introduce a set of parameters strictly security-dependent, but a more gen- eral list of requirements for an effective management of security (e.g., change management and log manage- ment are more general than security management only, but clearly key for its effectiveness and fundamental for risk analysis). Also, the incident response param- eter shows an additional characteristic not discussed before but relevant to mention: “Incident response is horizontal to all other parameters since incidents and reporting thresholds are defined in terms of the other parameters included in the SLA. For example, an inci- dent can be raised when availability falls below 99.99% for 90% of users for 1 month, when elasticity tests fail or when a vulnerability of a given severity is detected.” This is perhaps one of the best examples to foster the need that a cloud provider be held co-responsible for SLA violations, because in this case it is not just a mat- ter of measuring the performance of some operational parameters on a per user basis, but to have a global measure of how the whole cloud system is performing over all customers. An example of the need for strong customization of a security SLA is given in discussing risk for the service elasticity and load tolerance. This parameter may also serve the need of absorbing DoS or DDoS attacks. However, the economic impact (losses) of DoS/DDoS is notoriously extremely variable, being strictly dependent on the industrial sector and busi- ness characteristics of the victim. As the ENISA guide observes: “Services with highly volatile demand will have more stringent requirements for this parameter. Highly static applications (e.g., running a set of low- traffic web servers with no demand variation) may not need to include this requirement, although it may still be required to ensure resilience against DoS/DDoS attacks.” Therefore, the important message to keep from the ENISA guide, in addition to the suggested parameters, is that there is no effective security SLA without sharing the burden of detecting SLA violations

between the customer and the provider, and without offering to the customer the possibility to tailor the SLA on his/her specific risk profile.

The CUMULUS European project’s deliverable Security-Aware SLA Specification Language and Cloud Security Dependency Model [41] represents a step ahead in the definition of a security SLA and once again it demonstrates both the difficulty that goal presents and the necessity of a coordinated approach. In particular, in CUMULUS the focus is the formal definition of an XML-based language specification for security-oriented SLA parameters, a definition needed if cloud SLAs have to be processed automatically (even partially) and be comparable—all features not present in current cloud providers’ SLAs. The CUMULUS work is extremely detailed and comprehensive, and should be considered as a reference guide for future security SLA specifica- tions. Anyway, it cannot escape the subtleties of secu- rity risk analysis. For instance, it cites vulnerability levels as an important parameter to consider, not even mentioned in today’s SLAs, but commonly evaluated by organizations through vulnerability scans, monitoring tools, and penetration tests. This is a reasonable obser- vation, even more general than the specific scope of a SLA. The authors explain the rule of thumb for a risk evaluation based on a vulnerability assessment: vulner- abilities should not just be counted but be weighted. This means assessing the risk posed by vulnerabilities does not end with a technical assessment, but it implies an evaluation phase with respect to the vulnerabilities fea- tures, their impact (direct and indirect) on business and on operations, and a rule for setting priorities. Defining a vulnerability level is both a technical and a manage- rial task, and it always implies a decision which could not be just “fix ’em all,” except in few trivial situations. Doing that over the years has become too expensive, rid- den with side effects, and a waste of scarce resources. Vulnerabilities must be ranked (i.e., weighted), a risk analysis must be performed based on a risk acceptance criterion, and a threshold must be set. Up to a cer- tain vulnerability in the rank, it makes sense to patch/ update, below that it is better to accept the risk and use the resources (money, people, tools) for other tasks. This is the inescapable logic behind the quest for a reli- able, effective, and prudent ranking criterion for vul- nerabilities. CUMULUS authors are well aware of this and suggest the standard solution: the CVSS (common vulnerability scoring system) algorithm, which is based

on the U.S. NVD (National Vulnerability Database) for unambiguously identifying a vulnerability and setting a risk score both numerical (on a scale from 0 to 10) and

qualitative (low/medium/high risk) [42–44]. But here

the subtleties of risk analysis applied to vulnerabilities come into play: standards often decay due to obsoles- cence, their suitability must be questioned and continu- ously investigated, otherwise risk analysis too becomes ineffective. This is what is happening to CVSS version 2: its validity as a scoring system has been severely criti- cized and for many it is no longer considered an effective solution for managing the risk posed by vulnerabilities