Germane to the study of computing is the manipulation, processing, storage, and transmission of data. Identifying the importance of data, common data functions and data-intensive implementations as they relate to cloud computing are a key underpinning. The data functions can be categorized in two tiers, one as an underlying operational tier and the second as a higher-level informational tier. The distinctions between the two tiers are important because of the functions that these two data types provide and are made clearer when considered in terms of the primary users. Data at the operational tier is more likely to be used by the Cloud Provider, Cloud Auditor, Cloud Broker, and Cloud Carrier. In some cases, the Cloud Consumer may need to use this type of data as well. Operational data functions support the manipulation, extraction, and presentation of meaningful results to end users. For the informational data type, the Cloud Consumer is considered the chief user; however, other actors in the cloud computing environment may use this as well.
6.1.1 Operational Data Functions
The following is a list of typical data services functions that are associated with data in the cloud.
Analytics Services - Reporting and Business Intelligence Services
Change Control/Tracking - Track User Versions of Files, View/Restore of prior versions
Common Functions - Data Delete, eDiscovery, Data Fusion, Data Visualization, Data filtering/reduction
Data Integrity Services - Data Replication for Disaster Recovery and Business Continuity, Data Recovery objectives (i.e., time and point), Data authenticity, Media Sanitization
Data Maintenance - Backup/Restore, Retention/Hold
Data Portability - File Portability, Archive Portability, Meta Data Portability, Database Portability, Document and Record Portability
Data Security - Identity and Privilege Management, Data sensitivity and protection, User Access/Role Controls, Forensic Analysis tools
Data Storage and Archive - Data Archive and Restore, Application storage, Internet “Drive”
secondary storage, “Scale out” storage, Compression, Encryption, Latency, Throughput, Long Term and temporary retention and preservation, Database/Data Warehouse/Business Intelligence, Video Library, Disk-Archive management
Data Translation - Data Locality
Data Transport - Data Presentation – Streaming and feeds, Cloud Data Exchange / Synchronization, Common file sharing (e.g., Wikis etc.), Bulk data transfers, Geographic Placement
File Management - Create/Modify/Delete files, Distribute files
Policy Management - Common standard Management Framework and interface, Quota Management, Archive Policy Management, Exception Management, Data locality policy administration,
Policy compliance assessment (FISMA, DoD, etc.), Privacy Policy compliance review, Support for Multiple Data Policies (GAAP, HIPPA, etc.)
Reporting Services - Power Utilization tracking and optimization, Administrative Reporting, Notification requests and management (e.g., notify when a reference document updated), Power Consumption tracking, Provider SLA reporting which including performance not accessible to general users, Activity Review, Quota management
Search - File Name and Content Search, Advanced Search (owner, creation date, modification date, accessed by)
Others - Database Operations Services, Published reference files, Forms, Training (student materials, videos, testing), Data interoperability
With this list, the above operational data functions can now be mapped to distinct sections of the RA. Security and privacy for the operational data functions are cross-cutting issues for all of the tabulated items as well:
Service Layer Cloud Service Orchestration Resource Abstraction Physical Resource
SaaS PaaS IaaS
Analytics Services x x x
Change
Control/Tracking
x x
Common Functions x
Data Integrity Services x x
Data Maintenance x
Data Portability x
Data Security x x x
Data Storage and Archive x x Data Translation x x Data Transport x x x x File Management x x Policy Management x Reporting Services (administrative, SLA, data movement, etc.)
Search x x VM Instance
Management
x
6.1.2 Informational Data and Data Services
Besides the operational data functions identified above, informational data and their associated services play important roles in the cloud computing landscape. Data services are not new computing concepts. With the use of cloud computing where the aggregation or the mash-up of multiple data sources, located in data centers across the globe, into a correlated purposeful data set needs to be identified in the Cloud Computing Reference Architecture.
Data services can be defined as a set of computing services exposing informational data in a way that adhere to cloud computing reference architecture – stand-alone or within a system of systems. There are many prominent examples, which with Application Program Interface (API) provide end users with human-readable meaningful results. These services are useful to end users because of the standardized format and methodologies that allow them to work seamlessly.
Data services that are derived from informational data, depending on their usage, can be categorized as a part of Software as a Service (SaaS) or as a part of Platform as a Service (PaaS). In SaaS or PaaS, to leverage the data and their associated metadata, software applications or standard Web interfaces are needed to extract the intended information from disparate data sets. The NIST Cloud Computing Standard Roadmap document defined data functions within the SaaS and PaaS environments.
SaaS
The varieties of the SaaS applications determine what can be consumed by the SaaS consumer. There are varying degrees of functional standardization. SaaS applications are mostly consumed using a Web browser, and some are consumed as a Web service using other application clients, such as stand-alone desktop applications and mobile applications.
For example, standard metadata format and APIs are needed to describe and generate eDiscovery metadata for emails, document management systems, financial account systems, etc., that will help government consumers to leverage commercial off-the-shelf (COTS) and government off-the-shelf (GOTS) software products to meet eDiscovery requirements. This is especially important when email messaging systems, content management systems, or Enterprise Resource Planning (ERP) and financial systems are migrated to a SaaS model.
PaaS
PaaS functional interfaces encompass the runtime environment with supporting libraries and system components for developers to develop and deploy SaaS applications. Standard-based APIs are often part of a PaaS offering to begin with (such that the PaaS provider can lure existing development away to cloud-based hosting environment).