• No results found

2.2 Data Management Systems

3.4.1 Infrastructure as a Service Storage

Cloud services can provide low-level storage abstraction — IaaS storage — of four types: block storage, object storage, file storage, and database service. Each of these services is related to a different type of data abstraction. Block storage services provide storage at the block level over which users can configure file systems, databases, or their applications. File storage is usually provided through files systems like GFS and HDFS (see Section 3.2), while database services are usually provided by building easy-to-use interfaces on top of existing database technologies (see Section 2.2.2). This section discusses the data model and architecture of Amazon S3 [20]. Other IaaS storage solutions, such as Backblaze’s B2 Cloud Storage [125] and DigitalOcean Spaces [120], provide similar abstractions and functionalities.

3.4.1.1 Amazon S3

Amazon S3 (Simple Storage Service) is an object storage cloud service provided by Amazon, that provides high scalability, high reliability, and low failure rate. The next subsections describe in detail the storage metaphors it uses, how access control is regulated, and its architecture.

Buckets and Objects

Data in Amazon S3 is stored as objects, which are organised in buckets. An object

consists of data81 and metadata and it is identified by its name. A bucket is a named

container that can store a potentially unlimited number of objects and is regulated by a set of admin-defined rules. Amazon S3 is accessible to developers via SOAP, REST API, or using one of the provided AWS SDKs.82 The name of anobjectmust be unique within

a bucket. Objects are accessible through URL addresses with the following scheme:

http(s)://<bucket name>.s3.amazonaws.com/<object name>

As of today, Amazon S3 stores trillions of objects and handles millions of requests per second [148].

Versioning

Object versioning is optional and must be enabled at the bucket level. All versions are assigned a version id, which changes as data is updated, while the name of the object remains unchanged. Amazon S3 supports only linear versioning (i.e., no branching) and object deletion results in the creation of a special deleted version object (see Figure 3.13). Versions are accessed specifying the bucket name, the object name and the version id:

http(s)://<bucket name>.s3.amazonaws.com/<object name>?versionId=<version id>

Metadata and Tags

An object’s metadata is a set of key-value pairs (attributes). Some of the default metadata attributes are the data length in bytes, the data of last modification, and the content-type of the data. Custom metadata attributes for an object can also be set. When versioning is enabled, metadata is assigned to a particular object’s version, so that a change in the metadata results in a new object’s version.

In addition, an object can be associated with one or more mutable tags (up to ten), which are key-value pairs that can be used to group together objects stored within the same bucket. While metadata attributes are used to describe objects, tags can be used to enforce control over groups of objects, as described below.

82

Amazon S3 integrates the BitTorrent protocol too, but only to retrieve data from S3 to a BitTorrent network.

3.4. Cloud Storage

PUT PUT DELETE

GET GET GET

HTTP 404 Name = photo.gif vID = 111111 Name = photo.gif vID = 111111 Name = photo.gif vID = 121212 Name = photo.gif vID = 111111 Name = photo.gif vID = 121212 Name = photo.gif vID = 4857693 Delete Marker Name = photo.gif

vID = 111111 Name = photo.gifvID = 121212 Name = photo.gif

vID = 121212 Name = photo.gif

vID = 111111

Bucket Name: photos

Figure 3.13: Amazon S3 versioning diagram, derived from [149]. The vID is the version id of an object’s version.

Access Control and Data Protection

Amazon S3 enforces access control both at the resource (i.e., objects and buckets) level, via bucket policies and ACLs, and at the user level, through the AWS IAM (Identity and Access Management) service [150, 151].

Furthermore, Amazon S3 protects data both in transit and at rest via encryption [152]. Data encryption is enforced at a per-object granularity using either a client master-key or an Amazon-generated master-key.

Policies and Notifications

Amazon S3 enables users to define policies for the buckets they administer. Policies can be enforced on specific objects or set of objects that match some given tags or whose names start with a given prefix. The following are examples of policies:

• Define the life cycle of an object, so that after an interval of time from its creation the object is automatically migrated to cheaper and slower-access storage services (Amazon S3 Standard IA83 and Amazon Glacier [153]).

• Define a policy such that all new objects are replicated to another bucket.

• Define a policy such that all objects matching a given tag are replicated to a set of selected buckets.

Furthermore, it is possible to enable notifications on a bucket, so that whenever an event occurs (e.g., an object is added, an object is deleted, etc.) other AWS related services can be notified and act upon it.

Architecture and Performance Enhancements

Amazon S3 is implemented as a distributed storage system with per-bucket replication. Amazon S3 storage is distributed across multiple global data centres. The internal details of its architecture, however, are unknown due to the closed nature of Amazon.

Amazon S3 provides also a set of mechanisms to optimise the performance of an appli- cation using it. Some of these mechanisms are the following:

• Multipart Uploads. Objects can be divided in chunks that can be uploaded in parallel and re-assembled once they are all stored on Amazon S3.

• Range-based downloads. Objects can be downloaded in smaller chunks, enabling parallel downloads.

• Cross Region Replication. Objects can be replicated across multiple continents so that latency is reduced on object retrieval.