• No results found

3.6 Context-Aware Storage

5.1.5 The Metadata Model

The data model enables users to store and access data (i.e., via atoms) irrespective of its locations, aggregate content together (i.e., via compounds), and version content over time (i.e., via assets). The metadata model extends the data model by allowing the content that is referenced by a version to be associated with arbitrary metadata. Section 2.1.3 defined metadata as “data about data” and stated how important metadata is in data storage systems. Chapter 3 explored how state of the art storage systems manage metadata.

In the SOS, metadata is represented through the metadata manifest and it is of extrinsic type, since it is not stored within the data. The metadata manifest consists of a list of properties (see Manifest 5.6). A property is defined as the triplet (key, type, value). The propertykey can be any arbitrary string, the type defines the nature of the property, and thevalue is the actual state of the property associated with the givenkey and of the specified type. The prototype presented in Chapter 6 supports the types listed in Table 5.1. However additional property types can be added, such as arrays, objects or thetimestamp type. The type attribute, in fact, allows a property to be extended beyond the standard JSON capabilities.91 Properties of type string,long,double, and boolean are

literal values, while GUID properties enable metadata to link content over other entities of the SOS, such as the version of another asset or a role (see Section 5.1.6 below).

Type Examples

String “text/html”, “red” Long -1, 0, 1, 255 Double -3.14, 2.71828 Boolean true, false

GUID SHA256 16 5a63f...

Table 5.1: Metadata types currently supported by the current prototype of the SOS.

{

"type" : "Metadata",

"guid" : <hash(properties)>, "properties" : [

{

"key" : <property name>,

"type" : "string", "long", "double", "boolean", or "guid", "value" : <property value>

} ] }

Manifest 5.6: Metadata manifest in JSON format.

Metadata is linked to the relevant content through the version manifest. The version manifest allows metadata to be linked by using an additional, optional, metadata field (see Manifest 5.7, and Figure 5.14 for an illustration of the relationship). Having the relationship between a version manifest and a metadata manifest stored in the former allows metadata to be de-duplicated across multiple versions. The metadata field, if present, will be used to generate the GUID of the version:

GU ID=hash(Version+I+hinvarianti+C+hcontenti+

P+ [hpreviousi+.] +M+ [hmetadatai+.]) {

"type" : "Version",

"guid" : <hash(invariant, content, previous, metadata)>, "invariant" : <invariant GUID>,

"content" : <content GUID>, "previous" : [ <previous GUID> ], "metadata" : <metadata GUID> }

Manifest 5.7: Version manifest linking content and metadata.

The list of metadata properties is generated through a metadata engine that processes the content of the version (more on this in Sections 5.2.2.4 and 6.1). Figure 5.15 shows an example of apng fish image versioned by the asset 10295 and described by the metadata manifest identified by the GUID33745. The metadata describes the fish atom in terms of

5.1. The Sea of Stuff Model

Type: Version

Invariant: invariant-ref

GUID: version-ref

Content: content-ref

metadata TypeGUID: Metadata: metadata-ref Properties: [ {

Key: property-key

Type: String, Long, GUID

Value: property-value

} ]

Figure 5.14: Diagram showing the relationship between a version manifest and the associated metadata manifest.

its type (png), its size (855819 bytes), and its most prevalent colour (orange).

A node supporting the SOS model can build an inverted index on the stored metadata manifests and a reverse map for the version-metadata relationship to provide search over the content of the SOS using the information stored in the metadata. Having these two data structures, anyone searching for content of colour ‘orange’, will find the metadata manifest 33745 from the inverted index and the version 54320 from the reverse map. These data structures are stored locally to the node, but content is still searchable across the SOS as long as nodes expose their functionalities via an API.

Type: Version Invariant: 10295 GUID: 54320 metadata 33745 content f1544 Type: Metadata GUID: 33745 Properties: [ { Key: “content-type” Type: “String” Value: “image/png” }, { Key: “size” Type: “Long” Value: 855819 }, { Key: “color” Type: “String” Value: “Orange” } ] Type: Atom GUID: f1544 Locations: [ ... ] f1544

Figure 5.15: Example of metadata being linked to an atom through the version manifest.

its advantages as well as its disadvantages. For instance, having the metadata stored as a separate manifest from the version manifest, it is possible to describe and handle the content of a version without having to have the actual content. Moreover, it is possible to replicate the metadata independently of the data it describes. On the other hand, getting the data from the metadata has a cost due to the additional index data structures that a node has to maintain.

Alternative Metadata Design

An alternative design for modelling the metadata consists of storing each property in its own manifest and using a compound-like manifest to group together all such properties. The advantages of this design are better de-duplication of the metadata and the ability to perform metadata search without the need of an inverted index. On other hand, this design would add another level of indirection to retrieve metadata for some given version. Such alternative design could be explored in the future.