• No results found

2.4 NoSQL Database Categories

2.4.1 Key-Value Stores

Key-value stores are the simplest NoSQL data stores and can bee considered as the mother of all NoSQL databases. Even though they are not the same, they have many things in common. A HashMap11is the simplest data structure that can hold a set of key-value pairs and they all store data as maps [Tiw11]. They were inspired by Amazon DynamoDB storage model. The APIs the databases of this category offer the "match query" options. A match query extracts the value associated with a certain key.

• Get(key) - extracts the value given a key

• Put(key, value) - creates or updates the value given its key • Delete(key) - removes the key and its associated value

They are easy to use from an API perspective. - the client can either get the value for the key, put a value for a key, or delete a key from the data store; that is the reason why they are easy to scale and generally have great performance. The aggregates can be stored into one single bucket which is a namespace for keys. But storing all different objects in one single bucket increases the chance of key conflicts. An alternate approach is to append the name of the

object to the key, for example 288790b8a421_userProfile so that can be accessed as needed. The value can be a blob, text, JSON, XML, and so on. For the data store (not for all) the value is opaque, it is the application that should understand what is stored. The query characteristics make key-value stores likely candidates for storing session data (with the session ID as the key), shopping cart data, and user profiles [SF12].

Some key-value stores get around the "opaque" nature of the value by providing the ability to search inside the value, such as Riak. Riak [Ria] - a Riak cluster is masterless, automatically redistributes data when you scale, and keeps data available when physical machines fail. Mo- tivated by Amazon alike use cases, it stores data as key-value pairs, has a simple operational model, and comes with an HTTP API and many client libraries. Any data can be stored, in any desired format as all objects are stored on disk as binaries. It provides a feature, Riak Search which is a distributed, full-text search engine, that allows you to query the data just like you would query it using Apache Lucene or Solr indexes.

Solr12is the popular fast open source enterprise search platform from the Apache Lucene project. Lucene is a simple search library that can be easily integrated into your application. Its core facility manages indexes. Documents are parsed and indexed and stored away into a storage scheme, which could be a filesystem, memory, or any other store.

Redis- is an open source, BSD licensed advanced key-value store. It is often referred to as Data Structure server since keys can contain strings, hashes, lists, sets and sorted sets. It is an in-memory system and thus, provides optional durability [Red].

Memcached13 - it’s a distributed memory object caching system which demonstrated that in-memory indexes can be highly scalable, distributing and replicating objects over multiple nodes. It dedicates blocks of memory on multiple servers to cache data from your data store. It is free and open source, high-performance, generic in nature, but intended for use in speeding up dynamic Web applications by alleviating database load. It is being used Facebook, Twitter, Wikipedia, YouTube, and many others. It is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

MemcacheDB - is a persistent variant of memcached; it is a distributed key-value storage system which is API-compatible with memcached. MemcacheDB uses BerkleyDB as a storing backend, so lots of features including transaction and replication are supported [Mem]. BerkeleyDB14is a high-performance embeddable database providing SQL, Java Object and key-value storage. It offers advanced features including transactional data storage, highly concurrent access, replication for high availability, and fault tolerance in a self-contained, small footprint software library. It was recently bought by Oracle, the well established vendor for the relational database.

Oracle, has also developed its NoSQL solution, Oracle NoSQL. It is also a distributed key- value database. It offers high availability, rapid fail-over in the event of a node failure and

12Apache Solr: http://lucene.apache.org/solr/ 13Memcached: http://memcached.org/

optimal load balancing of queries. At the core of Oracle NoSQL is BerkeleyDB.

Oracle Coherence15 - is a memory cache layer on top of a database that takes a key-value approach.

The limitation for key-range processing of key-values stores is overcome by ordered key-value model which significantly improves aggregation capabilities. Some of the above mentioned systems are ordered key-value stores (BerkeleyDB, MemcacheDB) which provide the search function of "range query" because it sorts the keys in ascending order. Preserving some order while storing keys makes it possible to run efficiently a range query to extract the attributes associated with a key [Sho].

Almost all of the other categories of NoSQL systems were built, whether physically or conceptually, upon key-value store principles. Therefore you should expect their applications to be more specialized than, but not completely distinct from, those of key-kalue stores themselves [BBBI]. The main key-value stores offered as a service by the big Cloud vendors are Google AppEngine Datastore16, Amazon SimpleDB, Amazon DynamoDB, and Microsoft Azure Table Storage that we will introduce in more detail later on.