Discussion and Lessons Learned - Extending a methodology for migration of the database layer to

At the heart of NoSQL data modeling is the principle of de-normalization. Data modeling and consequently the schema design is driven by the application specific query patterns, thus it can’t be thought of an automatic way of moving from RDBMS to NoSQL in case of complex relational schemas and unknown application query patterns. This is the reason and a great finding of our work, why it is not feasible to have a plug-in mechanism for mapping from a concrete relational database to a NoSQL store.

Data integrity is not ensured by the NoSQL database, as there are no referential integrity constraints, cascade update/delete, that are the proven mechanisms of RDBMS to preserve the integrity of data. This makes NoSQL stores not suitable for applications that have high requirements for data integrity. Even though, also if you use the MyISAM engine of MySQL it does not ensure data integrity and transaction support. In traditional RDBMS, the data can be retrieved using any query tools. In NoSQL databases, there are query tools but is the application the owner of the data and serves them using services.

When considering adoption of NoSQL databases, there are many barriers that might prevent the companies from doing so. They have to be carefully considered based on the use case. One of these are the security features when considering a NoSQL for the Cloud. Most of the NoSQL stores were not designed with security in mind and exposing those instances in the Cloud might be too risky. Even though some might offer more security features than others, still they are not in a mature state. Also, the community and commercial support is a factor

In this final chapter we summarize the results and contributions of our work and provide recommendations for future work.

7.1 Recommendations

Many successful NoSQL adoptions are an example of polyglot persistence. You should not think of NoSQL databases as a magical hammer for all the problems. They are meant for specific use case and make trade-offs to achieve those objectives. Design your data model such that operations are idempotent. In an eventually consistent and fully distributed system idempotent operations can help a lot. They allow partial failures in the system, as the operations can be retried safely without changing the final state of the system. It can allow you to work with eventual consistency without causing data duplication or other anomalies [Pat]. In order to achieve good performance, building a caching layer on top of the NoSQL store back-end, in case it doesn’t have a built-in, has been proved to be a successful approach and is being applied by Web social applications, especially gaming sites. Also using key-value store of Oracle NoSQL for gaming application and using RDBMS for ad-hoc analytics, is another example of a polyglot persistence [Oraa].

You can make use of queues to separate writes to the database and maintenance of indexes in case of no indexing support from the database, thus you shorten the response time as you don’t have to wait for the index to be created or updated. For those NoSQL stores that have limited search functionality, an integration with other search engines like Apache Solr or Lucene, is a good practice. And this might be useful for media and content based applications. In MongoDB, it takes time and resources to deploy sharding, and if your system has already reached or exceeded its capacity, you will have a difficult time deploying sharding without impacting your application [Monb]. So the recommendation here is to think of sharding during your data modeling, think of which collections you want to shard and which are the sharding keys.

When using MongoDB as a Service from MongoHQ, it is recommendable based on advices received from the MongoHQ support during communication with them, in case your application is in Java to use the MongoDB driver for Java and not the REST APIs, because even for basic interactions using the driver is going to provide faster and better results than going over HTTP. MongoDB also recommends to use replica sets instead of master/slave setup to achieve replication for production environments. In NoSQL stores it is very important how you decide to split the data. This is a decision you should pay a great attention since the beginning, in order to avoid impacting negatively the performance and operations. A bad partitioning key can result in "hot spots", i.e. certain machines responsible for serving the

biggest amount of data and requests. One of the means to avoid these hot spots is consistent hashing which distributes data evenly across nodes.

If you want the query transformations efforts to be minimal, you can use a NoSQL that support SQL-like query language (a subset of SQL), like SimpleDB, Cassandra (with its CQL), and making the necessary changes at the data access layer. If your application uses range queries, chose a database that supports range partitioning, so that you avoid crossing partition or servers boundaries that will result in decreased performance.

Use MapReduce in a controlled manner and outside your peak production hours, cause it might effect your performance. Successful use cases of using NoSQL in the Cloud, show that they migrated also the business logic layer to the Cloud. If you put into the same AWS availability zone, the application and the database layer, you reduce the communication latency of the application with the database.

In document Extending a methodology for migration of the database layer to the cloud considering relational database schema migration to NoSQL (Page 92-96)