Chapter 3: Modern enterprise databases explained The last couple articles presented an introduction to Application Performance
Management (APM) and identified the challenges in effectively implementing an APM strategy. Now, we are going to provide a general introduction to the various databases within your environment. Before we begin, let’s set the stage for the particular types of databases we are going to cover. In this article, we’ll provide an overview of what SQL and NoSQL are, and the various types of databases available for customers today.
This section is not meant for the more advanced users. If you’re already familiar with these basic concepts, feel free to skip this section and go straight to the top-6 metrics discussed in the next article. Otherwise, feel free to read the following for a good general introduction to the various database types.
Structured query language
SQL stands for Structured Query Language and is used for database access and administration. International corporations Oracle, Microsoft, and IBM, offer a broad range of cloud-based and local applications for data warehousing and enterprise resource planning.
Data warehousing happens whenever data is stored in a database in any context, for any purpose. When you log into your email, your computer sends your username and password to the email server. The email server then takes that information and queries the database for your username, meaning it searches the database for the entry (or row) with your username. A row has multiple data cells (or data entries). It is typical that the row for your username would have your password, last login time and other pertinent information corresponding specifically to your email account. Other rows in the database are for other email accounts. When the database receives a query from the server, it returns the information in those cells to the server. The server checks the password you entered with the text in the password data cell of the row with the username you entered.
Now let’s discuss the differences and advantages of different types of SQL databases and why a company would select one over another to meet its business needs.
MySQL databases
Your PHP application may be utilizing a backend database, a caching layer, or possibly When choosing a system to meet your businesses needs, you must consider benchmark performance. A benchmark is a standard that every implementation of a system follows.
Because MySQL (owned by Oracle) was created with an eye towards speed, every implementation of a MySQL databases is fast and lean. So, MySQL has a benchmark performance standard for speed and efficiency.
If your system needs to perform over a thousand database transactions per hour, such as a high volume online retailer, you would likely select MySQL for its reliable speed. If you want each of your customers to click “purchase” and have the transaction complete and next page load in a few seconds, you will need a database that is created for speed and optimized for quickly handling dozens of transactions per second, simultaneously.
Also, MySQL can run faster on smaller computer systems. A computer with a 500 GB hard drive and 4GB of RAM might be able to run large server applications with no issues, but if you’re working with older systems with fewer capabilities, or if you’re working with embedded systems (mobile devices), you need to choose a database system that is optimized for fast performance on systems with limited resources.
Fun fact: Many well versed MySQL developers do not know where the “My” originates.
One of the developers of MySQL was a Swedish man named Michael “Monty” Widenius.
He named MySQL after his daughter My. He named two other database products (MariaDB and MaxDB) after his other children.
PostgreSQL databases and ACID compliance standards
PostgreSQL is not owned by any corporation. It is open source and non-profit, much like OpenGL or Linux Ubuntu.
A benchmark standard for PostgreSQL (or Postgres) databases is compliance with Atomicity, Consistency, Isolation, and Durability also referred to as ACID standards.
Atomicity is a standard for being able to complete multiple database transactions, with each one’s success being contingent upon on another. Imagine a customer purchasing a book on Amazon. When he clicks submit, the order is processed, and two events happen: His credit card is charged (one database transaction), and a book is reserved to be shipped to him (a second database transaction). Both of those things are contingent upon the other. If his credit card is declined or fails, no book will be shipped to him.
If the software systems find that that book is out of stock or for some other reason cannot be shipped to him, his credit card will not be charged. Either both events occur, or neither event occurs. Atomicity is the standard that measures how capable your software system is of handing separate but co-dependent database transactions.
Consistency is a standard that measure compliance with rules of how data can be changed. Let’s go back to your email account database entry. The last_login_time data cell has a timestamp that includes the date, time, and timezone of a transaction. If you run a database transaction that wants to put the word ‘elephant’ in the last_login_time data cell, the database should throw an exception (meaning to return an error) and explicitly disallow that change to be made.
Isolation is the standard that forces each database transaction to be made independent of one another. Let’s go back to the customer purchasing a book on Amazon. If the book is a bestseller and has hundreds of copies sold daily, it is very likely that several customers could purchase a copy in the same minute and second. Consider a situation where Amazon has five physical copies of the book left, and ten individual customers attempt to purchase a copy at the same time. The database transactions will be handled in the order they were received. If the first five database transactions reserved a copy of the book, and the next five database transactions also tried to reserve copies of the book at the same time, it’s easy to see how massive logistical errors can arise: The second group of 5 customers will have their credit cards charged, and will be told a book is shipping to them, when in reality none exists. They will have to call customer service to rectify this issue - which is a massive overhead operation cost to the business.
Isolation means database transactions happen independently of one another. One database transaction must be closed and have completed all its changes before another one can begin. Sometimes, when a customer clicks “purchase” on Amazon, because of Isolation compliance standards, it can take 10-30 seconds before the next page loads.
That is not because of slow internet connection; it is due to ACID compliance standards to prevent logistical problems.
Durability is a standard that measures if the changes will last even if problems occur on the system. If Amazon’s database is located on a physical hard drive in Topeka, Kansas. Consider the example best-selling book sales again. As soon as the first five customers click submit, there are a power outage in Topeka and Amazon’s server and
13 Chapter 3: Modern enterprise databases explained the database system crashes. Let’s say Amazon has an efficient recovery system and
the entire software system is up and running in less than 2 minutes after the crash. Will the changes from the first five database transactions still be there if the system crashes unexpectedly? If the system is ACID compliance, they will be. Postgres got its name from the “post ingress” orientation of how transactions are handled.
MS SQL server databases
MS SQL Server Databases is a Microsoft developed database system that is widely renowned for its reliability and high levels of data integrity and durability and other ACID compliance features. On implementation of MS SQL Server Databases is called Microsoft Jet, a file sharing system. No, it is not a p2p file sharing system used for stealing music. Much like Google Docs, multiple users can access the same file and read/write (according to that user’s permission settings). What if two users open the same file at the same time? One person makes changes, and the other person makes and entirely different set of changes unrelated to the first. The first person saves his changes, and the next person saves his changes five minutes later. You might think the first person’s changes will be overwritten, but not necessarily. Microsoft Jet has two fixes for this problem. The first is edit locking. Only allowing one person to have the file open for editing at one time. Before another person can open the file for editing, the first person must save their changes and close it out. Another solution that some files sharing systems use is called change merging. It will use advanced algorithms to take all of the changes from both users and merge them and create a new version of the file that contains all changes from all users. This is very common in version control system for developers who write computer code.
Further distinctions between MS SQL Server Databases and other competitive systems include the methods of implementation. Just as C# differs from its open source
competitor Java, MS SQL Server is very similar to MySQL server except Microsoft has added its distinctive touches to the coding practices and standards. These changes are for better or for worse depending on whom you ask.
DB2DB2 is a direct competitor to MS SQL server, offered by IBM. IBM is more focused on compatibility with other IBM services. Flexibility and compatibility with other components of your software solutions is one of the most important criteria. Thinking intuitively, using IBM cloud services mixed in with Microsoft database software tools might not make for a smooth operation of your total software solution.
Three main criteria should be:
– reliability and functionality – speed and efficiency
– compatibility with your other software solutions
MySQL, the most popular choice, boasts limber speed and unmatched efficiency.
Postgres offers structure and time-tested reliability. Microsoft’s SQL Server and IBM’s DB2 are designed to work in conjunction with other software solutions from their respective companies.
NoSQL databases
Enterprises with enormous computing operations around the world are facing increasingly heavy demands for database processing power and speed. “Big Data” is impacting organizations at every turn, and traditional SQL relational databases are beginning to crack. The solution for many organizations lies in NoSQL.
While relational databases use tables for data storage and retrieval, NoSQL accesses structured and unstructured data with high fault-tolerance, which means it is designed to continue operating even if failures occur in the system.
NoSQL was created to overcome problems relational databases have in scaling up to handle large amounts of read-and-write operations. SQL scales “vertically,” with all of the data on one server. NoSQL works “horizontally,” pulling data from large stores across multiple servers. This is why it can access significantly more data than SQL. To do this, there are four NoSQL database types:
Types of NoSQL databases
Column store
Unlike the row-based system of relational databases, column databases keep certain values together in the same column, providing high-speed access for retrieval requests of the same data attributes. This is a good system for when you have aggregate queries on a field.
This database method is ideal when you have lots of data, disk access, and high-availability. These criteria are why Facebook implemented Cassandra and Google came out with Bigtable. Another benefit is that column databases are happy running on multiple server clusters. Databases that can run on a single machine probably don’t need the power of a column store database. Better alternatives in this situation are document store and key-value database implementations.
Situations that are well-suited for column store databases include:
– When you need to write to the database every time
– When data is stored over many data centers in different locations – Apps that employ dynamic fields
– When your program may eventually need to process immense amounts of data
Use case scenarios that could potentially need this amount of processing power include:
– Search operations online
– Stock market and trading analysis – Security analytics
– Social media
Apache Accumulo and open-source Druid are examples of this type of NoSQL database.
Chapter 3: Modern enterprise databases explained
Document store
Data is stored as documents. These are not documents like from a word processor;
rather, they are key-value pairs of strings, sequences of alpha or numeric characters.
When flexibility is paramount, document stores shine. This is especially true when you need to store attributes along with the data. With this database type, you can also use embedded documents. This feature is beneficial when denormalizing, which is storing data in the same document if that data is frequently accessed.
Document stores are perhaps the most widely implemented NoSQL database type because of their high performance, ease of deployment and flexibility. You’ll see them used in many different scenarios including:
– Keeping track of various metadata types.
– Situations that can benefit from denormalization.
– Support for web entities with large volumes of disk access actions.
– Apps that use JSON.
– Data with variable attributes, such as parts, inventory or products.
Examples of document stores are Couchbase, Lotus Notes, and Apache CouchDB.
Cloud-based vendors include Microsoft Azure and Cloudant.
Graph databases
These databases are superior at working with interconnected data. Each database utilizes connections between nodes. Nodes are items you are keeping track of —employees, accounts, sales and more. Edges are the interconnections between nodes and represent the relationships between them. Graph databases fly across these connections rapidly, but scalability is limited as all of the data must be located on one machine.
Examples, where graph databases are used, might be freeway systems interconnecting states and towns, workers meeting with other staff in a company or modeling of
molecular structures. These different elements have a relationship or connection of two instances of entities.
Graph databases are ideal for problem domains such as:
– Business processes.
– Social media.
– Information technology infrastructure.
– Product or service recommendation engines.
– Permissions and access control.
Each of these examples shows that graph databases are well-suited for modeling explicit relations. Sometimes graph databases are built as an add-on to other database systems such as column-store. Usually, this construct arises when there is a need for large-scale processing.
Key/value databases
Every key in this type of database is paired to a value. A traditional relational database uses key values to pull data. For example, a human resources manager might search employees based on the position, and the “position” field serves as the key.
In key-value NoSQL, there are no fields - if there are modifications made, the entire value must be changed, exclusive of the key. This method scales well, but complex operations may be difficult.
Key-value setups have simple queries. There are some databases in this category that are more sophisticated, but, in general, key-value databases do not have the query strengths of graph databases, document store, and column store.
These databases are used for a myriad of use, cases including:
– Memory caches for relational databases.
– User data stores for mobile apps.
– Online shopping carts.
– Image, audio and other large file storage.
Examples include Aerospike, MemcacheDB, and MUMPS. Cloud-based providers include Microsoft Azure and Amazon Web Services SimpleDB.
Conclusion
As the fast-moving world of database technology continues to evolve, NoSQL will become increasingly widespread. Databases are growing exponentially, and enterprises need creative solutions to processing challenges that SQL finds daunting. Just look at Facebook as an example. With over 1 billion members, their data demands are far and above anything that SQL can handle.
In comparison, NoSQL can handle the volume and variety of data that these companies manages every day. Craigslist, for example, switched from MySQL to MongoDB and experienced much faster innovation and product development. On the other hand, Uber chose MySQL, and then had to modify it considerably to emulate a NoSQL database.
SQL or NoSQL
However, the rapid growth of NoSQL does not mean that it is right for your company or database challenges. The banking industry, for instance, continues to use relational databases that process millions of customer transactions every day. SQL is much better suited for this scenario. Likewise, NoSQL is not very good at non-real-time data warehousing. An old-school SQL application is better in this case.
Carefully weighing your future growth, database demands and a need for flexibility and speed will indicate whether you should implement NoSQL. There are many advantages, but the drawbacks must be considered and steps taken to minimize their impact.
In the next article, we’ll look at the top-6 performance metrics to measure in enterprise databases and how to interpret them.