Microsoft SQL Azure Database is a cloud-based relational database service that is built on SQL Server technologies and runs in Microsoft data centers on hardware that is owned, hosted, and maintained by Microsoft.
SQL Azure is probably the most fully-featured relational database available in the cloud. It is based on the SQL Server standalone database but the way data is managed and stored in SQL Azure is
significantly different.
Similar to an instance of SQL Server, SQL Azure Database exposes a tabular data stream (TDS) interface for Transact-SQL-based database access. This allows your database applications to use SQL Azure Database in the same way that they use SQL Server. Because SQL Azure Database is a service, administration in SQL Azure Database is slightly different.
Unlike administration for an on-premise instance of SQL Server, SQL Azure Database abstracts the logical administration from the physical administration. Users continue to administer databases, logins, users, and roles, but Microsoft administers the physical hardware such as hard drives, servers, and storage. This approach helps SQL Azure Database provide a large-scale multitenant database service that offers enterprise-class availability, scalability, security, and self-healing [11].
7.1 Subscriptions
To use SQL Azure, Windows Azure platform account must be used. This account allows access to all the Windows Azure-related services, such as Windows Azure, Windows Azure AppFabric, and SQL Azure. The Windows Azure platform account is used to set up and manage subscriptions and to bill for consumption of any of the Windows Azure services including SQL Azure, and running SQL Azure does not require Windows Azure. Whit the Windows Azure platform account, the Windows Azure Platform Management portal can be used to create SQL Azure servers, databases, and its associated
administrator accounts [11].
Each subscription allows one instance of SQL Server to be defined, which will initially include only a master database. For each server firewall settings has to be configured, to determine which connections will be allowed access.
7.2 Databases
Each SQL Azure server always includes a master database. Up to 149 additional databases can be created for each SQL Azure server. Microsoft is offering two editions of SQL Azure databases: Web and Business, and when you create a database using the Windows Azure Platform Management portal, the maximum size you specify determines the edition you create. A Web Edition database can have a maximum size of 1 GB or 5GB. A Business Edition database can have maximum size of up to 150 GB of data, in 10GB increments up to 50GB, and then 50 GB increments [11][12]. If the size of the database reaches the limit it is not possible to insert data, update data, or create new database
29 objects. However, read and delete data, truncate tables, drop tables and indexes, and rebuild indexes are still possible.
SQL Azure data access model does not support cross-database queries in the current version a connection is made to a single database. If data from another database is needed, new connection must be created [11].
7.3 Security and Access to a SQL Azure Database
Most security issues for SQL Azure databases are managed by Microsoft within the SQL Azure data center, with very little setup required by the users. A user must have a valid login and password in order to connect to the SQL Azure database. Because SQL Azure supports only standard security, each login must be explicitly created.
In addition, the firewall can be configured on each SQL Azure server to only allow traffic from specified IP addresses to access the SQL Azure server. This helps to greatly reduce any chance of a denial-of-service (DoS) attack. All communications between clients and SQL Azure must be SSL encrypted, and clients should always connect with Encrypt = True to ensure that there is no risk of man-in-the-middle attacks. DoS attacks are further reduced by a service called DoSGuard that actively tracks failed logins from IP addresses and if it notices too many failed logins from the same IP address within a period of time, the IP address is blocked from accessing any resources in the service [11].
The security model within a database is identical to that in SQL Server. Users are created and mapped to login names. Users can be assigned to roles, and users can be granted permissions. Data in each database is protected from users in other databases because the connections from the client application are established directly to the connecting user’s database.
7.4 SQL Azure architecture
Each SQL Azure database is associated with its own subscription. From the subscriber’s perspective, SQL Azure provides logical databases for application data storage. In reality, each subscriber’s data is replicated across three SQL Server databases that are distributed across three physical servers in a single data center. Many subscribers may share the same physical database, but the data is presented to each subscriber through a logical database that abstracts the physical storage architecture and uses automatic load balancing and connection routing to access the data. The logical database that the subscriber creates and uses for database storage is referred to as a SQL Azure database [11].
7.5 Logical Databases on a SQL Azure Server
SQL Azure subscribers access the actual databases, which are stored on multiple machines in the data center, through the logical server. The SQL Azure Gateway service acts as a proxy, forwarding the Tabular Data Stream (TDS) requests to the logical server. It also acts as a security boundary providing
30 login validation, enforcing the firewall and protecting the instances of SQL Server behind the gateway against denial-of-service attacks. The Gateway is composed of multiple computers, each of which accepts connections from clients, validates the connection information and then passes on the TDS to the appropriate physical server, based on the database name specified in the connection. Figure 8 shows the physical architecture represented by the single logical server.
Figure 7 Figure 8 A logical server and its databases distributed across machines in the data center [11]
The machines with the SQL Server instances are called data nodes. Each data node contains a single SQL Server instance, and each instance has a single user database, divided into partitions. Each
partition contains one SQL Azure client database, either a primary or secondary replica. Each database hosted in the SQL Azure data center has three replicas: one primary replica and two secondary
replicas. All reads and writes go through the primary replica, and any changes are replicated to the secondary replicas asynchronously. The replicas are the central means of providing high availability for your SQL Azure databases.
The other SQL Azure databases partitions existing within the same SQL Server instances in the data center are completely invisible and unavailable between different subscribers [11].
For SQL Azure databases every commit needs to be a quorum commit. That is, the primary replica and at least one of the secondary replicas must confirm that the log records have been written before the transaction is considered to be committed.
Each data node machine hosts a set of processes referred to as the fabric. The fabric processes perform the following tasks:
Failure detection: notes when a primary or secondary replica becomes unavailable so that the Reconfiguration Agent can be triggered
Reconfiguration Agent: manages the re-establishment of primary or secondary replicas after a node failure
31
PM (Partition Manager) Location Resolution: allows messages to be sent to the Partition Manager
Engine Throttling: ensures that one logical server does not use a disproportionate amount of the node’s resources, or exceed its physical limits
Ring Topology: manages the machines in a cluster as a logical ring, so that each machine has two neighbors that can detect when the machine goes down
The machines in the data center are all commodity machines with components that are of low-to- medium quality and low-to-medium performance capacity. The low cost and the easily available configuration make it easy to quickly replace machines in case of a failure condition. In addition, Windows Azure machines use the same commodity hardware, so that all machines in the data center, whether used for SQL Azure or for Windows Azure, are interchangeable
In Figure 7, the logical server contains three databases: DB1, DB2, and DB3. The primary replica for DB1 is on Machine 6 and the secondary replicas are on Machine 4 and Machine 5. For DB3, the primary replica is on Machine 4, and the secondary replicas are on Machine 5 and on another machine not shown in this figure. For DB4, the primary replica is on Machine 5, and the secondary replicas are on Machine 6 and on another machine not shown in this figure. Note that this diagram is a simplification. Most production Microsoft SQL Azure data centers have hundreds of machines with hundreds of actual instances of SQL Server to host the SQL Azure replicas, so it is extremely unlikely that if multiple SQL Azure databases have their primary replicas on the same machine, their secondary replicas will also share a machine [11].
The physical distribution of databases that all are part of one logical instance of SQL Server means that each connection is tied to a single database, not a single instance of SQL Server.
7.6 Network Topology
Four distinct layers of abstraction work together to provide the logical database for the subscriber’s application to use: the client layer, the services layer, the platform layer, and the infrastructure layer.
Figure 8 illustrates the relationship between these four layers.
The client layer resides closest to the application, and it is used by the application to communicate directly with SQL Azure. The client layer can reside on-premises in a data center, or it can be hosted in Windows Azure. Every protocol that can generate TDS over the wire is supported. Because SQL Azure provides the TDS interface same as SQL Server, known and familiar tools and libraries can be used to build client applications for data that is in the cloud.
The infrastructure layer represents the IT administration of the physical hardware and operating systems that support the services layer.
32 Figure 8 Four layers of abstraction provide the SQL Azure logical database for a client application to use [11]
33
7.7 High Availability with SQL Azure
The goal for Microsoft SQL Azure is to maintain 99.9 percent availability for the subscribers’
databases. As it was stated earlier this goal is achieved by the use of commodity hardware that can be quickly and easily replaced in the case of machine or drive failure and the management of the replicas, one primary and two secondary, for each SQL Azure database [12].
7.8 Failure Detection
Management in the data centers needs to detect not only a complete failure of a machine, but also conditions where machines are slowly degenerating and communication with them is affected. The concept of quorum commit, discussed earlier, addresses these conditions. First, a transaction is not considered to be committed unless the primary replica and at least one secondary replica can confirm that the transaction log records were successfully written to disk. Second, if both a primary replica and a secondary replica must report success, small failures that might not prevent a
transaction from committing but that might point to a growing problem can be detected [11].
7.9 Reconfiguration
The process of replacing failed replicas is called reconfiguration. Reconfiguration can be required due to failed hardware or to an operating system crash, or to a problem with the instance of SQL Server running on the node in the data center. Reconfiguration can also be necessary when an upgrade is performed, whether for the operating system, for SQL Server, or for SQL Azure.
All nodes are monitored by six peers, each on a different rack than the failed machine. The peers are referred to as neighbors. A failure is reported by one of the neighbors of the failed node, and the process of reconfiguration is carried out for each database that has a replica on the failed node.
Because each machine holds replicas of hundreds of SQL Azure databases (some primary replicas and some secondary replicas), if a node fails, the reconfiguration operations are performed
hundreds of times. There is no prioritization in handling the hundreds of failures when a node fails;
the Partition Manager randomly selects a failed replica to handle, and when it is done with that one, it chooses another, until all of the replica failures have been dealt with.
If a node goes down because of a reboot, that is considered a clean failure, because the neighbors receive a clear exception message.
Another possibility is that a machine stops responding for an unknown reason, and an ambiguous failure is detected. In this case, an arbitrator process determines whether the node is really down.
Although this discussion centers on the failure a single replica, it is really the failure of a node that is detected and dealt with. A node contains an entire SQL Server instance with multiple partitions containing replicas from up to 650 different databases. Some of the replicas will be primary and some will be secondary. When a node fails, the processes described earlier are performed for each affected database. That is, for some of the databases, the primary replica fails, and the arbitrator chooses a new primary replica from the existing secondary replicas, and for other databases, a
34 secondary replica fails, and a new secondary replica is created.
The majority of the replicas of any SQL Azure database must confirm the commit. At this time, user databases maintain three replicas, so a quorum commit would require two of the replicas to acknowledge the transaction. A metadata store, which is part of the Gateway components in the data centers, maintains five replicas and so needs three confirmations to satisfy a quorum commit.
The master cluster, which maintains seven replicas, needs four of them to confirm a transaction.
However, for the master cluster, even if all seven replicas fail, the information is recoverable, because mechanisms are in place to rebuild the master cluster automatically in case of such a massive failure [11].
7.10 Availability Guarantees
As mentioned earlier, the goal for Microsoft SQL Azure is to maintain 99.9 percent availability.
Because of the way that database replicas are distributed across multiple servers and the efficient algorithms for promoting secondary replicas to primary, up to 15 percent of the machines in the data center can be down and the availability can still be guaranteed [11].
7.11 Scalability with SQL Azure
As said earlier one of the biggest benefits of hosting your databases in the cloud is the built-in scalability. With SQL Azure as with the most cloud database platforms you add more databases only when and if you need them, and if the need is only temporary, you can then drop the unneeded databases. There are two components within SQL Azure that allow this scalability by continuously monitoring the load on each node. One component is Engine Throttling, which ensures that the server doesn’t get overloaded. The other component is the Load Balancer, which ensures that a server isn’t continuously in the throttled state. In this section, we’ll look at these two components and discuss how engine throttling applies when predefined limits are reached and how load balancing works as the number of hosted database increases. The third technique to achieve greater scalability and performance are the Federations [31] used in SQL Azure. One or more tables within a database are split by row and portioned across multiple databases (Federation members).
This type of horizontal partitioning is often referred to as ‘sharding’. The primary scenarios in which this is useful are where you need to achieve scale, performance, or to manage capacity [11].
7.12 Throttling
Because of the multitenant use of each SQL Server in the data center, it is possible that one subscriber’s application could render the entire instance of SQL Server ineffective by imposing heavy loads. For example, under full recovery mode, inserting lots of large rows, especially ones containing large objects, can fill up the transaction log and eventually the drive that the transaction log resides on. In addition each instance of SQL Server in the data center shares the machine with
35 other critical system processes that cannot be starved – most relevantly the fabric process that monitors the health of the system.
To keep a data center server’s resources from being overloaded and jeopardizing the health of the entire machine, the load on each machine is monitored by the Engine Throttling component. In addition, each database replica is monitored to make sure that statistics such as log size, logs write duration, CPU usage, the actual physical database size limit, and the SQL Azure user database size are all below target limits. If the limits are exceeded, the result can be that a SQL Azure database rejects reads or writes for 10 seconds at a time. Occasionally, violation of resource limits may result in the SQL Azure database permanently rejecting reads and writes (depending on the resource type in question) [11].
7.13 Load Balancer
At this time, although there are availability guarantees with SQL Azure, there are no performance guarantees. Part of the reason for this is the multitenant problem: many subscribers with their own SQL Azure databases share the same instance of SQL Server and the same computer, and it is impossible to predict the workload that each subscriber’s connections will be requesting. SQL Azure provides load balancing services that evaluate the load on each machine in the data center. When a new SQL Azure database is added to the cluster, the Load Balancer determines the locations of the new primary and secondary replicas based on the current load on the machines.
If one machine gets loaded too heavily, the Load Balancer can move a primary replica to a machine that is less loaded [11].
7.14 SQL Azure Management
Because your SQL Azure databases are hosted within larger SQL Server instances on machines in the data centers, the management work that needs to be done is very limited. However, some maintenance tasks are still necessary.
All physical aspects of dealing with your databases are handled in the data center by Microsoft.
Also all the upgrades are handled in the data center one replica at a time. The user has responsibility to troubleshoot poorly performing queries and concurrency problems, such as blocking.
Just like in SQL Server, some of the main tools available for troubleshooting are the dynamic management views (DMVs) [11].
7.15 Pricing in SQL Azure
Billing in SQL Azure is per database, based on usage and database edition, this allows organization
36 to start with a small investment and add space as the business grows. SQL Azure provides two different database editions, Business Edition and Web Edition. SQL Azure edition features apply to the individual database. They can be mixed and match different database editions within the same
36 to start with a small investment and add space as the business grows. SQL Azure provides two different database editions, Business Edition and Web Edition. SQL Azure edition features apply to the individual database. They can be mixed and match different database editions within the same