• No results found

Best Practices: SQL Server 2012 on Violin

N/A
N/A
Protected

Academic year: 2021

Share "Best Practices: SQL Server 2012 on Violin"

Copied!
17
0
0

Loading.... (view fulltext now)

Full text

(1)

Abstract

This technical report describes best practice recommendations from Violin Memory for deploying a Microsoft SQL Server 2012 database on a Windows Server 2008 R2 operating system for use with Violin Memory 6000 Series flash-based storage arrays.

Best Practices: SQL Server 2012 on Violin

Deployment Best Practices for Microsoft SQL Server 2012 on Windows Server 2008

With the Violin Memory 6000 Series Array

(2)

Table of Contents

1 Introduction ...3

1.1 Purpose and Scope ...3

1.2 Intended Audience ...3

1.3 Additional Resources ...3

2 Installing the Violin Memory Array ...3

2.1 Installation Checklist ...3

2.2 LUN Settings ...4

3 Configuring Host Hardware and BIOS ...5

3.1 Hyper Threading ...5

3.2 HBA Card Placement ...5

4 Configuring Windows Server ...6

4.1 Lock Pages in Memory ...6

4.2 Sector Alignment (Disk Alignment) ...7

4.3 Disk Allocation Units (Data Block Size Selection) ...7

4.4 Storport Driver Setting (For PCIe Direct Attached) ...8

4.5 HBA Driver Settings (For Fibre Channel or iSCSI) ...8

4.6 Microsoft SQL Server Tools ...9

4.7 Log File Storage Performance ...9

4.8 Turning off Indexing for all Disks ...10

5 SQL Server Recommendations ...10

5.1 Sizing the Required Database Space ...10

5.2 SLC vs. MLC ...10

5.3 Number of Files ...11

5.4 CPU Affinity ...11

5.5 Trace Flags and Startup Parameters ...11

5.6 Soft NUMA ...12

5.7 Configuration Parameters ...12

5.8 Maintenance ...14

6 Utilizing New Features in SQL Server 2012 ...14

6.1 Columnstore Index ...14

6.2 HADRON and Database Mirroring ...14

7 Performance Testing with SQLIO ...15

7.1 General Testing Procedures ...15

(3)

1 Introduction

This technical report describes best practice recommendations from Violin Memory for deploying a Microsoft SQL Server database on a Windows Server operating system for use with Violin Memory 6000 Series and 3000 Series arrays.

1.1 Purpose and Scope

The purpose of this document is to present key best practices that Violin Memory Systems engineers have found to be of benefit when configuring and using our flash memory arrays. The suggestions encompass both OS settings for Windows Server 2008 and 2008 R2, and SQL Server DBMS settings for SQL Server 2005, 2008, 2008 R2 and 2012. In addition, several key features of SQL2012 are highlighted, as VMS feels these can significantly increase the reliability and utilization of a system running SQL Server and using VMS arrays.

Many of the best practices suggestions have not changed from previous versions of SQL. Unless otherwise noted, these best practices can be applied to SQL versions 2012, 2008, 2008 R2 and 2005.

1.2 Intended Audience

This document is intended for SQL Server and Windows Server administrators who have prior knowledge of and experience with SQL Server 2012 database installation and configuration in an enterprise-scale environment that utilizes shared storage devices.

1.3 Additional Resources

Best Practices for Tuning Microsoft SQL Server on the HP ProLiant DL980:

http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c02861709/c02861709.pdf Configure SQL Server to Use Soft-NUMA (SQL Server):

http://msdn.microsoft.com/en-us/library/ms345357.aspx Understanding Non-uniform Memory Access:

http://msdn.microsoft.com/en-us/library/ms178144.aspx How to Map TCP/IP Ports to NUMA Nodes:

http://msdn.microsoft.com/en-us/library/ms345346(v=sql.100).aspx

2 Installing the Violin Memory Array

This section provides an installation checklist and recommendations for optimal installation of the Violin array. The items in this list are presented in more detail in the sections that follow.

2.1 Installation Checklist

This list is derived from issues commonly encountered when installing a Violin array for the first time. To avoid potential issues when installing an array, make sure that:

1. LUNs are created at 4k on the Violin array.

2. NTFS Volumes and Mount Points are created at 4k. 3. Indexing is turned OFF on each Volume or Mount Point.

(4)

6. All LUNs are exported to the proper HBA ports.

7. The correct number of HBA ports has been installed. Too few and/or too slow of HBA ports can cause the transport layer to be the bottleneck of the I/O subsystem (~750 MB/s per FC8 port).

8. HBA cards placed in proper slots. Make sure that the HBA cards are placed in PCIe x8 slots. 9. HBA cards are spread evenly between IOHs (in a multi-HBA and/or multi-IOH architecture).

10. HBA cards in risers, or in otherwise extended PCIe bus slots, are not bound by the aggregation of going through one slot (all of the extended slots funnel down into one x16 or x8 slot). In some cases, this will cause the x8 slot in the extended bus not to be allowed its full x8 throughput.

11. Review core affinity if using virtualization or on a hard NUMA system (4 sockets or more).

12. Verify that Windows has not moved the Page File onto the array. If the amount of RAM on the server is near the same size as the C: drive, Windows might move it to a larger volume.

13. Verify proper switch zoning or quality of service settings at the switching layer.

14. Create enough MPIO paths when in a shared fiber channel architecture. Three dual-port cards are required to establish enough paths to fully drive a 6000 series array.

2.2 LUN Settings

When creating LUNs to be used by Windows Server 2008 or 2008 R2, the LUNs should be created with a 4k Block Size. This, along with a NTFS allocation unit size of 4k when creating the Windows volumes and mount points, will allow the array to flow data through the system in alignment.

Violin, Windows and SQL Server will operate with any size or number of LUNs. NTFS requires at least two LUNs in order to max out the IOPs for one Windows Server instance.

Do not use special characters in the name of LUNs. Use “DBLog1” instead of “DB_Log1”.

Refer to the “Verify 4k Alignment Settings” steps to validate the LUN settings, as seen in Windows, after the LUNs have been exported to the host server.

(5)

3 Configuring Host Hardware and BIOS

This section covers best practices for configuring the host hardware and BIOS, including recommendations for hyper threading and HBA card placement.

3.1 Hyper Threading

Hyper threading is implemented differently by each chip manufacturer, and its effect on each system depends upon the workload profile. In general, systems with many small units of work will more likely receive benefits from hyper threading than systems with lower numbers of larger workloads.

By design, hyper threading is built to optimize CPU efficiency by allowing multiple workloads to be processed in parallel. Hyper threading is implemented differently by all of the processor vendors. Specifically, x86 hyper threading does not guarantee time slices. As such, if multiple processes exist that are dependent upon one another it is conceivable that one unit of work could ‘get stuck’ on one logical core while waiting for a related process to get time on another core. This is potentially applicable to databases, which perform high levels of concurrent transactions that require inter-user locking (write heavy OLTP) or applications specifically bound by a single thread (transaction log writes). Conversely, SPARC’s implementation guarantees time slices between threads. This can either unlock intertwined processes or throttle threads with large or steady workloads (transaction log writes).

All systems should be tested in both ON and OFF states to determine the best results for the individual system. Special consideration must also be paid to scale and how each system will perform under peak conditions. Violin Memory recommends testing in the ON state to start.

3.2 HBA Card Placement

Due to the high throughput nature of the Violin arrays, HBA cards will most likely require a PCIe x8 slot. Refer to the server manual to see which slots have which capability. Take care to distinguish between the physical size of the PCIe slot (often called “physical width” or “mechanical width”) and the actual throughput of the slot (often called “electrical width”). Some slots will have x8 connectors but only allow for x4 data throughput (this is normally for adding extra power to a 2-form factor card). Some server architectures have riser or add-on PCIe extender boards. In some cases all of the extended slots will flow through one connection slot into the motherboard. This means that all of the extended slots must share the one extender connection’s capacity. So, an x8 slot on the extended bus may not be capable of the full x8 if other slots are in use. Further, in larger systems (usually 4 sockets or more) there will be two or more IOHs (I/O coordinators). Each IOH is responsible for the I/O throughput between NUMA nodes and the PCIe bus. If the destination system has more than one IOHs (refer to the hardware manual), the HBA cards must be placed to spread the I/O load evenly over the IOHs. Networking (NIC) cards must also be spread evenly between IOHs.

Table 1. HBA Card Interface Throughput

PCI Express 1.0 - 250MB/s per lane PCI Express 2.0 - 500MB/s per lane Fibre Channel

x4 – 1GB (1000MB) x8 – 2GB (2000MB) x4 – 2GB (2000MB) x8 – 4GB (4000MB) 1GFC – 200 MB/s 2GFC – 400 MB/s 4GFC – 800 MB/s 8GFC – 1600 MB/s 16GFC – 3200 MB/s

(6)

Requirements:

• Single port 4GFC = 800MB = PCIe Gen 1 x4 or higher

• Dual port 4GFC = 1600MB = PCIe Gen 1 x8 or PCIe Gen 2 x4 or higher • Single port 8GFC = 1600MB = PCIe Gen 1 x8 or PCIe Gen 2 x4 or higher • Dual port 8GFC = 3200MB = PCIe Gen 2 x8 or higher

4 Configuring Windows Server

This section presents best practice recommendations for deploying Microsoft Windows Server for use with SQL Server and Violin arrays. Deployment may vary depending on hardware selection; check with your hardware provider for specific best practice recommendations.

4.1 Lock Pages in Memory

Locking pages in memory is not required on 64-bit operating systems, unless SQL Server is using Windows Large Pages Feature (run flag –T834, see below).

The Windows policy Lock Pages in Memory option is disabled by default. This privilege must be enabled to configure

Address Windowing Extensions (AWE). This policy determines which accounts can use a process to keep data in physical memory, preventing the system from paging the data to virtual memory on disk. On 32-bit operating systems, setting this privilege when not using AWE can significantly impair system performance.

Use the Windows Group Policy tool (gpedit.msc) to enable this policy for the account used by SQL Server. Only system administrators can change this policy. To enable Lock Pages in Memory, follow these steps:

1. On the Start menu, click Run. In the Open box, type gpedit.msc. The Group Policy dialog box opens. 2. On the Group Policy console, expand Computer Configuration, and then expand Windows Settings.

3. Expand Security Settings, and then expand Local Policies.

4. Select the User Rights Assignment folder. The policies will be displayed in the details pane.

5. In the pane, double-click Lock pages in memory.

6. In the Local Security Policy Setting dialog box, click Add.

7. In the Select Users or Groups dialog box, add an account with privileges to run sqlservr.exe.

For architectures that utilize Violin arrays as shared SAN storage, HBA tuning will be required. Achieving the best I/O throughput results requires maximizing the usage of the I/O pipe similar to optimizing network usage. There are two main areas to address: activating and configuring multipathing (MPIO), and HBA queue depth.

4.1.1 Queue Depth

Violin Memory arrays respond so quickly that higher-than-default settings should be used for the HBA queues. Common defaults for HBA queue depths are 32 or 64. Violin recommends increasing those values to 128 for data warehouses and 256 for OLTP servers. Adjust as needed. All paths heading to the same LUNs should have the same queue depth values. It should be noted that each I/O is considered one unit in the HBA queue depth. IO’s can vary in size from the SQL Server minimum of 8k during OLTP loads up to 1MB during backups. Normal OLTP loads will range in the 8k to 32k range. Normal reporting loads will more likely be in the 32k to 64k range. Enterprise edition of SQL Server can go up to 128k for the read ahead process. Index rebuilds will use up to 128k batch size IO’s. Backups will use 1MB reads and writes as possible. Due to this variance in I/O size, that each I/O is considered one unit in the HBA queue depth and that there could be many paths due to MPIO, it is recommended that testing be done to right-size the queue depth for each installation. It is likely that the queue depth will need to be raised from the defaults.

(7)

4.1.2 MPIO Policy

With many paths between the host server and the storage array comes the issue of how to decide upon which path to place each IO. The default setting is Round Robin. This will evenly distribute the IO’s over all the paths and is likely to yield the best throughput. IO’s will vary in size, amount and timing so the simplest way to even the resulting bytes-transferred-per-path is to allocate them in round robin fashion.

There are times in which the policy of Least Queue Depth will result in greater throughput. These tend to be on systems with a very static I/O size such that out-of-order placement will not stick a small I/O behind a group of large IO’s, etc. If the I/O batch size of the destination system is highly static then it is worth testing this setting. Do not mistake the results of a homogeneous benchmarking tool as being similar to what will be seen in production. The more production-like the testing the more likely the results are to carry forward into usage.

For Direct Connected, no switch, Fibre Channel, use “Point-To-Point”.

4.1.3 Selecting the Optimum Number of LUNs, Volumes and Mount Points

In the past, system architects were required to consider the number of LUNs, and LUN placement, when laying out a system for maximum efficiency. Due to the random access nature of flash and the randomized placement of pre-raided data onto the Violin array, the number of LUNs and their placement is no longer a performance concern. Systems will receive the maximum performance of the array with just one LUN.

As of Windows Server 2008 R2, the addition of NTFS requires that a minimum of two LUNs be deployed for SQL Server architectures which can be delivered from one LUN per Violin array, if there are multiple arrays, or two LUNs from a single Violin array. Otherwise, Violin performance is not affected by the number of LUNs, Volumes or Mount Points.

4.2 Sector Alignment (Disk Alignment)

Disk alignment is important because misaligned disk can cause a significant reduction in performance. In Windows 2008 and Windows 2008 R2 disk is aligned automatically, in Windows 2003 and older versoins, you must re-align the disk manually if they are not already, and you must do it before data resides on the disk because re-partitioning a disk (re-aligning) will remove all the data. If you have upgraded from Windows 2003 to Windows 2008, you will need to validate that the disks are aligned properly since upgrading will not result in automatic realignment.

For more detailed instructions on detecting and repartitioning disks, visit the MSDN library at: http://msdn.microsoft.com/en-us/library/dd758814(v=sql.100).aspx

4.3 Disk Allocation Units (Data Block Size Selection)

It has long been common practice to allocate disks in 64k sectors. Generally, the more data processed per disk access, the better the overall throughput. This is due to the overhead associated with accessing data on traditional hard drives. While improving throughput, large block allocations are of little help for high transaction or random read workloads, and often cause significantly more data to be processed than needed. With a Violin array, that overhead is no longer variable, nor long enough to require large batch requests for optimization.

As such, when creating NTFS partitions, Violin recommends that customers set the Disk Allocation size to 4k for optimal performance. Similarly, a block size of 4096 must be selected when creating the corresponding LUN on the Violin array. In addition to the decrease in disk I/O latency that flash-based memory delivers, choosing a 4k block size allows the application to better utilize the I/O pipe by eliminating the overhead associated with processing unneeded data in large batch requests.

(8)

4.4 Storport Driver Setting (For PCIe Direct Attached)

To support 4k sector drives or LUNs, you must install the Windows Server 2008 R2 hotfix described at http://support. microsoft.com/kb/982018. Please note that most of this fix is part of Windows Server 2008 R2 SP1, so this step is not necessary if SP1 is already installed.

It is also necessary to change the Violin Storport driver (PCIe direct) Registry entry to present a 4K sector drive, as follows:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\ViolinArray\Parameters\ Device]

“Support4KSector”=dword:00000001

Note: This registry entry must be entered before any LUNs are created. If done after, the LUNs will be inaccessible to Windows.

To find out more about Microsoft statements and 4k sector drives, see http://support.microsoft.com/kb/2510009. For more information about MS SQL Server 2008 and 4k sector drives, visit:

http://blogs.msdn.com/b/psssql/archive/2011/01/13/sql-server-new-drives-use-4k-sector-size.aspx

4.5 HBA Driver Settings (For Fibre Channel or iSCSI)

For architectures that utilize Violin arrays as SAN storage, HBA tuning is required. There are two main areas to address: multipathing and HBA Queue Depth. Achieving the best I/O throughput results requires maximizing the I/O pipe similar to optimizing network usage. i.e. an application wants to optimize the number of requests outstanding. That number can best be achieved by having as many paths available as possible and queues as deep as possible. Target numbers are 256 for combined sum of all HBA queues and setting the number of paths equal to the number of LUNs times the number of HBA’s • In the driver setting change the Queue Depth (default usually 32) to a more appropriate setting for your environment,

start at 128 and move up or down.

• Total Queue depth (sum) for all HBA’s is 256 per fabric.

• “LUNS number of paths”: As many paths as possible, ie, 1 path per HBA X #LUNS • For Direct Attached use “Point-To-Point” connection.

• The default MPIO policy on the Violin San Array Multipath Disk Device Properties is Round Robin, in some cases,

using Least Queue Depth produces better performance. For a given workload, Violin recommends testing to see

which gives the best results.

(9)

4.6 Microsoft SQL Server Tools

Windows Server 2008 R2 includes a tool called the Best Practice Analyzer in the Server Manager Console. The Best Practice Analyzer runs a series of tests against Active Directory roles (such as the Hyper-V role, the DNS role, and the Remote Desktop Services role) to assess whether the role has been installed and configured properly and to compare the installation with tested best practices.

Some of the results from the Best Practice Analyzer could tell an administrator they need to add more memory to a server, to move a role to a separate server to improve role optimization, or to shift a database to a different drive on the server to distribute disk performance demands on the system.

4.7 Log File Storage Performance

The larger the system, the greater the number of transactions and need for a cached log file for the database. The amount of data in the cached log file is proportional to the database.

With traditional storage systems, best practices dictated that the SQL database log be stored on the fastest storage possible. In an environment using Violin arrays, no special optimization is required if the SQL log is placed on the array, as the array provides the fastest possible storage.

If for some reason the log cannot be placed on the Violin array (e.g., auditing requirements, company policy, etc.) and must be placed on slower storage, the following steps outline how to enable the log at the OS level, and guarantee the OS will not flag the IO’s to bypass the cache:

1. Right-click My Computer and select Manage.

2. Click Device Manager and expand Disk Drives.

3. Locate the log disk(s) and right-click each device entry. If you are using Secure Path or some other multi-path disk software, the correct disk may be a virtual device.

4. For each entry, right-click, select Properties, and click the Policies tab. 5. Verify the Optimize for Performance radio button is selected.

(10)

4.8 Turning off Indexing for all Disks

Microsoft’s built-in indexing will hamper performance. To disable it, go to Disk Properties and uncheck Allow files on this drive to have contents indexed in addition to file properties.

5 SQL Server Recommendations

As with all best practice recommendations, customers should test the parameters with their application in a test environment before releasing it to production.

5.1 Sizing the Required Database Space

The old formula of sizing the required database space consisted of the actual size of data, index and logs, factoring in growth, size of each disk, latency of each disk, combined performance of a tray of disks, type of raid overhead and the desired performance. With flash-based arrays, these are not required as the low latency and high IOPs mitigates the performance limitations of traditional arrays. Sizing comes down to adding the actual size of data, index and log space, and growth.

5.2 SLC vs. MLC

Violin provides both Single Level Cell (SLC) and Multi-Level Cell (MLC) flash memory arrays. These two approaches differ in terms of density, read/write/erase speed, and longevity. SLC arrays have a lower capacity but faster write/erase speeds and better longevity, while MLC are slower and but provide a higher data density.

As a general rule, Violin recommends that customers place the database logs and TEMPDB files on SLC arrays, and placing the data files being place on an MLC device. Note that even a Violin MLC array offers many times the write performance of a traditional rotating disk array, so placing the files on the MLC array will yield great performance benefits compared to a magnetic storage environment. But separating them between the two types of flash will offer the greatest benefits and will make best use of the two technologies.

(11)

5.3 Number of Files

With a traditional storage environment, file placement was critical to mitigate unbalanced I/O loads and other problems that could create higher latencies and possibly overload one or more of the arrays. With Violin Memory Arrays, these issues are usually not present, so the # of files and placement is less critical.

However, there are considerations with SQL Server and the Windows NTFS file system that dictate having multiple files. The number of files and file placement really depends on the client’s environment, but as a general rule the best practice recommendations for a new environment are:

1. One data file per physical core (not hyper-thread core) with data access distributed evenly.

2. The best number of TEMPDB files vary with the application. Violin suggests starting with a number equal to ¼ to ½ the number of cores (between 4 and 8), and grow the number up to the total number of logical processors. The usage by SQL of the TEMPDB can be monitored with Perfmon.

3. All files should be pre-created to max size and created to disable automatic growth. This includes LOG and TEMPDB files. SQL backup, for example, will create an I/O stream for every file, adding parallelism to the backup/restore that is negated by having few files, no matter how fast the storage.

5.4 CPU Affinity

CPU/core affinity should be reviewed if any of these cases are true:

1. There are processes or applications other than SQL Server residing on the same server as SQL Server. 2. SQL Server is running inside of a virtual machine

3. There is more than one SQL Server running on the host server.

4. The server is of hard NUMA architecture. (4+ sockets or 2+ IOH’s. Refer to the host server’s manual to determine if it contains more than one IOH.)

If any of these situations apply then the system architect should review the affinity settings of each process/application.

5.5 Trace Flags and Startup Parameters

There are a handful of Trace Flags and Startup Parameters that can be applied that can improve performance but flags and parameters must be used with care because they can hurt performance as well as help depending on client application load (OLTP vs. DSS). There is many more trace flags that can be used for debugging purposes but the ones below are for performance. Also, be aware that these flags are global and must be done in the startup string and a SQL Server reboot but happen for them to take effect.

The Trace Flags are:

652 Disables read ahead for the server. Only use for pure OLTP applications, small random reads.

661 Disables the ghost record removal process. A ghost record is the result of a delete operation. When SQL Server deletes a record, the deleted record is kept as a ghost record. Later, the deleted record is purged by the ghost record removal process. When this process is disabled, the deleted record is not purged. Therefore, the space that the deleted record consumes is not freed. This behavior affects space consumption and the performance of scan operations. SCOPE: Global. If you turn off this trace flag, the ghost record removal process works correctly.

834 Causes SQL Server to use Windows large-page allocations for the memory that is allocated for the buffer pool. Good for DSS applications, and in general reduces TLB thrashing, as there are fewer pages in memory to track.

(12)

Startup Parameters are:

-E Increases the number of extents that are allocated for each file in a file group. This is good for DSS applications. -K[n] Increases the size of buckets in the Ring Buffer. [n] Is in megabytes.

To configure startup options:

1. In SQL Server Configuration Manager, click SQL Server Services.

2. In the right pane, right-click SQL Server (<instance_name>), and then click Properties.

3. On the Advanced tab, in the Startup Parameters box, type the parameters separated by semicolons (;).

4. Click OK.

5. Restart the Database Engine. Example:

<…>;-T 834;-E

5.6 Soft NUMA

With the proliferation of Non-Uniform Memory Architecture (NUMA) systems, SQL and Windows Server have been enhanced to recognize they are running on a NUMA enabled system, and take steps to maximize local memory accesses.

By default, Windows Server will interrogate the underlying hardware to determine the NUMA topology, if present. SQL Server (since SQL Server 2005) will in turn interrogate Windows Server, which will pass on the NUMA topology information. SQL will attempt to keep a thread’s code and data on the same NUMA node, minimizing intra-node memory access, with the associated higher latencies.

SQL Server has taken NUMA handling a step further, with two additional features regarding NUMA that allow more control and greater benefit from this feature:

1. Soft NUMA – This allows artificial NUMA nodes to be created in the Windows Registry. The Soft NUMA nodes must

be a subset of the hardware NUMA node, typically composed of all the cores in a given socket. When combined with connection affinity, the workload can be finely tuned to evenly divide the work through all cores in the system, or segment work between reporting and OLTP to different processors.

2. Connection Affinity – This feature gives the application the ability to connect to a specific NUMA or Soft NUMA node,

which will force the thread code and data stack to the same NUMA node, minimizing intra-node memory accesses.

5.7 Configuration Parameters

Configuration Parameters are Server and Database settings the determine SQL Server behavior. There are many parameters that have specific uses but there are a few parameters that have been identified as ones that should be considered and have specific benefits depending on the application type OTLP or DSS. These parameters should be tested before putting them into production.

5.7.1 Max Degree of Parallelism

Max degree of parallelism, also referred to as MaxDop, controls the number of processes a session can start, ie parallelism, for given task, such as Index Creation, Query, etc. The default value for this is 0 which causes SQL Server will use all available processors. This parameter is used along with the “cost threshold for parallelism” parameter. The optimizer will cost the query and if the cost is greater than or equal to the “cost threshold for parallelism”, the optimizer will then parallelize the query to a maximum number of threads determined by the MaxDop. This is a dynamic setting no restart required.

(13)

In OTLP type applications thus can consume CPU availability, causing long CPU waits. The correct setting can only be determined by experimentation, but Violin recommends it not be left at the default of 0. Start by setting MaxDop to 1, which

removes parallelism, so each task is limited to 1 CPU and thread. This can be used as a starting point, and then increased with testing, to find the best setting for a specific workload.

sp_configure ‘max degree of parallelism’,1 go

reconfigure go

5.7.2 Max Worker Threads

Thread pooling helps optimize performance when large numbers of clients are connected to the server. Usually, a separate operating system thread is created for each query request. However, with hundreds of connections to the server, using one thread per query request can consume large amounts of system resources. The max worker threads option enables SQL

Server to create a pool of worker threads to service a larger number of query request, which improves performance. Setting this to 0 allows SQL to set the default on SQL Server startup, which is good for most systems. The actual formula takes into

account the number or processors, workload, etc. Microsoft’s recommendations for this setting can be found at: http://msdn.microsoft.com/en-us/library/ms190219.aspx

sp_configure ‘Max Worker Threads’,n <go in increments of 1024> go

Restart SQL Server.

5.7.3 Lightweight Pooling

Lightweight pooling shifts some of the task-scheduling overhead from the OS kernel to a lighter overhead SQL function. This is also good for OLTP applications but has been known to cause issues with Distributed Transaction Coordinator (DTC). This is a dynamic option. This is useful in some specific, highly tuned OLTP applications with high transaction loads, but in general should only be used if the specific application has been well tested with it and it has shown to be of benefit.

sp_configure ‘LightWeight Pooling’,1 go

reconfigure go

5.7.4 High Priority

This option increases the OS processing priority for SQL Server process(s). This is only good if you have multiple applications running on the same server. It is dynamic.

sp_configure ‘High Priority’,1 go

reconfigure go

(14)

5.8 Maintenance

Every client should have a maintenance strategy of rebuilding indexes and statistics for a better performing system. Traditionally index rebuilds take a long time, with the speed of Violin Memory Arrays the strategy needs to be reviewed, because downtime and unavailability with be greatly reduced.

6 Utilizing New Features in SQL Server 2012

There are many new features in SQL Server 2012. A few are worth noting as they either reduce query or I/O volume when properly applied, or significantly improve the data integrity.

6.1 Columnstore Index

New in SQL Server 2012 is a feature called Columnstore Index. From SQL’s point of view, this is just another non-clustered index, and can be used as such for satisfying queries. The one exception with the Columnstore is that it is are read-only, unlike a normal non-clustered indices. This means it must be rebuilt periodically if the data they index is updated often. Instead of storing the key value with pointers to a given row as is normal (giving the normal data row orientation

precedence). Columnstore stores all the columns in a table together, usually in a compressed format. It appears as a non-clustered index on a table. The Query Optimizer automatically decides whether to use the Columnstore index or another index, based on the specific query.

Creation of Columnstore indexes takes a significant amount of memory, and SQL will limit the MAXDOP to 1 for tables with

less than a million rows.

The advantage of Columnstore is it can significantly reduce the disk I/O required to retrieve the data for a query. For example, if a row has 50 columns and a query only requests 5 (10 percent), then the I/O is reduced by 90 percent. The disadvantage is that the data must be recombined to be presented as the output of the query. In addition, queries that only select a few rows will be more efficient by accessing the row data than the Columstore data. For more information about Columnstore, visit: http://msdn.microsoft.com/en-us/library/gg492088.aspx

and

http://social.technet.microsoft.com/wiki/contents/articles/3540.sql-server-columnstore-index-faq-en-us.aspx

6.2 HADRON and Database Mirroring

Violin recommends our customers use a highly available architecture to ensure data integrity, and our products have many fault-tolerant and redundancy features to facilitate this. Many software technologies exist to facilitate this from various companies, including Database Mirroring in SQL Server (Introduced in SQL Server 2005) and Windows Clustering, a feature of the Windows Server OS. SQL Data base mirroring allows for data transactions to be updated into a mirror copy of the database, either synchronously or asynchronously. The mirror copy acts as a standby to go active if the primary server or database goes down. Optionally, there can be a Witness Server that monitors the state of the two servers, and facilitates the switchover. AlwaysON (HADRON), introduced in SQL 2012, is the logical extension of Database Mirroring. Many of the shortcomings of Database Mirroring have been addressed in HADRON:

Always On Availability Groups – Since most applications encompass multiple databases which must maintain logical

integrity, Availability Groups provides a way of failing over and managing multiple databases. Availability Groups also expands the secondary (mirror) databases from one to up to four databases, including two synchronous secondaries. Further, the secondaries can now be accessed readonly, allowing for them to be used in reporting, backup, effectively combining the processing power of the primary and secondary servers.

(15)

AlwaysOn Failover Cluster Instances – Integrating with Windows Server 2008 R2 Failover Clustering (required), this

supports multi-site clustering across subnet which can enable cross-datacenter failover of SQL Server instances. This makes failover predictable and fast, allowing for easier downtime for planned maintenance.

Improved Analytical Tools – Helps control when a failover should happen and eliminates false failovers. It also features

more real-time updates.

Improved Deployment and Management – Automates deployment with PowerShell cmdlets which provide scripting

options. Also, the AlwaysOn Dashboard provides visual GUI of the configuration and health of the SQL Mirroring cluster. Since the secondary is now available for read, and thus backup, Violin recommends SQL Server 2012 Hadron for our customers. Effectively, backups can now be done offline (on the secondary server) without affecting processing and performance on the primary. Coupled with the extreme performance of Violin memory arrays, this can help increase the uptime percentage for about any site.

For more information about cluster and database mirroring, visit: http://msdn.microsoft.com/en-us-library/ff877884.aspx

7 Performance Testing with SQLIO

The values in parenthesis are suggested starting values. The testing methodology should be to start low and work upwards until the transport or array capacity has been attainted. This will be achieved when the throughput stays the same but the latency goes up.

For the best tuning results, change the outstanding IO’s in increments of four (4, 8, 12, 16, etc). To allow for a better understanding of what is happening, only change one input at a time between tests. For most applications the number of threads, the number of outstanding IO’s and the batch size are all commutative. This means that a test with 16 threads and 8 outstanding IO’s will perform usually the same as a test with 8 threads and 16 outstanding IO’s. Thusly, establish a number of threads equal the number of cores of the host, a batch size representing a common workload and use the number of outstanding IO’s in order to find the peak throughput of the array. Start with 4 outstanding IO’s and work upwards until a peak has been found.

7.1 General Testing Procedures

1. The test file has been created on the volume residing upon the Violin array. 2. The test file is of sufficient size (20GB or larger is recommended).

3. Threads equal to the number of logical cores on the server, minimum of 8. (-t16)

4. Outstanding IO’s set to 4, then move up in increments of 4 until transport or array maximum is found (-o4). 5. Batch size set to 8k for transactional, 64k for reporting servers. (-b8)

6. Reads and writes, individually. (-kR or -kW) 7. Random IO. (-frandom)

8. Host caching turned off. (-BH)

9. Short read test durations. SQLIO will stabilize after a few seconds. (-s120, with -kR) 10. Long write test durations and pre & post write tests. (-s600, with -kW)

11. Content Indexing turned off on the test volume.

(16)

7.2 Write Cliff Avoidance Test

1. Create a file to fill the whole array (use all space)

2. Delete the file (mark all space as needing garbage collection)

3. Write a file of 75-80% the total size of the array (pre-load actual usage of space) 4. Run an all-write test on a large file (test garbage collection)

5. Re-run read tests (benchmark reports) for post-write-cliff numbers

6. Set affinity of the cmd.exe process to one node/socket if on a NUMA architecture (4+ sockets).

7. With the command window open, launch Task Manager and right click on the cmd.exe process. Choose “Set Affinity…”. Uncheck “All Processors”. Re-check all cores for one node/socket.

Do not use SQLIOsim to test SAN performance. SQLIOsim is a SAN stability testing application and as such its latency numbers are not valid indicators of SAN performance. SQLIO is the SAN performance-measuring tool.

(17)

About Violin Memory

Violin Memory is pioneering a new class of high-performance flash-based storage systems that are designed to bring storage performance in-line with high-speed applications, servers and networks. Violin Flash Memory Arrays are specifically designed at each level of the system architecture starting with memory and optimized through the array to leverage the inherent capabilities of flash memory and meet the sustained high-performance requirements of business critical applications, virtualized environments and Big Data solutions in enterprise data centers. Specifically designed for sustained performance with high reliability, Violin’s Flash Memory Arrays can scale to hundreds of terabytes and millions of IOPS with low, predictable latency. Founded in 2005, Violin Memory is headquartered in Mountain View, California.

References

Related documents

Version Database Server Web and Licence Servers Windows Server 2012 R2 Recommended Recommended Windows Server 2012. Windows Server 2008 R2 Windows

•Microsoft SQL Server 2008 Service Pack 3 (Windows only) •Microsoft SQL Server 2008 R2 Service Pack 2 (Windows only) •Microsoft SQL Server 2012 Service Pack 2 (Windows only)

• If 6PE is requested, the equipment must support &#34;Connecting IPv6 Islands over IPv4 MPLS Using IPv6 Provider Edge Routers (6PE)” [RFC4798] • If mobile IPv6 is requested,

Colloquium Lecture, Technical University Darmstadt, Germany, December 1985. Lecture: The Fixed Point Property for Products of Ordered Sets.. Series of 6 two-hour lectures in the

¾ Fifteen years of experience in the coordination, management, design, implementation and marketing of programs and projects for a university small business development center, and

Module 7: Implementing Security in Microsoft SQL Server 2012 This module describes how to use security enhancements in SQL Server

The database servers run Windows Server 2008 R2 Enterprise and Microsoft SQL Server 2008 Enterprise data management software, and SQL Server Reporting Services is used

The wholesale cloud service provider’s business model and platform architecture help to facilitate the customers’ ability to buy into cloud services provided by service