SQL Server Business Intelligence on HP ProLiant DL785 Server
By Ajay Goyal
Scalability Experts, Inc.
www.scalabilityexperts.com
Mike Fitzner Hewlett Packard www.hp.com
Recommendations presented in this document should be thoroughly tested before being implemented in a production environment. While recommendations are listed in an itemized format, they should not be considered standalone. All recommendations have dependencies and implementation of any recommendation in isolation can be detrimental to performance.
Information in this document is subject to change without notice. Complying, with all applicable copyright laws is the responsibility of the user. No part of this document may be reproduced or
transmitted in any form or by any means, electronic or mechanical, for any purpose, without the express written permission of Scalability Experts Inc. If, however, your only means of access is electronic,
permission to print one copy is hereby granted.
Scalability Experts Inc. may have patents, patent applications, trademarks, copyright, or other intellectual property rights covering subject matter in this document; except as expressly provided in any written license agreement from Scalability Experts Inc., the furnishing of this document does not give you any license to these patents, trademarks, copyright, or intellectual property.
© Copyright 2008 by Scalability Experts Inc. All rights reserved.
Microsoft BackOffice, MS-DOS, Windows, Windows NT, Windows 2008, SQL Server 2008 are either registered trademarks or trademarks of Microsoft corporation in the USA and /or other countries.
Other product and company names mentioned herein may be trademarks of their respective owners.
Contents
Introduction ... 4
Executive Summary ... 4
Software and data Specifications... 5
Fact Table Sizes ... 5
Server Specifications ... 6
System Diagram ... 6
SSAS Configuration settings used ... 6
SQL Server Configuration settings used ... 6
Cube Processing ... 7
Best Practices for Cube Processing ... 8
Upgrade to SQL Server 2008 Analysis Services ... 8
Optimizing data retrieval from Relational Source ... 8
Optimizing Aggregation and indexing performance ... 9
Conclusion ... 16
Introduction
SQL Server 2008 has enabled us to create Business Intelligence applications to large multi terabyte size data warehouses. Some of the new features and architecture improvements have been specifically targeted to scale SQL Server deployments on newer, larger servers offering performances not seen on previous versions of SQL Server.
HP ProLiant DL 785 G5 servers can scale up to 8 Sockets with Quad Core 8300 series Opteron processors from AMD with HyperTransport (HT), 512 GB of memory, 11 PCI-e I/O Slots or an optional I/O backplane with 7 PCI-e & 2 HTx I/O slots. This makes it an ideal server for data warehousing and consolidation.
This whitepaper discusses about the best practices learned during our tests to configure SQL Server 2008 Analysis Services on HP ProLiant DL785. These best practices help in maximum utilization of given server resources, improving the performance of cube processing and user queries.
From our tests, it was proved that SQL Server 2008 can scale very well on HP ProLiant DL785 servers and it is possible to run highly optimized and fast data warehousing solutions on HP ProLiant line of servers.
Executive Summary
SQL Server 2008 has seen improvements at the engine level to better utilize large servers and improve scalability. With consolidation being considered by many organizations, the HP ProLiant DL 785 server proved to be an ideal platform for hosting SQL Server 2008 relational engine and Analysis Services on the same server. This whitepaper discusses our findings and challenges from sharing the server between the two services and how we configured both, hardware and software, to get the best performance on a 1+ TB data warehouse.
The test database was a real data structure containing web traffic data. It was used to identify download trends, regions and other data that was used to optimize the website response times and auditing and was 1.26 TB in size while the SQL Server 2008 Analysis Services Cube was 443 GB. The cube consisted of four measure groups and we tested with two scenarios: one with 15 partitions per measure group, and another with 150 partitions per measure group.
With SQL Server 2008 and Analysis Services running on the same server, we could process 11 Billion rows in about 3:12 hrs. This gave us a throughput of about 1.15 million rows per second. At this rate, we can load 1 billion rows in about 17 minutes.
For our testing we started with SQL Server 2005 with default settings and compared it with SQL Server 2008 with similar settings. We could easily see the improved engine at work with 48% improvement in processing times.
Here is the summary of the performance improvements achieved during our tests.
Test Item Results
Upgrade to SQL Server 2008 48% improvement over SSAS 2005
Partition your cube 19% improvement with more, smaller partitions (increased from 15 partitions to 150 partitions)
PreAllocate memory for SSAS 12% improvement in SSAS 2008 compared to not Pre-allocating memory
AggregationMemoryLimitMin and
AggregationMemoryLimitMax
25% improvement over the default values
Throughput achieved 1.15 million rows per second
Software and data Specifications
Operating System Microsoft Windows Server 2008 Enterprise Edition Database SQL Server 2008 SP1
SQL Server 2008 Analysis Services SP1 Data size 1.26 Terabyte
Cube Size 443 GB
Measure Groups 4 (with 15 and 150 partitions each) Max Memory Set SQL Server 2008 – 60 GB
Analysis Services – 54 GB Processor Affinity SQL Server 2008 – All Processors
Analysis Services – All Processors
Fact Table Sizes
Fact Table Number of Rows (Billions)
Table Size (GB)
FactDownloadEvent 5.0 621
FactInstallEvent 3.6 450
FactDetectionEvent 2.4 222
FactSiteErrorEvent 0.2 8
11.2 1,301
Server Specifications
Server HP ProLiant DL 785 G5
Processor 8 x Quad Core AMD 8300 series Opteron 2.9 GHz processors Memory 128 GB
Storage 48 disks on SAS controller – 15K RPM RAID 10
System Diagram
Windows Server 2008 Enterprise Edition SQL Server 2008 Enterprise Edition SQL Server 2008 Analysis Services 8 x Quad Core AMD
Opteron 2.9 Ghz 128 GB Memory
8 x P800 SAS Controllers 8 x P800 SAS
Controllers
MSA 70 24 Disks 15K RPM RAID 10
MSA 70 24 Disks 15K RPM RAID 10 Sequential Throughput
9 Gb/s HP ProLiant DL 785 G5
SSAS Configuration settings used
Parameter Optimized
Value
Original Value
CoordinatorExecutionMode -8 -4
Memory\LowMemoryLimit 70 75
OLAP\Process\AggregationMemoryLimitMin 1 10 OLAP\Process\AggregationMemoryLimitMax 5 80 OLAP\Process\DatabaseConnectionPoolMax 64 50 ThreadPool\Process\MaxThreads 320 64
PreAllocate 20 0
SQL Server Configuration settings used
Parameter Optimized Value
Original Value
MaxDOP 8 0
MaxMem 64 GB All
MaxWorkerThreads 960 0 CPU Affinity All All
Note: Setting CPU Affinity did not affect the performance of cube processing.
Cube Processing
For this whitepaper we focus on improving cube processing times. Best practices for cube processing involve optimizing the cube design, Analysis Services settings and hardware configuration. Since many companies already have existing cubes, we wanted to minimize changes to the test cube while optimizing the hardware and Analysis Services configuration to maximize cube processing on HP ProLiant DL 785 server.
Cube processing involves three main stages:
1. Reading data from the relational source 2. Building aggregations and indexes 3. Writing data to the files.
The cube was configured as MOLAP storage to optimize query performance. Because we wanted to keep cube changes to the minimum, we tested two versions of the cube altering only the number of partitions:
Cube v1 Cube v2
Measure Groups 4 4
Partitions per Measure group 15 150 Total partitions in cube 60 600 Partition Size (largest) 28 GB 2.87 GB
Best Practices for Cube Processing
These are some of the best practices that were learned from the tests.
Upgrade to SQL Server 2008 Analysis Services
When processing the cubes on both SQL Server 2005 and SQL Server 2008, we saw an
immediate improvement in processing times without any changes to the cubes. We configured both versions similarly.
Processing Time One Measure Group
Improvement SQL Server 2005 2:37:07
SQL Server 2008 1:22:26 48%
SQL Server 2008 Analysis Services processing engine has been much improved for parallelism and memory allocations.
Optimizing data retrieval from Relational Source
Performance of cube processing directly depends on how fast Analysis Services can retrieve data from the relational source. When the relational source and Analysis Services are on different servers, network connectivity and traffic play an important role in determining the throughput of data retrieval. An advantage of consolidation is that since both services are running on the same operating system, Analysis services can use shared memory access to connect to SQL Server, providing for best throughput available. This is only limited by Physical I/O and CPU availability.
o Optimize I/O response time
Optimize I/O Performance by spreading SQL Server database and cube file on multiple spindles. We had the Cube spread out on a LUN with 20 disks configured with RAID 1+0.
While RAID 5 provides better space utilization, RAID 1+0 is a better option as we need to write large amounts of data while processing cube. RAID 1+0 ensures better response time than RAID 5, while also providing better fault tolerance.
o Optimize source SQL queries
Dimension processing can create a large number of queries to the source relational table based on the number of attributes for each dimension. Dimension tables are not as frequently updated as fact tables and so we can afford to have more indexes on these tables without affecting inserts or updates. To optimize these queries, use the Database Engine Tuning Advisor on a Profiler trace of a Dimension query. Target long-running queries for large data warehouses and add every suggested index to the table. This can improve performance substantially if a dimension has many millions of members.
Remember not to use this approach for Fact tables as it will adversely affect the ETL bulk load performance on the Fact tables.
Optimizing Aggregation and indexing performance
On a cube with multiple partitions, Analysis services starts aggregating and indexing data sooner if it has enough data on hand for some partitions. The number of partitions Analysis Services processes in parallel depends on the number of logical processors available to the operating system. It is also possible to override this and manually specify the maximum number of tasks Analysis services should perform in parallel.
The default settings present when Analysis Services is installed is sufficient for small to medium sized data warehouses and when Analysis Services is the only service running on the server. In a consolidated environment like used in this test and large data warehouses, Analysis Services may create more threads or use memory in a manner that can negatively affect the overall performance. To prevent this, some of these settings need to be tweaked for maximum performance.
This section discusses some of the settings that showed substantial improvement in cube processing times.
o Use Partitions
Partitions are required to introduce parallelism on a server like HP ProLiant DL785.
Having multiple partitions for each measure group ensures that Analysis Services can process multiple partitions in parallel. Analysis Services creates multiple threads based on the number of partitions and processes each partition separately, independent of each other. As it processes multiple partitions, the processing may be at different stages, aggregating on one partition, while creating indexes on other.
Keep partitions to fewer than a few thousand
Having too many partitions will increase the cost of metadata operations. Aim to have less than 2,0001 partitions for a measure group. At the same time, avoid designing partitions that contain fewer than two million rows. On a large data warehouse, fact tables can have 1+ billion rows, and having a small partition can result in too many partitions being created
Process partitions in parallel
To get the most out of the hardware, process multiple partitions in parallel. When processing the cube, avoid using the default Let the server decide setting for
parallelism. Since the server is shared between SQL Server and Analysis Services, the default setting will result in too many threads that result in processor contention.
1 Analysis Services Query Performance Top 10 Best Practices - http://technet.microsoft.com/en- us/library/cc966527.aspx
The recommended setting in other literature is twice the number of processors in the server. For example, if the server has four dual-core processors installed, set the number of parallel tasks to 16. This works great on a server that is dedicated to Analysis Services.
However on a consolidated server, the processors are shared between SQL Server and Analysis Services. In such a case, using the recommended value can create too many threads, causing excessive context switching and affecting the performance.
Our test results are shown below.
Max Parallel Tasks Processing Time Improvement
32 3:13:41 10%
64 3:35:38 57%
Let the Server Decide 8:19:48
Figure 1: Setting correct MaxParallel ensures reduced context switching, improving performance
With MaxParallel=Let the server decide setting, we see that context switching drops after sometime. This is because SQL Server has completed sending data to Analysis Services, and with Analysis Services being the only active service, context switching drops.
Have enough physical memory installed to prevent temp file usage
If the server does not have enough physical memory installed for processing aggregations, SSAS will have to use temp files for processing. This can increase the processing time.
Partition Sizing
The general recommendation is to target 20 million rows or 250 MB per partition.
With a large data warehouse, this will result in too many partitions. Experiment with partition sizes between 500 MB to 1 GB. For even larger data warehouses, aim at 3 GB per partition. Spread partitions on multiple spindles.
When using SAN for storage, ensure each LUN is spread across several disks. This ensures maximum IOPS (I/Os per second) available for performing queries.
Aggregation considerations for multiple partitions
A large data warehouse typically contains historical records for a long period of time. Measure groups when partitioned on time, are a good candidates for having different aggregation design for different partitions of the same measure group. For example, as new data is loaded into newer partitions, they are the most heavily queried partitions. You can define more aggregations on these new partitions while decreasing the number of aggregations on older partitions. This saves disk space and keeps the size of the cube smaller.
Use Query Binding and partition slicing
Create a view on the fact table and use query binding in each cube partition with nolock hint. Ensure that the query defined for each partition is optimized for best execution plan. As a rule, it is a best practice to ensure the underlying source table is also partitioned to gain optimal performance while retrieving data for cube
performance. In addition, always specify the Partition slice in the properties of each partition. Analysis services may not always get it right resulting in expensive scans on partitions not containing the required data.
By increasing the number of partitions from 15 to 150, we saw a 19% improvement in processing time. This is because Analysis services would now process even more partitions in parallel, improving the processing time. As shown in Figure 2, we see that the increased number of partitions allowed better memory management and parallelization of partition processing. Be sure to set the Preallocate value to ensure that Analysis Services has the required memory to handle multiple partitions. See section Preallocate memory for more information on Preallocating memory.
At 150 partitions, our partition sizes ranged from 1-3GB in size with 1 million - 33 million records each depending on the measure group.
Figure 2: More partitions enable parallelism
o Preallocate memory
Preallocate memory setting can be found in the msmdsrv.ini file. It is used to reserve memory for Analysis Services when the service is started. A value of 20 indicates that 20% of the system memory will be pre-allocated to Analysis Services that will not be available to other processes. Analysis Service allocates and reserves this memory at the start of the service.
For servers where Analysis Services is used along with other services, using Preallocate can provide for more stable memory configuration. It has most impact on Windows Server 2003 where processing time was reduced by up to 50% in testing. Microsoft has published that Windows Server 2008 has better memory management making the use of Preallocate optional.
Our tests have shown that even with improved memory management in Windows Server 2008, using Preallocate is beneficial to processing times.
Preallocate Value Processing Time Improvement
20 3:13:48 12%
0 3:39:24
Figure 3: Preallocate prevents frequent memory allocations, reducing overheads and improving performance
For more information on Preallocate setting, refer to this technical note:
Running Microsoft SQL Server 2008 Analysis Services on Windows Server 2008 vs.
Windows Server 2003 and Memory Preallocation: Lessons Learned
Figure 4: More time to read data can be due to stress on SQL Server (time is for a single measure group)
In the above chart, we see that having more partitions resulted in more time taken to read the data. This can be because the source server is stressed and taking more time to return the results. However, since Analysis Service is now processing more aggregations and indexes in parallel, the overall processing time is reduced.
This can be seen in the chart below where Analysis Services has more data to work in parallel from the beginning, resulting in shorter overall processing time.
Figure 5: More partitions enable SSAS to start working on partitions quickly
We should also note that we limited overall design changes choosing to test
hardware upgrade results without re-designing the application as well. That said, we did re-partition the cube from the 15 to 150 partitions but we did not change the partition scheme of the relational database. It is likely that this is the contributing factor to taking longer to read the data.
OLAP\Process\AggregationMemoryLimitMax
AggregationMemoryLimitMax is a server property that controls the maximum amount of memory that can be used for SSAS aggregation processing. This value is expressed as a percentage of total memory if the value is less than 100, or an absolute value of bytes if the value is greater than 100. The default value is 80, which indicates that a maximum of 80% of the total physical memory can be used for SSAS aggregation processing.
We initially started with AggregationMemoryLimitMax=80. With Preallocate set to 20, we saw that aggregations were quickly using excessive memory, slowing down other tasks in the processing.
Reducing AggregationMemoryLimitMax to 5, we now saw large temp file activity as Analysis services did not have enough memory available to hold aggregations in memory. Even with temp file activity, we saw a large drop in processing time because other processes were now not starved of memory and resulted in more evenly
distributed workload with an improvement of 25% processing time. This is shown in the next two charts.
Figure 6: More consistent load resulting in faster processing times
Figure 7: Aggregation values very high at the beginning, slowing other tasks
Monitor the Proc Aggregations\Temp file rows written/sec performance monitor counter to determine temp file usage during processing. AggregationMemoryLimitMax value needs to be set based on the number of records in each partition. Setting this
value too low may result in temp file space getting too large. In that case try increasing this value incrementally until Proc Aggregations\Temp file rows written/sec shows a more reasonable value. If you do increase the value of AggregationMemoryLimitMax , also increase the value of Preallocate to ensure that other tasks do not starve for memory.
When reducing the size of AggregationMemoryLimitMax, care should be taken with the size of each partition. With 15 partitions per measure group, we had each partition at about 28 GB in size. The processing failed because now Analysis Services did not have enough memory to process each large partition. Follow the best practices discussed above on partition sizing.
Conclusion
The tests showed that Microsoft SQL Server 2008 Enterprise Edition running on HP ProLiant DL 785 G5 provides a scalable, high performance solution for deploying large scale BI solutions. SQL Server 2008 scales effectively to use multiple processors installed on the server, providing high throughput while cube processing. Consolidating SQL Server 2008 and Analysis Services also requires managing memory allocation to prevent internal contentions. Cube processing, being a very resource intensive operation, is also a critical process for many organizations that have very short processing windows or are processing cubes multiple times a day. Cube processing performance also proves to be a good stress test for the server.
HP ProLiant DL785 was also able to prove itself as an ideal platform for consolidating both, SQL Server and Analysis Services on the same server. HP storage that we used provided high throughput rates, providing for good disk I/O performance.
While the best practices discussed above are not exhaustive, they do provide substantial performance improvements, while keeping changes to the cube design at the minimum.
About HP
HP, the world’s largest technology company, simplifies the technology experience for consumers and businesses with a portfolio that spans printing, personal computing, software, services and IT
infrastructure. More information about HP (NYSE: HPQ) is available at http://www.hp.com/.
About Microsoft
Founded in 1975, Microsoft (Nasdaq “MSFT”) is the worldwide leader in software, services and solutions that help people and businesses realize their full potential.
About Scalability Experts
Scalability Experts is a leader in data management and business intelligence for complex data platforms.
With recognized expertise and proven methodologies, Scalability Experts delivers solutions-based services, consulting and training that help mid-to-large-scale enterprises drive business performance with improved decision making and operational efficiencies.