• No results found

Over the past several years SAS has begun to invest heavily in Web technologies to help deliver SAS Intelligence to a larger audience. Products such as SAS Internet and various solutions have helped deliver application results to the web. This effort will only continue as SAS rolls out more capabilities into its base software and solutions over the next year with the release of SAS 9.1.

SAS solutions are starting to be deployed in multi-tier environments (server, middle tier, client, etc.). There are incredible scalability advantages and flexibility advantages to this architecture, but it can make system tuning more difficult. Finding performance issues will require looking at all the different layers (SAS backend server, Application Server Middle Tier, Client and even the network itself).

SAS products like App Dev Studio are provided to help build web based applications that leverage this multi-tier architecture.

Tuning for this particular environment is relatively new to SAS and much more difficult because there are now more than one system you need to monitor and gather information from. There are many publicly available books on multi-tier application environments that were released during the .COM wars of the late 1990’s. There will also be more whitepapers and publications coming out from SAS in the near future as SAS version 9.1 begins to roll out late this year.

12. Summary

Hopefully this paper will give you some clues as to how to tackle your new implementation of SAS on Sun or help give you clues to help your existing environment. This paper doesn’t have all the answers, but it is composed from over 5+ years of SAS and Sun relationship efforts based largely out of experience gathered in the SAS Customer Technology Center in Cary, NC. If this paper is helpful, please let us know. If it isn’t, let us know that too. We are always looking for new ways to help out joint customers.

Appendix A: Server Sizing / Configuration Questions:

These questions are designed to generate dialog that will help determine the size and configuration of the server needed for running SAS. It greatly helps sizing discussions, if the customers could have the answers to these questions before the conference calls.

I] Determine the size of the data files that will be analyzed:

• What is the data store for your data? (Data base or SAS data files?) ________________________________ • What is the total size of the data? (Disk space in MB, GB, TB? ) ________________________________ • Will analysis be done on one large file –or- multiple files? ________________________________ • What are the sizes of the files being used? ________________________________ (If unknown, How many rows/records? How many columns/variables?) ________________________________ (What is the record length of the file?) ________________________________ • What is the expected growth of the data involved? ________________________________

II] SAS usage:

• What SAS products are installed -or- being considered? _________________________________ • What types of processing will they be doing with SAS? _________________________________

(HOLAP?, Query & Reporting?, Decision Support?, ….)

• How many steps are in a typical SAS job? _________________________________ • Will data manipulation be followed by data analysis &/or reporting? _________________________________ • Will data be extracted from one large file or will smaller files be __________________________________ combined together to produce reports? __________________________________ • Where will the data for the applications reside? _________________________________ • How much historical data needs to be stored? __________________________________ • What is the total storage requirement for the system? __________________________________

(Database size; SAS work area; User work area; etc…)

III] Determine the Server utilization:

• Will other applications be running on the server? _________________________________ • How many SAS users are there? _________________________________ • How many concurrent SAS sessions? ( # of sas users * # of sessions) _________________________________ • How will the client machines access the Server? _________________________________

(via client/server SAS sessions -or- via an X-emulation session) _________________________________ • What is the expected growth of the user community? _________________________________

IV] Determine the Server configuration:

• What Hardware platform do you have or are you considering? _________________________________ (if so, what is the existing configuration?)

• Model? _________________________________ • Number of Processors? _________________________________ • Processor Speed? _________________________________ • Amount of physical RAM and swap file size? _________________________________ • Network Media and speed? (i.e. Ethernet, Fiber, Token Ring) _________________________________ • I/O subsystem layout? ( # of Disks, # of controllers…) _________________________________ • Are there performance problems with the server? _________________________________ • How are the file systems created on the disks? _________________________________

If customer is looking into a specific solution, complete appropriate area below:

Data Warehousing:

Provide a brief overview of the current DW system:

• Overall data warehouse architecture for database ___________________________________ creation and query? ___________________________________ • Describe Data content and Data preparation for SAS/DW ___________________________________ • Will data be loaded into Oracle OLAP data structure, SAS ____________________________________

database or flat files for access by the users? ____________________________________ • Are the Databases read only access or update access? ____________________________________ • Will SAS users access shared data on the DW server or ____________________________________ on a Data Mart server? ____________________________________ • Current system availability and contention? ____________________________________ (Server & Storage Array) ____________________________________ • Provide run times for the longest running data prep. jobs? ___________________________________ • Timeliness of data and data accuracy? ____________________________________

Data Mining:

• How many Enterprise Miner projects will be running at any _____________________________________ given time? ____________________________________ • On the average, how many per day? ____________________________________ • Is there a peak workload period? (Week, Month, Quarter?) _____________________________________ • How many analysts will be executing EM activities? ____________________________________ • What volume of data will the EM projects be accessing? ____________________________________ (Number of records, size of each record, total file size?) _____________________________________ • How many variables will be used for analysis? _____________________________________ • How many unique values for each variable used? _____________________________________ • Will you be running data preparation/cleansing jobs during _____________________________________ data mining executions? _____________________________________ • Is there a need to store both active & inactive EM projects? _____________________________________ How many of each? _____________________________________ • What is the expected response time of the users? _____________________________________

Web-based Applications:

Provide a brief overview of the internet architecture, applications and the SAS/Internet products that will be used to deliver the application.

• Report Distribution, Application Distribution, Thin-Client, ______________________________________

HTML, SQL? ______________________________________ • Will the users select and generate reports dynamically? ______________________________________

• Will they be doing Graphs or Static report viewing? ______________________________________ • What data store will the dynamic reports be run against? ______________________________________ • What is the number of reports involved? Size of reports? ______________________________________ • How many users are involved? ______________________________________ • Are there peak processing times? ______________________________________ • Will you using SAS/Share? ______________________________________ • Will the same server host both the Web processing and the ______________________________________

Appendix B: SAS FULLSTIMER OPTION

This option is key to gathering performance data from the SAS application perspective. When enabled, the SAS log will contain detailed timing and memory usage on a PROC by PROC (as well as data step) basis. For applications run in batch or background mode, the timing totals will be printed in summarized form. An example for the PHREG proc:

NOTE: PROCEDURE PHREG used:

time: memory:

real 3:06.380 page faults 0 user cpu 2:34.728 page reclaims 0 system cpu 27.287 seconds usage 30.99 M block I/O operations: context switches:

input 1 voluntary 521 output 75 involuntary 2036

This option can either be invoked from within a SAS program:

options fullstimer;

or it can be added on the command line during program invocation:

$ <SAS_INSTALL_DIR>/sas -fullstimer myprog.sas

The items of most interest are the time values and the amount of memory used. This information is obtained from the library call getrusage(3C). The man page (man getrusage) can provide all the gory details of what the fields exactly mean. A brief explanation of real, user and system times:

• Real time represents wall clock time.

• User time is the CPU time spent executing user or application code.

• System time is the CPU time spent in the kernel performing system functions on behalf of the user application (for example system I/O).

Differences in time between real time and user + system can be attributed to any one or all of: • CPU contention - time spent waiting for a CPU time slice

• Paging or swapping • I/O contention • Network

• Waiting for a lock • Other running processes

Applications which have very close real and user times are typically CPU intensive. In these cases, performance can be increased by using faster CPUs. Many times, optimizing the application can realize performance gains as well.

If you are seeing a large differential between the wall clock time and user + system, there is potential for performance improvement. The critical next step would be to identify which factors are the root cause or causes for the difference. These factors are often a combination of a hardware configuration limitation and/or application inefficiency.

FULLSTIMER will also report the voluntary and involuntary context switches. These fields are documented in the ru_nvcsw and ru_nivcsw section of the getrusage(3C) man page. Voluntary switches usually represent wait states on a resource so high numbers in this field are not necessarily bad. However, a high number of

Appendix C: System Monitoring and Data Gathering Tools

How to determine the version of SAS you are running:

$ <sasinstall dir>/sas –nodms

NOTE: Copyright (c) 1999-2001 by SAS Institute Inc., Cary, NC, USA. NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0)

Licensed to SAS INSTITUTE EMPLOYEE USE SOFTWARE, Site 0000800045. NOTE: This session is executing on the SunOS 5.8 platform.

<cntrl-D> to exit.

To determine the version of Solaris: $ cat /etc/release

Solaris 2.6 s297s_smccServer_37cshwp SPARC

Copyright 1996 Sun Microsystems, Inc. All Rights Reserved. Manufactured in the USA 18 July 1997

Or

$ uname –a

SunOS rayserv1 5.8 Generic_108528-15 sun4u sparc SUNW,Sun-Fire-280R $

To list the swap configuration:

$ /usr/sbin/swap -l

swapfile dev swaplo blocks free /dev/dsk/c1t10d0s1 32,73 16 1048784 1002016

Blocks are 512 KB so there is ~500 MB of SWAP configured above.

There is no easy way for users to determine the storage platform and layout from command line options so this information must be provided by the systems administrator. The

command: $ df -k

will show the mount points; this can be used to verify that WORK or data areas are physically located on the system and are not NFS storage areas.

The hardware configuration can be determined in several ways, either with a combination of prtconf(1M) and dmesg(1M) or with prtdiag.

$ /usr/sbin/prtconf -v | more

System Configuration: Sun Microsystems sun4u Memory size: 1536 Megabytes

System Peripherals (Software Nodes): SUNW,Ultra-Enterprise

truss (included in Sun Solaris) - A tool often to trace a process and display all its system

calls. There have been countless times where we used truss to debug generic problems (i.e.; a process may fail because the user doesn't have write permission in a temp directory. In this case, the open system call would be shown and would have failed with the resulting

errno(3C). The features of truss (1M) since Solaris 7 and higher have been expanded to include library calls so you can follow the calling sequence through libc, libthread, etc.

prstat (included in Sun Solaris) - The prstat utility iteratively examines all active processes

on the system and reports statistics based on the selected output mode and sort order. The user can monitor memory utilization, CPU, numbers of threads and other useful process statistics.

vmstat (included in Sun Solaris) – This system utility allows the user to monitor various

system statistics from overall CPU utilization, paging and I/O activity.

mpstat (included in Sun Solaris) – This system utility allows the user to monitor CPU

activity on a CPU by CPU basis. Example output below was captured on a 2 CPU Sun 280R server running at SAS. Note this machine was inactive (cpu idle = idl = 93%)

rayserv1% mpstat 10

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 2 0 330 661 558 466 22 20 12 0 777 5 1 2 93 1 2 0 23 311 283 484 19 20 12 0 1107 3 1 2 93 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 0 0 329 641 537 213 34 19 11 0 114 59 0 0 40 1 0 0 4 328 271 275 31 18 13 0 163 40 0 0 59 rayserv1%

iostat (included in Sun Solaris) – Is a system utility that can be used to monitor i/o activity at

a disk and virtual disk level. It is useful in identifying I/O hot spots. The example below was run on Sun server with several I/O controllers, disks and virtual disks. Note that it even shows overall CPU utilizations as well:

rayserv1% iostat -xczn 10 cpu

us sy wt id 4 1 2 93

extended device statistics

r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.5 0.0 6.1 0.0 0.0 0.0 6.2 0 0 d0 3.0 16.1 40.7 105.7 0.0 0.1 0.0 6.6 0 2 d1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t6d0 0.0 0.2 0.0 1.6 0.0 0.0 0.0 5.8 0 0 c3t1d0 0.0 0.2 0.0 1.6 0.0 0.0 0.0 5.5 0 0 c3t2d0 0.0 0.1 0.0 1.4 0.0 0.0 0.0 6.0 0 0 c3t3d0 0.0 0.2 0.0 1.5 0.0 0.0 0.0 7.9 0 0 c3t6d0 0.7 3.2 6.8 17.3 0.0 0.0 0.0 6.0 0 1 c4t0d0 0.8 3.2 6.8 17.6 0.0 0.0 0.0 6.0 0 1 c4t1d0 0.7 3.2 6.7 17.6 0.0 0.0 0.0 6.0 0 1 c4t2d0 0.8 3.2 6.8 17.4 0.0 0.0 0.0 6.0 0 1 c5t0d0 0.8 3.2 6.8 17.6 0.0 0.0 0.0 6.0 0 1 c5t1d0 0.8 3.3 6.9 18.1 0.0 0.0 0.0 6.1 0 1 c5t2d0 0.1 0.7 2.7 9.3 0.0 0.0 1.2 21.1 0 0 c1t0d0 0.1 0.2 2.1 2.3 0.0 0.0 3.2 8.4 0 0 c1t1d0 cpu us sy wt id 50 0 0 50

extended device statistics

r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.0 0.1 0.0 0.2 0.0 0.0 0.0 6.2 0 0 c1t0d0

iostat-S (available upon request) – this utility is an enhancement of basic iostat. The most

important difference is that iostat-S shows TOTAL read and writes for the entire system. It also shows logical reads and allows the user to determine if the Solaris Buffer Cache is helping answer disk read requests.

rayserv1% iostat-S

Agg. Agg. Read Write Logical Serv Busy CPU(2) Disks Time rIO/s wIO/s KB/s KB/s rKB/s mSecs % user/sys bsy/>50/tot Mx 13:50:33 0.1 6.1 0.9 46.7 0.1 5.3 0 50/0 1/0/15 ssd0* 13:50:43 0.0 0.2 0.0 1.0 0.1 0.9 0 50/0 1/0/15 ssd0 13:50:53 0.0 0.2 0.0 1.6 0.0 1.2 0 50/0 2/0/15 sd46 13:51:03 0.0 1.6 0.0 12.8 0.6 0.8 0 50/0 1/0/15 ssd0 rayserv1%

memtool (downloadable) – This set of memory utilities gives the user various ways to look

at memory utilization on a process or over all system level. One of the most commonly used tools in this toolkit is prtmem (examples earlier in the paper), which shows overall memory usage and the amount of free RAM that the Solaris buffer cache is taking advantage of.

ftp://playground.sun.com/pub/memtool/ (anonymous FTP).

top (downloadable from www.sunfreeware.com for Sun Solaris 8) – This is an older but popular tool similar to prstat that is now bundled in the Solaris Operating System. Top gives similar output, but also would show overall memory and CPU utilization on the same report to give the user a wider view of the system activity.

Appendix D: Sun Fire 6800 I/O Scalability Tests

In January, 2003 we borrowed some time on a Sun Fire 6800 with 24 1050 MHz processors, 96 GB of RAM and 16 T3 StorEdge devices. The goal of our tests was to take a pre-release version of SAS 9.1 and run it on a large Sun system with the Sun Solaris 9 12/02 Operating System. We were particularly curious about testing overall I/O throughput and how the Solaris Buffer Cache would effect a system under heavy load. The following graphs are results from one of our tests. We wanted to show you how we use some of the strategies in this paper to run tests that will help us enhance future versions of SAS and learn more about how to tune SAS running on Sun. The Data below was gathered via the SAS FULLSTIMER and the Sun iostast-S command. Test overview:

• sort a large dataset on a standalone file system (84 variables, 47 million observations) • run 1 to 16 simultaneous versions of this sort

• add a new storage device and controller as your increase the number of simultaneous runs (increase I/O capability as you increase the workload)

Conclusions:

• Achieved excellent scalability and total I/O throughput for the system (this data will be very useful for future system sizing)

• Provided SAS and Sun R&D with some useful data on where to look for internal bottlenecks as the system reaches the 14 I/O controller & 14 Job run level

Average bigsort.sas Run Time

Related documents