CIT 668: System Architecture
Topics
1. What is performance testing? 2. Performance-testing activities 3. UNIX monitoring tools
What is performance testing?
Performance testing is a type of testing intended to determine the responsiveness, throughput, reliability, and/or scalability of a system under a given workload.
- http://perftestingguide.codeplex.com/
Performance testing goals:
– Assess production readiness
– Evaluate against performance criteria
– Compare performance characteristics of multiple systems
or system configurations
– Find the source of performance problems – Support system tuning
Performance Testing Activities
Testing Types
Performance testing: determining performance, scalability, or stability characteristics of system; a superset of the other testing types.
Load testing: determining performance
characteristics of system when subjected to work load expected during production.
Stress testing: determining performance
characteristics of system when subjected to work loads beyond those expected during production to determine under what conditions system will fail.
Baselines
A baseline is a set of data used for comparison.
In performance testing, baselines are used to evaluate the effectiveness of subsequent
performance-improving changes to the system. Once the system has been changed, a new
Benchmarking
Benchmarking is the process of measuring
system performance using standard tests and comparing it against a well known system.
SPEC CPU2006 (SPECint, SPECfp) SPEC power2008 (power usage) SPEC sfs2008 (NFS, CIFS)
SPEC virt2010 (virtualization) SPEC web2005 (PHP or JSP)
BogoMips Dhrystone Whetstone
Weighted TeraFLOPS
Experimenter Effect
Monitoring the system affects performance.
Monitoring tools use system resources.
If you’ve consistently monitored system, then monitoring won’t alter system performance.
Identify Bottlenecks
Identify which aspect of performance
Latency: delay until initial access.
Throughput: rate of transfer/processing.
Identify which system component
CPU
Memory Disk
Performance Problem Solutions
1. Get more of needed resource.
Ex: Upgrade processor, use striped disk array. 2. Reduce system requirements.
Ex: Kill processes, move services to other hosts. 3. Eliminate inefficiency and waste.
Ex: Produce a static home page every 15 minutes instead of regenerating each access.
4. Ration resource usage.
Ex: Set process priorities with renice.
Performance Testing Services
• Gomez • Keynote • Pingdom • SiteUptime • Alertra
Activities
Activity Input Output
Identify test environment Production system
architecture Comparison of test and production environments Test system architecture Environment concerns Available tools Are other tools needed?
ID acceptance criteria Client expectations Success criteria
Risks to be mitigated Performance goals and requirements
Plan and design tests Available system features
and components Test data to implement tests
Use cases Use models to be simulated Success criteria Resources required
Configure test environment Tools Configured load generation and resource monitoring tools
Activities
Activity Input Output
Implement test design Configured tools Validated, executable tests Prepared environment Validated resource
monitoring
Available tools Validated data collection
Execute tests Test execution plan Test results Configured tools
Executable tests
Analyze Results, Report,
and Retest Test results Results analysis
Acceptance criteria Recommendations Risks, concerns, and issues Reports
Web Load Tools
• ab (Apache Bench) • httperf
• autobench (httperf multihost wrapper) • JMeter
• openload • SIEGE
Metric Collection and Notification Tools
• Ganglia • Cacti • Nagios • Zabbix
• Hyperic HQ • Munin
• ZenOSS • OpenNMS • GroundWork • Monit
Monitoring Processes
uptime
Provides aggregate data about system load. ps
Shows running processes with CPU, mem usage. top
Updated list of running processes + summaries. vmstat
Uptime
Uptime provides the following data
How long system has been running. Number of users logged in.
Average number of runnable processes. In last 1, 5, 15 minutes.
Want a load average under 3.
Uptime example
> uptime
17:40 up 126 days, 8:03, 6 users, load average: 1.40, 1.03, 0.55
vmstat
• Number of Runnable and Blocked processes. • Memory (virtual, free, buffered, cached)
• Blocks/second transferred in (bi) and out (bo) • Interrupts/sec (in) and context switches/sec (cs) • CPU usage by user, system, idle, and waiting.
> vmstat 5 4
procs ---memory--- ---swap-- ---io---- --system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 395716 45176 211284 88480 0 0 1 2 1 2 9 3 88 0 0 0 395716 45168 211300 88480 0 0 0 50 1035 1677 0 0 100 0 0 0 395716 45168 211300 88480 0 0 0 0 1040 1670 0 0 99 0 0 0 395716 45168 211300 88480 0 0 0 0 1033 1660 0 0 100 0
Identifying CPU Shortages
1. Short-term CPU spikes are normal.
2. Consistently high number of runnable processes (r) in vmstat.
3. Consistent high total CPU usage (sy+us).
4. High system time compared to user time and high context switches indicates system is
thrashing between processes instead of doing user work.
Changing Process Priorities
Nice values
Positive values lower priorities.
Negative values increase priorities.
If you know a process will be a CPU hog,
nice +5 command_name
If you detect a CPU hog after it’s started,
Managing Processes with
kill
TERM (default)
Terminates process execution (Ctrl-c). Processes can catch or ignore signal.
KILL (9)
Terminates process execution. Processes cannot catch or ignore. Processes waiting on I/O will not die.
STOP
Suspends process execution until SIGCONT (Ctrl-z). Useful for moving CPU hog out of way temporarily.
Imposing Limits on Processes
CPU time ulimit –t secs
Maximum file size ulimit –f KB
Maximum data segment ulimit –d KB
Maximum stack size ulimit –s KB
Maximum physical mem ulimit –m KB
Maximum core size ulimit –c KB
Maximum number procs ulimit –u n
Monitoring Memory
Use free to see how memory is used.
System will use most free memory for caching. System will swap out inactive processes.
Don’t worry until free < 5% of total memory. Use vmstat to detect paging activity.
Page out (so) rate greater than 0 consistently.
High page in (si) rate, as system uses the paging facility to load programs into memory.
Managing Memory
1. Improving paging capacity.
Add new swapfiles with swapon. Add new swap partitions.
2. Improving paging performance.
Use swap partitions instead of swap files. Distribute swap resources across disks.
3. Migrate memory hogs to another host. 4. Add more memory.
Monitoring Disk I/O
Use iostat to get per disk statistics.
Transactions per second (tps). Blocks read/written per second.
Managing disk performance problems.
Distribute heavily used data across disks/ctrlers. Get more or faster disks.
iostat
> iostat 2
Linux 2.6.15-23-386 (zim) 03/26/2007
avg-cpu: %user %nice %system %iowait %steal %idle 8.55 0.18 3.22 0.09 0.00 87.96
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hde 0.69 8.18 9.43 89783416 103565744 hdh 0.15 1.33 3.37 14590831 36969599 hdc 0.00 0.00 0.00 9548 0 avg-cpu: %user %nice %system %iowait %steal %idle
0.17 0.00 0.17 0.00 0.00 99.67
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn hde 0.33 0.00 21.33 0 128 hdh 0.00 0.00 0.00 0 0 hdc 0.00 0.00 0.00 0 0
Managing Disk Capacity
Detecting disk resource usage.
List all partition usage with df –h Identify high usage directories with du
Summary data: du –s
Highest usage directories: du -k /|sort –rn
Use find to detect disk hogs.
Use find –size to search for big files.
Use –atime +X to identify files that haven’t been used in X days.
Managing Disk Shortages
1. Add more disks.
2. Move files to remote fileservers. 3. Eliminate unnecessary files.
4. Compress large infrequently used files. 5. Impose disk quotas on users.
Soft limit: can be violated temporarily. Hard limit: cannot be violated.
Monitoring Network Connections
List listening network ports
lsof -i
List firewall rules (which ports are accessible)
iptables -L
List network connections and listening ports
IPTraf
Managing Network Capacity
1. Move applications onto separate servers. 2. Add more NICs and bond them.
3. Upgrade from 1Gbps to 10Gbps Ethernet if supported by server hardware.
Key Points
Performance testing terms
– Load testing and stress testing – Latency and throughput
– Baselines and benchmarks
Performance testing activities
1. Identify test environment
2. Identify performance criteria 3. Plan and design tests
4. Configure test environment 5. Implement test design
6. Execute tests
References
1. Mark Burgess, Principles of System and Network Administration, Wiley, 2000.
2. Aeleen Frisch, Essential System Administration,
3rd edition, O’Reilly, 2002.
3. Mike Loukides and Gian-Paolo D. Musumeci,
System Performance Tuning, 2nd edition, O’Reilly, 2003.
4. Evi Nemeth et al, UNIX System Administration Handbook, 3rd edition, Prentice Hall, 2001.
5. patterns & practices, Performance Testing
Guidance for Web Applications,