Is the System Wait Bound? - Expert Oracle9i Database Administration SAM R. ALAPATI

If none of the previous steps indicated any problems, chances are that your system is suffering from a serious contention for some resource such as library

cache latches. Check to see if there is contention for critical database resources such as locks and latches, for example. Contention for these resources manifests itself in the form of wait events. The wait event analysis earlier in this chapter gave you a detailed explanation of various critical wait events. You can use the output of the Statspack utility to see what the top wait events in your database are. A quick- and-dirty way to find out what waits are causing the system slowdown is to run the query shown in Listing 19-30.

Listing 19-30. Current Wait Events in the Instance SQL> select event,count(*) from v$session_wait

2* group by event;

EVENT COUNT(*)

---

PL/SQL lock timer 1

SQL*Net message from client 569

SQL*Net message to client 2

SQL*Net more data from client 2

buffer busy waits 1

db file scattered read 2

db file sequential read 7

latch free 1 8 rows selected.

SQL>

An even more efficient method of viewing current waits in the system is to use OEM’s Performance Manager. For example, if the Latch Analysis Chart shows much more than 1 percent latch contention in the system, you may have some type of latch issue. Figure 19-7 shows how to use OEM’s Performance Manager to examine the latch wait levels in your database.

You can also use the Performance Manager to view the Wait Analysis Overview tool, which shows you summary information on various wait events in the database. You can view a chart showing the number of sessions currently waiting for a specific event. You can also look at charts that show the top five wait events that have fin- ished waiting for the sample period. OEM’s wait analysis screen will also show you the top waits by time waited, as shown in Figure 19-8.

Eliminating the Contention

Once you identify wait events due to contention in the system, you need to remove the bottleneck. Of course, this is easier said than done in the short run. You may be able to fix some contention problems right away, whereas you may need more time with others. Problems such as high DB file scattered read events, which are due to full table scans, may indicate, as you have seen, that the I/O workload of the system needs to be reduced. However, if the reduction in I/O requires creating new indexes and rewriting SQL statements, obviously you can’t fix the problem right away. You can’t add disks and rearrange objects to reduce hot spots right away either. Similarly, most latch contention requires changes at the application level. Just make sure you don’t perform a whole bunch of changes at once—you’ll never be able to find out what fixed the problem (or in some cases, what made it worse!).

Figure 19-7. Using the Performance Manager to examine latch waits

The trick, as usual, is to go after the problems you can fix in the short run. Problems that you can fix by changing the memory allocation to the shared pool or the buffer cache you can easily handle almost immediately by dynamically adjusting the cache values. You can also take care of any changes that concern the redo logs right away. If you notice one or two users causing a CPU bottleneck, it may be a smart idea to kill those sessions so the database as whole will perform better. As you know, prevention is much better than a cure, so consider using the Oracle Database Resource Manager tool (Chapter 11 shows you in detail how to use the Database Resource Manager) to create resource groups and prevent a single user or group from monopolizing the CPU usage.

If intense latch contention is slowing your database down, you probably should be setting the cursor_sharing parameter’s value to force or similar to ame- liorate the situation.

Most other changes, though, may require more time-consuming solutions. Some changes may even require major changes in the code or the addition or modification of important indexes. However, even if the problem isn’t fixed immediately, you have learned your craft, and you’re on the right path to improving instance performance.

Summary

In an operational database, the scope for SQL tuning is extremely limited in most cases, unless the poor SQL code is bringing the entire system to a standstill. Most of the time, you have to work your way around less-than-optimal SQL code. As a DBA, you can impact instance performance considerably, however, and it’s important to know how to evaluate system performance.

This chapter provided you with detailed analyses of database hit ratios and Oracle wait events. The issue of whether you should use the traditional hit ratios or the more sophisticated wait ratio analysis isn’t hard to resolve. You need to look at both the hit ratios and wait ratios to assess performance problems. In some cases, the problem may be elusive, even when you employ both of these performance indicators. A better methodology is sometimes to simply focus on the top resource-consuming SQL statements in your system and see if you can tune them. Remember that in order to reduce the response time of your users, you’ll need to bring down the wait time involved. If high wait times are causing response times to go down, you need to focus intensely on the type of waits and what’s causing them. Just make sure you don’t mistake the symptoms of high wait times for the maladies themselves.

The chapter showed you how to handle a database hang and offered a systematic methodology for you to follow when your database response times are slow.

In document Expert Oracle9i Database Administration SAM R. ALAPATI (Page 85-88)