Experts rely on hit ratios or wait statistics, or sometimes both, but there are situa- tions where both the hit ratios and the wait statistics can completely fail you. Imagine a situation where all the hit ratios are in the 99 percent range. Also, imagine that the wait statistics don’t show any significant waiting for resources or any contention for latches. Does this mean that your system is running optimally?
Well, your system is doing what you asked it to do extremely well, but there’s no guarantee that your SQL code is processing things efficiently. If a query is per- forming an inordinate number of logical reads, the hit ratios are going to look wonderful. The wait events also won’t show you a whole lot, because they don’t capture the time spent while you were actually using the CPU. However, you’ll be burning a lot of CPU time, because the query is making too many logical reads.
This example shows why it’s important not to rely on just the hit ratios or the wait statistics, but also to look at the major consumers of resources on your instance with an intense focus—for example, why is this query doing a billion logical reads? Check the top session list (sorted according to different criteria) on your instance and see if there’s justification for them to be in that list.
Above all, try not to confuse the symptoms of poor performance with the causes of poor performance. If your latch rate is high, there are initialization parameters that you might want to adjust right away—after all, isn’t Oracle a highly configurable database? Well, you may succeed sometimes by relying solely on adjusting the initialization parameters, but it may be time to pause and question why exactly the latch rate is so high. More than likely, the high latch rate is due to application coding issues, rather than a specific parameter setting. Simi- larly, you may notice that your system is CPU bound, but the reason may not be slow or too few CPU resources. Your application may again be the likely culprit, because it is doing too many unnecessary I/Os, even if they’re mostly from the database buffer cache and not disk.
When you are examining wait ratios, please understand that your goal is not make all the wait events go away, because that will never happen. Learn to ignore the unimportant and routine, unavoidable wait events such as the control file parallel wait event. As you saw in the previous section, wait events such as the SQL*Net Message from Client event reflect waits outside the database, so don’t attribute these waits to a poorly performing database. Focus on the total wait time rather than the number of wait events that show up in your performance tables and Statspack reports. Also, if the wait events make up only a small portion of response time, there’s no point in fretting about the waits. That is, if the response time is 20 minutes and the waits are 20 seconds, what’s the point of wasting your time investigating the wait events any further? As Einstein might say, the signifi- cance of wait events is relative—relative to the total response time and relative to the total CPU execution time.
In addition to the hit ratios and the wait event statistics, you need to monitor other areas of the database for possible performance implications. Among other things, you should try to achieve the following goals to maximize the performance of your database:
• Minimize row chaining and migration. • Check for blocking locks in your system.
• Make sure that the redo log space requests are kept low by tuning your database writer process. If the number of requests is high, it means the database is slowing down due to an inadequate number/size of redo logs. • Monitor checkpoint intervals to optimize their frequency.
Recently, there has been a surge in publications that expound the virtues of a wait event analysis–based performance approach (also called the wait interface approach). If you use this method for performance tuning, how important are the traditional hit ratios? Well, you can always use the buffer hit ratios and the other ratios for a general idea about how the system is using Oracle’s memory and other resources, but an analysis of wait events is still a better bet in terms of improving performance. If you take care of the wait issues, you’ll have taken care of the tradi- tional hit ratios as well anyway. For example, if you want to fix a problem that is the result of a high number of free buffer waits, you may need to increase the buffer cache. Similarly, if latch-free wait events are troublesome, one of the solu- tions is to check if you need to add more memory to the shared pool. You may fix a problem due to a high level of waits caused by the direct path reads by increasing the value of the pga_aggregate_target parameter.