Chapter 5. Using WebSphere Studio Application Monitor
5.2 Problem determination
5.2.1 Scenario 1 - table lock
This section presents Scenario 1 of five scenarios covering Application Monitor problem determination.
Problem description
This scenario illustrates a performance problem introduced by a transaction that obtains and holds an exclusive lock on a DB2 table that other transactions need to access. This allows us to show many of the capabilities of Application Monitor as you might use them with your applications.
If such a problem occurred in your environment, you might be alerted to it in one of several ways:
Your phone might ring with a complaint about response times.
You might have an alert set in Application Monitor to notify you of long response times or other unusual conditions.
You might notice an unusual drop in transaction rates, from the Application Monitor Application Overview screen.
You might notice an unusual number of in-flight transactions in Application Monitor, with long residence times.
To create this problem, we ran a simple application that does nothing more than get an exclusive lock on a DB2 table used by Trade 2 transactions, then goes to sleep. The output from this application is shown in Figure 5-3 on page 79.
Figure 5-3 Application output
We have Trade 2 stock purchase transactions in flight, now being delayed because they can’t access the table they need. To the end user, the situation looks like the browser page shown in Figure 5-4 on page 80.
Figure 5-4 Trade output
Our prior quote operation succeeded, but now having clicked Buy to purchase the shares, we get no response and eventually time out.
Let’s see how things look in Application Monitor by reviewing Figure 5-5 on page 81.
Figure 5-5 shows the In-Flight Request Search page. Although there are various ways to navigate in Application Monitor, the quickest way to display this page is via the Top Navigation pull-down for Problem Determination at the top of every Application Monitor page.
Here we see all currently executing transactions for the servers we are
monitoring. This is just another way of saying in-flight requests. They are listed in descending order of response time, or Total Resident Time, so you are quickly able to see which requests on which servers are taking the longest.
In this case, we see that our locking request has taken almost 41 seconds so far, and that quite a few Trade2 requests are taking more than 20 seconds. If we press the OK button a few times, we’ll see the Total Resident Time increasing, and more transactions in-flight as the bottleneck builds.
One way to drill down further is to select Server Activity Display, by placing your cursor over the tools icon in between the Server Name and Client Request fields as shown in Figure 5-6 on page 83.
Figure 5-6 Server activity
Selecting Server Activity Display allows you to see more information about requests on a specific server, as shown in Figure 5-7 on page 84.
Figure 5-7 Server activity selected
We see that our LockingServlet request shows very long Resident Time, but no CPU time. We need more information to determine what this request is doing, but before we leave this page, note that you can also see Last Known Class Name, Last Known Method, and Total Thread Count for the server. This information could be useful for other tuning efforts.
To get more detail about a specific request, we click the Thread ID of the LockingServlet request. We could have done this directly from the In-Flight Request page, but then we would have missed all the excitement of the Server
Activity Display! See Figure 5-8 for an example of the resulting Request Detail page.
Figure 5-8 Request detail
From this Request Detail page, we see that the Last SQL this request generated was an exclusive lock on table TRADEHOLDINGBEAN. We have further options on the left menu, including Stack Trace and Method Trace. Let’s look at each one in turn, starting by clicking Stack Trace as shown in Figure 5-9 on page 86.
Here we see the JVM stack for all methods that have not yet executed. Note that the last outstanding method to execute is “sleep.” Combined with our table lock, this is the reason that other requests are being delayed.
If we want more proof (or just want to show what you can get from a method trace), click Method Trace to display the Method Trace page as shown in Figure 5-10 on page 88.
Figure 5-10 Method trace
Important: Method Trace information requires L3 (Tracing Mode).
Method Trace provides detailed method-level information, including entry and exit points for the thread. Note the Nesting Level column and the adjacent ENTRY and EXIT indicators. These will help you, or your applications team, understand this request better. Also note the Elapsed Time and CPU Time, as well as the delta columns for each. For our contrived locking application there is no measurable time spent before obtaining the lock and going to sleep, but you will probably see more interesting data in your applications. Finally, note that the last known SQL statements are also displayed on this page for each method entry and exit point.
To further illustrate that our locking application is causing the delay to our buy request, let’s look at its Request Detail page, shown in Figure 5-11 on page 90.
Tip: Method Trace data can also be viewed after the fact, using Application Monitor’s Performance Analysis and Reporting function. From this saved data, various reports including Method Trace can be exported as a CSV file and viewed in a spreadsheet, or e-mailed as a PDF. You may find these ways useful for sharing this kind of information with other parties involved in your performance and tuning efforts.
Figure 5-11 Request detail page
Remember, we could get here by selecting In-Flight Request Search from the Problem Determination Top Navigation pull-down from any Application Monitor page, then clicking the Thread ID of the request we want to see.
We see that the last SQL for our buy request is an insert into the
TRADEHOLDINGBEAN table. Of course, this is the table that is locked. If we click Stack Trace in the menu on the left of this page, we see the page shown in Figure 5-12 on page 91.
Figure 5-12 Stack trace reports
Here we see, as we might have expected, that our last method in the stack is SQLExecute.
OK, you’ve probably had just about enough of our locking transaction by now, so let’s bring this scenario to a close. In real life, this would be a case where the locking transaction clearly does not play well with others, and needs to be fixed.
You would take the information you have obtained from Application Monitor and ask the application developers to modify their request, perhaps with guidance from your friendly DBA.
In the world of test systems, we have another option. From the Request Detail page for the locking transaction, we can click Cancel Request, as shown in Figure 5-13.
Figure 5-13 Request reports
Note that cancelling a request should be considered as a last resort only. It’s not an elegant move. Depending on the state of the transaction and the resources it is using at the time it is cancelled, you could bring down your server. If your only other choice is to bring the server down because everyone is hung up by a rogue request tying up resources, you may want to cancel the request.
In our example, with the locking transaction cancelled and the lock on table TRADEHOLDINGBEAN freed, our other transactions, including our buy transaction, are free to continue and the system soon returns to normal, as evidenced by an In-Flight Request page minus the many delayed requests we saw before, as shown in Figure 5-14 and Figure 5-15.
Figure 5-14 In-flight report 1 of 2