Practices for Lesson 6: Overview
Practice 6-1: Investigating Server Problems
Scenario
Soon after the latest version of an application is deployed, users report they are unable to access the application. You investigate.
Overview
In this practice, you configure server overload conditions, monitor threads, create a thread dump, and analyze a thread dump.
Assumptions
You completed “Practice 3-1: Harvesting Diagnostic Metrics.” All instances of WebLogic Server are running.
Tasks
1. Modify a server’s overload configuration so that the server fails if “too many” threads are stuck.
Note: The default is to never put a server into the FAILED state based on stuck threads.
a. Access the administration console and lock the configuration. b. Locate and select server1.
c. Select the Configuration > Overload tabs.
d. Update the Stuck Thread Count to 5.
e. Also update the Max Stuck Thread Time to 30.
Note: The default is 600 seconds (10 minutes).
f. Save and activate the changes.
Note: Notice that the server needs to be restarted for the changes to take effect.
g. Shut down and restart server1. Wait for server1 to be running before continuing. 2. Run the setup script to deploy the latest version of the application.
a. Access host01. Open a Terminal window and run the setup script in the current practice directory.
$> cd /practices/tshoot/practice06-01 $> ./setup.sh
Note:
− The setup script undeploys the current version of the application with its deployment plan.
− The script also deletes the deployment plan file. If you run the script more than once, a message displays that the file cannot be removed (because it is no longer there).
− The script deploys a new version of the application. This version of the application: − Introduces some threading issues
− No longer has a deployment plan
− No longer contains an application-scoped diagnostic module
− The setup script also disables the system diagnostic module, by targeting it to nothing.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.
Practices for Lesson 6: Troubleshooting Servers
Chapter 6 - Page 4 − Please ignore any messages about:
− An insecure protocol being used to connect to the admin server − WLContext.close() being called in a different thread
3. Run a Grinder script to simulate users accessing the application.
a. In the same Terminal window, from the current practice directory, run the script to call the Grinder.
$> ./rungrinder.sh
b. Let the Grinder run for about a minute before continuing.
Note: Do not wait longer than two minutes or you may have to run the Grinder again.
c. Do not close the Terminal window. d. Minimize the VNC Viewer to use again. 4. Check the status of server1 and its threads.
a. Access the admin console. Navigate to the Servers table. You should see server1 is in the FAILED state.
Note: If is not, wait a little while and refresh the web browser.
b. Click the name of server1. Click the Monitoring > Threads tabs.
c. In the Self-Tuning Thread Pool Threads table, you should see some threads that are
“stuck” (the Stuck column is true). Also, you should see threads that are “hoggers” (the
Hogger column is true). You may have to scroll to see these threads.
Note: “Hogger” threads are suspicious, but have not yet been busy long enough to be
“stuck.” After they are busy long enough to be “stuck,” the “hogger” flag is not reset (it remains true).
d. Click the Dump Thread Stacks button.
e. Scroll down and view some of the threads that are marked as hogger or stuck. You should also see some threads that are “blocked.”
f. Notice at the top of each stack trace, it always seems to show the same method:
stcurr.DataAccess.getConnection()
Note: Because all the threads that are blocked and stuck seem to have this method in
common, that is where developers should start looking to resolve the problem.
Tip: If you do not see any stuck threads in the thread stack, you waited too long to
press the Dump Thread Stacks button. If that is the case, you need to kill the Grinder processes and run the Grinder script again.
− Use the admin console to shut down and restart server1.
− Open a new Terminal window, navigate to the current practice directory and run the script killgrinder.sh.
$> cd /practices/tshoot/practice06-01 $> ./killgrinder.sh
− Return to the window in which you ran the Grinder script and run it again.
− Try this task again: access the admin console, select server1, click the Monitoring > Threads tabs, look for stuck threads, click the Dump Thread Stacks button, and look through the thread dump.
5. Thread dumps go to the server log file, but you might also want to save them to their own file. Save a thread dump to a file by using the HotSpot JVM utility called jstack.
a. Return to the VNC Viewer for host01.
Oracle University and In Motion Servicios S.A. use only
b. Open a new Terminal window.
c. Navigate to where the jstack utility resides, the bin directory under the JDK.
$> cd /u01/app/jdk/bin
d. Find the process ID of server1 by using the ps (process status) command:
$> ps –u oracle –o pid,args | grep weblogic.Server
Note:
− The -u option means to show only processes owned by the users listed (oracle). − The -o option is the format desired (the pid followed by the command args). − Using the pipe followed by grep weblogic.Server means to send the output of
the ps command to grep and only show those results that contain the string “weblogic.Server” (notice that Server starts with a capital “S”).
e. Look through the output. You should have three items. One is the admin server, one is server1, and the last one is the grep command itself. You can tell which WebLogic Server is which by the option within the arguments that lists the server’s name:
-Dweblogic.Name=server1
f. Note the number at the start of the server1 item. That is the PID of server1.
g. In the same Terminal window, run the jstack utility with that PID. Normally it prints the thread dump to the Terminal window. Redirect it to a file to keep it.
$> ./jstack nnnnn > /home/oracle/server1_threads.txt
Note: Replace nnnnn with the PID of server1. h. View the thread dump file by using the gedit editor.
$> gedit /home/oracle/server1_threads.txt
i. Try searching in the editor (Search > Find or Ctrl + F) for these strings:
stuck blocked
Note: If you do not see any threads that are stuck or blocked, it may be that too much
time has passed since the Grinder script accessed the “bad” application. j. Exit the editor.
k. Close the Terminal window.
6. Look at a thread dump with a tool. You will use an open source tool called Samurai that looks through a server log file and displays thread dumps found in the log.
a. Still on host01, open a new Terminal window, and navigate to where the Samurai JAR file resides:
$> cd /install/samurai
b. Set up the PATH and CLASSPATH by running the setWLSEnv.sh script:
$> source /u01/app/fmw/wlserver/server/bin/setWLSEnv.sh
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.
Practices for Lesson 6: Troubleshooting Servers
Chapter 6 - Page 6 c. Run the Samurai JAR file to open the GUI tool:
$> java –jar samurai.jar
d. You can drag a server log file into Samurai by using the File Browser, navigating to the log file location, and dragging it from the File Browser into the Samurai window. Or you can use the Samurai File > Open menu options to navigate to and select the log file. The screenshot below is using the File menu. Remember, the log file is here:
/u01/domains/tshoot/wlsadmin/servers/server1/logs/server1.log
Oracle University and In Motion Servicios S.A. use only
e. Samurai reads the log file and puts any thread dumps into the Thread Dumps tab. Select that tab. The information displayed starts out in Samurai’s Table View. Scroll to the bottom to see the legend.
f. Each column in the table represents a thread dump. You are interested in one with Blocked threads (red blocks). The thread dump of interest will be the last column (or the only column). Select one of the Blocked threads by clicking its red block.
Note: You may not have as many thread dumps (columns) in your table as shown in
the screen shot. In fact, you may only have one.
g. This takes you to the Sequence View, to the part of this thread that has the problem. You may notice that even though a thread is “blocked” and waiting, it does not have to be stuck, as in this example.
Note: The “ACTIVE” and “BLOCKED” notations have been highlighted in yellow in the
screenshot.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.
Practices for Lesson 6: Troubleshooting Servers
Chapter 6 - Page 8
h. To return to the Table view, scroll up and click the Table link, or click the Table View icon near the bottom of the screen.
Note: The icon has been highlighted in yellow in the screenshot.
i. Try selecting some other red blocks (blocked threads). Can you find any that are stuck?
Tip: Scroll down and click one of the lower numbered ones. Or select the Thread
Dump view ( ) then use the Edit > Find options. In the Find window that
opens at the bottom of the screen, enter STUCK, select Match Case, and click the Next button.
7. Clean up.
a. Close the Samurai window.
b. Close the Terminal window where you started Samurai.
c. Stop the Grinder processes, if they are still running. Open a new Terminal window, and navigate to the current practice directory. Run the killgrinder.sh script.
$> cd /practices/tshoot/practice06-01 $> ./killgrinder.sh
Killed all Grinder client processes.
Note: If the Grinder client processes have already finished running, the script prints
out:
No Grinder client processes found.
d. Minimize the host01 VNC Viewer.
8. Change server1’s overload configuration. Shut down and restart server1.
Note: Although there were thread issues with the new version of the contacts application,
the overload configuration numbers you entered earlier might be too low, so you will update them before shutting down and restarting the server.
a. Access the admin console.
b. Navigate to the Servers table. You should see that server1 is still in the FAILED state. Even if its health may now be OK (the stuck threads are gone), the server was set to fail if too many threads got stuck. After the server fails, it stays FAILED until it is shut down and restarted.
c. Lock the configuration. d. Select server1.
e. Select the Configuration > Overload tabs.
f. Update the Stuck Thread Count to 30. g. Update the Max Stuck Thread Time to 300.
Note: Both of these are more realistic values.
h. Save and activate the changes.
Note: Notice that the server needs to be restarted for the changes to take effect.
Oracle University and In Motion Servicios S.A. use only
i. Shut down and restart server1.
Note: It needs to be shut down and started again because it is in the FAILED state,
anyway.
j. Wait for it to return to the RUNNING state before continuing. 9. Return to a previous, working version of the application.
a. Return to the host01 VNC Viewer.
b. In a Terminal window, navigate to the current practice directory and run the
deploygood.sh script.
$> cd /practices/tshoot/practice06-01 $> ./deploygood.sh
Note: This script uses the same deploy_app.py WLST script as the setup.sh
script does. However, after it undeploys the bad version of the application, it copies the original version over the bad version in the domain’s apps directory and then deploys the original.
c. Close the Terminal window. d. Exit the VNC Viewer.
Copyright © 2014, Oracle and/or its affiliates. All rights reserved.
Practices for Lesson 6: Troubleshooting Servers
Chapter 6 - Page 10