October 9th, 2013

How to find which thread in a java application (tomcat) is eating up all your CPU

So here's my idea....
1- use top to see all the java processes and threads by CPU utilization (capital H displays threads).  This will cause top to show the nlwp of for the thread in the PID column

Example (from top data):

[root@host ~]# top -H -n 3 -b |grep tomcat | grep java | sort -rn -k 9 | head -1
6638 tomcat    20   0 10.5g 2.9g  12m S  98.9 39.2   0:03.16 java                                            

2- Us ps -L -utomcat to grep out the nlwp and get the PID of tomcat owning it.

Example: I'm grepping for the LWP id.   The first number is the PID and the second is the LWP id:

[root@host ~]#  ps -L -utomcat |grep java | grep 6638
27628  6638 ?        00:00:03 java

3- So now I have the PID of the java process and the LWP id of the bad thread.  I can take a stack trace of java.  The stack track records the LWP as NID in hex.  So we convert the NID in hex to LWP and we have the LWP.

Do a kill -s SIGQUIT $tomcat_pid to the process to force a thread dump (which will write out to catalina.out for tomcat).

[root@host ~]# kill -s SIGQUIT 27628

Then convert all the NID's to LWP's with a quick perl script that converts the hex to regular NLWP (which I saved and named /tmp/convert-nid-to-lwp.pl).

[root@host ~]# cat /usr/local/tomcat/default/logs/catalina.out.thread.dump.2013-10-09--13-54-49 | /tmp/convert-nid-to-lwp.pl > /usr/local/tomcat/default/logs/catalina.out.thread.dump.2013-10-09--13-54-49-nlwp

4- So now i have a thread dump with all the threads tagged by NLWP numbers.  I find my NLWP dump in the thread dump and I've got a stack trace of the thread that's eating up all the CPU.