Measuring Java JVM thread activity in Linux with task state sampling

2020-11-28

FYI: I will be speaking about similar topics at P99Conf in Oct 2023 (online, free event)

There are plenty of JVM profilers out there, but before attaching with a profiler, you can get a high-level overview by just sampling the Linux /proc/PID/comm and /proc/PID/stat fields to see which JVMs threads are most active and whether they’re burning CPU or are blocked by some I/O.

When creating threads in Java, you can name them using the following syntax: Thread t = new Thread("MyThread") or thread.setName("Worker-123"). This name will show up in the comm fields in Linux /proc. Some developers make good use of this observability feature.

Here’s an example from when I was generating a synthetic dataset for some Oracle SQL tuning experiments with SwingBench.

You can just run ps with -L option (to show individual threads) on the JVM process and print out task status (R,D,S,etc) and it’s comm field, showing the thread name. I also aggregate the list with sort | uniq -c:

$ ps -Lo s,comm -p 18657 | sort | uniq -c | sort -nbr
     25 S java
     10 S C2 CompilerThre
      5 S C1 CompilerThre
      1 S VM Thread
      1 S VM Periodic Tas
      1 S UCP Clock
      1 S Timer-0
      1 S Thread-2
      1 S Thread-0
      1 S Signal Dispatch
      1 S Service Thread
      1 S Reference Handl
      1 S oracle.ucp.acto
      1 S oracle.jdbc.dri
      1 S LOGON_28_Thread
      1 S InterruptTimer
      1 S Finalizer
      1 S COMMAND
      1 S Attach Listener

The above output is from JDK 8, so it has a bunch of unnamed system threads just called java. On JDK 11, almost all threads have a name set according to their role:

$ ps -Lo s,comm -p 14458 | sort | uniq -c | sort -nbr
      4 S UCP-worker-thre
      2 S java
      1 S VM Thread
      1 S VM Periodic Tas
      1 S UCP Clock
      1 S Timer-0
      1 S Thread-1
      1 S Thread-0
      1 S Sweeper thread
      1 S Signal Dispatch
      1 S Service Thread
      1 S Reference Handl
      1 S oracle.ucp.acto
      1 S oracle.jdbc.dri
      1 S InterruptTimer
      1 S GC Thread#9
      1 S GC Thread#8
      1 S GC Thread#7
...
      1 S G1 Young RemSet
      1 S G1 Refine#0
      1 S G1 Main Marker
      1 S G1 Conc#5
      1 S G1 Conc#4
      1 S G1 Conc#3
      1 S G1 Conc#2
      1 S G1 Conc#1
      1 S G1 Conc#0
      1 S Finalizer
      1 S Common-Cleaner

On Linux you can use my pSnapper tool to sample the JVM process activity multiple times and report a better summary, grouped by similar threads with just a numeric difference in their names:

$ psn -p 14458

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.


=== Active Threads ===========================================

 samples | avg_threads | comm              | state            
--------------------------------------------------------------
    1002 |       10.02 | (ADDRESSES_*_Th)  | Running (ON CPU) 
     454 |        4.54 | (ADDRESSES_*_Thr) | Running (ON CPU) 
      40 |        0.40 | (GC Thread#*)     | Running (ON CPU) 
       2 |        0.02 | (UCP Clock)       | Running (ON CPU) 
       1 |        0.01 | (VM Thread)       | Running (ON CPU) 

In the output above you see a bunch of ADDRESSES_* thread names, these are named by the application developer, so I know what kind of activity the application is mostly doing right now - with simple OS-level tools. Additionally, since JDK 11 assigns names to garbage collection threads too, I can assess the intensity of current garbage collection from how much CPU these GC threads are using.

Apparently I have around 10.45 threads on CPU, used for synthetic address generation. The garbage collection activity used 0.4 CPUs in total.

You can make pSnapper list exact comm names without replacing digits too:

$ psn -p 14458 -g comm2,state

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.


=== Active Threads ===========================================

 samples | avg_threads | comm2             | state            
--------------------------------------------------------------
      98 |        0.98 | (ADDRESSES_12_Th) | Running (ON CPU) 
      98 |        0.98 | (ADDRESSES_21_Th) | Running (ON CPU) 
      96 |        0.96 | (ADDRESSES_9_Thr) | Running (ON CPU) 
      95 |        0.95 | (ADDRESSES_2_Thr) | Running (ON CPU) 
      95 |        0.95 | (ADDRESSES_4_Thr) | Running (ON CPU) 
      94 |        0.94 | (ADDRESSES_31_Th) | Running (ON CPU) 
      94 |        0.94 | (ADDRESSES_3_Thr) | Running (ON CPU) 
      94 |        0.94 | (ADDRESSES_5_Thr) | Running (ON CPU) 
      93 |        0.93 | (ADDRESSES_23_Th) | Running (ON CPU) 
      92 |        0.92 | (ADDRESSES_17_Th) | Running (ON CPU) 
      91 |        0.91 | (ADDRESSES_22_Th) | Running (ON CPU) 
      91 |        0.91 | (ADDRESSES_25_Th) | Running (ON CPU) 
      91 |        0.91 | (ADDRESSES_26_Th) | Running (ON CPU) 
      91 |        0.91 | (ADDRESSES_27_Th) | Running (ON CPU) 
      91 |        0.91 | (ADDRESSES_28_Th) | Running (ON CPU) 
      87 |        0.87 | (ADDRESSES_29_Th) | Running (ON CPU) 
       2 |        0.02 | (GC Thread#10)    | Running (ON CPU) 
       2 |        0.02 | (GC Thread#11)    | Running (ON CPU) 
       2 |        0.02 | (GC Thread#8)     | Running (ON CPU) 
       2 |        0.02 | (GC Thread#9)     | Running (ON CPU) 
       1 |        0.01 | (GC Thread#2)     | Running (ON CPU) 
       1 |        0.01 | (GC Thread#3)     | Running (ON CPU) 
       1 |        0.01 | (GC Thread#4)     | Running (ON CPU) 
       1 |        0.01 | (GC Thread#5)     | Running (ON CPU) 
       1 |        0.01 | (GC Thread#6)     | Running (ON CPU) 
       1 |        0.01 | (GC Thread#7)     | Running (ON CPU) 

This way we get a better idea of concurrency and scale of utilization - do we have just one GC thread burning CPU or do we have many threads using only a little.

If you want to see the executed binary name too, you can use cmdline field in pSnapper (or exe field in the always-on 0x.tools xcapture tool:

$ psn -p 14458 -G cmdline

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, cmdline for 5 seconds... finished.


=== Active Threads ===========================================================================================

 samples | avg_threads | comm              | state            | cmdline                                       
--------------------------------------------------------------------------------------------------------------
     792 |        7.92 | (ADDRESSES_*_Th)  | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java 
     347 |        3.47 | (ADDRESSES_*_Thr) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java 
     206 |        2.06 | (CUSTOMERS_*_Th)  | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java 
      93 |        0.93 | (CUSTOMERS_*_Thr) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java 
      35 |        0.35 | (GC Thread#*)     | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java 
       6 |        0.06 | (C* CompilerThre) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java 
       1 |        0.01 | (Sweeper thread)  | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java 

You can also add syscall,wchan fields to the existing default ones, to print out in which system calls and kernel wait locations (if any) the individual threads were during sampling. Note that I’m using sudo as modern Linux kernel versions block access (or mask data) when querying some /proc fields as an unprivileged user:

$ sudo psn -p 14458 -G syscall,wchan

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, syscall, wchan for 5 seconds... finished.


=== Active Threads =============================================================================

 samples | avg_threads | comm              | state            | syscall   | wchan               
------------------------------------------------------------------------------------------------
    1319 |       13.19 | (CUSTOMERS_*_Th)  | Running (ON CPU) | [running] | 0                   
      90 |        0.90 | (CUSTOMERS_*_Thr) | Running (ON CPU) | [running] | 0                   
      76 |        0.76 | (ADDRESSES_*_Th)  | Running (ON CPU) | [running] | 0                   
      41 |        0.41 | (GC Thread#*)     | Running (ON CPU) | [running] | 0                   
      12 |        0.12 | (CUSTOMERS_*_Th)  | Running (ON CPU) | read      | wait_woken          
       4 |        0.04 | (GC Thread#*)     | Running (ON CPU) | futex     | futex_wait_queue_me 
       2 |        0.02 | (CUSTOMERS_*_Th)  | Running (ON CPU) | futex     | futex_wait_queue_me 
       1 |        0.01 | (ADDRESSES_*_Th)  | Running (ON CPU) | [running] | wait_woken          
       1 |        0.01 | (ADDRESSES_*_Th)  | Running (ON CPU) | read      | wait_woken          
       1 |        0.01 | (CUSTOMERS_*_Th)  | Running (ON CPU) | [running] | wait_woken          
       1 |        0.01 | (CUSTOMERS_*_Thr) | Running (ON CPU) | read      | wait_woken          

pSnapper only prints threads in states R,D by default as these are the only states that contribute to linux system load average. For the full picture of all task states, you can use the -a option:

$ sudo psn -p 14458 -G syscall,wchan -a

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/wchan, syscall, stat for 5 seconds... finished.


=== Active Threads ==================================================================================

 samples | avg_threads | comm              | state                 | syscall   | wchan               
-----------------------------------------------------------------------------------------------------
    2263 |       22.63 | (GC Thread#*)     | Sleep (Interruptible) | futex     | futex_wait_queue_me 
    1411 |       14.11 | (CUSTOMERS_*_Th)  | Running (ON CPU)      | [running] | 0                   
     600 |        6.00 | (G* Conc#*)       | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     400 |        4.00 | (UCP-worker-thre) | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     200 |        2.00 | (C* CompilerThre) | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     200 |        2.00 | (java)            | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (Common-Cleaner)  | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (Finalizer)       | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (G* Main Marker)  | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (G* Refine#*)     | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (G* Young RemSet) | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (InterruptTimer)  | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (Reference Handl) | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (Service Thread)  | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (Signal Dispatch) | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (Sweeper thread)  | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (Thread-*)        | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (Thread-*)        | Sleep (Interruptible) | read      | pipe_wait           
     100 |        1.00 | (Timer-*)         | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (UCP Clock)       | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (oracle.jdbc.dri) | Sleep (Interruptible) | futex     | futex_wait_queue_me 
     100 |        1.00 | (oracle.ucp.acto) | Sleep (Interruptible) | futex     | futex_wait_queue_me 
      99 |        0.99 | (VM Thread)       | Sleep (Interruptible) | futex     | futex_wait_queue_me 
      98 |        0.98 | (VM Periodic Tas) | Sleep (Interruptible) | futex     | futex_wait_queue_me 
      95 |        0.95 | (CUSTOMERS_*_Thr) | Running (ON CPU)      | [running] | 0                   
      48 |        0.48 | (CUSTOMERS_*_Th)  | Sleep (Interruptible) | read      | wait_woken          
      31 |        0.31 | (GC Thread#*)     | Running (ON CPU)      | [running] | 0                   
      15 |        0.15 | (CUSTOMERS_*_Th)  | Sleep (Interruptible) | futex     | futex_wait_queue_me 
      11 |        0.11 | (CUSTOMERS_*_Th)  | Running (ON CPU)      | read      | wait_woken          
       5 |        0.05 | (CUSTOMERS_*_Th)  | Sleep (Interruptible) | [running] | 0                   
       5 |        0.05 | (CUSTOMERS_*_Th)  | Sleep (Interruptible) | read      | 0                   
       4 |        0.04 | (CUSTOMERS_*_Th)  | Running (ON CPU)      | [running] | wait_woken          
       3 |        0.03 | (CUSTOMERS_*_Thr) | Sleep (Interruptible) | read      | wait_woken          
       2 |        0.02 | (GC Thread#*)     | Running (ON CPU)      | [running] | futex_wait_queue_me 
       2 |        0.02 | (GC Thread#*)     | Running (ON CPU)      | futex     | futex_wait_queue_me 
       1 |        0.01 | (CUSTOMERS_*_Th)  | Running (ON CPU)      | [running] | futex_wait_queue_me 
       1 |        0.01 | (CUSTOMERS_*_Thr) | Running (ON CPU)      | [running] | wait_woken          
       1 |        0.01 | (CUSTOMERS_*_Thr) | Sleep (Interruptible) | futex     | futex_wait_queue_me 
       1 |        0.01 | (GC Thread#*)     | Sleep (Interruptible) | [running] | 0                   
       1 |        0.01 | (GC Thread#*)     | Sleep (Interruptible) | futex     | 0                   
       1 |        0.01 | (VM Periodic Tas) | Running (ON CPU)      | futex     | futex_wait_queue_me 
       1 |        0.01 | (VM Periodic Tas) | Sleep (Interruptible) | [running] | futex_wait_queue_me 
       1 |        0.01 | (VM Thread)       | Sleep (Interruptible) | [running] | 0                   

Most of the threads are just idle in Sleep mode, waiting for wakeup using the Linux futex (fast-userspace mutex) mechanism.

If you want to see a summary of all JVMs’ activity in your system (but broken down by JVM instance), you can just use -p java to sample all java processes, but include pid into the grouping list. I’m using -g instead of -G here as I wanted to pick my own list of fields, not just append it to the end of the default (comm, state) fields:

$ psn -p java -g pid,comm,state

Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.


=== Active Threads ===================================================

 samples | avg_threads |   pid | comm              | state            
----------------------------------------------------------------------
     543 |        5.43 | 28578 | (ORDER_ITEMS_*)   | Running (ON CPU) 
     174 |        1.74 | 28578 | (CARD_DETAILS_*)  | Running (ON CPU) 
     149 |        1.49 | 28578 | (ORDERS_*_Threa)  | Running (ON CPU) 
      58 |        0.58 | 28578 | (ORDERS_*_Thread) | Running (ON CPU) 
      19 |        0.19 | 28578 | (GC Thread#*)     | Running (ON CPU) 
      11 |        0.11 |  9452 | (java)            | Running (ON CPU) 
       1 |        0.01 |  7375 | (java)            | Running (ON CPU) 

When there’s disk/filesystem I/O involved, you can also list the corresponding filenames, assuming that the system calls are simple enough (and operate on just one file descriptor at a time). The following example is from dd command, not Java, but you get the point:

$ psn -p 18286 -G syscall,filename

Linux Process Snapper v0.14 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, syscall for 5 seconds... finished.


=== Active Threads ==================================================================================

 samples | avg_threads | comm | state                  | syscall   | filename                        
-----------------------------------------------------------------------------------------------------
      79 |        0.79 | (dd) | Disk (Uninterruptible) | write     | /backup/tanel/test (stdout)
       7 |        0.07 | (dd) | Disk (Uninterruptible) | [running] |                                 
       5 |        0.05 | (dd) | Running (ON CPU)       | write     | /backup/tanel/test (stdout)     
       4 |        0.04 | (dd) | Disk (Uninterruptible) | read      | /reco/fio/mmapfile.0.0 (stdin)  
       3 |        0.03 | (dd) | Running (ON CPU)       | [running] |                                 
       2 |        0.02 | (dd) | Running (ON CPU)       | read      | /reco/fio/mmapfile.0.0 (stdin)  

You can read more about Linux Process Snapper (pSnapper) here.

While pSnapper is useful for troubleshooting currently ongoing Linux performance problems, you can use xcapture from my open source 0x.tools toolset for continuous sampling and logging of task states from /proc and look into always-on profiling for production systems in general.

Here’s one example of reporting any GC thread activity from the /proc samples taken by xcapture:

$ cat 2020-11-28.1[89].csv | grep '(GC' | awk -F, '{ printf("%-22s %s\n", $1, $5) }' | sort | uniq -c 
     22 2020-11-28 18:52:14.070 R
     12 2020-11-28 18:55:13.320 R
     15 2020-11-28 18:56:39.545 R
     23 2020-11-28 18:56:55.072 R
      5 2020-11-28 18:57:10.588 R
     12 2020-11-28 18:57:18.876 R
     23 2020-11-28 18:57:58.216 R
     23 2020-11-28 18:58:15.823 R
     23 2020-11-28 18:58:43.769 R
     23 2020-11-28 18:59:10.711 R
     22 2020-11-28 18:59:57.276 R
     23 2020-11-28 19:00:52.124 R
     23 2020-11-28 19:02:03.519 R
     21 2020-11-28 19:02:06.629 R
     23 2020-11-28 19:02:09.737 R
     20 2020-11-28 19:02:15.940 R
     23 2020-11-28 19:02:22.145 R
     22 2020-11-28 19:03:21.169 R
     12 2020-11-28 19:03:33.576 R
      7 2020-11-28 19:03:56.356 R
      2 2020-11-28 19:04:29.428 R
     23 2020-11-28 19:05:07.727 R
     20 2020-11-28 19:05:47.035 R
     23 2020-11-28 19:07:52.146 R
     20 2020-11-28 19:07:57.320 R
     15 2020-11-28 19:10:04.538 R
     18 2020-11-28 19:11:10.884 R
     21 2020-11-28 19:11:16.083 R
     12 2020-11-28 19:12:12.107 R

xcapture samples all threads in the system at 1 Hz (by default). It doesn’t capture every microsecond when a thread may have briefly been on CPU (or has waited for I/O) and some background garbage collection CPU activity is not necessarily a problem. Nevertheless, you can assess the “intensity” of GC activity and plot it over time just using task state sampling at OS level - without needing monitoring agents or attaching to the JVM with a profiler.

So, hopefully you now have one more tool in your performance toolset. And with 0x.tools, you can have this stuff always on for every process in your system, providing a good generic starting point for performance diagnosis drilldown whenever some intermittent hiccups happen.

If you need to do proper performance diagnosis requiring deeper visibility into your application, a JVM profiler is a good next step. My favorite tool for this is async-profiler or sometimes even just the jstack thread dumper bundled with the JVM.

It is also possible to use Linux perf for sampling JVM process activity, but there’s some extra complexity around creating .map files on the fly, I haven’t gotten that far yet in my toolset. I’m currently using perf at 1 Hz sampling rate for getting an “always on” history of on-CPU activity. It works for regular compiled programs, but it doesn’t show Java method names, yet.

If you haven’t seen 0x.tools for always-on profiling of production systems yet, check it out here:

Thanks for reading!


  1. Updated video course material to be announced soon:
    Advanced Oracle SQL Tuning training. Advanced Oracle Troubleshooting training, Linux Performance & Troubleshooting training.
    Check the current versions out here!
  2. Get randomly timed updates by email or follow Social/RSS