FYI: I will be speaking about similar topics at P99Conf in Oct 2023 (online, free event)
There are plenty of JVM profilers out there, but before attaching with a profiler, you can get a high-level overview by just sampling the Linux /proc/PID/comm
and /proc/PID/stat
fields to see which JVMs threads are most active and whether they’re burning CPU or are blocked by some I/O.
When creating threads in Java, you can name them using the following syntax: Thread t = new Thread("MyThread")
or thread.setName("Worker-123")
. This name will show up in the comm
fields in Linux /proc
. Some developers make good use of this observability feature.
Here’s an example from when I was generating a synthetic dataset for some Oracle SQL tuning experiments with SwingBench.
You can just run ps
with -L
option (to show individual threads) on the JVM process and print out task status (R,D,S,etc) and it’s comm
field, showing the thread name. I also aggregate the list with sort | uniq -c
:
$ ps -Lo s,comm -p 18657 | sort | uniq -c | sort -nbr
25 S java
10 S C2 CompilerThre
5 S C1 CompilerThre
1 S VM Thread
1 S VM Periodic Tas
1 S UCP Clock
1 S Timer-0
1 S Thread-2
1 S Thread-0
1 S Signal Dispatch
1 S Service Thread
1 S Reference Handl
1 S oracle.ucp.acto
1 S oracle.jdbc.dri
1 S LOGON_28_Thread
1 S InterruptTimer
1 S Finalizer
1 S COMMAND
1 S Attach Listener
The above output is from JDK 8, so it has a bunch of unnamed system threads just called java
. On JDK 11, almost all threads have a name set according to their role:
$ ps -Lo s,comm -p 14458 | sort | uniq -c | sort -nbr
4 S UCP-worker-thre
2 S java
1 S VM Thread
1 S VM Periodic Tas
1 S UCP Clock
1 S Timer-0
1 S Thread-1
1 S Thread-0
1 S Sweeper thread
1 S Signal Dispatch
1 S Service Thread
1 S Reference Handl
1 S oracle.ucp.acto
1 S oracle.jdbc.dri
1 S InterruptTimer
1 S GC Thread#9
1 S GC Thread#8
1 S GC Thread#7
...
1 S G1 Young RemSet
1 S G1 Refine#0
1 S G1 Main Marker
1 S G1 Conc#5
1 S G1 Conc#4
1 S G1 Conc#3
1 S G1 Conc#2
1 S G1 Conc#1
1 S G1 Conc#0
1 S Finalizer
1 S Common-Cleaner
On Linux you can use my pSnapper tool to sample the JVM process activity multiple times and report a better summary, grouped by similar threads with just a numeric difference in their names:
$ psn -p 14458
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.
=== Active Threads ===========================================
samples | avg_threads | comm | state
--------------------------------------------------------------
1002 | 10.02 | (ADDRESSES_*_Th) | Running (ON CPU)
454 | 4.54 | (ADDRESSES_*_Thr) | Running (ON CPU)
40 | 0.40 | (GC Thread#*) | Running (ON CPU)
2 | 0.02 | (UCP Clock) | Running (ON CPU)
1 | 0.01 | (VM Thread) | Running (ON CPU)
In the output above you see a bunch of ADDRESSES_*
thread names, these are named by the application developer, so I know what kind of activity the application is mostly doing right now - with simple OS-level tools. Additionally, since JDK 11 assigns names to garbage collection threads too, I can assess the intensity of current garbage collection from how much CPU these GC
threads are using.
Apparently I have around 10.45 threads on CPU, used for synthetic address generation. The garbage collection activity used 0.4 CPUs in total.
You can make pSnapper list exact comm
names without replacing digits too:
$ psn -p 14458 -g comm2,state
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.
=== Active Threads ===========================================
samples | avg_threads | comm2 | state
--------------------------------------------------------------
98 | 0.98 | (ADDRESSES_12_Th) | Running (ON CPU)
98 | 0.98 | (ADDRESSES_21_Th) | Running (ON CPU)
96 | 0.96 | (ADDRESSES_9_Thr) | Running (ON CPU)
95 | 0.95 | (ADDRESSES_2_Thr) | Running (ON CPU)
95 | 0.95 | (ADDRESSES_4_Thr) | Running (ON CPU)
94 | 0.94 | (ADDRESSES_31_Th) | Running (ON CPU)
94 | 0.94 | (ADDRESSES_3_Thr) | Running (ON CPU)
94 | 0.94 | (ADDRESSES_5_Thr) | Running (ON CPU)
93 | 0.93 | (ADDRESSES_23_Th) | Running (ON CPU)
92 | 0.92 | (ADDRESSES_17_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_22_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_25_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_26_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_27_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_28_Th) | Running (ON CPU)
87 | 0.87 | (ADDRESSES_29_Th) | Running (ON CPU)
2 | 0.02 | (GC Thread#10) | Running (ON CPU)
2 | 0.02 | (GC Thread#11) | Running (ON CPU)
2 | 0.02 | (GC Thread#8) | Running (ON CPU)
2 | 0.02 | (GC Thread#9) | Running (ON CPU)
1 | 0.01 | (GC Thread#2) | Running (ON CPU)
1 | 0.01 | (GC Thread#3) | Running (ON CPU)
1 | 0.01 | (GC Thread#4) | Running (ON CPU)
1 | 0.01 | (GC Thread#5) | Running (ON CPU)
1 | 0.01 | (GC Thread#6) | Running (ON CPU)
1 | 0.01 | (GC Thread#7) | Running (ON CPU)
This way we get a better idea of concurrency and scale of utilization - do we have just one GC thread burning CPU or do we have many threads using only a little.
If you want to see the executed binary name too, you can use cmdline
field in pSnapper (or exe
field in the always-on 0x.tools xcapture
tool:
$ psn -p 14458 -G cmdline
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, cmdline for 5 seconds... finished.
=== Active Threads ===========================================================================================
samples | avg_threads | comm | state | cmdline
--------------------------------------------------------------------------------------------------------------
792 | 7.92 | (ADDRESSES_*_Th) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
347 | 3.47 | (ADDRESSES_*_Thr) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
206 | 2.06 | (CUSTOMERS_*_Th) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
93 | 0.93 | (CUSTOMERS_*_Thr) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
35 | 0.35 | (GC Thread#*) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
6 | 0.06 | (C* CompilerThre) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
1 | 0.01 | (Sweeper thread) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
You can also add syscall,wchan
fields to the existing default ones, to print out in which system calls and kernel wait locations (if any) the individual threads were during sampling. Note that I’m using sudo
as modern Linux kernel versions block access (or mask data) when querying some /proc
fields as an unprivileged user:
$ sudo psn -p 14458 -G syscall,wchan
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, syscall, wchan for 5 seconds... finished.
=== Active Threads =============================================================================
samples | avg_threads | comm | state | syscall | wchan
------------------------------------------------------------------------------------------------
1319 | 13.19 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | 0
90 | 0.90 | (CUSTOMERS_*_Thr) | Running (ON CPU) | [running] | 0
76 | 0.76 | (ADDRESSES_*_Th) | Running (ON CPU) | [running] | 0
41 | 0.41 | (GC Thread#*) | Running (ON CPU) | [running] | 0
12 | 0.12 | (CUSTOMERS_*_Th) | Running (ON CPU) | read | wait_woken
4 | 0.04 | (GC Thread#*) | Running (ON CPU) | futex | futex_wait_queue_me
2 | 0.02 | (CUSTOMERS_*_Th) | Running (ON CPU) | futex | futex_wait_queue_me
1 | 0.01 | (ADDRESSES_*_Th) | Running (ON CPU) | [running] | wait_woken
1 | 0.01 | (ADDRESSES_*_Th) | Running (ON CPU) | read | wait_woken
1 | 0.01 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | wait_woken
1 | 0.01 | (CUSTOMERS_*_Thr) | Running (ON CPU) | read | wait_woken
pSnapper only prints threads in states R,D by default as these are the only states that contribute to linux system load average. For the full picture of all task states, you can use the -a
option:
$ sudo psn -p 14458 -G syscall,wchan -a Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools] Sampling /proc/wchan, syscall, stat for 5 seconds... finished. === Active Threads ================================================================================== samples | avg_threads | comm | state | syscall | wchan ----------------------------------------------------------------------------------------------------- 2263 | 22.63 | (GC Thread#*) | Sleep (Interruptible) | futex | futex_wait_queue_me 1411 | 14.11 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | 0 600 | 6.00 | (G* Conc#*) | Sleep (Interruptible) | futex | futex_wait_queue_me 400 | 4.00 | (UCP-worker-thre) | Sleep (Interruptible) | futex | futex_wait_queue_me 200 | 2.00 | (C* CompilerThre) | Sleep (Interruptible) | futex | futex_wait_queue_me 200 | 2.00 | (java) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (Common-Cleaner) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (Finalizer) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (G* Main Marker) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (G* Refine#*) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (G* Young RemSet) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (InterruptTimer) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (Reference Handl) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (Service Thread) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (Signal Dispatch) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (Sweeper thread) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (Thread-*) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (Thread-*) | Sleep (Interruptible) | read | pipe_wait 100 | 1.00 | (Timer-*) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (UCP Clock) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (oracle.jdbc.dri) | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | (oracle.ucp.acto) | Sleep (Interruptible) | futex | futex_wait_queue_me 99 | 0.99 | (VM Thread) | Sleep (Interruptible) | futex | futex_wait_queue_me 98 | 0.98 | (VM Periodic Tas) | Sleep (Interruptible) | futex | futex_wait_queue_me 95 | 0.95 | (CUSTOMERS_*_Thr) | Running (ON CPU) | [running] | 0 48 | 0.48 | (CUSTOMERS_*_Th) | Sleep (Interruptible) | read | wait_woken 31 | 0.31 | (GC Thread#*) | Running (ON CPU) | [running] | 0 15 | 0.15 | (CUSTOMERS_*_Th) | Sleep (Interruptible) | futex | futex_wait_queue_me 11 | 0.11 | (CUSTOMERS_*_Th) | Running (ON CPU) | read | wait_woken 5 | 0.05 | (CUSTOMERS_*_Th) | Sleep (Interruptible) | [running] | 0 5 | 0.05 | (CUSTOMERS_*_Th) | Sleep (Interruptible) | read | 0 4 | 0.04 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | wait_woken 3 | 0.03 | (CUSTOMERS_*_Thr) | Sleep (Interruptible) | read | wait_woken 2 | 0.02 | (GC Thread#*) | Running (ON CPU) | [running] | futex_wait_queue_me 2 | 0.02 | (GC Thread#*) | Running (ON CPU) | futex | futex_wait_queue_me 1 | 0.01 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | futex_wait_queue_me 1 | 0.01 | (CUSTOMERS_*_Thr) | Running (ON CPU) | [running] | wait_woken 1 | 0.01 | (CUSTOMERS_*_Thr) | Sleep (Interruptible) | futex | futex_wait_queue_me 1 | 0.01 | (GC Thread#*) | Sleep (Interruptible) | [running] | 0 1 | 0.01 | (GC Thread#*) | Sleep (Interruptible) | futex | 0 1 | 0.01 | (VM Periodic Tas) | Running (ON CPU) | futex | futex_wait_queue_me 1 | 0.01 | (VM Periodic Tas) | Sleep (Interruptible) | [running] | futex_wait_queue_me 1 | 0.01 | (VM Thread) | Sleep (Interruptible) | [running] | 0
Most of the threads are just idle in Sleep
mode, waiting for wakeup using the Linux futex
(fast-userspace mutex) mechanism.
If you want to see a summary of all JVMs’ activity in your system (but broken down by JVM instance), you can just use -p java
to sample all java processes, but include pid
into the grouping list. I’m using -g
instead of -G
here as I wanted to pick my own list of fields, not just append it to the end of the default (comm
, state
) fields:
$ psn -p java -g pid,comm,state Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools] Sampling /proc/stat for 5 seconds... finished. === Active Threads =================================================== samples | avg_threads | pid | comm | state ---------------------------------------------------------------------- 543 | 5.43 | 28578 | (ORDER_ITEMS_*) | Running (ON CPU) 174 | 1.74 | 28578 | (CARD_DETAILS_*) | Running (ON CPU) 149 | 1.49 | 28578 | (ORDERS_*_Threa) | Running (ON CPU) 58 | 0.58 | 28578 | (ORDERS_*_Thread) | Running (ON CPU) 19 | 0.19 | 28578 | (GC Thread#*) | Running (ON CPU) 11 | 0.11 | 9452 | (java) | Running (ON CPU) 1 | 0.01 | 7375 | (java) | Running (ON CPU)
When there’s disk/filesystem I/O involved, you can also list the corresponding filenames, assuming that the system calls are simple enough (and operate on just one file descriptor at a time). The following example is from dd
command, not Java, but you get the point:
$ psn -p 18286 -G syscall,filename Linux Process Snapper v0.14 by Tanel Poder [https://0x.tools] Sampling /proc/stat, syscall for 5 seconds... finished. === Active Threads ================================================================================== samples | avg_threads | comm | state | syscall | filename ----------------------------------------------------------------------------------------------------- 79 | 0.79 | (dd) | Disk (Uninterruptible) | write | /backup/tanel/test (stdout) 7 | 0.07 | (dd) | Disk (Uninterruptible) | [running] | 5 | 0.05 | (dd) | Running (ON CPU) | write | /backup/tanel/test (stdout) 4 | 0.04 | (dd) | Disk (Uninterruptible) | read | /reco/fio/mmapfile.0.0 (stdin) 3 | 0.03 | (dd) | Running (ON CPU) | [running] | 2 | 0.02 | (dd) | Running (ON CPU) | read | /reco/fio/mmapfile.0.0 (stdin)
You can read more about Linux Process Snapper (pSnapper) here.
While pSnapper is useful for troubleshooting currently ongoing Linux performance problems, you can use
xcapture
from my open source 0x.tools toolset for continuous sampling and logging of task states from/proc
and look into always-on profiling for production systems in general.
Here’s one example of reporting any GC thread activity from the /proc
samples taken by xcapture
:
$ cat 2020-11-28.1[89].csv | grep '(GC' | awk -F, '{ printf("%-22s %s\n", $1, $5) }' | sort | uniq -c 22 2020-11-28 18:52:14.070 R 12 2020-11-28 18:55:13.320 R 15 2020-11-28 18:56:39.545 R 23 2020-11-28 18:56:55.072 R 5 2020-11-28 18:57:10.588 R 12 2020-11-28 18:57:18.876 R 23 2020-11-28 18:57:58.216 R 23 2020-11-28 18:58:15.823 R 23 2020-11-28 18:58:43.769 R 23 2020-11-28 18:59:10.711 R 22 2020-11-28 18:59:57.276 R 23 2020-11-28 19:00:52.124 R 23 2020-11-28 19:02:03.519 R 21 2020-11-28 19:02:06.629 R 23 2020-11-28 19:02:09.737 R 20 2020-11-28 19:02:15.940 R 23 2020-11-28 19:02:22.145 R 22 2020-11-28 19:03:21.169 R 12 2020-11-28 19:03:33.576 R 7 2020-11-28 19:03:56.356 R 2 2020-11-28 19:04:29.428 R 23 2020-11-28 19:05:07.727 R 20 2020-11-28 19:05:47.035 R 23 2020-11-28 19:07:52.146 R 20 2020-11-28 19:07:57.320 R 15 2020-11-28 19:10:04.538 R 18 2020-11-28 19:11:10.884 R 21 2020-11-28 19:11:16.083 R 12 2020-11-28 19:12:12.107 R
xcapture
samples all threads in the system at 1 Hz (by default). It doesn’t capture every microsecond when a thread may have briefly been on CPU (or has waited for I/O) and some background garbage collection CPU activity is not necessarily a problem. Nevertheless, you can assess the “intensity” of GC activity and plot it over time just using task state sampling at OS level - without needing monitoring agents or attaching to the JVM with a profiler.
So, hopefully you now have one more tool in your performance toolset. And with 0x.tools, you can have this stuff always on for every process in your system, providing a good generic starting point for performance diagnosis drilldown whenever some intermittent hiccups happen.
If you need to do proper performance diagnosis requiring deeper visibility into your application, a JVM profiler is a good next step. My favorite tool for this is async-profiler or sometimes even just the jstack
thread dumper bundled with the JVM.
It is also possible to use Linux perf
for sampling JVM process activity, but there’s some extra complexity around creating .map
files on the fly, I haven’t gotten that far yet in my toolset. I’m currently using perf
at 1 Hz sampling rate for getting an “always on” history of on-CPU activity. It works for regular compiled programs, but it doesn’t show Java method names, yet.
If you haven’t seen 0x.tools
for always-on profiling of production systems yet, check it out here:
Thanks for reading!