FYI: I will be speaking about similar topics at P99Conf in Oct 2023 (online, free event)
There are plenty of JVM profilers out there, but before attaching with a profiler, you can get a high-level overview by just sampling the Linux /proc/PID/comm and /proc/PID/stat fields to see which JVMs threads are most active and whether they’re burning CPU or are blocked by some I/O.
When creating threads in Java, you can name them using the following syntax: Thread t = new Thread("MyThread") or thread.setName("Worker-123"). This name will show up in the comm fields in Linux /proc. Some developers make good use of this observability feature.
Here’s an example from when I was generating a synthetic dataset for some Oracle SQL tuning experiments with SwingBench.
You can just run ps with -L option (to show individual threads) on the JVM process and print out task status (R,D,S,etc) and it’s comm field, showing the thread name. I also aggregate the list with sort | uniq -c:
$ ps -Lo s,comm -p 18657 | sort | uniq -c | sort -nbr
25 S java
10 S C2 CompilerThre
5 S C1 CompilerThre
1 S VM Thread
1 S VM Periodic Tas
1 S UCP Clock
1 S Timer-0
1 S Thread-2
1 S Thread-0
1 S Signal Dispatch
1 S Service Thread
1 S Reference Handl
1 S oracle.ucp.acto
1 S oracle.jdbc.dri
1 S LOGON_28_Thread
1 S InterruptTimer
1 S Finalizer
1 S COMMAND
1 S Attach Listener
The above output is from JDK 8, so it has a bunch of unnamed system threads just called java. On JDK 11, almost all threads have a name set according to their role:
$ ps -Lo s,comm -p 14458 | sort | uniq -c | sort -nbr
4 S UCP-worker-thre
2 S java
1 S VM Thread
1 S VM Periodic Tas
1 S UCP Clock
1 S Timer-0
1 S Thread-1
1 S Thread-0
1 S Sweeper thread
1 S Signal Dispatch
1 S Service Thread
1 S Reference Handl
1 S oracle.ucp.acto
1 S oracle.jdbc.dri
1 S InterruptTimer
1 S GC Thread#9
1 S GC Thread#8
1 S GC Thread#7
...
1 S G1 Young RemSet
1 S G1 Refine#0
1 S G1 Main Marker
1 S G1 Conc#5
1 S G1 Conc#4
1 S G1 Conc#3
1 S G1 Conc#2
1 S G1 Conc#1
1 S G1 Conc#0
1 S Finalizer
1 S Common-Cleaner
On Linux you can use my pSnapper tool to sample the JVM process activity multiple times and report a better summary, grouped by similar threads with just a numeric difference in their names:
$ psn -p 14458
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.
=== Active Threads ===========================================
samples | avg_threads | comm | state
--------------------------------------------------------------
1002 | 10.02 | (ADDRESSES_*_Th) | Running (ON CPU)
454 | 4.54 | (ADDRESSES_*_Thr) | Running (ON CPU)
40 | 0.40 | (GC Thread#*) | Running (ON CPU)
2 | 0.02 | (UCP Clock) | Running (ON CPU)
1 | 0.01 | (VM Thread) | Running (ON CPU)
In the output above you see a bunch of ADDRESSES_* thread names, these are named by the application developer, so I know what kind of activity the application is mostly doing right now - with simple OS-level tools. Additionally, since JDK 11 assigns names to garbage collection threads too, I can assess the intensity of current garbage collection from how much CPU these GC threads are using.
Apparently I have around 10.45 threads on CPU, used for synthetic address generation. The garbage collection activity used 0.4 CPUs in total.
You can make pSnapper list exact comm names without replacing digits too:
$ psn -p 14458 -g comm2,state
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.
=== Active Threads ===========================================
samples | avg_threads | comm2 | state
--------------------------------------------------------------
98 | 0.98 | (ADDRESSES_12_Th) | Running (ON CPU)
98 | 0.98 | (ADDRESSES_21_Th) | Running (ON CPU)
96 | 0.96 | (ADDRESSES_9_Thr) | Running (ON CPU)
95 | 0.95 | (ADDRESSES_2_Thr) | Running (ON CPU)
95 | 0.95 | (ADDRESSES_4_Thr) | Running (ON CPU)
94 | 0.94 | (ADDRESSES_31_Th) | Running (ON CPU)
94 | 0.94 | (ADDRESSES_3_Thr) | Running (ON CPU)
94 | 0.94 | (ADDRESSES_5_Thr) | Running (ON CPU)
93 | 0.93 | (ADDRESSES_23_Th) | Running (ON CPU)
92 | 0.92 | (ADDRESSES_17_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_22_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_25_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_26_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_27_Th) | Running (ON CPU)
91 | 0.91 | (ADDRESSES_28_Th) | Running (ON CPU)
87 | 0.87 | (ADDRESSES_29_Th) | Running (ON CPU)
2 | 0.02 | (GC Thread#10) | Running (ON CPU)
2 | 0.02 | (GC Thread#11) | Running (ON CPU)
2 | 0.02 | (GC Thread#8) | Running (ON CPU)
2 | 0.02 | (GC Thread#9) | Running (ON CPU)
1 | 0.01 | (GC Thread#2) | Running (ON CPU)
1 | 0.01 | (GC Thread#3) | Running (ON CPU)
1 | 0.01 | (GC Thread#4) | Running (ON CPU)
1 | 0.01 | (GC Thread#5) | Running (ON CPU)
1 | 0.01 | (GC Thread#6) | Running (ON CPU)
1 | 0.01 | (GC Thread#7) | Running (ON CPU)
This way we get a better idea of concurrency and scale of utilization - do we have just one GC thread burning CPU or do we have many threads using only a little.
If you want to see the executed binary name too, you can use cmdline field in pSnapper (or exe field in the always-on 0x.tools xcapture tool:
$ psn -p 14458 -G cmdline
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, cmdline for 5 seconds... finished.
=== Active Threads ===========================================================================================
samples | avg_threads | comm | state | cmdline
--------------------------------------------------------------------------------------------------------------
792 | 7.92 | (ADDRESSES_*_Th) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
347 | 3.47 | (ADDRESSES_*_Thr) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
206 | 2.06 | (CUSTOMERS_*_Th) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
93 | 0.93 | (CUSTOMERS_*_Thr) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
35 | 0.35 | (GC Thread#*) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
6 | 0.06 | (C* CompilerThre) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
1 | 0.01 | (Sweeper thread) | Running (ON CPU) | /usr/lib/jvm/adoptopenjdk-11-hotspot/bin/java
You can also add syscall,wchan fields to the existing default ones, to print out in which system calls and kernel wait locations (if any) the individual threads were during sampling. Note that I’m using sudo as modern Linux kernel versions block access (or mask data) when querying some /proc fields as an unprivileged user:
$ sudo psn -p 14458 -G syscall,wchan
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, syscall, wchan for 5 seconds... finished.
=== Active Threads =============================================================================
samples | avg_threads | comm | state | syscall | wchan
------------------------------------------------------------------------------------------------
1319 | 13.19 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | 0
90 | 0.90 | (CUSTOMERS_*_Thr) | Running (ON CPU) | [running] | 0
76 | 0.76 | (ADDRESSES_*_Th) | Running (ON CPU) | [running] | 0
41 | 0.41 | (GC Thread#*) | Running (ON CPU) | [running] | 0
12 | 0.12 | (CUSTOMERS_*_Th) | Running (ON CPU) | read | wait_woken
4 | 0.04 | (GC Thread#*) | Running (ON CPU) | futex | futex_wait_queue_me
2 | 0.02 | (CUSTOMERS_*_Th) | Running (ON CPU) | futex | futex_wait_queue_me
1 | 0.01 | (ADDRESSES_*_Th) | Running (ON CPU) | [running] | wait_woken
1 | 0.01 | (ADDRESSES_*_Th) | Running (ON CPU) | read | wait_woken
1 | 0.01 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | wait_woken
1 | 0.01 | (CUSTOMERS_*_Thr) | Running (ON CPU) | read | wait_woken
pSnapper only prints threads in states R,D by default as these are the only states that contribute to linux system load average. For the full picture of all task states, you can use the -a option:
$ sudo psn -p 14458 -G syscall,wchan -a
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/wchan, syscall, stat for 5 seconds... finished.
=== Active Threads ==================================================================================
samples | avg_threads | comm | state | syscall | wchan
-----------------------------------------------------------------------------------------------------
2263 | 22.63 | (GC Thread#*) | Sleep (Interruptible) | futex | futex_wait_queue_me
1411 | 14.11 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | 0
600 | 6.00 | (G* Conc#*) | Sleep (Interruptible) | futex | futex_wait_queue_me
400 | 4.00 | (UCP-worker-thre) | Sleep (Interruptible) | futex | futex_wait_queue_me
200 | 2.00 | (C* CompilerThre) | Sleep (Interruptible) | futex | futex_wait_queue_me
200 | 2.00 | (java) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (Common-Cleaner) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (Finalizer) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (G* Main Marker) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (G* Refine#*) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (G* Young RemSet) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (InterruptTimer) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (Reference Handl) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (Service Thread) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (Signal Dispatch) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (Sweeper thread) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (Thread-*) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (Thread-*) | Sleep (Interruptible) | read | pipe_wait
100 | 1.00 | (Timer-*) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (UCP Clock) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (oracle.jdbc.dri) | Sleep (Interruptible) | futex | futex_wait_queue_me
100 | 1.00 | (oracle.ucp.acto) | Sleep (Interruptible) | futex | futex_wait_queue_me
99 | 0.99 | (VM Thread) | Sleep (Interruptible) | futex | futex_wait_queue_me
98 | 0.98 | (VM Periodic Tas) | Sleep (Interruptible) | futex | futex_wait_queue_me
95 | 0.95 | (CUSTOMERS_*_Thr) | Running (ON CPU) | [running] | 0
48 | 0.48 | (CUSTOMERS_*_Th) | Sleep (Interruptible) | read | wait_woken
31 | 0.31 | (GC Thread#*) | Running (ON CPU) | [running] | 0
15 | 0.15 | (CUSTOMERS_*_Th) | Sleep (Interruptible) | futex | futex_wait_queue_me
11 | 0.11 | (CUSTOMERS_*_Th) | Running (ON CPU) | read | wait_woken
5 | 0.05 | (CUSTOMERS_*_Th) | Sleep (Interruptible) | [running] | 0
5 | 0.05 | (CUSTOMERS_*_Th) | Sleep (Interruptible) | read | 0
4 | 0.04 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | wait_woken
3 | 0.03 | (CUSTOMERS_*_Thr) | Sleep (Interruptible) | read | wait_woken
2 | 0.02 | (GC Thread#*) | Running (ON CPU) | [running] | futex_wait_queue_me
2 | 0.02 | (GC Thread#*) | Running (ON CPU) | futex | futex_wait_queue_me
1 | 0.01 | (CUSTOMERS_*_Th) | Running (ON CPU) | [running] | futex_wait_queue_me
1 | 0.01 | (CUSTOMERS_*_Thr) | Running (ON CPU) | [running] | wait_woken
1 | 0.01 | (CUSTOMERS_*_Thr) | Sleep (Interruptible) | futex | futex_wait_queue_me
1 | 0.01 | (GC Thread#*) | Sleep (Interruptible) | [running] | 0
1 | 0.01 | (GC Thread#*) | Sleep (Interruptible) | futex | 0
1 | 0.01 | (VM Periodic Tas) | Running (ON CPU) | futex | futex_wait_queue_me
1 | 0.01 | (VM Periodic Tas) | Sleep (Interruptible) | [running] | futex_wait_queue_me
1 | 0.01 | (VM Thread) | Sleep (Interruptible) | [running] | 0
Most of the threads are just idle in Sleep mode, waiting for wakeup using the Linux futex (fast-userspace mutex) mechanism.
If you want to see a summary of all JVMs’ activity in your system (but broken down by JVM instance), you can just use -p java to sample all java processes, but include pid into the grouping list. I’m using -g instead of -G here as I wanted to pick my own list of fields, not just append it to the end of the default (comm, state) fields:
$ psn -p java -g pid,comm,state
Linux Process Snapper v0.18 by Tanel Poder [https://0x.tools]
Sampling /proc/stat for 5 seconds... finished.
=== Active Threads ===================================================
samples | avg_threads | pid | comm | state
----------------------------------------------------------------------
543 | 5.43 | 28578 | (ORDER_ITEMS_*) | Running (ON CPU)
174 | 1.74 | 28578 | (CARD_DETAILS_*) | Running (ON CPU)
149 | 1.49 | 28578 | (ORDERS_*_Threa) | Running (ON CPU)
58 | 0.58 | 28578 | (ORDERS_*_Thread) | Running (ON CPU)
19 | 0.19 | 28578 | (GC Thread#*) | Running (ON CPU)
11 | 0.11 | 9452 | (java) | Running (ON CPU)
1 | 0.01 | 7375 | (java) | Running (ON CPU)
When there’s disk/filesystem I/O involved, you can also list the corresponding filenames, assuming that the system calls are simple enough (and operate on just one file descriptor at a time). The following example is from dd command, not Java, but you get the point:
$ psn -p 18286 -G syscall,filename
Linux Process Snapper v0.14 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, syscall for 5 seconds... finished.
=== Active Threads ==================================================================================
samples | avg_threads | comm | state | syscall | filename
-----------------------------------------------------------------------------------------------------
79 | 0.79 | (dd) | Disk (Uninterruptible) | write | /backup/tanel/test (stdout)
7 | 0.07 | (dd) | Disk (Uninterruptible) | [running] |
5 | 0.05 | (dd) | Running (ON CPU) | write | /backup/tanel/test (stdout)
4 | 0.04 | (dd) | Disk (Uninterruptible) | read | /reco/fio/mmapfile.0.0 (stdin)
3 | 0.03 | (dd) | Running (ON CPU) | [running] |
2 | 0.02 | (dd) | Running (ON CPU) | read | /reco/fio/mmapfile.0.0 (stdin)
You can read more about Linux Process Snapper (pSnapper) here.
While pSnapper is useful for troubleshooting currently ongoing Linux performance problems, you can use
xcapturefrom my open source 0x.tools toolset for continuous sampling and logging of task states from/procand look into always-on profiling for production systems in general.
Here’s one example of reporting any GC thread activity from the /proc samples taken by xcapture:
$ cat 2020-11-28.1[89].csv | grep '(GC' | awk -F, '{ printf("%-22s %s\n", $1, $5) }' | sort | uniq -c
22 2020-11-28 18:52:14.070 R
12 2020-11-28 18:55:13.320 R
15 2020-11-28 18:56:39.545 R
23 2020-11-28 18:56:55.072 R
5 2020-11-28 18:57:10.588 R
12 2020-11-28 18:57:18.876 R
23 2020-11-28 18:57:58.216 R
23 2020-11-28 18:58:15.823 R
23 2020-11-28 18:58:43.769 R
23 2020-11-28 18:59:10.711 R
22 2020-11-28 18:59:57.276 R
23 2020-11-28 19:00:52.124 R
23 2020-11-28 19:02:03.519 R
21 2020-11-28 19:02:06.629 R
23 2020-11-28 19:02:09.737 R
20 2020-11-28 19:02:15.940 R
23 2020-11-28 19:02:22.145 R
22 2020-11-28 19:03:21.169 R
12 2020-11-28 19:03:33.576 R
7 2020-11-28 19:03:56.356 R
2 2020-11-28 19:04:29.428 R
23 2020-11-28 19:05:07.727 R
20 2020-11-28 19:05:47.035 R
23 2020-11-28 19:07:52.146 R
20 2020-11-28 19:07:57.320 R
15 2020-11-28 19:10:04.538 R
18 2020-11-28 19:11:10.884 R
21 2020-11-28 19:11:16.083 R
12 2020-11-28 19:12:12.107 R
xcapture samples all threads in the system at 1 Hz (by default). It doesn’t capture every microsecond when a thread may have briefly been on CPU (or has waited for I/O) and some background garbage collection CPU activity is not necessarily a problem. Nevertheless, you can assess the “intensity” of GC activity and plot it over time just using task state sampling at OS level - without needing monitoring agents or attaching to the JVM with a profiler.
So, hopefully you now have one more tool in your performance toolset. And with 0x.tools, you can have this stuff always on for every process in your system, providing a good generic starting point for performance diagnosis drilldown whenever some intermittent hiccups happen.
If you need to do proper performance diagnosis requiring deeper visibility into your application, a JVM profiler is a good next step. My favorite tool for this is async-profiler or sometimes even just the jstack thread dumper bundled with the JVM.
It is also possible to use Linux perf for sampling JVM process activity, but there’s some extra complexity around creating .map files on the fly, I haven’t gotten that far yet in my toolset. I’m currently using perf at 1 Hz sampling rate for getting an “always on” history of on-CPU activity. It works for regular compiled programs, but it doesn’t show Java method names, yet.
If you haven’t seen 0x.tools for always-on profiling of production systems yet, check it out here:
Thanks for reading!
