Linux Process Snapper (pSnapper, psn) is part of my open source 0x.tools suite of tools for Linux performance troubleshooting and always-on profiling of production systems.
pSnapper is a Linux /proc
profiler that works by sampling Linux task states and other metrics from /proc/PID/task/TID
pseudofiles. pSnapper is designed to be a low-tech passive profiler, it does not attach to your program to slow it down, nor alter your program execution path or signal handling (like strace may inadvertently do).
As pSnapper is just a python script reading /proc files, it does not require software installation, nor install any kernel modules. pSnapper does not even require root access in most cases. The exception is if you want to sample some “private” /proc files (like syscall, and kernel stack) of processes running under other users.
The current release v0.15 is a beta right now. I have many more features to add, some known issues to fix and the output & command line options may change.
You can download it from my GitHub repo (make sure to “*Star” the repo too :-)
MySQL XFS fsync() metadata syncing bottleneck & inode contention:
$ sudo psn -p "mysqld|kwork" -G syscall,wchan Linux Process Snapper v0.14 by Tanel Poder [https://0x.tools] Sampling /proc/syscall, stat, wchan for 5 seconds... finished. === Active Threads ======================================================================================== samples | avg_threads | comm | state | syscall | wchan ----------------------------------------------------------------------------------------------------------- 25 | 3.12 | (mysqld) | Disk (Uninterruptible) | fsync | _xfs_log_force_lsn 16 | 2.00 | (mysqld) | Running (ON CPU) | [running] | 0 14 | 1.75 | (mysqld) | Disk (Uninterruptible) | pwrite64 | call_rwsem_down_write_failed 8 | 1.00 | (mysqld) | Disk (Uninterruptible) | fsync | submit_bio_wait 4 | 0.50 | (mysqld) | Disk (Uninterruptible) | pread64 | io_schedule 4 | 0.50 | (mysqld) | Disk (Uninterruptible) | pwrite64 | io_schedule 3 | 0.38 | (mysqld) | Disk (Uninterruptible) | pread64 | 0 3 | 0.38 | (mysqld) | Running (ON CPU) | [running] | io_schedule 3 | 0.38 | (mysqld) | Running (ON CPU) | pread64 | 0 2 | 0.25 | (mysqld) | Disk (Uninterruptible) | [running] | 0 1 | 0.12 | (kworker/*:*) | Running (ON CPU) | read | worker_thread 1 | 0.12 | (mysqld) | Disk (Uninterruptible) | fsync | io_schedule 1 | 0.12 | (mysqld) | Disk (Uninterruptible) | futex | call_rwsem_down_write_failed 1 | 0.12 | (mysqld) | Disk (Uninterruptible) | poll | 0 1 | 0.12 | (mysqld) | Disk (Uninterruptible) | pwrite64 | _xfs_log_force_lsn 1 | 0.12 | (mysqld) | Running (ON CPU) | fsync | submit_bio_wait 1 | 0.12 | (mysqld) | Running (ON CPU) | futex | futex_wait_queue_me
Here’s the basic syntax:
$ psn -h # get help $ psn --list # show all possible reporting columns $ psn # run pSnapper for 5 seconds, report task comm,state profile (default) $ psn -d 10 # same as above, but for 10 seconds $ psn -p 1234 # run pSnapper for PID 1234 $ psn -p 1234 -r # run pSnapper for PID 1234 and its children $ psn -p 1234 -a # as above, but show all thread states (including idle, interruptible sleep) $ psn -p "post|kwork" # run psnapper on processes matching regex "post|kwork" $ sudo psn -g comm,state,syscall # customize sampled /proc columns (syscall is "private", so need sudo if sampling all procs) $ psn -g cmdline,state,syscall,wchan # choose exact sampled /proc columns $ psn -G syscall,wchan # append sampled columns in addition to the default "comm,state" $ psn -G kstack # show kernel stack backtraces in addition to the default columns"
-a
to show all thread states (I, S, etc)psn
with default columns (so it doesn’t try to access syscall, stack of other “private” proc files) or run psn
with sudo. I’ll improve & add warnings in a future version.Here’s a video introducing pSnapper and 0x.tools. If you want to learn more about the /proc
filesystem and system calls in general, you can watch the two hacking session videos listed below too.
If you want to read about an example of using /proc
filesystem for troubleshooting a Linux process hang, read this blog entry.
pSnapper 0.14 can also show you the file names of accessed files, so when combined with the syscall sampling, you can easily check whether your program (like tar
, dd
or some other batch process) is bottlenecked by its input speed, output speed or mostly just CPU usage within the program itself.
Here’s an example of dd reading from a fast disk, writing to the slow disk (most of tie spent in write()
syscall to the slow backup device:
$ psn -p 18286 -G syscall,filename Linux Process Snapper v0.14 by Tanel Poder [https://0x.tools] Sampling /proc/stat, syscall for 5 seconds... finished. === Active Threads ================================================================================== samples | avg_threads | comm | state | syscall | filename ----------------------------------------------------------------------------------------------------- 79 | 0.79 | (dd) | Disk (Uninterruptible) | write | /backup/tanel/test (stdout) 7 | 0.07 | (dd) | Disk (Uninterruptible) | [running] | 5 | 0.05 | (dd) | Running (ON CPU) | write | /backup/tanel/test (stdout) 4 | 0.04 | (dd) | Disk (Uninterruptible) | read | /reco/fio/mmapfile.0.0 (stdin) 3 | 0.03 | (dd) | Running (ON CPU) | [running] | 2 | 0.02 | (dd) | Running (ON CPU) | read | /reco/fio/mmapfile.0.0 (stdin)
And in this case the copy flow is reversed, reading from the slow backup device, writing to the fast SSH storage. Most of the time is spent in read()
syscalls, reading from the /backup/tanel/test
file:
$ psn -p 18859 -G syscall,filename Linux Process Snapper v0.14 by Tanel Poder [https://0x.tools] Sampling /proc/stat, syscall for 5 seconds... finished. === Active Threads ============================================================================== samples | avg_threads | comm | state | syscall | filename ------------------------------------------------------------------------------------------------- 60 | 0.60 | (dd) | Disk (Uninterruptible) | read | /backup/tanel/test (stdin) 17 | 0.17 | (dd) | Disk (Uninterruptible) | [running] | 11 | 0.11 | (dd) | Running (ON CPU) | read | /backup/tanel/test (stdin) 6 | 0.06 | (dd) | Disk (Uninterruptible) | write | /reco/fio/test (stdout) 6 | 0.06 | (dd) | Running (ON CPU) | [running] |
In my lab environment I noticed that some disks were doing pretty heavy I/O and accessing files & directories on the filesystem was pretty slow. That was weird as I knew that no serious workload was running on my machine at that time.
iotop
also showed no significant I/O done by any of the applications.cp
uses buffered reads and writes, so the new file copy may be just in filesystem cache as dirty pages and may not have reached the disks.sync
and indeed it got hung!The iostat output is below:
$ iostat -xmd 5. Linux 4.14.35-1844.3.2.el7uek.x86_64 (linux01.localdomain) 04/06/2019 _x86_64_ (32 CPU) Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme1n1 0.00 0.00 0.20 2.80 0.00 0.02 15.07 0.00 3.47 2.00 3.57 0.00 0.00 nvme0n1 0.00 2.40 4.20 15.60 0.07 0.11 17.79 0.02 3.62 0.86 4.36 0.15 0.30 sdc 773.60 812.00 169.00 128.20 57.11 58.73 798.25 8.78 30.59 40.33 17.75 2.69 80.06 sdd 774.80 0.00 172.60 0.60 59.24 0.00 700.48 1.01 7.46 7.26 66.00 0.68 11.76 sde 775.00 823.40 174.20 130.60 59.76 59.68 802.54 9.74 35.03 46.97 19.11 2.65 80.84 sdb 773.60 0.00 173.60 0.60 59.20 0.00 696.02 0.84 6.30 6.20 35.33 0.61 10.56 md10 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The sd[b-e] disks had up to 180MB/s read/write I/O each. Interestingly only 2 disks out of 4 had write I/Os against them. These 4 disks actually belong to a software RAID device md10, but that RAID device has absolutely no I/O done against it. How is this possible, where do these extra I/Os come from?
Instead of guessing or hoping to catch some usual suspects, let’s get systematic - pSnapper to the rescue!
Since the sync
command was hung and there was I/O going on, I decided to run pSnapper, searching for active tasks with sync or kworker in their name (comm). The kworker threads do all kinds of kernel stuff, including asynchronous I/O. I’m also using sudo
here as I want to sample the private /proc/PID/syscall
file of root-owned threads. I’m using a new -G
command line option (v0.11) to append sampled fields to the existing default ones (comm,state).
$ sudo psn -G syscall,wchan -r -p "sync|kworker" Linux Process Snapper v0.11 by Tanel Poder [https://0x.tools] Sampling /proc/stat, syscall, wchan for 5 seconds... finished. === Active Threads ===================================================================================== samples | avg_threads | comm | state | syscall | wchan -------------------------------------------------------------------------------------------------------- 100 | 1.00 | (sync) | Disk (Uninterruptible) | sync | wb_wait_for_completion 98 | 0.98 | (kworker/u66:0) | Disk (Uninterruptible) | read | wait_barrier 82 | 0.82 | (md10_resync) | Disk (Uninterruptible) | read | raise_barrier 15 | 0.15 | (md10_resync) | Disk (Uninterruptible) | read | md_do_sync 3 | 0.03 | (kworker/29:2) | Disk (Uninterruptible) | read | rpm_resume 3 | 0.03 | (md10_resync) | Disk (Uninterruptible) | read | raid10_sync_request 2 | 0.02 | (kworker/1:0) | Disk (Uninterruptible) | read | hub_event 2 | 0.02 | (kworker/29:2) | Disk (Uninterruptible) | read | msleep 1 | 0.01 | (kworker/20:1H) | Running (ON CPU) | read | worker_thread 1 | 0.01 | (kworker/30:0) | Running (ON CPU) | [userland] | 0 1 | 0.01 | (kworker/6:0) | Running (ON CPU) | [userland] | 0 1 | 0.01 | (kworker/u66:0) | Running (ON CPU) | [userland] | 0 1 | 0.01 | (kworker/u66:0) | Running (ON CPU) | read | wait_barrier
From the first line of this output table “(sync)” we see that this process is 100% waiting for some wb_wait_for_completion
in the kernel while executing the sync
syscall. A quick source code search shows that wb_
means WriteBack in the file system layer. But the sync
command is not actually doing any I/O directly itself nor using any noticeable CPU. The other tasks that pop up near the top are more interesting though - the md10_resync
kernel thread is apparently trying to do disk reads and is waiting for various kernel events like raise_barrier
, md_do_sync
and raid10_sync_request
.
Note that I got lucky here with thread naming, I was looking for my program called
sync
and themd10_resync
also happened to have sync in it, so my regular expression matched it. Had I not gotten lucky and seen the troublemakers above, I would have ran pSnapper without the-p
argument next, to see all threads’ activity regardless of their name.
The events above provide already enough hints of what these threads are doing so that I don’t even have to search the source code. Some sort of RAID10 syncing perhaps? Indeed when checking dmesg
output, I see RAID (mirror) syncing going on at that time:
$ dmesg -H [Apr 6 13:07] md/raid10:md10: not clean -- starting background reconstruction [ +0.000003] md/raid10:md10: active with 4 out of 4 devices [ +0.079751] md10: detected capacity change from 0 to 8001301774336 [ +0.000233] md: resync of RAID array md10
And that explains it. The RAID manager realized that the disks that were supposed to be mirrored were not in sync (I had just created that software RAID array) and started syncing them in background. That’s why there was no I/O shown against the /dev/md10
device in iostat as it wasn’t userland applications generating the I/O against the RAID device, but the kernel threads accessing the underlying disk devices directly. That also explains why iotop
didn’t show these threads showing up. Iotop reads /proc/PID/io
as its input, but most kernel threads don’t record any activity in there (as they are issuing their I/Os at much lower level compared to userspace application system calls). Here’s an example from the md10_raid10
kernel thread:
$ sudo cat /proc/`pgrep md10_raid10`/io rchar: 0 wchar: 0 syscr: 0 syscw: 0 read_bytes: 0 write_bytes: 0 cancelled_write_bytes: 0
So in the case of troubleshooting unexpected I/O caused by kernel threads, you would need to either use some in-kernel tracing facility or sample /proc filesystem using pSnapper for getting a quick overview first.
kswapd
threads are idle:
sudo psn -G syscall,wchan -p kswapd –a
=== Active Threads ============================================================
samples | avg_threads | comm | state | syscall |wchan
-------------------------------------------------------------------------------
100 | 1.00 | (kswapd0:0) | Sleep (Interruptible) | read |kswapd
100 | 1.00 | (kswapd1:0) | Sleep (Interruptible) | read |kswapd
kswapd
threads are using CPU (likely page scanning), doing I/O (paging) and also freeing reclaimable objects (looks like from XFS filesystem inode cache this time, can also free reclaimable kernel SLAB pages etc):
$ sudo psn -G syscall,wchan -p kswapd –a === Active Threads =============================================================================== samples | avg_threads | comm | state | syscall | wchan -------------------------------------------------------------------------------------------------- 77 | 0.77 | (kswapd0:0) | Sleep (Interruptible) | read | kswapd 75 | 0.75 | (kswapd1:0) | Sleep (Interruptible) | read | kswapd 24 | 0.24 | (kswapd1:0) | Running (ON CPU) | [running] | 0 22 | 0.22 | (kswapd0:0) | Running (ON CPU) | [running] | 0 1 | 0.01 | (kswapd0:0) | Disk (Uninterruptible) | read | io_schedule 1 | 0.01 | (kswapd1:0) | Disk (Uninterruptible) | read | xfs_reclaim_inodes_ag
Scroll right to see the full picture:
$ sudo psn -G kstack
Linux Process Snapper v0.13 by Tanel Poder [https://0x.tools]
Sampling /proc/stat, stack for 5 seconds... finished.
=== Active Threads =====================================================================================================================================================================================================================================================================================================================================================================================================================================
samples | avg_threads | comm | state | kstack
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
55 | 2.75 | (mysqld) | Disk (Uninterruptible) | SyS_fsync()->do_fsync()->vfs_fsync_range()->xfs_file_fsync()->_xfs_log_force_lsn()
41 | 2.05 | (mysqld) | Running (ON CPU) | -
12 | 0.60 | (sysbench) | Running (ON CPU) | -
8 | 0.40 | (ksoftirqd/*) | Running (ON CPU) | ret_from_fork()->kthread()->smpboot_thread_fn()
7 | 0.35 | (mysqld) | Disk (Uninterruptible) | SyS_fsync()->do_fsync()->vfs_fsync_range()->xfs_file_fsync()->xfs_blkdev_issue_flush()->blkdev_issue_flush()->submit_bio_wait()
6 | 0.30 | (mysqld) | Disk (Uninterruptible) | SyS_fsync()->do_fsync()->vfs_fsync_range()->xfs_file_fsync()->file_write_and_wait_range()->__filemap_fdatawait_range()->wait_on_page_bit_common()->io_schedule()
6 | 0.30 | (mysqld) | Running (ON CPU) | do_futex()->futex_wait()->futex_wait_queue_me()
4 | 0.20 | (mysqld) | Running (ON CPU) | retint_user()->prepare_exit_to_usermode()->exit_to_usermode_loop()
3 | 0.15 | (mysqld) | Disk (Uninterruptible) | SyS_pwrite64()->vfs_write()->__vfs_write()->xfs_file_write_iter()->xfs_file_buffered_aio_write()->xfs_file_aio_write_checks()->file_update_time()->xfs_vn_update_time()->xfs_ilock()->call_rwsem_down_write_failed()
3 | 0.15 | (mysqld) | Disk (Uninterruptible) | SyS_pwrite64()->vfs_write()->__vfs_write()->xfs_file_write_iter()->xfs_file_buffered_aio_write()->xfs_ilock()->call_rwsem_down_write_failed()
3 | 0.15 | (mysqld) | Running (ON CPU) | SyS_fsync()->do_fsync()->vfs_fsync_range()->xfs_file_fsync()->_xfs_log_force_lsn()
2 | 0.10 | (collectl) | Running (ON CPU) | -
2 | 0.10 | (mysqld) | Disk (Uninterruptible) | -
1 | 0.05 | (kworker/*:*) | Disk (Uninterruptible) | ret_from_fork()->kthread()->worker_thread()->process_one_work()->pm_runtime_work()->rpm_suspend()->rpm_callback()->__rpm_callback()->usb_runtime_suspend()->usb_suspend_both()->generic_suspend()->usb_port_suspend()->set_port_feature()->usb_control_msg()->usb_start_wait_urb()->usb_submit_urb()->usb_submit_urb.part.6()->usb_hcd_submit_urb()->xhci_hub_control()->msleep()
1 | 0.05 | (kworker/*:*) | Disk (Uninterruptible) | ret_from_fork()->kthread()->worker_thread()->process_one_work()->xfs_end_io()->xfs_iomap_write_unwritten()->xfs_ilock()->call_rwsem_down_write_failed()
1 | 0.05 | (mysqld) | Disk (Uninterruptible) | SyS_io_submit()->do_io_submit()->aio_write()->xfs_file_write_iter()->xfs_file_buffered_aio_write()->iomap_file_buffered_write()->iomap_apply()->xfs_file_iomap_begin()->xfs_ilock()->call_rwsem_down_write_failed()
1 | 0.05 | (mysqld) | Disk (Uninterruptible) | SyS_io_submit()->do_io_submit()->aio_write()->xfs_file_write_iter()->xfs_file_buffered_aio_write()->xfs_file_aio_write_checks()->file_update_time()->xfs_vn_update_time()->xfs_ilock()->call_rwsem_down_write_failed()
1 | 0.05 | (mysqld) | Running (ON CPU) | SyS_fsync()->do_fsync()->vfs_fsync_range()->xfs_file_fsync()->file_write_and_wait_range()->__filemap_fdatawait_range()->wait_on_page_bit_common()->io_schedule()
1 | 0.05 | (mysqld) | Running (ON CPU) | xfs_trim_extent_eof()
1 | 0.05 | (ora_vkrm_linprd) | Running (ON CPU) | -
Report cmdline,state summary of all “active” threads over 5 seconds:
$ psn Process Snapper sampling cmdline, stat for 5 seconds... finished sampling === Active Threads =============================================== samples | avg_threads | cmdline | state ------------------------------------------------------------------ 316 | 9.58 | fio | Disk (Uninterruptible) 212 | 6.42 | fio | Running (ON CPU) 33 | 1.00 | python | Running (ON CPU) 30 | 0.91 | | Running (ON CPU) 3 | 0.09 | | Disk (Uninterruptible) 2 | 0.06 | /usr/bin/perl | Running (ON CPU) 1 | 0.03 | ora_vktm_LINPRD | Running (ON CPU) 1 | 0.03 | top | Running (ON CPU)
Report all thread states (including sleeping) for processes matching regex ^fio. Group by system call names too:
$ psn -p ^fio -g cmdline,state,syscall -a Process Snapper sampling cmdline, syscall, stat for 5 seconds... finished sampling === Active Threads ==================================================== samples | avg_threads | cmdline | state | syscall ----------------------------------------------------------------------- 759 | 7.59 | fio | Disk (Uninterruptible) | pread64 701 | 7.01 | fio | Disk (Uninterruptible) | pwrite64 100 | 1.00 | fio | Sleep (Interruptible) | futex 99 | 0.99 | fio | Sleep (Interruptible) | nanosleep 45 | 0.45 | fio | Disk (Uninterruptible) | [userland] 41 | 0.41 | fio | Running (ON CPU) | [userland] 37 | 0.37 | fio | Running (ON CPU) | pread64 17 | 0.17 | fio | Running (ON CPU) | pwrite64 1 | 0.01 | fio | Running (ON CPU) | nanosleep
Same as above, but include kernel wait event wchan
in the profile too:
$ psn -p ^fio -g cmdline,state,syscall,wchan -a Process Snapper sampling cmdline, syscall, wchan, stat for 5 seconds... finished sampling === Active Threads ========================================================================== samples | avg_threads | cmdline | state | syscall | wchan --------------------------------------------------------------------------------------------- 705 | 7.05 | fio | Disk (Uninterruptible) | pread64 | io_schedule 661 | 6.61 | fio | Disk (Uninterruptible) | pwrite64 | io_schedule 100 | 1.00 | fio | Sleep (Interruptible) | futex | futex_wait_queue_me 100 | 1.00 | fio | Sleep (Interruptible) | nanosleep | hrtimer_nanosleep 42 | 0.42 | fio | Running (ON CPU) | [userland] | 0 31 | 0.31 | fio | Disk (Uninterruptible) | [userland] | 0 30 | 0.30 | fio | Disk (Uninterruptible) | [userland] | io_schedule 25 | 0.25 | fio | Running (ON CPU) | pread64 | 0 25 | 0.25 | fio | Running (ON CPU) | pread64 | io_schedule 24 | 0.24 | fio | Disk (Uninterruptible) | pread64 | 0 20 | 0.20 | fio | Running (ON CPU) | [userland] | io_schedule 14 | 0.14 | fio | Running (ON CPU) | pwrite64 | 0 12 | 0.12 | fio | Running (ON CPU) | pwrite64 | io_schedule 11 | 0.11 | fio | Disk (Uninterruptible) | pwrite64 | 0
MySQL daemon profile (including Linux idle states that may not be idle in the RDBMS):
$ sudo psn -p mysql -g cmdline,state,syscall,wchan -a Process Snapper sampling cmdline, syscall, wchan, stat for 5 seconds... finished sampling === Active Threads ========================================================================================== samples | avg_threads | cmdline | state | syscall | wchan ------------------------------------------------------------------------------------------------------------- 1728 | 17.28 | /usr/sbin/mysqld | Sleep (Interruptible) | futex | futex_wait_queue_me 1000 | 10.00 | /usr/sbin/mysqld | Sleep (Interruptible) | io_getevents | read_events 200 | 2.00 | /usr/sbin/mysqld | Sleep (Interruptible) | nanosleep | hrtimer_nanosleep 120 | 1.20 | /usr/sbin/mysqld | Disk (Uninterruptible) | pread64 | io_schedule 100 | 1.00 | /usr/sbin/mysqld | Sleep (Interruptible) | poll | poll_schedule_timeout 100 | 1.00 | /usr/sbin/mysqld | Sleep (Interruptible) | rt_sigtimedwait | do_sigtimedwait 98 | 0.98 | /usr/sbin/mysqld | Running (ON CPU) | [userland] | 0 12 | 0.12 | /usr/sbin/mysqld | Running (ON CPU) | pread64 | 0 10 | 0.10 | /usr/sbin/mysqld | Running (ON CPU) | [userland] | poll_schedule_timeout 8 | 0.08 | /usr/sbin/mysqld | Running (ON CPU) | pread64 | io_schedule 7 | 0.07 | /usr/sbin/mysqld | Sleep (Interruptible) | [userland] | 0 6 | 0.06 | /usr/sbin/mysqld | Disk (Uninterruptible) | [userland] | io_schedule 4 | 0.04 | /usr/sbin/mysqld | Running (ON CPU) | poll | 0 3 | 0.03 | /usr/sbin/mysqld | Disk (Uninterruptible) | [userland] | 0 1 | 0.01 | /usr/sbin/mysqld | Disk (Uninterruptible) | pread64 | 0 1 | 0.01 | /usr/sbin/mysqld | Running (ON CPU) | [userland] | io_schedule 1 | 0.01 | /usr/sbin/mysqld | Sleep (Interruptible) | [userland] | poll_schedule_timeout 1 | 0.01 | /usr/sbin/mysqld | Sleep (Interruptible) | poll | 0
Sample only processes containing “post”. Use Postgres built-in cmdline instrumentation to see “inside” DB activity:
$ psn -p post -g cmdline Process Snapper sampling cmdline, stat for 5 seconds... finished sampling === Active Threads ============================================================= samples | avg_threads | cmdline -------------------------------------------------------------------------------- 106 | 1.06 | postgres: postgres pgbench [local] COMMIT 40 | 0.40 | postgres: postgres pgbench [local] UPDATE 17 | 0.17 | postgres: postgres pgbench [local] idle in transaction 4 | 0.04 | postgres: postgres pgbench [local] SELECT 4 | 0.04 | postgres: postgres pgbench [local] idle 2 | 0.02 | postgres: postgres pgbench [local] UPDATE waiting 2 | 0.02 | postgres: postgres pgbench [local] idle in tran 1 | 0.01 | postgres: background writer 1 | 0.01 | postgres: postgres pgbench [local] INSERT
Profile Postgres active thread states only (sampling /proc/PID/syscall
of a different user requires root access):
$ sudo psn -p post -g state,syscall Process Snapper sampling syscall, stat for 5 seconds... finished sampling === Active Threads ========================================== samples | avg_threads | state | syscall ------------------------------------------------------------- 93 | 0.93 | Disk (Uninterruptible) | fdatasync 64 | 0.64 | Running (ON CPU) | [userland] 22 | 0.22 | Running (ON CPU) | epoll_wait 7 | 0.07 | Running (ON CPU) | futex 2 | 0.02 | Disk (Uninterruptible) | [userland] 2 | 0.02 | Running (ON CPU) | fdatasync
Full details of Postgres processes (including idle thread states) during pgbench
startup vacuum:
$ sudo psn -p post -g cmdline,state,syscall,wchan -a Process Snapper sampling cmdline, syscall, wchan, stat for 5 seconds... finished sampling === Active Threads ============================================================================================================== samples | avg_threads | cmdline | state | syscall | wchan --------------------------------------------------------------------------------------------------------------------------------- 100 | 1.00 | /usr/pgsql-11/bin/postmaster | Sleep (Interruptible) | select | poll_schedule_timeout 100 | 1.00 | postgres: autovacuum launcher | Sleep (Interruptible) | epoll_wait | ep_poll 100 | 1.00 | postgres: background writer | Sleep (Interruptible) | epoll_wait | ep_poll 100 | 1.00 | postgres: checkpointer | Sleep (Interruptible) | epoll_wait | ep_poll 100 | 1.00 | postgres: logger | Sleep (Interruptible) | epoll_wait | ep_poll 100 | 1.00 | postgres: logical replication launcher | Sleep (Interruptible) | epoll_wait | ep_poll 100 | 1.00 | postgres: stats collector | Sleep (Interruptible) | epoll_wait | ep_poll 99 | 0.99 | postgres: walwriter | Sleep (Interruptible) | epoll_wait | ep_poll 51 | 0.51 | postgres: postgres pgbench [local] VACUUM | Disk (Uninterruptible) | fdatasync | submit_bio_wait 23 | 0.23 | postgres: postgres pgbench [local] VACUUM | Running (ON CPU) | [userland] | 0 15 | 0.15 | postgres: postgres pgbench [local] VACUUM | Disk (Uninterruptible) | fdatasync | io_schedule 3 | 0.03 | postgres: postgres pgbench [local] VACUUM | Running (ON CPU) | fdatasync | 0 2 | 0.02 | postgres: postgres pgbench [local] VACUUM | Disk (Uninterruptible) | fdatasync | 0 2 | 0.02 | postgres: postgres pgbench [local] VACUUM | Running (ON CPU) | fdatasync | io_schedule 1 | 0.01 | postgres: postgres pgbench [local] VACUUM | Disk (Uninterruptible) | [userland] | 0 1 | 0.01 | postgres: postgres pgbench [local] VACUUM | Disk (Uninterruptible) | [userland] | io_schedule 1 | 0.01 | postgres: postgres pgbench [local] VACUUM | Disk (Uninterruptible) | [userland] | submit_bio_wait 1 | 0.01 | postgres: postgres pgbench [local] VACUUM | Running (ON CPU) | fdatasync | submit_bio_wait 1 | 0.01 | postgres: walwriter | Sleep (Interruptible) | futex | futex_wait_queue_me
Full details of Postgres processes (including idle thread states) during pgbench
run:
$ sudo psn -p post -g cmdline,state,syscall,state -a Process Snapper sampling cmdline, syscall, stat for 5 seconds... finished sampling === Active Threads ============================================================================================================================ samples | avg_threads | cmdline | state | syscall | state ----------------------------------------------------------------------------------------------------------------------------------------------- 523 | 5.23 | postgres: postgres pgbench [local] COMMIT | Sleep (Interruptible) | futex | Sleep (Interruptible) 100 | 1.00 | /usr/pgsql-11/bin/postmaster | Sleep (Interruptible) | select | Sleep (Interruptible) 100 | 1.00 | postgres: autovacuum launcher | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 100 | 1.00 | postgres: logger | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 100 | 1.00 | postgres: logical replication launcher | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 100 | 1.00 | postgres: postgres pgbench [local] COMMIT | Disk (Uninterruptible) | fdatasync | Disk (Uninterruptible) 100 | 1.00 | postgres: stats collector | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 99 | 0.99 | postgres: background writer | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 71 | 0.71 | postgres: checkpointer | Sleep (Interruptible) | select | Sleep (Interruptible) 57 | 0.57 | postgres: walwriter | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 40 | 0.40 | postgres: walwriter | Sleep (Interruptible) | futex | Sleep (Interruptible) 30 | 0.30 | postgres: postgres pgbench [local] idle | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 23 | 0.23 | postgres: postgres pgbench [local] idle in transaction | Running (ON CPU) | [userland] | Running (ON CPU) 20 | 0.20 | postgres: postgres pgbench [local] UPDATE | Running (ON CPU) | [userland] | Running (ON CPU) 20 | 0.20 | postgres: postgres pgbench [local] UPDATE waiting | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 18 | 0.18 | postgres: checkpointer | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 12 | 0.12 | postgres: postgres pgbench [local] idle in transaction | Sleep (Interruptible) | [userland] | Sleep (Interruptible) 9 | 0.09 | postgres: checkpointer | Disk (Uninterruptible) | fsync | Disk (Uninterruptible) 9 | 0.09 | postgres: postgres pgbench [local] UPDATE | Running (ON CPU) | epoll_wait | Running (ON CPU) 9 | 0.09 | postgres: postgres pgbench [local] idle in transaction | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 8 | 0.08 | postgres: postgres pgbench [local] COMMIT | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 5 | 0.05 | postgres: postgres pgbench [local] SELECT | Running (ON CPU) | [userland] | Running (ON CPU) 5 | 0.05 | postgres: postgres pgbench [local] idle in transaction | Running (ON CPU) | epoll_wait | Running (ON CPU) 5 | 0.05 | postgres: postgres pgbench [local] idle in transaction | Sleep (Interruptible) | futex | Sleep (Interruptible) 4 | 0.04 | postgres: postgres pgbench [local] COMMIT | Sleep (Interruptible) | [userland] | Sleep (Interruptible) 4 | 0.04 | postgres: postgres pgbench [local] UPDATE | Sleep (Interruptible) | [userland] | Sleep (Interruptible) 3 | 0.03 | postgres: postgres pgbench [local] COMMIT | Running (ON CPU) | futex | Running (ON CPU) 3 | 0.03 | postgres: postgres pgbench [local] SELECT | Running (ON CPU) | epoll_wait | Running (ON CPU) 3 | 0.03 | postgres: postgres pgbench [local] idle | Running (ON CPU) | epoll_wait | Running (ON CPU) 3 | 0.03 | postgres: postgres pgbench [local] idle | Sleep (Interruptible) | [userland] | Sleep (Interruptible) 2 | 0.02 | postgres: checkpointer | Running (ON CPU) | [userland] | Running (ON CPU) 2 | 0.02 | postgres: postgres pgbench [local] INSERT | Sleep (Interruptible) | [userland] | Sleep (Interruptible) 2 | 0.02 | postgres: postgres pgbench [local] INSERT | Sleep (Interruptible) | futex | Sleep (Interruptible) 2 | 0.02 | postgres: walwriter | Sleep (Interruptible) | [userland] | Sleep (Interruptible) 1 | 0.01 | postgres: background writer | Running (ON CPU) | [userland] | Running (ON CPU) 1 | 0.01 | postgres: postgres pgbench [local] COMMIT | Running (ON CPU) | [userland] | Running (ON CPU) 1 | 0.01 | postgres: postgres pgbench [local] COMMIT | Running (ON CPU) | fdatasync | Running (ON CPU) 1 | 0.01 | postgres: postgres pgbench [local] INSERT | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 1 | 0.01 | postgres: postgres pgbench [local] SELECT | Sleep (Interruptible) | epoll_wait | Sleep (Interruptible) 1 | 0.01 | postgres: postgres pgbench [local] UPDATE waiting | Running (ON CPU) | epoll_wait | Running (ON CPU) 1 | 0.01 | postgres: postgres pgbench [local] idle | Running (ON CPU) | [userland] | Running (ON CPU) 1 | 0.01 | postgres: postgres pgbench [local] idle in tran | Running (ON CPU) | [userland] | Running (ON CPU) 1 | 0.01 | postgres: walwriter | Running (ON CPU) | futex | Running (ON CPU) samples: 100 (expected: 100) total processes: 16, threads: 16 runtime: 5.01, measure time: 0.76
That’s it!