SchedLat: a Low Tech Script for Measuring Process CPU Scheduling Latency on Linux

As you may know, I like to use low tech scripts when possible, ideally such ones that don’t even require you to be root. I prefer simple and “boring” tools as a starting point simply because of the production reality at my customers in traditional enterprise IT. Systems where RHEL6 (and clones) seem to be the most common distros (with Linux kernel 2.6.32) and it’s not too unusual to see RHEL5 occasionally either.

Of course I love diving deep using “fancy” dynamic tracing tools (perf, ftrace, BPF), but often it’s not just possible to run these in production systems at will (need root access or a newer kernel and there are safety issues if you get unlucky, too).

So, here’s a little Python tool that just reads the /proc/PID/schedstat file of a supplied PID every second and prints out the schduler-reported CPU latency numbers for that process.

%CPU shows the percent of time the task spent running on CPU
%LAT shows the percent of time the task spent trying to get onto CPU (in runqueue)
%SLP shows the delta between CPU & LAT (not on CPU, not in runqueue, thus sleeping/waiting)

An example with some commentary is below:

[tanel@linux01 psnapper]$ ./schedlat 29801
SchedLat by Tanel Poder (https://tanelpoder.com)

PID=29801 COMM=oracle_29801_li

TIMESTAMP              %CPU   %LAT   %SLP
2020-02-26 23:17:35   100.0    0.0    0.0   <<-- no CPU shortage, process 100% on CPU
2020-02-26 23:17:36   100.0    0.0    0.0
2020-02-26 23:17:37   100.0    0.0    0.0
2020-02-26 23:17:38   100.0    0.0   -0.0   <<-- I calculate %SLP as 100-(%CPU+%LAT) and when Linux
2020-02-26 23:17:39    98.0    0.0    2.0        reports more than "100%" of CPU+LAT, then the derived
2020-02-26 23:17:40     0.0    0.0  100.0        "remaining time" SLP% may show a negative value
2020-02-26 23:17:41     0.0    0.0  100.0
2020-02-26 23:17:42     0.0    0.0  100.0   <<-- no CPU shortage, process sleeping
2020-02-26 23:17:43     0.4    0.0   99.6
2020-02-26 23:17:44    33.5    0.2   66.3   <<-- no CPU shortage, process doing synchronous I/Os 
2020-02-26 23:17:45    55.5    0.2   44.2        in a loop (thus taken off CPU frequently by scheduler)
2020-02-26 23:17:46    53.9    0.2   45.9
2020-02-26 23:17:47    54.5    0.2   45.3
2020-02-26 23:17:48    59.1    0.2   40.7
2020-02-26 23:17:49     4.4    0.0   95.6
2020-02-26 23:17:50    58.5    0.1   41.4
2020-02-26 23:17:51    95.7    0.0    4.3
2020-02-26 23:17:52     0.3    0.0   99.7 
2020-02-26 23:17:53     0.1    0.0   99.9
2020-02-26 23:17:54     0.1    0.0   99.9
2020-02-26 23:17:55     0.3    1.1   98.6
2020-02-26 23:17:56     0.1    6.0   93.9
2020-02-26 23:17:57     0.1   15.0   84.9
2020-02-26 23:17:58     0.1   13.8   86.1
2020-02-26 23:17:59     9.6   61.4   29.0   <<-- CPU shortage + process doing synchronous I/Os in a loop
2020-02-26 23:18:00    14.6   83.9    1.5   <<-- and spending more time in CPU runqueue after every I/O
2020-02-26 23:18:01    31.4   59.7    8.9
2020-02-26 23:18:02    13.0   13.9   73.1
2020-02-26 23:18:03     0.3    5.3   94.4

Of course you might want to see the actual latency numbers in microseconds or some sort of a histogram, this tool will not provide such detail. However, if your question is - “how much time does my RDBMS transaction log writer spend trying to get onto CPU, instead of actually being on CPU (and issuing critical I/Os)”, then schedlat can give you a quick answer. Should you see Oracle’s Log Writer process spending 20% of its time in the scheduler runqueue instead of doing work, it’s time to either increase the priority of LGWR and/or find a way to reduce the system CPU utilization, before troubleshooting your transaction latency issues any further.

You can use SchedLat as a standalone tool or clone the whole 0x.Tools repo for other goodies too:

This will work on RHEL5 clones (like CentOS 5) too, but I’ll need to make the Python code compatible with Python 2.4 (RHEL5 default) in a future update. It’s a simple enough script so you can build a similar tool in shell or just use manual sampling and a calculator :-)

Enjoy & any feedback is welcome.