Oracle Performance Troubleshooting Without OS Access, Part 1: Identifying CPU Scheduling Latency

As running Oracle databases as (partially) managed services in the cloud has become quite popular, I thought to start a little series about troubleshooting Oracle (performance) stuff when not having OS access. This will help with such cloud services, but also in cases where your team just doesn’t have convenient OS access due to some separation of privileges reasons.

The first example is my schedlat.sql script that uses X$KSO_SCHED_DELAY_HISTORY to list sampled process scheduling latency details with a 15 minute history.

For example, I noticed that the single block read latency, essentially the db file sequential read wait was over 6ms on a local SSD-based storage system where I normally see read latencies under 500us. Note that these “read latencies” measure the Oracle wait event durations, which essentially measure system call durations (plus a tiny bit of overhead), not the actual storage hardware response times.

SQL> @sysm lat
Display SYSTEM metrics from V$METRIC

SECONDS_AGO    SECONDS METRIC_NAME                                        VALUE METRIC_UNIT     METRICGROUP
----------- ---------- --------------------------------------------- ---------- --------------- ------------------------------
         41         60 Average Synchronous Single-Block Read Latency       6.23 Milliseconds    System Metrics Long Duration

Normally I’d just log in to the OS and run some OS level command like iostat -x or blktrace to get lower level block device response times, I/O volumes and queue lengths as there may be other processes in the server causing IO too. But if you don’t have OS access, you’re out of luck there.

CPU oversubscription, where for every CPU you have more than one process trying to be on CPU, causes most Oracle wait events to take longer. Every time a synchronous I/O completes or some sleep at OS kernel level ends, your processes spend time queuing for getting back onto CPU, to complete their database-level waits.

There are OS level tools for measuring such CPU scheduling latency and starting from Oracle 12c there’s an easy way to get an idea of its impact from within the database too. You do need SYS/X$ access though.

SQL> @schedlat
Listing recent non-zero scheduling delays from X$KSO_SCHED_DELAY_HISTORY

HISTORY_BEGIN_TIME  HISTORY_END_TIME
------------------- -------------------
2020-02-09 20:09:53 2020-02-09 20:25:17

Any noticed scheduling delays during the in-memory history window are listed below:

SAMPLE_START_TIME   SAMPLE_END_TIME     SCHED_DELAY_MICRO
------------------- ------------------- -----------------
2020-02-09 20:25:10 2020-02-09 20:25:10              8990
2020-02-09 20:25:07 2020-02-09 20:25:07              3000
2020-02-09 20:24:03 2020-02-09 20:24:03              3001
2020-02-09 20:23:59 2020-02-09 20:23:59             17993
2020-02-09 20:23:58 2020-02-09 20:23:58              2973
2020-02-09 20:23:55 2020-02-09 20:23:55              3000
2020-02-09 20:23:53 2020-02-09 20:23:53             13999
2020-02-09 20:23:52 2020-02-09 20:23:52              3997
2020-02-09 20:23:50 2020-02-09 20:23:51              2999
2020-02-09 20:23:49 2020-02-09 20:23:49             24028

10 rows selected.

This script only shows you samples from last 15 minutes where a non-zero scheduling latency was detected. We see that there were cases of having to wait for up to ~14ms, ~18ms and ~24ms in the CPU runqueue, when wanting to get back onto CPU. If there had always been CPUs available for our processes to run, no rows would have been displayed.

How does it work?

Like with wait events, Oracle doesn’t actually go to the OS to instrument the scheduling latency at the OS kernel level. Instead the PSP0 background process runs a simple check every second. PSP0 asks the OS to put itself to sleep for exactly 1000 microseconds and measures how much time actually passes from the moment of issuing that system call and the end of it (the process would have gotten back onto the CPU by the time of ending the system call). It probably allows for some microseconds of “OS jitter”, but if the process gets back onto CPU fast enough, the X$KSO_SCHED_DELAY_HISTORY shows SCHED_DELAY_MICRO as zero for that second.

Sample strace output of PSP0:

select(0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, {0, 1000}) = 0 (Timeout)
select(0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, {0, 1000}) = 0 (Timeout)
select(0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, {0, 1000}) = 0 (Timeout)
select(0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, {0, 1000}) = 0 (Timeout)

The sampling is repeated up to 4 times, I guess to get a max or average, and this whole thing is repeated every second by PSP0. Some relevant parameters are below:

SQL> @pd sched%delay
Show all parameters and session values from x$ksppi/x$ksppcv...

NAME                                              VALUE    DESCRIPTION
------------------------------------------------- -------- ---------------------------------------------------------------------------------------------------
_sched_delay_sample_interval_ms                   1000     scheduling delay sampling interval in ms
_sched_delay_max_samples                          4        scheduling delay maximum number of samples
_sched_delay_sample_collection_thresh_ms          200      scheduling delay sample collection duration threshold ms
_sched_delay_measurement_sleep_us                 1000     scheduling delay mesurement sleep us
_sched_delay_os_tick_granularity_us               16000    os tick granularity used by scheduling delay calculations

While this doesn’t measure every process’es scheduling latency and certainly doesn’t accurately estimate latency of higher priority processes (like VKTM, LMSn and LGWR, if configured), it’s a convenient way to see if there seems to be some scheduling latency for regular Oracle processes in the system. Typically you would also see high CPU utilization and system load from the various system metric views, like V$OSSTAT. Note that as this loop just checks how fast a process gets back onto CPU after the OS level sleep ends, other factors like swapping due to memory shortage would also inflate these measurements (and inflates wait events too).

This approach isn’t perfect, but it’s better than nothing when trying to assess if wait events take longer due to CPU oversubscription. This is also relevant when latch or mutex contention pops up as a symptom of CPU overload.

Oracle Performance Troubleshooting Without OS Access, Part 1: Identifying CPU Scheduling Latency

2020-02-10

How does it work?