As running Oracle databases as (partially) managed services in the cloud has become quite popular, I thought to start a little series about troubleshooting Oracle (performance) stuff when not having OS access. This will help with such cloud services, but also in cases where your team just doesn’t have convenient OS access due to some separation of privileges reasons.
The first example is my schedlat.sql script that uses X$KSO_SCHED_DELAY_HISTORY to list sampled process scheduling latency details with a 15 minute history.
For example, I noticed that the single block read latency, essentially the db file sequential read wait was over 6ms on a local SSD-based storage system where I normally see read latencies under 500us. Note that these “read latencies” measure the Oracle wait event durations, which essentially measure system call durations (plus a tiny bit of overhead), not the actual storage hardware response times.
SQL> @sysm lat
Display SYSTEM metrics from V$METRIC
SECONDS_AGO    SECONDS METRIC_NAME                                        VALUE METRIC_UNIT     METRICGROUP
----------- ---------- --------------------------------------------- ---------- --------------- ------------------------------
         41         60 Average Synchronous Single-Block Read Latency       6.23 Milliseconds    System Metrics Long Duration
Normally I’d just log in to the OS and run some OS level command like iostat -x or blktrace to get lower level block device response times, I/O volumes and queue lengths as there may be other processes in the server causing IO too. But if you don’t have OS access, you’re out of luck there.
CPU oversubscription, where for every CPU you have more than one process trying to be on CPU, causes most Oracle wait events to take longer. Every time a synchronous I/O completes or some sleep at OS kernel level ends, your processes spend time queuing for getting back onto CPU, to complete their database-level waits.
There are OS level tools for measuring such CPU scheduling latency and starting from Oracle 12c there’s an easy way to get an idea of its impact from within the database too. You do need SYS/X$ access though.
SQL> @schedlat Listing recent non-zero scheduling delays from X$KSO_SCHED_DELAY_HISTORY HISTORY_BEGIN_TIME HISTORY_END_TIME ------------------- ------------------- 2020-02-09 20:09:53 2020-02-09 20:25:17 Any noticed scheduling delays during the in-memory history window are listed below: SAMPLE_START_TIME SAMPLE_END_TIME SCHED_DELAY_MICRO ------------------- ------------------- ----------------- 2020-02-09 20:25:10 2020-02-09 20:25:10 8990 2020-02-09 20:25:07 2020-02-09 20:25:07 3000 2020-02-09 20:24:03 2020-02-09 20:24:03 3001 2020-02-09 20:23:59 2020-02-09 20:23:59 17993 2020-02-09 20:23:58 2020-02-09 20:23:58 2973 2020-02-09 20:23:55 2020-02-09 20:23:55 3000 2020-02-09 20:23:53 2020-02-09 20:23:53 13999 2020-02-09 20:23:52 2020-02-09 20:23:52 3997 2020-02-09 20:23:50 2020-02-09 20:23:51 2999 2020-02-09 20:23:49 2020-02-09 20:23:49 24028 10 rows selected.
This script only shows you samples from last 15 minutes where a non-zero scheduling latency was detected. We see that there were cases of having to wait for up to ~14ms, ~18ms and ~24ms in the CPU runqueue, when wanting to get back onto CPU. If there had always been CPUs available for our processes to run, no rows would have been displayed.
How does it work?
Like with wait events, Oracle doesn’t actually go to the OS to instrument the scheduling latency at the OS kernel level. Instead the PSP0 background process runs a simple check every second. PSP0 asks the OS to put itself to sleep for exactly 1000 microseconds and measures how much time actually passes from the moment of issuing that system call and the end of it (the process would have gotten back onto the CPU by the time of ending the system call). It probably allows for some microseconds of “OS jitter”, but if the process gets back onto CPU fast enough, the X$KSO_SCHED_DELAY_HISTORY shows SCHED_DELAY_MICRO as zero for that second.
Sample strace output of PSP0:
select(0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, {0, 1000}) = 0 (Timeout)
select(0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, {0, 1000}) = 0 (Timeout)
select(0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, {0, 1000}) = 0 (Timeout)
select(0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, 0x7ffcd4a8e0e0, {0, 1000}) = 0 (Timeout)
The sampling is repeated up to 4 times, I guess to get a max or average, and this whole thing is repeated every second by PSP0. Some relevant parameters are below:
SQL> @pd sched%delay
Show all parameters and session values from x$ksppi/x$ksppcv...
NAME                                              VALUE    DESCRIPTION
------------------------------------------------- -------- ---------------------------------------------------------------------------------------------------
_sched_delay_sample_interval_ms                   1000     scheduling delay sampling interval in ms
_sched_delay_max_samples                          4        scheduling delay maximum number of samples
_sched_delay_sample_collection_thresh_ms          200      scheduling delay sample collection duration threshold ms
_sched_delay_measurement_sleep_us                 1000     scheduling delay mesurement sleep us
_sched_delay_os_tick_granularity_us               16000    os tick granularity used by scheduling delay calculations
While this doesn’t measure every process’es scheduling latency and certainly doesn’t accurately estimate latency of higher priority processes (like VKTM, LMSn and LGWR, if configured), it’s a convenient way to see if there seems to be some scheduling latency for regular Oracle processes in the system. Typically you would also see high CPU utilization and system load from the various system metric views, like V$OSSTAT. Note that as this loop just checks how fast a process gets back onto CPU after the OS level sleep ends, other factors like swapping due to memory shortage would also inflate these measurements (and inflates wait events too).
This approach isn’t perfect, but it’s better than nothing when trying to assess if wait events take longer due to CPU oversubscription. This is also relevant when latch or mutex contention pops up as a symptom of CPU overload.
