Here’s an example of a quick’n’dirty way of profiling stack traces on your command line. This is an example from Solaris (but the script should work on Linux too plus other Unixes with minor modifications).
I created a problem case below, based on a case I once troubleshooted at a client site. Note that they had set optimizer_mode = FIRST_ROWS in their database and the optimized came up with a very inefficient execution plan for the select from DBA_LOCK_INTERNAL view below:
SQL> ALTER SESSION SET optimizer_mode = FIRST_ROWS; Session altered. SQL> SET TIMING ON SQL> SELECT * FROM dba_lock_internal; ... ... the output data stripped ... ... 927 rows selected. Elapsed: 00:23:27.14
It took over 23 minutes to return 927 rows from DBA_LOCK_INTERNAL!
I ran Snapper to see where the time is spent then:
SQL> @snapper4 all 5 1 222 Sampling SID 222 with interval 5 seconds, taking 1 snapshots... -- Session Snapper v4.00 BETA - by Tanel Poder ( ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :) --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 222, SYS , STAT, non-idle wait count , 1, .19, , , , , ~ per execution 222, SYS , TIME, DB CPU , 6028084, 1.17s, 117.2%, [@@@@@@@@@@], , , 222, SYS , TIME, sql execute elapsed time , 6032677, 1.17s, 117.3%, [##########], , , 222, SYS , TIME, DB time , 6032677, 1.17s, 117.3%, [##########], , , ~ unaccounted time 222, SYS , WAIT, library cache: mutex X , 6, 1.17us, .0%, [ ], 1, .19, 6us average wait -- End of Stats snap 1, end=2013-02-14 21:30:45, seconds=5.1 --------------------------------------------------------------------------------------------- Active% | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS --------------------------------------------------------------------------------------------- 100% | c884zcqpv9y5h | 0 | ON CPU | ON CPU -- End of ASH snap 1, end=2013-02-14 21:30:45, seconds=5, samples_taken=50
The query is apparently 100% on CPU with no significant waits. Normally I would expect some other metrics to pop up in snapper output in such high-CPU cases, like thousands of session logical reads
per second, sorts (rows)
showing millions of rows sorted per second or parse count (hard)
number in hundreds or thousands per second. These are the “usual suspects”.
But this time none of these additional metrics were incremented by the session. So it’s time to systematically drill down by other means.
Now had it been an Oracle 11g+ database (with diagnostics&tuning pack licenses) then I could have just queried ASH PLAN_LINE columns or ran a SQL Monitoring report to see in which execution plan line most of the time was spent (scroll to the bottom):
SQL> @xp 222 eXplain with Profile: Running DBMS_SQLTUNE.REPORT_SQL_MONITOR for SID 222... SQL Monitoring Report SQL Text ------------------------------ SELECT * FROM dba_lock_internal Global Information ------------------------------ Status : EXECUTING Instance ID : 1 Session : SYS (222:569) SQL ID : c884zcqpv9y5h SQL Execution ID : 16777216 Execution Started : 02/14/2013 21:30:16 First Refresh Time : 02/14/2013 21:30:22 Last Refresh Time : 02/14/2013 21:30:54 Duration : 38s Module/Action : sqlplus@mac02.local (TNS V1-V3)/- Service : lin112 Program : sqlplus@mac02.local (TNS V1-V3) Global Stats ============================================== | Elapsed | Cpu | Concurrency | Other | | Time(s) | Time(s) | Waits(s) | Waits(s) | ============================================== | 42 | 42 | 0.00 | 0.06 | ============================================== SQL Plan Monitoring Details (Plan Hash Value=1240353084) ============================================================================================================================================================ | Id | Operation | Name | Rows | Cost | Time | Start | Execs | Rows | Mem | Activity | Activity Detail | | | | | (Estim) | | Active(s) | Active | | (Actual) | | (%) | (# samples) | ============================================================================================================================================================ | 0 | SELECT STATEMENT | | | | 33 | +6 | 1 | 421 | | | | | 1 | VIEW | DBA_LOCK_INTERNAL | 4 | | 33 | +6 | 1 | 421 | | | | | 2 | UNION-ALL | | | | 33 | +6 | 1 | 421 | | | | | 3 | NESTED LOOPS | | 1 | | 1 | +6 | 1 | 25 | | | | | 4 | NESTED LOOPS | | 1 | | 1 | +6 | 1 | 25 | | | | | 5 | FIXED TABLE FULL | X$KSUSE | 1 | | 1 | +6 | 1 | 248 | | | | | 6 | VIEW | GV$_LOCK | 1 | | 1 | +6 | 248 | 25 | | | | | 7 | UNION-ALL | | | | 1 | +6 | 248 | 6461 | | | | | 8 | FILTER | | | | 1 | +6 | 248 | 6461 | | | | | 9 | VIEW | GV$_LOCK1 | 2 | | 1 | +6 | 248 | 6461 | | | | | 10 | UNION-ALL | | | | 1 | +6 | 248 | 6461 | | | | | 11 | FIXED TABLE FULL | X$KDNSSF | 1 | | | | 248 | | | | | | 12 | FIXED TABLE FULL | X$KSQEQ | 1 | | 1 | +6 | 248 | 6461 | | | | | 13 | FIXED TABLE FULL | X$KTADM | 1 | | 1 | +0 | 248 | 0 | | 2.50 | Cpu (1) | | 14 | FIXED TABLE FULL | X$KTATRFIL | 1 | | | | 248 | | | | | | 15 | FIXED TABLE FULL | X$KTATRFSL | 1 | | | | 248 | | | | | | 16 | FIXED TABLE FULL | X$KTATL | 1 | | | | 248 | | | | | | 17 | FIXED TABLE FULL | X$KTSTUSC | 1 | | | | 248 | | | | | | 18 | FIXED TABLE FULL | X$KTSTUSS | 1 | | | | 248 | | | | | | 19 | FIXED TABLE FULL | X$KTSTUSG | 1 | | | | 248 | | | | | | 20 | FIXED TABLE FULL | X$KTCXB | 1 | | | | 248 | | | | | | 21 | FIXED TABLE FULL | X$KSQRS | 1 | | 1 | +6 | 25 | 25 | | | | | 22 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 23 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 24 | MERGE JOIN CARTESIAN | | 5 | | | | 1 | | | | | | 25 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 26 | FIXED TABLE FULL | X$KSUPRLAT | 1 | | | | 1 | | | | | | 27 | FIXED TABLE FIXED INDEX | X$KSUPR (ind:1) | 1 | | | | | | | | | | 28 | BUFFER SORT | | 100 | | | | | | | | | | 29 | FIXED TABLE FULL | X$KSLWT | 100 | | | | | | | | | | 30 | FIXED TABLE FIXED INDEX | X$KSUSE (ind:1) | 1 | | | | | | | | | | 31 | FIXED TABLE FIXED INDEX | X$KSLED (ind:2) | 1 | | | | | | | | | | 32 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 33 | NESTED LOOPS | | 1 | | | | 1 | | | | | | 34 | MERGE JOIN CARTESIAN | | 1 | | | | 1 | | | | | | 35 | FIXED TABLE FULL | X$KSUPR | 1 | | | | 1 | | | | | | 36 | BUFFER SORT | | 100 | | | | | | | | | | 37 | FIXED TABLE FULL | X$KSLWT | 100 | | | | | | | | | | 38 | FIXED TABLE FIXED INDEX | X$KSLED (ind:2) | 1 | | | | | | | | | | 39 | FIXED TABLE FIXED INDEX | X$KSUSE (ind:1) | 1 | | | | | | | | | | -> 40 | NESTED LOOPS | | 1 | | 37 | +6 | 1 | 396 | | | | | -> 41 | MERGE JOIN CARTESIAN | | 1 | | 37 | +6 | 1 | 17406 | | | | | 42 | NESTED LOOPS | | 1 | | 35 | +6 | 1 | 15 | | | | | 43 | NESTED LOOPS | | 1 | | 35 | +6 | 1 | 15 | | | | | -> 44 | FIXED TABLE FULL | X$KSUSE | 1 | | 37 | +6 | 1 | 15 | | | | | -> 45 | FIXED TABLE FIXED INDEX | X$KSLWT (ind:1) | 1 | | 37 | +6 | 15 | 15 | | | | | -> 46 | FIXED TABLE FIXED INDEX | X$KSLED (ind:2) | 1 | | 37 | +6 | 15 | 15 | | | | | -> 47 | BUFFER SORT | | 100 | | 37 | +6 | 15 | 17406 | 243K | | | | 48 | FIXED TABLE FULL | X$KGLOB | 100 | | 1 | +6 | 1 | 1208 | | | | | -> 49 | VIEW | DBA_KGLLOCK | 2 | | 41 | +2 | 17407 | 396 | | 2.50 | Cpu (1) | | -> 50 | UNION-ALL | | | | 37 | +6 | 17407 | 15M | | 2.50 | Cpu (1) | | -> 51 | FIXED TABLE FULL | X$KGLLK | 100 | | 42 | +1 | 17407 | 15M | | 37.50 | Cpu (15) | | -> 52 | FIXED TABLE FULL | X$KGLPN | 100 | | 40 | +3 | 17407 | 116K | | 55.00 | Cpu (22) | ============================================================================================================================================================
From the “Activity” columns in the right end of the above output you can see that most of the time 55.00 + 37.50% of this monitored query runtime was spent in execution plan lines 51 and 52 executing the FIXED TABLE FULL
rowsource operations on X$KGLLK and X$KGLPN tables (library cache locks and library cache pins). Well, querying an X$ shouldn’t usually take tens of minutes! So, that’s why you need to also look into the “Execs” column (it’s called “Starts” in extended DBMS_XPLAN output) and we see that Oracle hasn’t accessed these X$ tables once, but already 17407 times – all done within the current execution of the SQL. So this is clearly an execution plan inefficiency in this case – the parent NESTED LOOPS operation (line 40) has just called that execution plan branch (#49-#52) under it that many times – as its first child “branch” (#41, which happens to be a cartesian join) has returned 17406 rows to it. That’s how nested loops work – unfortunately this time we ended up full scanning through the memory structures behind these X$ tables during every nested loop iteration. And that takes a lot of CPU (not to mention latching or mutexes, depending on Oracle version).
So we need to adjust the execution plan (by whatever means, like by gathering stats on the X$ using dbms_stats.gather_fixed_objects_stats
or not allowing optimizer_mode = first_rows
for this query), so that it would not iterate through these X$s so many times in the plan (and that MERGE JOIN CARTESIAN isn’t so good idea either, although there may be exceptions).
But anyway, we could come to this conclusion because in Oracle 11g, ASH provides us row-source level detail in its samples. But my original problem happened on Oracle 10.1 or 9.2 (don’t remember exactly). I didn’t want to just kill the query and run it with statistics_level = all
, I wanted to troubleshoot it right away. So, I used pstack and my os_explain script for that which I have blogged about in past. But this time, as the session wasn’t completely hung, I wanted to take multiple stack samples of the Oracle process and then aggregate the stack samples to create sort of a profile of top functions taking time. And I didn’t want to report just a single top function, I wanted to know who called that function (parent) and who were the grandparents etc.
It’s actually very easy to get a quick semi-hierarchical aggregated stack profile with just an one-liner :) See below – I’m running 100 pstack commands on the Oracle (10.2) process in a row and then aggregating the output based on the indentation of the function name in os_explain output (the italic text is added as comments by me):
solaris01$ for i in {1..100} ; do pstack 6430 | os_explain -a ; done | sort -r | uniq -c 100 ???????? 100 main <<-- 100 samples out of 100 taken (100%) had main() as a grandparent function 100 opimai_real 100 sou2o 100 opidrv 100 opiodr 100 opiino 100 opitsk 100 ttcpip 100 opiodr 100 opifch 100 SELECT FETCH: 100 VIEW: Fetch 100 * NESTED LOOP JOIN: Fetch 100 UNION-ALL: Fetch 100 VIEW: Fetch 100 * FIXED TABLE: Fetch 100 UNION-ALL: Fetch <<-- 100% of execution during sampling was under this UNION-ALL in the plan 86 kqlftp <<-- under the parent UNION-ALL, 86% was spent in kqlftp function (kqlf functions are used for reading X$ tables) 9 kqlftl 3 UNION-ALL: RowProcedure 1 FIXED TABLE: GCol 86 kglic <<-- one level deeper 86% of total time was spent in kglic function, which is used for library cache scanning 8 kglLockIterator 1 kglFreeFreeList 1 VIEW: RowProcedure 1 memcpy 1 ksfflt 1 kqlfnn 3 kqlfgl 85 kglic0 1 kghfrf 1 kghalf 1 expeal 7 ksfglt 23 ksfflt 3 kqlfnn 5 kgligp 1 evaiand 18 yield 3 sskgslgf 4 kslgetl 4 kslfre
When reading such stack profiles, you don’t have to decode what every single function name means all the way to the deepest levels. The point of stack profiling is to point out in which “branch” of Oracle function call tree most of the time was spent – and in our case we can already say that 86% of all the CPU time was spent in the kglic()
function, which was called by an UNION-ALL
row source, which was a few layers down from a NESTED LOOPS
rowsource. So we know – by measuring – where Oracle spends all this CPU time. We can focus on figuring out what the kglic function means instead of having to figure out what all of these functions mean.
So while the stack profile doesn’t give us the info how many times a function was called, it does tell us roughly how much time was spent inside (or under) this function in the call tree and that takes us a big step closer to the root cause of this problem.
The real reason I’m writing this blog is that I want to show that basic stack profiling (or tracefile or heap dump profiling) is not too complex. When troubleshooting an urgent issue, you don’t have time for installing and trying out complex development tools and GUI profilers – processing the tracefile with a simple shell, awk or perl/python script can give you the quick’n’dirty answer. I came up with this ad-hoc profiling when troubleshooting a customer problem – it took just a minute to get the answer I wanted. Simplicity wins! Well at least it won in this case.
Note that on Linux you need to set an extra environment variable to get the sort command to behave (the output is slightly different than above as it’s taken from Oracle 11.2). Note that on Linux the pstack
command is just a shell script wrapper around gdb
, thus is pretty slow. You probably don’t want to examine processes just for fun on Linux production servers, but once some process starts behaving badly (and people are ready to kill it anyway) it is a decent approach for gathering evidence before pulling the trigger.
[root@oel6 tmp]# export LC_ALL=C ; for i in {1..100} ; do pstack 17163 | ./os_explain -a ; done | sort -r | uniq -c
100 main
100 ssthrdmain
100 opimai_real
100 sou2o
100 opidrv
100 opiodr
100 opiino
100 opitsk
100 ttcpip
100 opiodr
100 opifch
100 SELECT FETCH:
100 VIEW: Fetch
100 * NESTED LOOP JOIN: Fetch
100 UNION-ALL: Fetch
100 NESTED LOOP JOIN: Fetch
100 SORT: Fetch
100 NESTED LOOP JOIN: RowProc
100 VIEW: Fetch
100 * FIXED TABLE: Fetch
100 UNION-ALL: Fetch
45 kqlftp
53 kqlftl
1 UNION-ALL: RowProcedure
98 kglic0
1 VIEW: RowProcedure
17 kglic_cbk
10 kglReleaseHandleReference
10 kglReleaseBucketMutex
1 kglGetSessionUOL
29 kglGetHandleReference
4 kglGetBucketMutex
1 expeal
1 kqlpgCallback
5 kgligp
7 kgligl
13 kglReleaseMutex
3 kglGetSessionUOL
16 kglGetMutex
1 evaiand
2 kqlfgl
8 kgxRelease
8 kgxExclusive
3 kglMutexNotHeld
1 kglMutexHeld
1 kglIsMutexHeld
1 expepr
1 ??
1 P2D3
So, basic stack profiling can be done just with a few shell commands chained together. It’s not fully hierarchical stack profiling like ProcMon on Windows or the Sample Process in Activity Monitor of Mac OSX, but again its just one line of code :-)