This post is written mostly for fun, not for practical everyday usability (enjoy!).
Introduction
There was a question on Twitter about alternative options for retrieving the SQL execution plan line ID that Oracle is currently executing. Normally Oracle’s Active Session History shows this info in its PLAN_LINE* columns as I have explained in my asqlmon.sql post.
does anyone knows if I can select the equivalent from ASH.SQL_PLAN_LINE_ID for current session? (something like v$session or x$..., if really needed)
— Martin Berger 🇺🇦 (@martinberx) April 7, 2019
But it looks like a way without ASH (or Diagnostics/Tuning pack licenses) is needed. Here’s a totally hipster way (read: advanced) for dumping the current plan line info from process private memory. For some background reading you may want to check out my old blog entries, scripts and videos:
- Understanding Oracle execution plans with os_explain article
- Tracing Oracle SQL plan execution with DTrace article
- qer_trace.sh DTrace script
- SQL Execution hacking session video
This background material says that when a parent plan operator (plan line) calls its child operator, it will pass a pointer to the operator metadata inside the SQL child cursor into the function call. That metadata contains the plan line ID of the child operator and can be read from memory. After all, every plan line is just a qerNNFetch()
function call, where NN is the execution plan operator type. You can see all execution plan operators using my @xplto.sql script.
So here’s an example. My application’s SQL query seems to be stuck or very slow and I want to dump the execution plan line ID it is currently executing. I’ll first list the execution plan for my application SID 849 (output edited for brevity):
SQL> @xsid 849
-------------------------------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TEMP TABLE TRANSFORMATION | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6689_1564080 |
| 3 | UNION-ALL | |
| 4 | HASH GROUP BY | |
|* 5 | HASH JOIN | |
| 6 | TABLE ACCESS FULL | DATE_DIM |
|* 7 | HASH JOIN | |
| 8 | TABLE ACCESS FULL | CUSTOMER |
| 9 | PARTITION RANGE ALL | |
| 10 | TABLE ACCESS FULL | STORE_SALES |
| 11 | HASH GROUP BY | |
|* 12 | HASH JOIN | |
| 13 | TABLE ACCESS FULL | DATE_DIM |
|* 14 | HASH JOIN | |
| 15 | TABLE ACCESS FULL | CUSTOMER |
| 16 | PARTITION RANGE ALL | |
| 17 | TABLE ACCESS FULL | WEB_SALES |
| 18 | SORT ORDER BY | |
|* 19 | HASH JOIN | |
|* 20 | HASH JOIN | |
|* 21 | HASH JOIN | |
|* 22 | VIEW | |
| 23 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6689_1564080 |
|* 24 | VIEW | |
| 25 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6689_1564080 |
|* 26 | VIEW | |
| 27 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6689_1564080 |
|* 28 | VIEW | |
| 29 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6689_1564080 |
-------------------------------------------------------------------------------
Dumping call stacks
The above script uses v$session and v$sql_plan to show which SQL statement and plan my session is executing, but it doesn’t know on which exact plan line the execution currently is. So I connect to that stuck session with ORADEBUG and make it dump its call stack:
SQL> @odsid 849
Oracle pid: 135, Unix process pid: 31432, image: oracle@linux01.localdomain
SQL> ORADEBUG DUMP CALLSTACK 2
Statement processed.
The Oracle’s CALLSTACK dump dumps less data than ERRORSTACK. In fact, errorstack uses callstack for dumping the kernel stack under the hood. I’m using the Oracle’s stack dumping mechanism here as Oracle uses its built-in stack unwind code that tries to extract function call arguments (if available in the stack) too.
Here’s the lengthy output. I pasted it for completeness, but will separately paste sections of interest in my analysis below. The top functions (everything above __sighandler()
are the debug functions that were doing the actual stack unwinding and dumping so we can ignore them. And since we are interested in execution plan operators, you can scroll down straight to the qer*
functions below and then keep reading:
*** 2019-04-08T11:31:21.079018-04:00 Received ORADEBUG command (#27) 'DUMP CALLSTACK 2' from process '31462' ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst1()+110 call kgdsdst() 7FFCDF1C5D10 000000003 7FFCDF1C0050 ? 7FFCDF1C0168 ? 7FFCDF1C5830 ? 000000083 ? ksdxdmp()+1439 call ksedst1() 000000001 000000001 7FFCDF1C0050 ? 7FFCDF1C0168 ? 7FFCDF1C5830 ? 000000083 ? ksdxfdmp()+149 call ksdxdmp() 0141BF760 7FFCDF1C62D8 000000002 000000000 7FFCDF1C5830 ? 000000083 ? ksdxcb()+896 call ksdxfdmp() 7FFCDF1C6B90 000000010 000000003 7FFCDF1C6A48 7FFCDF1C6AE8 000000083 ? sspuser()+217 call ksdxcb() 000000001 000000010 ? 000000003 ? 7FFCDF1C6A48 ? 7FFCDF1C6AE8 ? 000000083 ? __sighandler() call sspuser() 000000001 ? 000000010 ? 000000003 ? 7FFCDF1C6A48 ? 7FFCDF1C6AE8 ? 000000083 ? skgghash3()+192 signal __sighandler() 7F261F019080 ? 00000000A ? <-- received the SIGUSR2 0A7C45E2F ? signal that Oradebug uses 3D0FD3EA42E2AFDC ? 0A173AC3C ? 9A62D9A34660CFE8 ? qesnhLoadRowsetsFor call skgghash3() 7F261F019080 ? 00000000A ? GBY()+1239 0A7C45E2F ? 3D0FD3EA42E2AFDC ? 0A173AC3C ? 9A62D9A34660CFE8 ? qerghRowPRowsetsFas call qesnhLoadRowsetsFor 7F2600000030 7F2625CF5548 tAggsCacheAware()+3 GBY() 0A7C45E2F ? 1622 3D0FD3EA42E2AFDC ? 0A173AC3C ? 9A62D9A34660CFE8 ? qerhnProbeRowsetFin call qerghRowPRowsetsFas 7FFCDF1C9C88 000007FFF ishInnerKu()+375 tAggsCacheAware() 7F2625CF5548 3D0FD3EA42E2AFDC ? 000000040 9A62D9A34660CFE8 ? qerhnProbeRowsetInn call qerhnProbeRowsetFin 7FFCDF1C98F0 000007FFF erEncoding()+1551 ishInnerKu() 7F2625CF5548 ? 3D0FD3EA42E2AFDC ? 000000040 ? 9A62D9A34660CFE8 ? qerhnProbeRowsetFin call qerhnProbeRowsetInn 7F26259947C8 7F2625994FC8 ishInnerKu()+375 erEncoding() 7F2625CF5548 ? 3D0FD3EA42E2AFDC ? 000000040 ? 9A62D9A34660CFE8 ? qerhnProbeRowsetInn call qerhnProbeRowsetFin 7FFCDF1C9700 000007FFF erEncoding()+1551 ishInnerKu() 7F2625CF5548 ? 3D0FD3EA42E2AFDC ? 000000040 ? 9A62D9A34660CFE8 ? kdstf00001010000100 call qerhnProbeRowsetInn 7F2625998800 7F2625999000 0km()+1511 erEncoding() 7F2625CF5548 ? 3D0FD3EA42E2AFDC ? 000000040 ? 9A62D9A34660CFE8 ? kdsttgr()+1683 call kdstf00001010000100 00000002C ? 7F2625999000 ? 0km() 0040701C0 7FFCDF1C9700 000007FFF 7FFCDF1C8970 qertbFetch()+1375 call kdsttgr() 7FFCDF1C8870 ? 00000000A 0040701C0 ? 7FFCDF1C9700 ? 000007FFF ? 7FFCDF1C8970 ? qergiFetch()+569 call qertbFetch() 7FFCDF1C8870 ? 00000000A ? <-- qerNNFetch function called 0040701C0 7FFCDF1C9700 ? 000007FFF 7FFCDF1C8970 ? rwsfcd()+140 call qergiFetch() 3FBA2CE80 12BED7140 0040701C0 7FFCDF1C9700 000007FFF 7FFCDF1C8970 ? qerhnFetch()+1372 call rwsfcd() 3FBA2C898 ? 7F262BED72E0 ? 0040701C0 ? 7FFCDF1C9700 ? 000007FFF ? 7FFCDF1C8970 ? rwsfcd()+140 call qerhnFetch() 3FBA2C7A8 7F2625CF49F0 3FBA2C7A8 7F2625CF49F0 0040701C0 7FFCDF1C8970 ? qerhnFetch()+6529 call rwsfcd() 3FBA2C668 ? 7F2625CF4F40 ? 0040701C0 ? 7F2625CF49F0 ? 0040701C0 ? 7FFCDF1C8970 ? qerghFetch()+664 call qerhnFetch() 304000005 7F2600000001 3FBA2C470 7F2625CF5018 004154370 7FFCDF1C8970 ? rwsfcd()+140 call qerghFetch() 3FBA2C1E8 ? 7F2625CF9B08 ? 0040422A0 7FFCDF1C9E68 000007FFF 7FFCDF1C8970 ? qeruaFetch()+376 call rwsfcd() 3FBA2BE60 ? 7F2625CF9F28 ? 0040422A0 ? 7FFCDF1C9E68 ? 000007FFF ? 7FFCDF1C8970 ? rwsfcd()+140 call qeruaFetch() 3FBA28B18 3FBA28B18 7F2625C78348 0122ACA00 7FFCDF1CA0A0 7FFCDF1C8970 ? qerltFetch()+547 call rwsfcd() 3FBA28668 ? 7F2625C784A0 ? 7F262C036980 ? 0122ACA00 ? 7FFCDF1CA0A0 ? 7FFCDF1C8970 ? insdlexe()+367 call qerltFetch() 3EB5FB498 7F2625C785C0 00361BA70 7FFCDF1CA1A0 000000001 7FFCDF1C8970 ? insExecStmtExecIniE call insdlexe() 3F9EBA018 ? 3E97EB0A8 ? ngine()+76 00361BA70 ? 7FFCDF1CA1A0 ? 000000001 ? 7FFCDF1C8970 ? insexe()+2269 call insExecStmtExecIniE 3F9EBA018 ? 3E97EB0A8 ? ngine() 00361BA70 ? 7FFCDF1CA1A0 ? 000000001 ? 7FFCDF1C8970 ? qes3tExecSQL()+859 call insexe() 3F9EBA018 ? 3E97EB0A8 ? 00361BA70 ? 7FFCDF1CA1A0 ? 000000001 ? 7FFCDF1C8970 ? qerleStart()+770 call qes3tExecSQL() 3E8F50FE8 000000001 000000000 3EA5AD5E0 000000001 ? 7FFCDF1C8970 ? selexe0()+990 call qerleStart() 3EA5AD5E0 7F262516EEE0 000000001 3EA5AD5E0 ? 000000001 ? 7FFCDF1C8970 ? opiexe()+6556 call selexe0() 3F9EBA018 ? 7FFCDF1CBE50 000000001 ? 000000001 7F262BFEF9A0 7FFCDF1C8970 ? kpoal8()+2419 call opiexe() 000000049 7FFCDF1CBE50 ? 7FFCDF1CC5B0 000000001 ? 7F262BFEF9A0 ? 7FFCDF1C8970 ? opiodr()+1244 call kpoal8() 00000005E 000000026 7FFCDF1D0040 000000001 ? 7F262BFEF9A0 ? 7FFCDF1C8970 ? ttcpip()+1239 call opiodr() 00000005E 000000026 7FFCDF1D0040 ? 000000000 7F262BFEF9A0 ? 7FFCDF1C8970 ? opitsk()+1943 call ttcpip() 7F262C010370 ? 0000005E0 ? 7FFCDF1D0040 000000000 ? 7FFCDF1CFA90 7FFCDF1D029C opiino()+957 call opitsk() 000000000 000000000 7FFCDF1D0040 ? 000000000 ? 7FFCDF1CFA90 ? 7FFCDF1D029C ? opiodr()+1244 call opiino() 00000003C 000000004 7FFCDF1D1C08 000000000 ? 7FFCDF1CFA90 ? 7FFCDF1D029C ? opidrv()+1091 call opiodr() 00000003C 000000004 7FFCDF1D1C08 ? 000000000 7FFCDF1CFA90 ? 7FFCDF1D029C ? sou2o()+191 call opidrv() 00000003C 000000004 7FFCDF1D1C08 000000000 ? 7FFCDF1CFA90 ? 7FFCDF1D029C ? opimai_real()+455 call sou2o() 7FFCDF1D1B31 00000003C 000000004 7FFCDF1D1C08 7FFCDF1CFA90 ? 7FFCDF1D029C ? ssthrdmain()+423 call opimai_real() 000000000 7FFCDF1D2308 000000004 ? 7FFCDF1D1C08 ? 7FFCDF1CFA90 ? 7FFCDF1D029C ? main()+275 call ssthrdmain() 000000000 000000002 7FFCDF1D2308 000000001 000000000 7FFCDF1D029C ? __libc_start_main() call main() 000000002 7FFCDF1D2558 +245 7FFCDF1D2308 ? 000000001 ? 000000000 ? 7FFCDF1D029C ? _start()+41 call __libc_start_main() 000D6B3C0 000000002 7FFCDF1D2558 7F2627A2E495 ? 000000000 ? 7FFCDF1D029C ?
Ok, I’ll just paste the first (topmost) qer*
function calls of interest here, so that there’d be less scrolling around:
qergiFetch()+569 call qertbFetch() 7FFCDF1C8870 ? 00000000A ? 0040701C0 7FFCDF1C9700 ? 000007FFF 7FFCDF1C8970 ? rwsfcd()+140 call qergiFetch() 3FBA2CE80 12BED7140 0040701C0 7FFCDF1C9700 000007FFF 7FFCDF1C8970 ? qerhnFetch()+1372 call rwsfcd() 3FBA2C898 ? 7F262BED72E0 ? 0040701C0 ? 7FFCDF1C9700 ? 000007FFF ? 7FFCDF1C8970 ? rwsfcd()+140 call qerhnFetch() 3FBA2C7A8 7F2625CF49F0 3FBA2C7A8 7F2625CF49F0 0040701C0 7FFCDF1C8970 ?
The “tb” in qertbFetch means “table”. “gi” means partition granule iterator. “hn” is a newer hash join function. Since the plan has many table scans and hash joins, how would we know which exact one this call stack was executing? This is where the qerNNFetch function call arguments come in. The first argument (highlighted above) is a pointer to the execution plan operator in the child cursor. A plan operator’s memory structure contains the human-readable execution plan line ID too.
When we look into the qertbFetch() argument 7FFCDF1C8870 ?
, it has a question mark after it. This indicates a “dubious value” so might not be trustworthy.
Whenever Oracle’s stack unwinder isn’t sure that it extracted the correct argument info, it will report the value as dubious. Depending on the CPU architecture and compiler function calling conventions, not all arguments are saved in the stack, but passed via CPU registers only. Thus it’s not possible to read the function call arguments after the call was made.
Indeed when looking into the memory address 7FFCDF1C8870
- this is process private memory, not a pointer to a child cursor in SGA. So, let’s just ignore that dubious argument and see if we have more luck with the next function call qergiFetch() and its argument 3FBA2CE80
.
Examining the child cursor
This address seems to be within the SGA as per @ksminfo.sql or when using my @fcha.sql to Find Chunk Address. Note that @fcha.sql uses X$KSMSP under the hood, so should likely not be used in busy production systems.
SQL> @fcha 3FBA2CE80 Find in which heap (UGA, PGA or Shared Pool) the memory address 3FBA2CE80 resides... WARNING!!! This script will query X$KSMSP, which will cause heavy shared pool latch contention in systems under load and with large shared pool. This may even completely hang your instance until the query has finished! You probably do not want to run this in production! Press ENTER to continue, CTRL+C to cancel... LOC KSMCHPTR KSMCHIDX KSMCHDUR KSMCHCOM KSMCHSIZ KSMCHCLS KSMCHTYP KSMCHPAR --- ---------------- ---------- ---------- ---------------- ---------- -------- ---------- ---------------- SGA 00000003FBA2C628 1 4 SQLA^208866ee 4096 freeabl 0 00000003F4F37AE0
Indeed we found a SQLA - SQL Area type chunk from that address (SQL Area in this context is the child cursor’s execution plan).
Since 11g, Oracle keeps the KGL object name’s hash value in chunk comment, so we can see what cursor the hash value 208866ee corresponds to:
SQL> SELECT sql_text FROM v$sql WHERE hash_value = TO_NUMBER('208866ee', 'xxxxxxxx'); SQL_TEXT ------------------------------------------------------------------------------------------ WITH year_total AS ( SELECT c_customer_id customer_id ,c_first_name customer_first_name ... WITH year_total AS ( SELECT c_customer_id customer_id ,c_first_name customer_first_name ...
Ok, yep, looks like the pointer 3FBA2CE80 points to my SQL statement’s child cursor. Now let’s read what’s behind that pointer:
SQL> @peek 3FBA2CE80 [3FBA2CE80, 3FBA2CEC0) = 00260008 01000009 FBA2C898 00000003 00000000 00000000 FBA2D730 00000003 FBA2CE38 00000003 000001E0 00000A18 1F340009 00000045 ...
The highlighted bytes show the human readable plan line ID that this operator belongs to. Let’s look into the execution plan again, to see what ID 9 is:
------------------------------------------------------------------------------- | Id | Operation | Name | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | TEMP TABLE TRANSFORMATION | | | 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6689_1564080 | | 3 | UNION-ALL | | | 4 | HASH GROUP BY | | |* 5 | HASH JOIN | | | 6 | TABLE ACCESS FULL | DATE_DIM | |* 7 | HASH JOIN | | | 8 | TABLE ACCESS FULL | CUSTOMER | | 9 | PARTITION RANGE ALL | | | 10 | TABLE ACCESS FULL | STORE_SALES | ...
Indeed, ID 9 is the PARTITION RANGE ALL operator, that is implemented by the qergiFetch()
function that we saw in the call stack earlier. I will ignore the rwsfcd()
function above as it is not a stand-alone execution plan line operator - it is used for implementing filter
predicates anywhere in the plan. I’ll look into the qerhnFetch()
call with the argument 3FBA2C7A8:
SQL> @peek 3FBA2C7A8 [3FBA2C7A8, 3FBA2C7E8) = 002E0008 01000007 FBA2C668 00000003 00000000 00000000 FBA2D968 00000003 FBA2C6E0 00000003 000001F0 00000000 04000007 00000086 ...
When you look into the execution plan excerpt above, it sure looks like ID 7 means HASH JOIN (qerhnFetch is a hash join function).
Summary
Hopefully this post and the background reading explain some internal mechanics of Oracle’s SQL Plan execution. I wrote it mainly for some internals hacking fun. If you ever need to use this in practice, you would probably want to run multiple CALLSTACK commands in a row to get a better sample set of where the execution is stuck or looping. With just one stack sample you may get unlucky and end up troubleshooting the wrong thing. Also, I’d stay away from production with ORADEBUG & undocumented dumps if simpler & safer tools (like ASH, SQL Monitoring or GATHER_PLAN_STATISTICS) are available to you.