When you troubleshoot an Oracle (performance) problem you usually want to find out what are the problem sessions doing (wrong) and then why are they doing it.
The “what” question is usually answered using some wait interface based method – like ASH or SQL*Trace, which both add plenty of extra details to the plain wait event data.
My normal session troubleshooting data collection sequence consists of only three steps:
- Interpret session-level top wait and top SQL data (ASH, SQL*Trace or custom tools sampling session-level V$views)
- Interpret session-level performance counter data (V$SESSTAT)
- Sample the problem process stack and list the top stack branches where execution spent the most time
Only looking into #1 is enough in 90% of problem cases. But for advanced troubleshooting of complex and exotic problems I regularly have to dive into the #2 – V$SESSTAT performance counters (That’s the reason why I wrote Snapper for example).
Maybe only once out of every 40-50 troubleshooting cases I dive down into #3 – process stack sampling. While process stack sampling doesn’t lie – it helps especially in these cases where the Oracle’s instrumentation itself doesn’t have a clue what a session/process is doing – it’s like a microscope. It gives you very detailed overview of one part of the problem, but not the big picture. That’s why I never start from #3 when troubleshooting (and if you do, you are doing it wrong – with an exception of a completely hung database ;) I always start from #1 and proceed to #2 only when needed.
Snapper lists both #1 and #2 data for a process by the way.
Anyway, the reason why I’m writing this post is that a few days ago I received a question about an issue which nicely illustrates a case where step #1 shows that there’s an issue and step #2 explains why it’s happening.
There was an execution plan, which ran slightly slower on a new server compared to the old one – and it did read 36 blocks per execution as opposed to only 5:
select ... from TABLE_X where ... call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.02 0.02 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 0.00 0.07 36 5 0 1 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 0.02 0.09 36 5 0 1
The “disk” column above shows 36 blocks being read in via physical IO – while only 5 buffers had actually been accessed after that (logical IOs are shown in the “query” column).
Now that’s weird – as logical IOs are the ones which cause physical IOs, so the LIO count should normally be equal or greater than the physical IO count.
What’s more stranger, the execution plan is an index unique scan:
Rows Row Source Operation ------- --------------------------------------------------- 1 TABLE ACCESS BY INDEX ROWID TABLE_X (cr=5 pr=36 pw=0 time=0 us cost=4 size=26 card=1) 1 INDEX UNIQUE SCAN TABLE_X_IDX (cr=4 pr=35 pw=0 time=0 us cost=3 size=0 card=1)(object id 51063)
… but if you look into the “pr=” column, which means “blocks read via physical read”, then you see that 35 out of the 36 blocks were read in by the INDEX UNIQUE SCAN row source (pr=35). These counters are cumulative, so the pr=36 in the parent TABLE ACCESS BY INDEX ROWID row source means that there was only one block read by physical IO at the table access level. So why does the index access read 35 blocks?
Is it index height? Can’t be – as the max index height is 24 levels (including the leaf level). And such an index would be ridiculous anyway :)
Let’s look into the wait profile:
Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2 0.00 0.00 Disk file operations I/O 4 0.00 0.00 db file sequential read 4 0.01 0.04 db file scattered read 1 0.03 0.03 SQL*Net message from client 2 0.15 0.15 ********************************************************************************
Wow, an index unique scan which is supposed to traverse the index tree with precise single block reads, has done 4 single block reads and one multiblock read!
Now this is where the wait interface and ASH-style data would not help us any further. We know what’s happening – a multiblock read happening where “it shouldn’t”. We still don’t know why this is happening. We would have to guess from here. And this is where V$SESSTAT comes in – there’s over 600 performance counters maintained for each session and many of them show valuable information about what kind of operations (and how many) the RDBMS kernel has done. So I asked the client to run Snapper on a test session and here’s the output (note that the below output is from a different test run where the queries were executed many times in a loop):
------------------------------------------------------------------------------------------------------------------------------------- SID, TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH ------------------------------------------------------------------------------------------------------------------------------------- 1612, STAT, opened cursors cumulative , 6617, 1.32k, 1612, STAT, user calls , 13234, 2.65k, 1612, STAT, pinned cursors current , -1, -.2, 1612, STAT, session logical reads , 5293, 1.06k, 1612, STAT, CPU used when call started , 7, 1.4, 1612, STAT, CPU used by this session , 7, 1.4, 1612, STAT, DB time , 306, 61.2, 1612, STAT, user I/O wait time , 223, 44.6, 1612, STAT, non-idle wait time , 224, 44.8, 1612, STAT, non-idle wait count , 6775, 1.36k, 1612, STAT, process last non-idle time , 5, 1, 1612, STAT, physical read total IO requests , 158, 31.6, 1612, STAT, physical read total multi block requests , 31, 6.2, 1612, STAT, physical read total bytes , 8265728, 1.65M, 1612, STAT, cell physical IO interconnect bytes , 8265728, 1.65M, 1612, STAT, consistent gets , 5293, 1.06k, 1612, STAT, consistent gets from cache , 5293, 1.06k, 1612, STAT, consistent gets - examination , 5293, 1.06k, 1612, STAT, physical reads , 1009, 201.8, 1612, STAT, physical reads cache , 1009, 201.8, 1612, STAT, physical read IO requests , 158, 31.6, 1612, STAT, physical read bytes , 8265728, 1.65M, 1612, STAT, free buffer requested , 1008, 201.6, 1612, STAT, physical reads cache prefetch , 851, 170.2, 1612, STAT, physical reads prefetch warmup , 851, 170.2, 1612, STAT, shared hash latch upgrades - no wait , 188, 37.6, 1612, STAT, calls to get snapshot scn: kcmgss , 1319, 263.8, 1612, STAT, file io service time , 25, 5, 1612, STAT, file io wait time , 2219641, 443.93k, 1612, STAT, min active SCN optimization applied on CR , 1, .2, 1612, STAT, index fetch by key , 1292, 258.4, 1612, STAT, session cursor cache hits , 6440, 1.29k, 1612, STAT, buffer is not pinned count , 1277, 255.4, 1612, STAT, parse time elapsed , 6, 1.2, 1612, STAT, parse count (total) , 6380, 1.28k, 1612, STAT, execute count , 6378, 1.28k, 1612, STAT, bytes sent via SQL*Net to client , 2195633, 439.13k, 1612, STAT, bytes received via SQL*Net from client , 2721339, 544.27k, 1612, STAT, SQL*Net roundtrips to/from client , 6374, 1.27k, 1612, TIME, parse time elapsed , 114215, 22.84ms, 2.3%, |@ | 1612, TIME, PL/SQL execution elapsed time , 67004, 13.4ms, 1.3%, |@ | 1612, TIME, DB CPU , 630000, 126ms, 12.6%, |@@ | 1612, TIME, sql execute elapsed time , 2595410, 519.08ms, 51.9%, |@@@@@@ | 1612, TIME, DB time , 3190919, 638.18ms, 63.8%, |@@@@@@@ | 1612, WAIT, db file sequential read , 978601, 195.72ms, 19.6%, |@@ | 1612, WAIT, db file scattered read , 1213550, 242.71ms, 24.3%, |@@@ | 1612, WAIT, SQL*Net message to client , 9212, 1.84ms, .2%, | |
See the highlighted lines above and scroll right if you don’t see the metric values…
- The physical read total multi block requests statistic confirms that indeed there were some multiblock reads done by the monitored session.
- The physical reads cache prefetch metric indicates that there was some data block prefetching happening! This explains why we had multiblock reads done where only single block reads were really needed.
- And the physical reads prefetch warmup metric explains why did we do any prefetching at all – it’s for buffer cache warmup, where after a restart or a significant buffer cache size increase Oracle decides to populate the empty buffer cache with any data as fast as possible. Oracle assumes that a random multiblock read doesn’t take that much more time than a random single block read anyway, so why not just read a bunch of additional blocks from around that single block that the index access wanted to read. The read operation will be slightly slower, but it might help to avoid multiple future single block reads as they are already in cache. This should affect things only when you restart the instance or significantly increase the buffer cache size. So on a steady state production system you shouldn’t see it (and it’s for a good cause anyway), but if you do performance tests on some test database, change something, bounce the instance and immediately run a test – this may affect your test results. Btw, this is controlled by a hidden parameter _db_cache_pre_warm, but I’m mentioning this just for educational purposes here, I have never fixed a problem by changing this parameter so you should touch it only in your sandbox hacking database :)
And now back to the original topic – this post wasn’t really about this specific prewarm feature, but rather a yet another example of how the V$SESSTAT metrics tell you much more than the usual wait interface and TOP-sql based approach. And you do not need to enable any tracing or undocumented events for getting these detailed metrics out of Oracle, just query the V$SESSION views at the right time. As I said in the beginning, I don’t jump to the microscopic view unless the higher level views aren’t helpful enough, so I recommend you to follow the 3-step “Advanced Oracle Troubleshooting” method when trying to diagnose what the heck some session is doing and why ;^)