In the previous post about in-memory parallel execution I described in which cases the in-mem PX can kick in for your parallel queries.
A few years ago (around Oracle 11.2.0.2 and Exadata X2 release time) I was helping a customer with their migration to Exadata X2. Many of the queries ran way slower on Exadata compared to their old HP Superdome. The Exadata system was configured according to the Oracle’s “best practices”, that included setting the parallel_degree_policy = AUTO.
As there were thousands of reports (and most of them had performance issues) and we couldn’t extract SQL Monitoring reports (due to another issue) I just looked into ASH data for a general overview. A SQL Monitoring report takes the execution plan line time breakdown from ASH anyway.
First I ran a simple ASH query which counted the ASH samples (seconds spent) and grouped the results by the rowsource type (I was using a custom script then, but you could achieve the same with running ashtop.sql for example:
SQL> @ash/ashtop "sql_plan_operation||' '||sql_plan_options" session_type='FOREGROUND' sysdate-1/24 sysdate
I used a similar, but hardcoded, script back then:
SQL> @ash_custom_report PLAN_LINE COUNT(*) PCT -------------------------------------------------- ---------- ---------- TABLE ACCESS STORAGE FULL 305073 47.6 99330 15.5 LOAD AS SELECT 86802 13.6 HASH JOIN 37086 5.8 TABLE ACCESS BY INDEX ROWID 20341 3.2 REMOTE 13981 2.2 HASH JOIN OUTER 8914 1.4 MAT_VIEW ACCESS STORAGE FULL 7807 1.2 TABLE ACCESS STORAGE FULL FIRST ROWS 6348 1 INDEX RANGE SCAN 4906 .8 HASH JOIN BUFFERED 4537 .7 PX RECEIVE 4201 .7 INSERT STATEMENT 3601 .6 PX SEND HASH 3118 .5 PX SEND BROADCAST 3079 .5 SORT AGGREGATE 3074 .5 BUFFER SORT 2266 .4 SELECT STATEMENT 2259 .4 TABLE ACCESS STORAGE SAMPLE 2136 .3 INDEX UNIQUE SCAN 2090 .3
The above output shows that indeed over 47% of Database Time in ASH history has been spent in TABLE ACCESS STORAGE FULL operations (regardless of which SQL_ID). The blank 15.5% is activity by non-SQL plan execution stuff, like parsing, PL/SQL, background process activity, logins/connection management etc. But only about 3.2+0.8+0.3 = 4.3% of the total DB Time was spent in index related row sources. So, the execution plans for our reports were doing what they are supposed to be doing on an Exadata box – doing full table scans.
However, why were they so slow? Shouldn’t Exadata be doing full table scans really fast?! The answer is obviously yes, but only when the smart scans actually kick in!
As I wanted a high level overview, I added a few more columns to my script (rowsource_events.sql) – namely the IS_PARALLEL, which reports if the active session was a serial session or a PX slave and the wait event (what that session was actually doing).
As the earlier output had shown that close to 50% of DB time was spent on table scans – I drilled down to TABLE ACCESS STORAGE FULL data retrieval operations only, excluding all the other data processing stuff like joins, loads and aggregations:
SQL> @ash/rowsource_events TABLE% STORAGE%FULL PLAN_LINE IS_PARAL SESSION WAIT_CLASS EVENT COUNT(*) PCT ---------------------------------------- -------- ------- --------------- ---------------------------------------- ---------- ---------- TABLE ACCESS STORAGE FULL PARALLEL WAITING User I/O cell multiblock physical read 139756 47 TABLE ACCESS STORAGE FULL PARALLEL ON CPU 64899 21.8 TABLE ACCESS STORAGE FULL SERIAL WAITING User I/O cell multiblock physical read 24133 8.1 TABLE ACCESS STORAGE FULL PARALLEL WAITING User I/O cell single block physical read 16430 5.5 TABLE ACCESS STORAGE FULL PARALLEL WAITING User I/O read by other session 12141 4.1 TABLE ACCESS STORAGE FULL PARALLEL WAITING Cluster gc buffer busy acquire 10771 3.6 TABLE ACCESS STORAGE FULL PARALLEL WAITING User I/O cell smart table scan 7573 2.5 TABLE ACCESS STORAGE FULL SERIAL WAITING Cluster gc cr multi block request 7158 2.4 TABLE ACCESS STORAGE FULL SERIAL ON CPU 6872 2.3 TABLE ACCESS STORAGE FULL PARALLEL WAITING Cluster gc cr multi block request 2610 .9 TABLE ACCESS STORAGE FULL PARALLEL WAITING User I/O cell list of blocks physical read 1763 .6 TABLE ACCESS STORAGE FULL SERIAL WAITING User I/O cell single block physical read 1744 .6 TABLE ACCESS STORAGE FULL SERIAL WAITING User I/O cell list of blocks physical read 667 .2 TABLE ACCESS STORAGE FULL SERIAL WAITING User I/O cell smart table scan 143 0 TABLE ACCESS STORAGE FULL SERIAL WAITING Cluster gc cr disk read 122 0 TABLE ACCESS STORAGE FULL PARALLEL WAITING Cluster gc current grant busy 97 0 TABLE ACCESS STORAGE FULL PARALLEL WAITING Cluster gc current block 3-way 85 0 TABLE ACCESS STORAGE FULL PARALLEL WAITING Cluster gc cr grant 2-way 68 0 TABLE ACCESS STORAGE FULL SERIAL WAITING User I/O direct path read 66 0 TABLE ACCESS STORAGE FULL SERIAL WAITING Cluster gc current grant 2-way 52 0
Indeed, over 60% of all my full table scan operations were waiting for buffered read related wait events! (both buffered IO waits and RAC global cache waits which you wouldn’t get with direct path reads, well unless some read consistency cloning & rollback operations are needed). So, Smart Scans were definitely not kicking in for some of those full table scans (despite the support analyst’s outdated claim about PX slaves “never” using buffered IOs)
Also, when you are not offiloading IOs, low level data processing and filtering etc to the storage cells, you will need to use your DB node CPUs for all that work (while the cell CPUs are almost idle, if nothing gets offloaded). So, when enabling the smart scans, both the DB IO wait times – and the DB CPU usage per unit of work done would also go down.
Anyway – after looking deeper into the issue – I noticed that they had parallel_degree_policy set to AUTO, so the in-memory PX also kicked in as a side-effect. Once we set this to LIMITED, the problem went away and instead of lots of buffered IO wait events, we got much less cell table smart scan wait events – and higher CPU utilization (yes, higher) as now we had a big bottleneck removed and got much more work done every second.
This was on Oracle 11.2.0.2, I didn’t have a chance to look any deeper, but I suspected that the problem came from the fact that this database had many different workloads in it, both smart scanning ones and also some regular, index based accesses – so possibly the “in-mem PX automatic memory distributor” or whatever controls what gets cached and what not, wasn’t accounting for tha all non-PX-scan activity. Apparently in Oracle 11.2.0.3 this should be improved – but haven’t tested this myself – I feel safer with manually controlling the CACHE/KEEP parameters.