When do Oracle Parallel Execution Slaves issue buffered physical reads – Part 2?

In the previous post about in-memory parallel execution I described in which cases the in-mem PX can kick in for your parallel queries.

A few years ago (around Oracle 11.2.0.2 and Exadata X2 release time) I was helping a customer with their migration to Exadata X2. Many of the queries ran way slower on Exadata compared to their old HP Superdome. The Exadata system was configured according to the Oracle’s “best practices”, that included setting the parallel_degree_policy = AUTO.

As there were thousands of reports (and most of them had performance issues) and we couldn’t extract SQL Monitoring reports (due to another issue) I just looked into ASH data for a general overview. A SQL Monitoring report takes the execution plan line time breakdown from ASH anyway.

First I ran a simple ASH query which counted the ASH samples (seconds spent) and grouped the results by the rowsource type (I was using a custom script then, but you could achieve the same with running ashtop.sql for example:

SQL> @ash/ashtop "sql_plan_operation||' '||sql_plan_options" session_type='FOREGROUND' sysdate-1/24 sysdate

I used a similar, but hardcoded, script back then:

SQL> @ash_custom_report

PLAN_LINE                                            COUNT(*)        PCT
-------------------------------------------------- ---------- ----------
TABLE ACCESS STORAGE FULL                              305073       47.6
                                                        99330       15.5
LOAD AS SELECT                                          86802       13.6
HASH JOIN                                               37086        5.8
TABLE ACCESS BY INDEX ROWID                             20341        3.2
REMOTE                                                  13981        2.2
HASH JOIN OUTER                                          8914        1.4
MAT_VIEW ACCESS STORAGE FULL                             7807        1.2
TABLE ACCESS STORAGE FULL FIRST ROWS                     6348          1
INDEX RANGE SCAN                                         4906         .8
HASH JOIN BUFFERED                                       4537         .7
PX RECEIVE                                               4201         .7
INSERT STATEMENT                                         3601         .6
PX SEND HASH                                             3118         .5
PX SEND BROADCAST                                        3079         .5
SORT AGGREGATE                                           3074         .5
BUFFER SORT                                              2266         .4
SELECT STATEMENT                                         2259         .4
TABLE ACCESS STORAGE SAMPLE                              2136         .3
INDEX UNIQUE SCAN                                        2090         .3

The above output shows that indeed over 47% of Database Time in ASH history has been spent in TABLE ACCESS STORAGE FULL operations (regardless of which SQL_ID). The blank 15.5% is activity by non-SQL plan execution stuff, like parsing, PL/SQL, background process activity, logins/connection management etc. But only about 3.2+0.8+0.3 = 4.3% of the total DB Time was spent in index related row sources. So, the execution plans for our reports were doing what they are supposed to be doing on an Exadata box – doing full table scans.

However, why were they so slow? Shouldn’t Exadata be doing full table scans really fast?! The answer is obviously yes, but only when the smart scans actually kick in!

As I wanted a high level overview, I added a few more columns to my script (rowsource_events.sql) – namely the IS_PARALLEL, which reports if the active session was a serial session or a PX slave and the wait event (what that session was actually doing).

As the earlier output had shown that close to 50% of DB time was spent on table scans – I drilled down to TABLE ACCESS STORAGE FULL data retrieval operations only, excluding all the other data processing stuff like joins, loads and aggregations:

SQL> @ash/rowsource_events TABLE% STORAGE%FULL

PLAN_LINE                                IS_PARAL SESSION WAIT_CLASS      EVENT                                      COUNT(*)        PCT
---------------------------------------- -------- ------- --------------- ---------------------------------------- ---------- ----------
TABLE ACCESS STORAGE FULL                PARALLEL WAITING User I/O        cell multiblock physical read                139756         47
TABLE ACCESS STORAGE FULL                PARALLEL ON CPU                                                                64899       21.8
TABLE ACCESS STORAGE FULL                SERIAL   WAITING User I/O        cell multiblock physical read                 24133        8.1
TABLE ACCESS STORAGE FULL                PARALLEL WAITING User I/O        cell single block physical read               16430        5.5
TABLE ACCESS STORAGE FULL                PARALLEL WAITING User I/O        read by other session                         12141        4.1
TABLE ACCESS STORAGE FULL                PARALLEL WAITING Cluster         gc buffer busy acquire                        10771        3.6
TABLE ACCESS STORAGE FULL                PARALLEL WAITING User I/O        cell smart table scan                          7573        2.5
TABLE ACCESS STORAGE FULL                SERIAL   WAITING Cluster         gc cr multi block request                      7158        2.4
TABLE ACCESS STORAGE FULL                SERIAL   ON CPU                                                                 6872        2.3
TABLE ACCESS STORAGE FULL                PARALLEL WAITING Cluster         gc cr multi block request                      2610         .9
TABLE ACCESS STORAGE FULL                PARALLEL WAITING User I/O        cell list of blocks physical read              1763         .6
TABLE ACCESS STORAGE FULL                SERIAL   WAITING User I/O        cell single block physical read                1744         .6
TABLE ACCESS STORAGE FULL                SERIAL   WAITING User I/O        cell list of blocks physical read               667         .2
TABLE ACCESS STORAGE FULL                SERIAL   WAITING User I/O        cell smart table scan                           143          0
TABLE ACCESS STORAGE FULL                SERIAL   WAITING Cluster         gc cr disk read                                 122          0
TABLE ACCESS STORAGE FULL                PARALLEL WAITING Cluster         gc current grant busy                            97          0
TABLE ACCESS STORAGE FULL                PARALLEL WAITING Cluster         gc current block 3-way                           85          0
TABLE ACCESS STORAGE FULL                PARALLEL WAITING Cluster         gc cr grant 2-way                                68          0
TABLE ACCESS STORAGE FULL                SERIAL   WAITING User I/O        direct path read                                 66          0
TABLE ACCESS STORAGE FULL                SERIAL   WAITING Cluster         gc current grant 2-way                           52          0

Indeed, over 60% of all my full table scan operations were waiting for buffered read related wait events! (both buffered IO waits and RAC global cache waits which you wouldn’t get with direct path reads, well unless some read consistency cloning & rollback operations are needed). So, Smart Scans were definitely not kicking in for some of those full table scans (despite the support analyst’s outdated claim about PX slaves “never” using buffered IOs)

Also, when you are not offiloading IOs, low level data processing and filtering etc to the storage cells, you will need to use your DB node CPUs for all that work (while the cell CPUs are almost idle, if nothing gets offloaded). So, when enabling the smart scans, both the DB IO wait times – and the DB CPU usage per unit of work done would also go down.

Anyway – after looking deeper into the issue – I noticed that they had parallel_degree_policy set to AUTO, so the in-memory PX also kicked in as a side-effect. Once we set this to LIMITED, the problem went away and instead of lots of buffered IO wait events, we got much less cell table smart scan wait events – and higher CPU utilization (yes, higher) as now we had a big bottleneck removed and got much more work done every second.

This was on Oracle 11.2.0.2, I didn’t have a chance to look any deeper, but I suspected that the problem came from the fact that this database had many different workloads in it, both smart scanning ones and also some regular, index based accesses – so possibly the “in-mem PX automatic memory distributor” or whatever controls what gets cached and what not, wasn’t accounting for tha all non-PX-scan activity. Apparently in Oracle 11.2.0.3 this should be improved – but haven’t tested this myself – I feel safer with manually controlling the CACHE/KEEP parameters.

When do Oracle Parallel Execution Slaves issue buffered physical reads – Part 2?

Tanel Poder

2013-11-27