I promised to write about Oracle 12c new features quite a while ago (when 12c got officially released), but I was actually on (a long) vacation then and so many cool 12c-related white-papers and blog entries started popping up so I took it easy for a while. I plan to be focusing on the less known low-level internal details anyway as you see from this blog entry.
As far as I can remember, Oracle has been able to unnest regular subqueries since 8i and merge views since Oracle 8.0.
First, a little terminology:
Update: I have changed the terminology section below a bit, thanks to Jason Bucata’s correction. A scalar subquery sure can also be used in the WHERE clause (as you can see in the comments). So, I clarified below that this blog post is comparing the “Scalar Subqueries in SELECT projection list” to “Regular non-scalar Subqueries in WHERE clause”. I also wrote a Part 2 to explain the scalar subqueries in WHERE clause.
- A “regular” subquery in Oracle terminology is a query block which doesn’t return rows back to the select projection list, but is only used in the WHERE clause for determining whether to return any rows from the parent query block, based on your specified conditions, like
WHERE v IN (SELECT x FROM q)
orWHERE EXISTS (SELECT x FROM inner WHERE inner.v=outer.v)
etc.
The subquery can return zero to many rows for the parent query block for evaluation.
- A scalar subquery is a subquery which can only return a single value (single row, single column) back to the parent block. It can be used both in the WHERE clause of the parent query or right in the SELECT list of the parent query instead of some column name. (In this post I am discussing only the case of a scalar subquery in SELECT clause). Whatever the scalar subquery returns, will be put in the “fake” column in the query result-set. If it returns more than one row, you will get the
ORA-01427: single-row subquery returns more than one row
error, if it returns no rows for the given lookup, the result will be NULL. An example is below.
I crafted a very simple SQL with a scalar subquery for demo purposes (the tables are just copies of all_users and all_objects views):
SELECT u.username , (SELECT MAX(created) FROM test_objects o WHERE o.owner = u.username) FROM test_users u WHERE username LIKE 'S%'
Now, why would you want to write the query this way is a different story. I think it’s actually pretty rare when you need to use a scalar subquery, you usually can get away with an outer join. I have used scalar subqueries for populating some return values in cases where adding a yet another (outer) join to the query would complicate the query too much for my brain (as there are some limitations how you can arrange outer joins). I have only done this when I know that the query result-set (on which the scalar subquery is executed once for every row returned, unless the subquery caching kicks in!) will only return a handful of rows and the extra effort of running the scalar subquery once for each row is acceptable.
Nevertheless, non-experienced SQL developers (who come from the procedural coding world) write lots of scalar subqueries, even up to the point of having every single column populated by a scalar subquery! And this can be a serious problem as this breaks the query into separate non-mergeable chunks, which means that the CBO isn’t as free to move things around – resulting in suboptimal plans.
So, starting from Oracle 12c (and maybe even 11.2.0.4?), the CBO transformation engine can unnest some types of the scalar subqueries and convert these to outer joins internally.
The following example is executed on Oracle 12.1.0.1.0 with explicitly disabling the scalar subquery unnesting (using the NO_UNNEST hint):
SELECT /*+ GATHER_PLAN_STATISTICS NO_UNNEST(@ssq) */ u.username , (SELECT /*+ QB_NAME(ssq) */ MAX(created) FROM test_objects o WHERE o.owner = u.username) FROM test_users u WHERE username LIKE 'S%' / -------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | A-Rows | A-Time | Buffers | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 |00:00:00.01 | 4 | | 1 | SORT AGGREGATE | | 5 | 5 |00:00:00.01 | 1365 | |* 2 | TABLE ACCESS STORAGE FULL| TEST_OBJECTS | 5 | 12538 |00:00:00.01 | 1365 | |* 3 | TABLE ACCESS STORAGE FULL | TEST_USERS | 1 | 5 |00:00:00.01 | 4 | --------------------------------------------------------------------------------------------
This somewhat funky-looking plan above (where you have multiple children directly under the SELECT STATEMENT
rowsource operator) indicates that there’s some scalar subquerying going on in the SQL statement. From the Starts column we see that the outer TEST_USERS table has been scanned only once, but the TEST_OBJECTS table referenced inside the scalar subquery has been scanned 5 times, once for each row returned from the outer query (A-rows). So the scalar subquerying behaves somewhat like a FILTER loop, where it executes the subquery once for each row passed to it by the parent query.
So, let’s remove the NO_UNNEST subquery hint and hope for the 12c improvement to kick in:
SELECT /*+ GATHER_PLAN_STATISTICS */ u.username , (SELECT MAX(created) FROM test_objects o WHERE o.owner = u.username) FROM test_users u WHERE username LIKE 'S%' /
The resulting plan shows that indeed the transformation got done – the scalar subquery was unnested from being a loop under SELECT STATEMENT and the whole query got converted into an equivalent outer join followed by a GROUP BY:
------------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Starts | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 5 |00:00:00.01 | 276 | | | | | 1 | HASH GROUP BY | | 1 | 5 |00:00:00.01 | 276 | 1079K| 1079K| 874K (0)| |* 2 | HASH JOIN OUTER | | 1 | 12541 |00:00:00.03 | 276 | 1421K| 1421K| 1145K (0)| |* 3 | TABLE ACCESS STORAGE FULL| TEST_USERS | 1 | 5 |00:00:00.01 | 3 | 1025K| 1025K| | |* 4 | TABLE ACCESS STORAGE FULL| TEST_OBJECTS | 1 | 12538 |00:00:00.01 | 273 | 1025K| 1025K| | ------------------------------------------------------------------------------------------------------------------------
Both tables have been scanned through only once and the total number of buffer gets has dropped from 1365 to 276 (but the new group by & hash join steps do likely use some extra CPU plus workarea memory). Nevertheless, in a real-life case, flattening and de-compartmentalizing a big query will open up more join order and data access optimization opportunities for the CBO, resulting in better plans.
Whenever the scalar subquery transformation is done, the following entry shows up in the optimizer trace:
***************************** Cost-Based Subquery Unnesting ***************************** SU: Unnesting query blocks in query block SEL$1 (#1) that are valid to unnest. Subquery Unnesting on query block SEL$1 (#1) SU: Performing unnesting that does not require costing. SU: Considering subquery unnest on query block SEL$1 (#1). SU: Unnesting scalar subquery query block SEL$2 (#2) Registered qb: SEL$683B0107 0x637de5a0 (SUBQ INTO VIEW FOR COMPLEX UNNEST SEL$2)
And possibly later in the tracefile, this:
SU: Unnesting scalar subquery query block SEL$2 (#2)
When searching for the final query form in the CBO trace, we see it’s just an outer join with a group by. The group by clause has U.ROWID in it too, possibly because I didn’t have a primary key on the users table (you’re welcome to rerun a modified test case and report the results)
Final query after transformations:******* UNPARSED QUERY IS ******* SELECT "U"."USERNAME" "USERNAME", MAX("O"."CREATED") "(SELECTMAX(CREATED)FROMTEST_OB" FROM "TANEL"."TEST_OBJECTS" "O", "TANEL"."TEST_USERS" "U" WHERE "U"."USERNAME" LIKE 'S%' AND "O"."OWNER"(+)="U"."USERNAME" AND "O"."OWNER"(+) LIKE 'S%' GROUP BY "O"."OWNER", "U".ROWID, "U"."USERNAME"
Note that my query uses MAX(created) for fetching something from the scalar subquery. When I use a COUNT(*) instead of MAX(created), the scalar subquery unnesting does not kick in, with a following message in the CBO tracefile:
SU: bypassed: Scalar subquery has null-mutating select item.
It looks like the transformation would not be (guaranteed to be) valid in this case, so this query with COUNT(*) didn’t get converted to the outer join + group by combo.
Also, when your scalar subquery can possibly return more than one row for a lookup value (which would return an error during execution time anyway), the transformation does not kick in either, with the following message in CBO tracefile:
SU: bypassed: Scalar subquery may return more than one row.
This transformation is controlled by the _optimizer_unnest_scalar_sq
parameter.
A small test case is here https://github.com/tanelpoder/tpt-oracle/blob/master/12c/scalar_subquery_unnesting.sql.
And that’s it, expect more cool stuff soon! ;-)