Oracle Memory Troubleshooting, Part 4: Drilling down into PGA memory usage with V$PROCESS_MEMORY_DETAIL

Tanel Poder

2014-03-26

If you haven’t read them – here are the previous articles in Oracle memory troubleshooting series: Part 1Part 2, Part 3.

Let’s say you have noticed that one of your Oracle processes is consuming a lot of private memory. The V$PROCESS has PGA_USED_MEM / PGA_ALLOC_MEM columns for this. Note that this view will tell you what Oracle thinks it’s using – how much of allocated/freed bytes it has kept track of. While this doesn’t usually tell you the true memory usage of a process, as other non-Oracle-heap allocation routines and the OS libraries may allocate (and leak) memory of their own, it’s a good starting point and usually enough.

Then, the V$PROCESS_MEMORY view would allow you to see a basic breakdown of that process’es memory usage – is it for SQL, PL/SQL, Java, unused (Freeable) or for “Other” reasons. You can use either the smem.sql or pmem.sql scripts for this (report v$process_memory for a SID or OS PID):

SQL> @smem 198
Display session 198 memory usage from v$process_memory....

       SID        PID    SERIAL# CATEGORY         ALLOCATED       USED MAX_ALLOCATED
---------- ---------- ---------- --------------- ---------- ---------- -------------
       198         43         17 Freeable           1572864          0
       198         43         17 Other              5481102                  5481102
       198         43         17 PL/SQL                2024        136          2024
       198         43         17 SQL              117805736  117717824     118834536

From the above output we see that this session has allocated over 100MB of private memory for “SQL” reasons. This normally means SQL workareas, so we can break this down further by querying V$SQL_WORKAREA_ACTIVE that shows us all currently in-use cursor workareas in the instance. I’m using a script wrka.sql for convenience – and listing only my SID-s workareas:

SQL> @wrka sid=198
Show Active workarea memory usage for where sid=198...

   INST_ID        SID  QCINST_ID      QCSID SQL_ID        OPERATION_TYPE                  PLAN_LINE POLICY                   ACTIVE_SEC ACTUAL_MEM_USED MAX_MEM_USED WORK_AREA_SIZE NUMBER_PASSES TEMPSEG_SIZE TABLESPACE
---------- ---------- ---------- ---------- ------------- ------------------------------ ---------- ------------------------ ---------- --------------- ------------ -------------- ------------- ------------ ------------------------------
         1        198                       ff8v9qhv21pm5 SORT (v2)                               1 AUTO                           14.6        64741376    104879104       97623040             0   2253389824 TEMP
         1        198                       ff8v9qhv21pm5 HASH-JOIN                               6 AUTO                           14.8         1370112      1370112        2387968             0
         1        198                       ff8v9qhv21pm5 BUFFER                                 25 AUTO                           14.8        11272192     11272192       11272192             0

The ACTUAL_MEM_USED column above shows the currently used memory by this workarea (that happens to be a SORT (v2) operation in that cursor’s execution plan line #1). It was only about 64MB at the time I got to query this view, but the MAX_MEM_USED shows it was about 100MB at its peak. This can happen due to multipass operations where the merge phase may use less memory than the sort phase or once the sorting completed and the rowsource was ready to start sending sorted rows back, not that much memory would have been needed anymore for just buffering the blocks read from TEMP (the sort_area_size vs sort_area_retained_size thing from past).

For completeness, I also have a script called wrkasum.sql that summarizes the workarea memory usage of all sessions in an instance (so if you’re not interested in a single session, but rather a summary of which operation types tend to consume most memory etc) you can use that:

SQL> @wrkasum
Top allocation reason by PGA memory usage

OPERATION_TYPE      POLICY      ACTUAL_PGA_MB ALLOWED_PGA_MB    TEMP_MB NUM_PASSES     NUM_QC NUM_SESSIONS 
------------------- ----------- ------------- -------------- ---------- ---------- ---------- ------------ 
SORT (v2)           AUTO                   58            100       1525          0          1            1            
BUFFER              AUTO                   11             11                     0          1            1            
HASH-JOIN           AUTO                    1              2                     0          1            1

You may want to modify the script to change the GROUP BY to SQL_ID you want to list the top workarea-memory consuming SQL statement across the whole instance (or any other column of interest – like QC_INST_ID/QCSID).

But what about the following example:

SQL> @pmem 27199
Display process memory usage for SPID 27199...

       SID SPID                            PID    SERIAL# CATEGORY         ALLOCATED       USED MAX_ALLOCATED     CON_ID
---------- ------------------------ ---------- ---------- --------------- ---------- ---------- ------------- ----------
      1516 27199                           120        198 Freeable            786432          0                        0
      1516 27199                           120        198 Other            842807461                842807461          0
      1516 27199                           120        198 PL/SQL              421064      77296        572344          0
      1516 27199                           120        198 SQL                2203848      50168       2348040          0

Most of the memory (over 800MB) is consumed by category “Other”?! Not that helpful, huh? V$SQL_WORKAREA_ACTIVE didn’t show anything either as it deals only with SQL workareas and not all the other possible reasons why an Oracle process might allocate memory.

So we need a way to drill down into the Other category and see which allocation reasons have taken the most of this memory. Historically this was only doable with a PGA/UGA memory heapdump and by aggregating the resulting dumpfile. You have to use oradebug to get the target process to dump its own private memory breakdown as it’s private memory and other processes can not just read it directly. I have written about it in Part 1 of the Oracle memory troubleshooting series.

_Update: an alternative to ORADEBUG is to use ALTER SESSION SET EVENTS ‘immediate trace name pga_detail_get level N’ where N is the Oracle PID of the process. _

However starting from Oracle 10.2 you can get similar detailed breakdown info by querying V$PROCESS_MEMORY_DETAIL, no need for post-processing tracefiles! However when you just query it, the view does not return any rows:

SQL> SELECT * FROM v$process_memory_detail;

no rows selected

Again this is for the abovementioned reasons – your current process can not just read the contents of some other process’es private memory – the OS ensures that. You will have to ask that target process to populate the V$PROCESS_MEMORY_DETAIL with its memory allocation breakdown. You can do this by using the ORADEBUG DUMP PGA_DETAIL_GET command:

SQL> ORADEBUG SETMYPID
Statement processed.
SQL> ORADEBUG DUMP PGA_DETAIL_GET 49
Statement processed.

The number 49 above is the Oracle PID (v$process.pid) of the target process I want to examine. The oradebug PGA_DETAIL_GET command will not immediately make the target process to report its usage – it will merely set a flag somewhere and the target process itself checks it when it is active. In other words, if the target process is idle or sleeping for a long time (due to some lock for example), then it won’t populate the V$ view with required data. In my test environment, the V$PROCESS_MEMORY_DETAIL got populated only after I ran another dummy command in the target session. This shouldn’t be an issue if you are examining a process that’s actively doing something (and not idle/sleeping for a long time).

The output below is from another dummy demo session that wasn’t using much of memory:

SQL> SELECT * FROM v$process_memory_detail ORDER BY pid, bytes DESC;

       PID    SERIAL# CATEGORY        NAME                       HEAP_NAME            BYTES ALLOCATION_COUNT HEAP_DES PARENT_H
---------- ---------- --------------- -------------------------- --------------- ---------- ---------------- -------- --------
        49          5 Other           permanent memory           pga heap            162004               19 11B602C0 00
        49          5 SQL             QERHJ Bit vector           QERHJ hash-joi      131168                8 F691EF4C F68F6F7C
        49          5 Other           kxsFrame4kPage             session heap         57736               14 F68E7134 11B64780
        49          5 SQL             free memory                QERHJ hash-joi       54272                5 F691EF4C F68F6F7C
        49          5 Other           free memory                pga heap             41924                8 11B602C0 00
        49          5 Other           miscellaneous                                   39980              123 00       00
        49          5 Other           Fixed Uga                  Fixed UGA heap       36584                1 F6AA44B0 11B602C0
        49          5 Other           permanent memory           top call heap        32804                2 11B64660 00
        49          5 Other           permanent memory           session heap         32224                2 F68E7134 11B64780
        49          5 Other           free memory                top call heap        31692                1 11B64660 00
        49          5 Other           kgh stack                  pga heap             17012                1 11B602C0 00
        49          5 Other           kxsFrame16kPage            session heap         16412                1 F68E7134 11B64780
        49          5 Other           dbgeInitProcessCtx:InvCtx  diag pga             15096                2 F75A8630 11B602C0
...

The BYTES column shows the sum of memory allocated from private memory heap HEAP_NAME for the reason shown in NAME column. If you want to know the average allocation (chunk) size in the heap, divide BYTES by ALLOCATION_COUNT.

For example, the top PGA memory user in that process is an allocation called “permanent memory”, 162004 bytes taken straight from the top-level “pga-heap”. It probably contains all kinds of low-level runtime allocations that the process needs for its own purposes. It may be possible to drill down into the subheaps inside that allocation with the Oracle memory top-5 subheap dumping I have written about before.

The 2nd biggest memory user is in category SQL – “QERHJ Bit vector” allocation, 131168 bytes allocated in 8 chunks of ~16kB each (on average). QERHJ should mean Query Execution Row-source Hash-Join and the hash join bit vector is a hash join optimization (somewhat like a bloom filter on hash buckets) – Jonathan Lewis has written about this in his CBO book.

I do have a couple of scripts which automate running the ORAEDBUG command, waiting for a second so that the target process would have a chance to publish its data in the V$PROCESS_MEMORY_DETAIL and then query it. Check out smem_detail.sql and pmem_detail.sql.

Now, let’s look into a real example from a problem case – a stress test environment on Oracle 12c:

SQL> @smem 1516
Display session 1516 memory usage from v$process_memory....

       SID        PID    SERIAL# CATEGORY         ALLOCATED       USED MAX_ALLOCATED     CON_ID
---------- ---------- ---------- --------------- ---------- ---------- ------------- ----------
      1516        120        198 Freeable            786432          0                        0
      1516        120        198 Other            844733773                844733773          0
      1516        120        198 PL/SQL              421064      77296        572344          0
      1516        120        198 SQL                 277536      45904       2348040          0

The Other memory usage of a session has grown to over 800MB!

Let’s drill down deeper. The script warns that it’s experimental and asks you to press enter to continue as it’s using ORADEBUG. I haven’t seen any problems with it, but use it at your own risk (and stay away from critical background processes on production systems)!

SQL> @smem_detail 1516

WARNING! About to run an undocumented ORADEBUG command
for getting heap details.
This script is EXPERIMENTAL, use at your own risk!

Press ENTER to continue, or CTRL+C to cancel

PL/SQL procedure successfully completed.

STATUS
----------
COMPLETE

If the status above is not COMPLETE then you need to wait
for the target process to do some work and re-run the
v$process_memory_detail query in this script manually
(or just take a heapdump level 29 to get heap breakdown
in a tracefile)

       SID CATEGORY        NAME                       HEAP_NAME            BYTES ALLOCATION_COUNT
---------- --------------- -------------------------- --------------- ---------- ----------------
      1516 Other           permanent memory           qmxlu subheap    779697376           203700
      1516 Other           free memory                qmxlu subheap     25960784           202133
      1516 Other           XVM Storage                XVM subheap of     5708032               51
      1516 Other           free memory                session heap       2722944              598
      1516 Other           permanent memory           pga heap            681992               36
      1516 Other           qmushtCreate               qmtmInit            590256                9
      1516 Other           free memory                top uga heap        449024              208
      1516 Other           qmtmltAlloc                qmtmInit            389680             1777
      1516 Other           permanent memory           kolarsCreateCt      316960               15
      1516 Other           free memory                pga heap            306416               17
      1516 Other           miscellaneous                                  297120              105
      1516 Other           permanent memory           qmxtgCreateBuf      279536               73
      1516 Other           free memory                koh dur heap d      239312              134
      1516 Other           kxsFrame4kPage             session heap        232512               56
      1516 Other           permanent memory           qmcxdDecodeIni      228672               21
      1516 Other           permanent memory           qmxtigcp:heap       215936              730
      1516 Other           permanent memory           session heap        189472               28
      1516 Other           free memory                lpxHeap subhea      182760               32
      1516 Other           kfioRqTracer               pga heap            131104                1
      1516 Other           free memory                top call heap       129312                4
      1516 PL/SQL          recursive addr reg file    koh-kghu sessi      110592               10
      1516 Other           free memory                callheap            109856                4
      1516 Other           koh-kghu session heap      session heap         88272               36
      1516 Other           Fixed Uga                  pga heap             72144                1
      1516 PL/SQL          PL/SQL STACK               PLS PGA hp           68256                4
...

Well, there you go – the power of measuring & profiling. Most of that big memory usage comes from something called qmxlu subheap. Now, while this name is cryptic and we don’t know what it means – we are already half-way there, we at least know what to focus on now. We can ignore all the other hundreds of cryptic memory allocations in the output and just try to figure out what “qmxlu subheap” is. A quick MOS search might just tell it and if there are known bugs related to this memory leak, you might just find what’s affecting you right away (as Oracle support analysts may have pasted some symptoms, patch info and workarounds into the bug note):

memory_mos_notes

Indeed, there are plenty of results in MOS and when browsing through them to find one matching our symptoms and environment the closest, I looked into this: ORA-4030 With High Allocation Of “qmxdpls_subheap” (Doc ID 1509914.1). It came up in the search as the support analyst had pasted a recursive subheap dump containing our symptom – “qmxlu subheap” there:

Summary of subheaps at depth 2
5277 MB total:
 5277 MB commented, 128 KB permanent
 174 KB free (110 KB in empty extents),
   2803 MB, 1542119496 heaps:   "               "
   1302 MB, 420677 heaps:   "qmxlu subheap  "
    408 MB, 10096248 chunks:  "qmxdplsArrayGetNI1        " 2 KB free held
    385 MB, 10096248 chunks:  "qmxdplsArrayNI0           " 2 KB free held

In this note, the reference bug had been closed as “not a bug” and hinted that it may be an application issue (an application “object” leak) instead of an internal memory leak that causes this memory usage growth.

Cause:

The cause of this problem has been identified in:

unpublished Bug:8918821 – MEMORY LEAK IN DBMS_XMLPARSER IN QMXDPLS_SUBHEAP

closed as “not a bug”. The problem is caused by the fact that the XML document is created with XMLDOM.CREATEELEMENT, but after creation XMLDOM.FREEDOCUMENT is not called. This causes the XML used heaps to remain allocated. Every new call to XMLDOM.CREATEELEMENT will then allocate a new heap, causing process memory to grow over time, and hence cause the ORA-4030 error to occur in the end.

Solution:

To implement a solution for this issue, use XMLDOM.FREEDOCUMENT to explicitly free any explicitly or implictly created XML document, so the memory associated with that document can be released for reuse.

And indeed, in our case it turned out that it was an application issue – the application did not free the XMLDOM documents after use, slowly accumulating more and more open document memory structures, using more memory and also more CPU time (as, judging by the ALLOCATION_COUNT figure in smem_detail output above, the internal array used for managing the open document structures had grown to 203700). Once the application object leak issue was fixed, the performance and memory usage problem went away.

Summary:

****V$PROCESS_MEMORY_DETAIL allows you to conveniently dig deeper into process PGA memory usage. The alternative is to use Oracle heapdumps. A few more useful comments about it are in an old Oracle-L post.

Normally my process memory troubleshooting & drilldown sequence goes like that (usually only steps 1-2 are enough, 3-4 are rarely needed):

  1. v$process / v$process_memory / top / ps
  2. v$sql_workarea_active
  3. v$process_memory_detail or heapdump_analyzer
  4. pmap -x at OS level

#1,2,3 above can show you “session” level memory usage (assuming that you are using dedicated servers with 1-1 relationship between a session and a process) and #4 can show you a different view into the real process memory usage from the OS perspective.

Even though you may see cryptic allocation reason names in the output, if reason X causes 95% of your problem, you’ll need to focus on finding out what X means and don’t need to waste time on anything else. If there’s an Oracle bug involved, a MOS search by top memory consumer names would likely point you to the relevant bug right away.

Oracle troubleshooting is fun!


  1. Updated video course material to be announced soon:
    Advanced Oracle SQL Tuning training. Advanced Oracle Troubleshooting training, Linux Performance & Troubleshooting training.
    Check the current versions out here!
  2. Get randomly timed updates by email or follow Social/RSS