Oracle Troubleshooting with Snapper – detecting who’s causing excessive redo generation

Update: As this post was written many years ago, you should check out newer Snapper articles/videos:

/2013/02/18/snapper-v4-02-and-the-snapper-launch-party-video/

My friend asked today a question that how to identify why his Oracle 9.2 database has suddenly started generating loads more redo than usual.

So obviously I recommended him Snapper as first thing, it’s perfect for ad-hoc analysis like that! ( I know I sound biased but if you haven’t used Snapper yet, then now is the time! :)

So, I asked him to run Snapper on all sessions of the instance with 10 second interval and find the session with highest “redo size” delta figure.

However as a normal production database has loads of sessions in it and as each session can return tens of statistic rows, then you might need to do some post-processing of output data to get only the “redo size” numbers.

So I figured that why not do the filtering in Snapper itself, so I made the small additions and released Snapper v1.07.

Usage is simple:

SQL> @snapper out,gather=s,sinclude=%redo_size% 10 1 "select sid from v$session"

-- Session Snapper v1.07 by Tanel Poder (  )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     129, 20080530 09:07:20,       10, STAT, redo size                               ,     180595100,   18059510,     180.6M,     18.06M
DATA,     166, 20080530 09:07:20,       10, STAT, redo size                               ,        790912,      79091,    790.91k,     79.09k
DATA,     167, 20080530 09:07:20,       10, STAT, redo size                               ,        889796,      88980,     889.8k,     88.98k
--  End of snap 1

PL/SQL procedure successfully completed.

And in the result you already see that there is a single session which generates significantly more redo than others (btw this is an artificially generated example). In my friend’s case there also happened to be a single session , scheduled through DBMS_JOB, contributing to most of the redo generation. Once the troublemaker was identified, the problem could be fixed.

There are few things to note:

The filtering is applied on V$SESSTAT stats only, as it doesn’t make much sense filtering out timed events based just on their name (and not the actual amount of time they contributed to your response time).
As the troubleshooting goal was to identify the top redo generator, I wasn’t interested in Waits and DB Time, so I used gather=s option, which gathers only V$SESSTAT stuff.
There is a new “sinclude” keyword, which specifies the filter condition for V$SESSTAT stats and “linclude” for filtering V$LATCH stats (if latch stat calculation is enabled using gather=l). If you look into Snapper source code, you see that these filter conditions are just used in a LIKE predicate and if on 10g+ then there’s a REGEXP_LIKE used additionally, therefore you can specify more complex filtering expressions.
Note how instead of a single SID I pass “select sid from v$session” to measure all sessions in the instance. You can write any SQL for specifying SIDs to analyze, as long as that SQL returns a list of numbers, fits on one line and is put between doublequotes.

Another example, utilizing simple form of 10g+ regular expressions, where you can specify multiple different patterns for filtering:

SQL> @snapper "out,gather=s,sinclude=redo size|commit|rollback" 10 1 "select sid from v$session"

-- Session Snapper v1.07 by Tanel Poder (  )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     123, 20080530 21:43:07,       10, STAT, user rollbacks                          ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit cleanouts                        ,            14,          1,         14,        1.4
DATA,     123, 20080530 21:43:07,       10, STAT, commit cleanouts successfully completed ,            14,          1,         14,        1.4
DATA,     123, 20080530 21:43:07,       10, STAT, redo size                               ,      33351344,    3335134,     33.35M,      3.34M
DATA,     123, 20080530 21:43:07,       10, STAT, rollback changes - undo records applied ,          9786,        979,      9.79k,      978.6
DATA,     123, 20080530 21:43:07,       10, STAT, transaction rollbacks                   ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit batch/immediate requested        ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit immediate requested              ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit batch/immediate performed        ,             1,          0,          1,         .1
DATA,     123, 20080530 21:43:07,       10, STAT, commit immediate performed              ,             1,          0,          1,         .1
DATA,     161, 20080530 21:43:07,       10, STAT, redo size                               ,            68,          7,         68,        6.8
--  End of snap 1

PL/SQL procedure successfully completed.

Yet another example where I display few logical IO stats and cache buffers chains latches stats together:

SQL> @snapper out,gather=sl,sinclude=gets,linclude=cache 5 1 123

-- Session Snapper v1.07 by Tanel Poder (  )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     123, 20080530 21:37:33,        5,  LAT, cache buffers lru chain                 ,             1,          0,          1,         .2
DATA,     123, 20080530 21:37:33,        5,  LAT, cache buffers chains                    ,        488174,      97635,    488.17k,     97.63k
DATA,     123, 20080530 21:37:33,        5,  LAT, logical standby cache                   ,             1,          0,          1,         .2
DATA,     123, 20080530 21:37:33,        5,  LAT, change notification client cache latch  ,             1,          0,          1,         .2
DATA,     123, 20080530 21:37:33,        5,  LAT, row cache objects                       ,            19,          4,         19,        3.8
DATA,     123, 20080530 21:37:33,        5, STAT, consistent gets                         ,        487586,      97517,    487.59k,     97.52k
DATA,     123, 20080530 21:37:33,        5, STAT, consistent gets from cache              ,        487586,      97517,    487.59k,     97.52k
DATA,     123, 20080530 21:37:33,        5, STAT, consistent gets from cache (fastpath)   ,        487588,      97518,    487.59k,     97.52k
DATA,     123, 20080530 21:37:33,        5, STAT, no work - consistent read gets          ,        484876,      96975,    484.88k,     96.98k
--  End of snap 1

PL/SQL procedure successfully completed.

So this filtering can be used for easily identifying stuff like highest redo or LIO generator, the heaviest parser… or when used with “%sql*net%” filter, you could identify who’s using the network bandwidth the most:

SQL> @snapper "out,gather=s,sinclude=%sql*net%" 5 1 123

-- Session Snapper v1.07 by Tanel Poder (  )

---------------------------------------------------------------------------------------------------------------------------------------------
HEAD,     SID, SNAPSHOT START   ,  SECONDS, TYPE, STATISTIC                               ,         DELTA,  DELTA/SEC,     HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA,     123, 20080530 22:07:47,        5, STAT, bytes sent via SQL*Net to client        ,       1509441,     301888,      1.51M,    301.89k
DATA,     123, 20080530 22:07:47,        5, STAT, bytes received via SQL*Net from client  ,           374,         75,        374,       74.8
DATA,     123, 20080530 22:07:47,        5, STAT, SQL*Net roundtrips to/from client       ,            34,          7,         34,        6.8
--  End of snap 1

PL/SQL procedure successfully completed.

I hope this helps – and happy Snapping! :)

Oracle Troubleshooting with Snapper – detecting who’s causing excessive redo generation

Tanel Poder

2008-05-30