KGL simulator, shared pool simulator and buffer cache simulator – what are these?

If you have queried v$sgastat on recent Oracle versions (by which I mean 9i and above) you probably have seen allocations for some sort of simulators in Oracle instance. Here’s an example:

SQL> select * from v$sgastat where lower(name) like '%sim%' order by name;

POOL         NAME                            BYTES
------------ -------------------------- ----------
shared pool  kglsim alloc latch area          1700
shared pool  kglsim alloc latches               68
shared pool  kglsim count of pinned he        9248
shared pool  kglsim free heap list             204
shared pool  kglsim free obj list              204
shared pool  kglsim hash table                4104
shared pool  kglsim hash table bkts        2097152
shared pool  kglsim heap                    635536
shared pool  kglsim latch area                1700
shared pool  kglsim latches                     68
shared pool  kglsim main lru count           87040
shared pool  kglsim main lru size           174080
shared pool  kglsim object batch            909440
shared pool  kglsim pin list arr               816
shared pool  kglsim recovery area             2112
shared pool  kglsim sga                      22092
shared pool  kglsim size of pinned mem       18496
shared pool  ksim client list                   84
shared pool  log_simultaneous_copies           480
shared pool  sim cache nbufs                   640
shared pool  sim cache sizes                   640
shared pool  sim kghx free lists                 4
shared pool  sim lru segments                  640
shared pool  sim segment hits                 1280
shared pool  sim segment num bufs              640
shared pool  sim state object                   48
shared pool  sim trace buf                    5140
shared pool  sim trace buf context             120
shared pool  sim_knlasg                       1200
shared pool  simulator hash buckets          16512
shared pool  simulator latch/bucket st        4608

31 rows selected.

See, a bunch of “kgl sim” and then just “sim” allocations.

… or sometimes you can see latch contention on following latches:

SQL> select name from v$latch where name like '%sim%';

NAME
-------------------------------------------------------
ksim membership request latch
simulator lru latch
simulator hash latch
sim partition latch
shared pool simulator
shared pool sim alloc

6 rows selected.

Again, there seems to be some “simulation” work going on in Oracle instance.

So what are these simulators about?

These simulators help Oracle to determine how much time and effort would have been saved if buffer cache or shared pool size was bigger than current. And this info is summarized and externalized in various advisor views such V$SHARED_POOL_ADVICE, V$MTTR_TARGET_ADVICE, V$DB_CACHE_ADVICE and so on. These simulation engines were introduced in Oracle 9i along the mentioned V$ views.

Also this stuff is used by MMAN for making SGA resizing decisions if SGA_TARGET (10g+) or MEMORY_TARGET (11g+) is enabled.

I decided to write an article about simulators as quite a few people have been asking me about it lately. So, I first googled around to see if someone has already written about it already and found one here. It turned out it was written by me, I just didn’t know about it anymore :)

So, I will write an edited version of my original explanation here. Note that I don’t know the exact algorithm of the simulator, just explaining the fundamentals here.

KGL simulator is the shared pool memory allocation tracking and simulation engine.

KGL=Kernel General Library cache manager, as the name says it deals with library objects such cursors, cached stored object definitions (PL/SQL stored procs, table definitions etc).

KGL simulator is used for estimating the benefit of caching if the cache was larger than currently. The general idea is that when flushing out a library cache object, it’s hash value (and few other bits of info) are still kept in the KGL simulator hash table. This stores a history of objects which were in memory in the past but got flushed out. That’s the key part: we keep a history of objects flushed out (this is done by selective sampling though, not at every object flush as it would have too much overhead).

When loading a library cache object into library cache (which means that no existing such object is in there already), Oracle goes and checks the KGL simulator hash table to see whether an object with matching hash value is in there. If a matching object is found, that means that the required object had been in cache in the past, but flushed out due space pressure. That’s the second important part: we know when we would have been able to (probably) just reuse an object if the shared pool had been bigger.

Using that information of how many library cache object (re)loads could have been been avoided if cache had been bigger (thanks to KGL simulator history) and also thanks to knowing how much time the (current) object reloads took, Oracle can predict how much response time would have been saved instancewide if shared pool was bigger. So, we don’t need to increase shared pool itself to measure the benefit, we simulate a bigger shared pool by just keeping a history of aged out library cache objects hash values in a special array – and that’s why there are various kglsim and simulator allocations seen in V$SGASTAT.

The simulation results are seen from v$shared_pool_advice and are used by other advisors such ADDM.

Apparently Oracle keeps track also about how much memory is pinned in the shared pool (thus can’t be flushed out) over time. This info is probably used also for estimating shared pool resize requirements as pinned memory chunks are not usable for anything else (as long as they are pinned) in the shared pool. These measurements are also done on sampling basis.

I did a little test with my lotshparses.sql script (which just runs a lot of hard parses inside a PL/SQL loop) and ran latchprofx to report which latches and why was my session using. Note the “sim” instead of usual % in the latchprofx syntax. I am interested in only these latches which have “sim” in their name in this experiment:

SQL> @latchprofx sid,name,func,hmode 146 sim 100000

-- LatchProfX 1.12 by Tanel Poder (  )

       SID NAME                                FUNC                                     HMODE              Held       Gets  Held %
---------- ----------------------------------- ---------------------------------------- ------------ ---------- ---------- -------
 146 shared pool simulator               kglsim_unpin_simhp                       exclusive           514         49     .51
 146 shared pool simulator               kglsim_upd_newhp                         exclusive           206        198     .21
 146 shared pool simulator               kglsim_scan_lru: scan                    exclusive            95         93     .10
 146 shared pool simulator               kglsim_chg_simhp_free                    exclusive            32         32     .03
 146 shared pool simulator               kglsim_chg_simhp_noob                    exclusive            32         32     .03
 146 shared pool sim alloc               kglsim_chk_heaplist: alloc               exclusive             4          3     .00
 146 shared pool sim alloc               kglsim_chk_objlist: alloc                exclusive             3          2     .00
 146 shared pool simulator               kglsim_chk_objlist: extend               exclusive             1          1     .00
 146 shared pool simulator               kglsim_scan_lru: rebalance               exclusive             1          1     .00

This simulation engine is pretty cool because its the first time someone has tried to measure the time what could be saved by increasing shared pool size. (Response) Time is the ultimate measure of performance, so this simulator measures the right thing. Do you remember what people used before that? Library cache hit ratio! A lot of people back then (and some still today) have been misleaded by measuring the wrong things like hit ratios.

Anyway, while this is a cool feature – it can cause trouble in big systems. This is because the extra CPU usage for simulation and also the potential concurrency issues like simulator latch contention (like the shared pool simulator latch above, these latches are used even on 11g where most library cache latches have been replaced with KGX mutexes for efficiency and avoiding false contention issues). Also, some people are sensitive about the potential large memory allocations the shared pool simulator can take from shared pool (this is controlled by _kglsim_max_percent parameter which defaults to max 5% of total shared pool size). As these allocations are dynamic, then when your application suddenly starts making lots of hard parses due newly released code, causing memory shortage, then the simulator can make things even worse as it can make relatively big allocations for itself too! And as far as I know the simulator arrays don’t shrink back.

Note that I’m talking about big systems here, in most other cases the simulation shouldn’t cause trouble (on recent db versions anyway).

In some databases I have disabled all of the advisors and simulators with setting statistics_level = basic. All features what you see from V$STATISTICS_LEVEL would be disabled when setting statistics_level to basic:

SQL> select statistics_name,activation_level from v$statistics_level;

STATISTICS_NAME                          ACTIVAT
---------------------------------------- -------
Buffer Cache Advice                      TYPICAL
MTTR Advice                              TYPICAL
Timed Statistics                         TYPICAL
Timed OS Statistics                      ALL
Segment Level Statistics                 TYPICAL
PGA Advice                               TYPICAL
Plan Execution Statistics                ALL
Shared Pool Advice                       TYPICAL
Modification Monitoring                  TYPICAL
Longops Statistics                       TYPICAL
Bind Data Capture                        TYPICAL
Ultrafast Latch Statistics               TYPICAL
Threshold-based Alerts                   TYPICAL
Global Cache Statistics                  TYPICAL
Active Session History                   TYPICAL
Undo Advisor, Alerts and Fast Ramp up    TYPICAL
Streams Pool Advice                      TYPICAL
Time Model Events                        TYPICAL
Plan Execution Sampling                  TYPICAL
Automated Maintenance Tasks              TYPICAL
SQL Monitoring                           TYPICAL
Adaptive Thresholds Enabled              TYPICAL
V$IOSTAT_* statistics                    TYPICAL
.
23 rows selected.

Sometimes you don’t want to disable absolutely everything, but only the troublemakers. The good news is that there’s an (undocumented) parameter for disabling every feature separately. As always, make sure that you know what you’re doing before changing these and get blessing from Oracle support in order to not get yourself in trouble.

So, for example the library cache memory simulation (shared pool advice) can be disabled by setting _library_cache_advice = false .

I covered shared pool and library cache above. Oracle has more cache advisors, the underlying principle in them is the same. They save some small metadata/info about flushed out objects in a simulator array and just check that array when someone is allocating memory for new object. If matching object’s metadata still happens to be in the simulator array, then Oracle knows that we could have avoided the time and effort loading (generating) that object back into memory if the corresponding cache had been bigger. Of course the algorithm must be more sophisticated than just checking for object’s existence, there are probably object’s flush timestamp and size stored in the simulator arrays as well.

While the library cache simulator array stores library cache objects hash values for looking up historical memory allocations metadata, then the buffer cache advisor stores just data block addresses (DBAs) of blocks previously in cache. When Oracle spots that a block was previously in cache while intiating physical IO for re-reading it back again, it will measure the time it took to complete the current physical IO and add this info into the simulator array. Again, that’s how Oracle can measure how much response time would have been saved if the block would still have been in the buffer cache.

The buffer cache advisor needs some special attention, as I have had most issues with that (on large systems again), mostly with buffer cache’s simulator lru latch contention in systems with lots of CPUs and physical IOs going on. The problem here is that even though you can have hundreds of thousands of cache buffers chains latches in a system, the number of simulator lru latches remains relatively low by default, causing contention.

Oracle provides a documented parameter for disabling the buffer cache simulation – you need to set db_cache_advice = off (or ready).

So, hopefully this sheds some light into how the advisors work and what the heck the simulators simulate in Oracle. Even though I provided a list of parameters which allow you to control this stuff – don’t change any of them unless you actually have a problem which they should fix (and you are sure that these parameters would fix your problem).

Update:

Jonathan Lewis has written a good article about buffer cache advisor details on his website – I recommend to read that one too if you’re interested in this stuff:

http://www.jlcomp.demon.co.uk/Cache_advice.html [archive.org]

KGL simulator, shared pool simulator and buffer cache simulator – what are these?

Tanel Poder

2009-09-14