When the Smart Flash Cache was introduced in Exadata, it was caching reads only. So there were only read “optimization” statistics like cell flash cache read hits and physical read requests/bytes optimized in V$SESSTAT and V$SYSSTAT (the former accounted for the read IO requests that got its data from the flash cache and the latter ones accounted the disk IOs avoided both thanks to the flash cache and storage indexes). So if you wanted to measure the benefit of flash cache only, you’d have to use the cell flash cache read hits metric.
This all was fine until you enabled the Write-Back flash cache in a newer version of cellsrv. We still had only the “read hits” statistic in the V$ views! And when investigating it closer, both the read hits and write hits were accumulated in the same read hits statistic! (I can’t reproduce this on our patched 11.2.0.3 with latest cellsrv anymore, but it was definitely the behavior earlier, as I demoed it in various places).
Side-note: This is likely because it’s not so easy to just add more statistics to Oracle code within a single small patch. The statistic counters are referenced by other modules using macros with their direct numeric IDs (and memory offsets to v$sesstat array) and the IDs & addresses would change when more statistics get added. So, you can pretty much add new statistic counters only with new full patchsets, like 11.2.0**.4**. It’s the same with instance parameters by the way, that’s why the “spare” statistics and spare parameters exist, they’re placeholders for temporary use, until the new parameter or statistic gets added permanently with a full patchset update.
So, this is probably the reason why both the flash cache read and write hits got initially accumulated under the cell flash cache read hits statistic, but later on this seemed to get “fixed”, so that the read hits only showed read hits and the flash write hits were not accounted anywhere. You can test this easily by measuring your DBWR’s v$sesstat metrics with snapper for example, if you get way more cell flash cache read hits than physical read total IO requests, then you’re probably accumulating both read and write hits in the same metric.
Let’s look into a few different database versions:
SQL> @i USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED -------------------- ------------ ------------------------- ----- -------- ---------- -------- SYS db12c1 enkdb03.enkitec.com 1497 20671 12.1.0.1.0 20131127 SQL> @sys cell%flash NAME VALUE ---------------------------------------------------------------- -------------------------- cell flash cache read hits 1874361
In the 12.1.0.1 database above, we still have only the read hits metric. But in the Oracle 11.2.0.4 output below, we finally have the flash cache IOs broken down by reads and writes, plus a few special metrics indicating if the block written to already existed in the flash cache (cell overwrites in flash cache) and when the block range written to flash was only partially cached in flash already when the DB issued the write (cell partial writes in flash cache):
SQL> @i USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED -------------------- ------------ ------------------------- ----- -------- ---------- -------- SYS dbm012 enkdb02.enkitec.com 199 607 11.2.0.4.0 20131201 SQL> @sys cell%flash NAME VALUE ---------------------------------------------------------------- -------------------------- cell writes to flash cache 711439 cell overwrites in flash cache 696661 cell partial writes in flash cache 9 cell flash cache read hits 699240
So, this probably means that the upcoming Oracle 12.1.0.2 will have the flash cache write hit metrics in it too. So in the newer versions there’s no need to get creative when estimating the write-back flash cache hits in our performance scripts (the Exadata Snapper currently tries to derive this value from other metrics, relying on the bug where both read and write hits accumulated under the same metric, so I will need to update it based on the DB version we are running on).
So, when I look into one of the DBWR processes in a 11.2.0.4 DB on Exadata, I see the breakdown of flash read vs write hits:
SQL> @i USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED -------------------- ------------ ------------------------- ----- -------- ---------- -------- SYS dbm012 enkdb02.enkitec.com 199 607 11.2.0.4.0 20131201 SQL> @exadata/cellver Show Exadata cell versions from V$CELL_CONFIG.... CELL_PATH CELL_NAME CELLSRV_VERSION FLASH_CACHE_MODE CPU_COUNT -------------------- -------------------- -------------------- -------------------- ---------- 192.168.12.3 enkcel01 11.2.3.2.1 WriteBack 16 192.168.12.4 enkcel02 11.2.3.2.1 WriteBack 16 192.168.12.5 enkcel03 11.2.3.2.1 WriteBack 16 SQL> @ses2 "select sid from v$session where program like '%DBW0%'" flash SID NAME VALUE ---------- ---------------------------------------------------------------- ---------- 296 cell writes to flash cache 50522 296 cell overwrites in flash cache 43998 296 cell flash cache read hits 36 SQL> @ses2 "select sid from v$session where program like '%DBW0%'" optimized SID NAME VALUE ---------- ---------------------------------------------------------------- ---------- 296 physical read requests optimized 36 296 physical read total bytes optimized 491520 296 physical write requests optimized 25565 296 physical write total bytes optimized 279920640
If you are wondering that why is the cell writes to flash cache metric roughly 2x bigger than the physical write requests optimized, it’s because of the ASM double mirroring we use. The physical writes metrics are counted at the database-scope IO layer (KSFD), but the ASM mirroring is done at a lower layer in the Oracle process codepath (KFIO). So when the DBWR issues a 1 MB write, v$sesstat metrics would record a 1 MB IO for it, but the ASM layer at the lower level would actually do 2 or 3x more IO due to double- or triple-mirroring. As the cell writes to flash cache metric is actually sent back from all storage cells involved in the actual (ASM-mirrored) write IOs, then we will see more around 2-3x storage flash write hits, than physical writes issued at the database level (depending on which mirroring level you use). Another way of saying this would be that the “physical writes” metrics are measured at higher level, “above” the ASM mirroring and the “flash hits” metrics are measured at a lower level, “below” the ASM mirroring in the IO stack.