Find which Oracle SQL_ID hits the SQL*Net break/reset to client event

Oracle’s SQL*Net break/reset to client wait event shows up whenever Oracle (SQL, PL/SQL call, etc) returns an ORA- error back to the client instead of the normal expected results (and ORA-0 under the hood):

SQL> @oerr 0

ORA-0000: normal, successful completion

I explained this behavior a bit more in my 2008 post about Oracle’s SQL*Net break/reset to client post.

Despite the wait event name, this break/reset wait doesn’t mean that you have a network problem, but more likely some sort of an application problem (like trying to insert duplicate values into a unique column, etc). Or it might be some database resource issue too, like running out of the PGA hard limit (pga_aggregate_limit) and suddenly some larger queries start erroring out with ORA-4036:

SQL> @oerr 1

ORA-00001: unique constraint (.) violated on table . columns ()

SQL> @oerr 4036

ORA-04036: PGA memory used by the instance or PDB exceeds PGA_AGGREGATE_LIMIT.

If your network latency is bad, then this aggravates the problem further, as the break/reset requires an extra network roundtrip and your application has to still get its work done, so it might end up retrying, etc - even more roundtrips!

Systemwide monitoring tools like AWR show you that “someone in the system is experiencing unhandled exceptions or ORA- errors”, but can’t show you who (which user, session) was hitting this problem. If the exact database session info is not available, then it would be nice to know at least which SQL_ID it was, that would help the application developers to narrow the issue down somewhat. AWR doesn’t have that info either (as the underlying V$ views it uses do not collect this info).

The next step is to look into V$SESSION or Active Session History if you have licenses for that. However, there’s a problem. The ORA- error is not returned (and break/reset wait event set) immediately when a problem happens during a database call. When an error happens internally, first the database call has to be rolled back and the call state object (and various allocations) cleaned out. And then, during the end of call processing, the error is sent out to the client and the corresponding wait event is set.

Guess what, the DB call cleanup operation also resets SQL_ID in V$SESSION to NULL! In other words, by the time we can observe the “SQL*Net break/reset to client” wait event, the failing SQL_ID info is already gone! You can query ASH (using ashtop.sql for example) and try to find any samples of this wait event in your history, you’ll see that the SQL_ID is NULL. Other columns, like USERNAME, MODULE, ACTION would still be in place (they are not automatically reset on some error), but they don’t help to narrow down the issue much.

Luckily, V$SESSION has PREV_SQL_ID! ASH doesn’t have this column, but we can sample V$SESSION using a script or PL/SQL loop, or even manually if needed. Therefore, I’m introducing an updated version of my break.sql script that’s been in the TPT repo for a couple of years (pull the latest version though):

The latest version uses both V$SESSION and V$SESSION_WAIT_HISTORY (not the same thing as V$ACTIVE_SESSION_HISTORY). I pasted a separate link to swh.sql above too, this lists the last 10 wait events for any session. Note that these are the exact last 10 waits, no matter how quickly they completed (unlike ASH that does just periodic sampling of whatever happens to be active at its sampling time).

In the first session, I ran a random failing query and then left the session idle:

SQL> select * from blah;
select * from blah
              *
ERROR at line 1:
ORA-00942: table or view does not exist

SQL> @swh sid=7384

-- Display last 10 COMPLETED waits from v$session_wait_history v0.1 BETA by Tanel Poder ( https://tanelpoder.com )

 INST_ID     SID USERNAME   CURR_PREV_SQL MACHINE    WAITS_AGO EVENT                            WAIT_TIME_US CPU_BEFORE_WT_US
-------- ------- ---------- ------------- --------- ---------- ------------------------------- ------------- ----------------
       1    7384 SYS                      airm2              1 SQL*Net message to client                   3                7
       1         SYS        aydg62t60vx6p airm2              2 SQL*Net break/reset to client           4,864                1
       1         SYS                      airm2              3 SQL*Net break/reset to client               4               30
       1         SYS                      airm2              4 Disk file operations I/O                   51              664
       1         SYS                      airm2              5 SQL*Net message from client         1,378,489                7
       1         SYS                      airm2              6 SQL*Net message to client                   2               58
       1         SYS                      airm2              7 SQL*Net message from client             4,898            1,918
       1         SYS                      airm2              8 SQL*Net message to client                   0               32
       1         SYS                      airm2              9 Disk file operations I/O                   27              109
       1         SYS                      airm2             10 SQL*Net message from client             6,869               30

I actually join this data to V$SESSION to show the corresponding session’s SQL_ID (1st row, NULL) and PREV_SQL_ID (second row, aydg62t60vx6p).

But if you don’t know which session to look at and you want to monitor all sessions for any errors, then use break.sql. It doesn’t require any arguments, but its output might be a bit confusing at first. I might change the layout or even break it into two separate scripts. Note that break.sql doesn’t return anything if it doesn’t find any sessions currently waiting (or having recently waited) for this wait event.

The output from the same setup is here:

SQL> @break

TYPE  INST_ID    SID  WAITS_AGO EVENT                          BREAK_OP  PREV_SQL_ID      WAIT_US     RUN_US USERNAME 
---- -------- ------ ---------- ------------------------------ --------- ------------- ---------- ---------- ---------
CUR         1   7141          0 On CPU / runqueue                        3wdb8ng2y8p8a                       SOE
SWH         1   7141          8 SQL*Net break/reset to client  reset(0)                      7458          1
SWH         1   7141          9 SQL*Net break/reset to client  break(1)                         4        488

CUR         1   7384          0 SQL*Net message from client              aydg62t60vx6p                       SYS
SWH         1   7384          2 SQL*Net break/reset to client  reset(0)                      4864          1
SWH         1   7384          3 SQL*Net break/reset to client  break(1)                         4         30

In the first half of the output you see session 7141 currently being on CPU (“CUR” and WAITS_AGO=0) show that this is the current session state or wait taken from V$SESSION. Right below it, you see some break/reset waits but they happened 8 and 9 “wait events” before we reached the current state (On CPU). So, the PREV_SQL_ID in this case might not be related to an ORA- error at all, some other DB call probably came in after this error (WAITS_AGO 1..7).

In the bottom section that PREV_SQL_ID might be relevant the break happened just 2-3 waits before reaching our current state (session idle).

Ok, let’s run some concurrent workloads with lots of sessions and see if this thing would actually work in a production scenario. I ran two separate instances of SwingBench Order Entry workload, each with 100 sessions, against the same schema. This SwingBench workload is not designed to be executed by multiple independent “runners” against a single schema (the separate SwingBench instances are not aware of each other). And this resulted in occasional “ORA-00001: unique constraint violation” errors:

$ ./charbench -c ../configs/SOE_Server_Side_V2.xml 
Swingbench 
Author  :    Dominic Giles 
Version :    2.7.0.1313  

Results will be written to results.xml 
Hit Return to Terminate Run... 

Time    Users TPM TPS

The following error has occcured. Further occurences will be supressed but the error count will be incremented and recorded in the results file 
java.sql.SQLIntegrityConstraintViolationException: ORA-00001: unique constraint (SOE.ORDER_PK) violated
ORA-06512: at "SOE.ORDERENTRY", line 531
ORA-06512: at "SOE.ORDERENTRY", line 438
ORA-06512: at line 1
 
9:12:07 AM      50      176183  4199

Let’s run break.sql:

SQL> @break

TYPE  INST_ID    SID  WAITS_AGO EVENT                          BREAK_OP  PREV_SQL_ID      WAIT_US     RUN_US USERNAME
---- -------- ------ ---------- ------------------------------ --------- ------------- ---------- ---------- --------
CUR         1    248          0 log file sync                            a6hdpzrqqhc7d                       SOE
SWH         1    248          3 SQL*Net break/reset to client  reset(0)                      1796          1
SWH         1    248          4 SQL*Net break/reset to client  break(1)                         3        235

CUR         1    491          0 SQL*Net message from client              csasr8ct2051v                       SOE
SWH         1    491          7 SQL*Net break/reset to client  reset(0)                       929          1
SWH         1    491          8 SQL*Net break/reset to client  break(1)                         2        264

CUR         1   2908          0 SQL*Net break/reset to client  reset(0)  a6hdpzrqqhc7d                       SOE
SWH         1   2908          1 SQL*Net break/reset to client  break(1)                         5        294

CUR         1   5331          0 SQL*Net message from client              a6hdpzrqqhc7d                       SOE
SWH         1   5331          2 SQL*Net break/reset to client  reset(0)                      1515          1
SWH         1   5331          3 SQL*Net break/reset to client  break(1)                         3        262

CUR         1   6056          0 log file sync                            a6hdpzrqqhc7d                       SOE
SWH         1   6056          5 SQL*Net break/reset to client  reset(0)                      2742          2
SWH         1   6056          6 SQL*Net break/reset to client  break(1)                         2        284

Yay, we got many sessions! If such errors & breaks do not happen frequently in your system, then you might have to run this query multiple times in a row, until you get lucky.

One of these output sections is different from others - we managed to catch a session that was currently handing a SQL*Net break/reset. Therefore, whatever is the PREV_SQL_ID for this CUR session, is very likely the SQL_ID that hit the currently handled error.

SQL> @sqlid a6hdpzrqqhc7d %
Show SQL text, child cursors and execution stats for SQLID a6hdpzrqqhc7d child %

HASH_VALUE PLAN_HASH_VALUE   CH# SQL_TEXT
---------- --------------- ----- ------------------------------------------------------------------------------------------------------------------------------------------------------
3983028461               0     0 insert into orders(ORDER_ID, ORDER_DATE, CUSTOMER_ID, WAREHOUSE_ID, DELIVERY_TYPE, COST_OF_DELIVERY, WAIT_TILL_ALL_AVAILABLE) values (:1 , :2 , :3 ,
                                 :4 , :5 , :6 , :7 )



  CH#  PLAN_HASH     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED ROWS_PER_FETCH CPU_SEC_EXEC ELA_SEC_EXEC LIOS_PER_EXEC PIOS_PER_EXEC TOTAL_CPU_SEC TOTAL_ELA_SEC TOTAL_IOWAIT_SEC TOTAL_LIOS TOTAL_PIOS      SORTS USERS_EXECUTING LAST_ACTIVE_TIME    PARENT_HANDLE    OBJECT_HANDLE
----- ---------- ---------- ---------- ---------- ---------- -------------- -------------- ------------ ------------ ------------- ------------- ------------- ------------- ---------------- ---------- ---------- ---------- --------------- ------------------- ---------------- ----------------
    0          0      83614          1      83609          0          58874                        .000         .001            30             3        38.136        84.308        55.597688    2510250     258054          0               1 2025-05-14 09:17:24 0000000C1765F888 0000000C1765A558

Ok, it’s an insert statement. It definitely can fail for many reasons… knowing what type of operation (and which username, session type, etc) it is, we can narrow down the list of “usual suspects” to look for. This doesn’t sound very systematic though, however it’s all the info we have:

This session by user SOE hit some error
The SQL_ID that was active when we hit the error was the INSERT statement with SQL_ID a6hdpzrqqhc7d.

Sadly, Oracle 19c does not report the actual error number (inserts can fail for other reasons than ORA-1 too). You would need an ON SERVERERROR trigger, tracing or processtate/errorstack dumping with debug events to catch more context right when the error happens. Oracle 23c brings good improvements for dealing with this problem. I’ll write about this more someday myself, but for now you can read this post by Oracle.

If you have noticed that only a single or a few sessions tend to encounter these errors & resets, you can zoom in with swh.sql or a custom query to list all (last 10) wait events in order for the sessions of interest. Here we get a useful additional clue, we were waiting for “enq: TX - row lock contention” just before hitting an error. The V$SESSION_WAIT_HISTORY keeps track of the wait event arguments (P1,P2,P3) as well, so you could enhance the script to list the arguments and you’d likely see that we tried to get this enqueue in “mode 4”, indicating an “inserted key’s uniqueness uncertainty” reason for this wait:

SQL> @swh sid=974

-- Display last 10 COMPLETED waits from v$session_wait_history v0.1 BETA by Tanel Poder ( https://tanelpoder.com )

 INST_ID   SID USERNAME   CURR_PREV_SQL MACHINE    WAITS_AGO EVENT                           WAIT_TIME_US CPU_BEFORE_WT_US
-------- ----- ---------- ------------- --------- ---------- ------------------------------ ------------- ----------------
       1   974 SOE                      mac23              1 SQL*Net message to client                  1              119
       1       SOE        djj5txv2dzwb6 mac23              2 SQL*Net message from client           14,279                1
       1       SOE                      mac23              3 SQL*Net message to client                  0                5
       1       SOE                      mac23              4 log file sync                             53               59
       1       SOE                      mac23              5 SQL*Net message from client            1,473                9
       1       SOE                      mac23              6 SQL*Net message to client                  0               30
       1       SOE                      mac23              7 SQL*Net break/reset to client          1,591                1
       1       SOE                      mac23              8 SQL*Net break/reset to client              2              154
       1       SOE                      mac23              9 enq: TX - row lock contention          5,908              172
       1       SOE                      mac23             10 SQL*Net message from client            6,249               18

I’ll stop now, I’ve got places to be! :-)