Monday, June 13, 2016

SQL Parsing in serializable transaction throws ORA-08177: can't serialize access for this transaction

Setting serializable isolation level in Oracle 12c (12.1.0.2.0), SQL Parsing can throw:
  ORA-08177: can't serialize access for this transaction

run test (see Blog appended TestCode):
 
SQL (742) > exec ora_08177_test(10, 'ksun_ora_08177_collect_listagg');

Run=1
ERROR at line 1:
ORA-08177: can't serialize access for this transaction
ORA-06512: at "ORA_08177_TEST", line 20
ORA-06512: at line 1

Note: Each time, open a new session, run the test. After first run, subsequent runs in the same session do not hit the error.

According to Oracle documentation: Database Concepts - Data Concurrency and Consistency

The database generates an error when a serializable transaction tries to update or delete data changed by a different transaction that committed after the serializable transaction began:
 ORA-08177: Cannot serialize access for this transaction


Here there exists only one single session, so it does not satisfy the requirement of above documentation.

Reasoning


Refer to the recursive session discussed in Blog: Oracle 12c single session "library cache lock (cycle)" deadlock .

run query:
 
select distinct s.sid, holding_user_session, holding_session
  from v$libcache_locks c, v$session s
 where lnnvl(holding_user_session = holding_session)
   and c.holding_user_session = s.saddr 
   and s.sid = 742;
   
SID HOLDING_USER_SES HOLDING_SESSION 
--- ---------------- ----------------
742 070001010ECF8680 070001010ECFC880    

(742 is the SID of test session)

crosscheck the dump generated by 8177 trace event on AIX (we set this event for the test):

----- Call Stack Trace -----
ktrexc()+3536        call     ksesecl0()           FFFFFFFFFFC9FB0 ? 122F5F490 ?
rpidrv()+3572        call     rpiswu2()            70001010ECFC880 ? 1221BF1E0 ?
kksLoadChild()+3544  call     rpiswu2()            70001010ECF8680 ? 000000090 ?
...
========== FRAME [19] (ktrexc()+3536 -> ksesecl0()) ==========
defined by frame pointers 0xffffffffffca080  and 0xffffffffffc9ee0
CALL TYPE: call   ERROR SIGNALED: yes   COMPONENT: (null)
...
PROCESS STATE
-------------
Process global information:
     process: 0x70001010e8c4078, call: 0x7000100eb0e7bb0, xact: 0x7000101037e9880, 
     curses: 0x70001010ecfc880, usrses: 0x70001010ecf8680
...
KSUSETRN = 0X07000101037E9880
...
(trans) flg = 0x00000004, flg2 = 0x00400000, flg3 = 0x00000040, 
  prx = 0x7000101037e9880, ros = 2147483647, crtses=0x70001010ecf8680

we can establish two equations:

holding_user_session = 070001010ECF8680 = usrses: 0x70001010ecf8680
holding_session      = 070001010ECFC880 = curses: 0x70001010ecfc880

which tellingly indicate the co-existence of two different sessions, usrses(user session) and curses (cursor session), and existence of transaction (0X07000101037E9880) created by user session (crtses=0x70001010ecf8680).

Looking Call Stack Trace, user session calls:
   kksLoadChild()
and then switches to cursor session, which triggers rpi (recursive procedure interface):
 rpidrv()
and finally ERROR SIGNALED at:
  ktrexc()
where ktr means kernel transaction read consistency.

Since user session already started a transaction, any attempt to modify the data by the cursor session resulted in:
 ORA-08177: can't serialize access for this transaction

Run other two tests:

SQL (742) > exec ora_08177_test(10, 'ksun_ora_08177_collect');
SQL (742) > exec ora_08177_test(10, 'ksun_ora_08177_listagg');

there are no errors.

Hence, the error is probably related to certain constellation of collect and listagg.

All tests are performed on AIX, Solaris and Linux with Oracle 12c (12.1.0.2.0).
It is not reproducible in Oracle 11.2.0.4.0.

By the way, the first run of query:

SQL (742)> select collect (y) from ksun_ora_08177;
SYSTPNSRcj9+JXnjgVAAmVdl5Ow==(1, 2, 3, 0, 1, 2, 3, 0, 1)

creates a new TYPE:

TYPE     "SYSTPNSRcj9+JXnjgVAAmVdl5Ow==" AS TABLE OF NUMBER

which is due to "COLLECT" function.

It can be verified by:

select s.*, o.last_ddl_time, o.created, o.*
  from dba_source s, dba_objects o
 where s.name = o. object_name
   and o.object_type = 'TYPE'
   and name = 'SYSTPNSRcj9+JXnjgVAAmVdl5Ow==';

and also documented in Database SQL Language Reference:

COLLECT is an aggregate function that takes as its argument a column of any type
and creates a nested table of the input type out of the rows selected.


Another example of TYPE creation was observed in Blog: Oracle 12c single session "library cache lock (cycle)" deadlock.

Addendum


It seems that fixing such a bug requires a lot of design change in the code, for example, distinguishing user DML and sys DML on underlying objects (e.g. undo$ or user$ tables). For the time being, fixing the bug is not feasible. Therefore be cautious when using such features.

TestCode


--------------------
set numformat 99
column y_coll format a10

alter system set events '8177 trace name errorstack level 10';

drop table ksun_ora_08177;

create table ksun_ora_08177 as select mod(level, 2) x, mod(level, 4) y from dual connect by level <= 9; 

select * from ksun_ora_08177;

create or replace type ksun_tab force is table of number;
/

--------------------
create or replace force view ksun_ora_08177_collect_listagg(x, cnt, y_coll) as
with sq as (
  select x, count(*) cnt, cast (collect (distinct y) as ksun_tab) y
  from ksun_ora_08177
  group by x)
select x, cnt,
       (select listagg (column_value, ',') within group (order by column_value)
          from table (y)) y_coll
from sq 
where x is not null;

--------------------

create or replace force view ksun_ora_08177_collect(x, cnt, y_coll) as
select x, count(*) cnt, cast (collect (distinct y) as ksun_tab) y_coll
from ksun_ora_08177
where x is not null
group by x;

--------------------
create or replace force view ksun_ora_08177_listagg(x, cnt, y_coll) as
select x, count(*) cnt, listagg (y, ',') within group (order by y) y_coll
from ksun_ora_08177
where x is not null
group by x;

--------------------
create or replace procedure ora_08177_test(p_cnt number, p_view varchar2) as
 l_cur       integer;
 l_qry       varchar2(300) := 'select * from '||p_view;
 l_idx           pls_integer;
 l_col_desc_tab  dbms_sql.desc_tab;
begin
 l_qry := 'select /* '||p_view ||' */ * from '||p_view;
 for i in 1..p_cnt loop
  dbms_output.put_line('Run='||i);
  execute immediate 'alter system flush shared_pool';

  begin
    commit;
    set transaction isolation level serializable;
    l_cur := dbms_sql.open_cursor;
    dbms_sql.parse(l_cur, l_qry, dbms_sql.native);
    dbms_sql.describe_columns (l_cur, l_idx, l_col_desc_tab);
    dbms_sql.close_cursor(l_cur);
    exception when others then
      dbms_sql.close_cursor(l_cur);
      raise;
  end;
 end loop;
end;
/

-- Note: Each time, open a new session, run the test. After first run, subsequent runs in the same session do not hit the error.
-- exec ora_08177_test(10, 'ksun_ora_08177_collect_listagg');
-- exec ora_08177_test(10, 'ksun_ora_08177_collect');
-- exec ora_08177_test(10, 'ksun_ora_08177_listagg');