Thursday, March 27, 2014

dbms_aq.dequeue - latch: row cache objects on AIX

dbms_aq.dequeue uses an ADT object_type to pick payload from queue, and this can cause a heavy
   latch: row cache objects (latch#: 280 and child#: 8)

which is used to protect Row Cache "dc_users" (cache# 10 and 7 in v$rowcache) in a CPU overloaded AIX System with Oracle 11.2.0.3.0.

At first, let's look at the special behaviour of dbms_aq.dequeue (see appended TestCase):
 exec qstat;
 exec deq(1, 5);
 exec qstat;


The output looks like:
 SQL> exec qstat;
  LOCKED_TOTAL=9,   PINNED_TOTAL=9
  SQL_TEXT=select , EXECUTIONS=19, ROWS_PROCESSED=3
  SQL_TEXT=insert , EXECUTIONS=3,  ROWS_PROCESSED=3
 SQL> exec deq(1, 5);
 SQL> exec qstat;
  LOCKED_TOTAL=10,  PINNED_TOTAL=10
  SQL_TEXT=select , EXECUTIONS=22, ROWS_PROCESSED=3
  SQL_TEXT=insert , EXECUTIONS=3,  ROWS_PROCESSED=3


It shows that the dequeue SELECT executed 3 times, and the LOCKED_TOTAL on the payload object: qtab_obj increased 1. By a 10046 event trace, we  can see that two selects were run immediately after dequeue, and the third one was after 5 seconds, and finally throws: "ORA-25228: timeout".

By enqueuing one message, and then dequeue it:
 exec qstat;
 exec enq(1);
 exec qstat;
 exec deq(1, 5);
 exec qstat;


The output is:
 SQL> exec qstat;
  LOCKED_TOTAL=12,  PINNED_TOTAL=12
  SQL_TEXT=select , EXECUTIONS=23, ROWS_PROCESSED=4
  SQL_TEXT=insert , EXECUTIONS=4,  ROWS_PROCESSED=4
 SQL> exec enq(1);
 SQL> exec qstat;
  LOCKED_TOTAL=13,  PINNED_TOTAL=13
  SQL_TEXT=select , EXECUTIONS=23, ROWS_PROCESSED=4
  SQL_TEXT=insert , EXECUTIONS=5,  ROWS_PROCESSED=5
 SQL> exec deq(1, 5);
 SQL> exec qstat;
  LOCKED_TOTAL=14,  PINNED_TOTAL=14
  SQL_TEXT=select , EXECUTIONS=24, ROWS_PROCESSED=5
  SQL_TEXT=insert , EXECUTIONS=5,  ROWS_PROCESSED=5


We can see 1 enqueue INSERT, 1 dequeue SELECT, and LOCKED_TOTAL increased 1 for enqueue, and 1 for dequeue.

Based on this observation, probably we can reconstruct one dbms_aq.dequeue pseudo code:
 declare
  l_rec qtab%rowtype;
   cursor c is select  /*+ INDEX(TAB AQ$_QTAB_I) */ ... for update skip locked;
 begin
  open c;

  -- 1st fetch
  fetch c into l_rec;
  if c%found then return; end if;

  -- 2nd fetch
   fetch c into l_rec;
  if c%found then return; end if;

  -- thread_wait with timeout 5 seconds.

  -- Within 5 seconds, if there is message available, waked up by a thread_post.
  thread_wait(5);

  fetch c into l_rec;
  if c%found then return; end if;

  finally close c;
          throw ORA-25228;
 end;


By the way, "skip locked" option seems originally designed for AQ, and this select can not guarantee the consistency due to "skip locked" rows which are not deterministic to the result set.

Now we run a CPU intensive test on AIX Power7 System with Entitled Capacity 4, SMT=4 and Oracle 11.2.0.3.0:

 exec loop_job_cpu_burning(48);
 exec loop_job_enq(48, 0.01);
 exec loop_job_deq(48, 1);


Then repeatedly run:
 select event, v.* from v$active_session_history v where p2 = 280 order by sample_time desc;
 select event, v.* from v$session v where p2 = 280;
 select * from v$latch_children where latch# = 280 and child# in (8);
 select * from v$latchholder;


we will observe:
  latch: row cache objects

To make this event more noticeable, launch some dequeue jobs with NO_WAIT:
  exec loop_job_deq(16, 0);

then there will be even more latch misses and sleeps (also some other events: "latch: cache buffers chains", "buffer busy waits", "redo copy").

This also indicates that if dequeue is faster than enqueue, a higher dequeue wait will reduce CPU load because each empty dequeue also triggers three dequeue select, and hence locks on payload object_type.

By running,
 select locked_total, pinned_total, locks, pins, v.*
 from v$db_object_cache v
 order by v.locked_total desc;


we can notice huge LOCKED_TOTAL on payload object_type: "QTAB_OBJ". This is probably the root cause of "latch: row cache objects" caused by dbms_aq.dequeue.

In fact, it was first discovered by a 10046 Event Trace on a dequeue session and a system library_cache dump with:
  alter session set events 'immediate trace name library_cache level 10';

Once the test is finished, remove all the jobs by:
  exec clean_job;

UNIX Process Monitoring


By AIX truss or trace on a dequeue session process, we can observe continuous
  thread_wait, thread_post
to sysnchonize IPC mechanism, but hardly see real work (for example: kread).

Running ps on a dequeue session process a few times (for example, PID 31785184):
 ps -flp 31785184
it shows that both Column C (CPU utilization) and PRI (priority) are changed each time. And IBM AIX document states:

for the sched_other policy (AIX default), CPU utilization is used in determining process scheduling priority. Large values indicate a CPU intensive process and result in lower process priority, whereas small values indicate an I/O intensive process and result in a more favorable priority.

One reasonable explanation from UNIX level is that the process holds the single:
  latch: row cache objects (latch#: 280 and child#: 8)
consumes a certain CPU (spin_gets), and was evaluated to a lower priority, yield CPU to others, and is not able to release this single latch. Whereas the other processes which are desperately requesting this latch are starving the CPU, but can't get this latch. And finally it results in a single latch contention, and all processes are serialized by this single Oracle resource.

We often see this latch taking on average several ms (3-40 ms), and occasionally more than one hour (that contradicts the common believe that latch is a short activity).

As we know that AIX Schedo Dispatcher uses a timeslice to choose the next processes.The default value for timeslice is a single clock tick, that is, 10 milliseconds.

One possible guess for the huge LOCKED_TOTAL on payload object_type: "QTAB_OBJ" is that AIX run-time linking problem in which each dequeue action requires a lock on "QTAB_OBJ" object file (or shared library (SO)).

Bug 14382262  latch free wait event in"kokc descriptor allocation latch"


Oracle seems aware of this huge LOCKED_TOTAL and published a patch to fix it. As tested, the patch stopped the increasing of LOCKED_TOTAL, but "latch: row cache objects" gets even worse. So all the tests in this Blog is without this patch.

latchstat tool


Imitated from vmstat, a small script: latchstat is created to to monitor latch activities, for example:
  exec latchstat(4, 1, 280, 8);

(code is appended at the end of Blog).

---------------------- setup ---------------------- 
create or replace type qtab_obj is object (id integer, pr_id integer);
/      

begin sys.dbms_aqadm.drop_queue_table(queue_table => 'QTAB', force=> TRUE); end;
/

begin
  sys.dbms_aqadm.create_queue_table
  (
    queue_table           => 'QTAB'
   ,queue_payload_type    => 'QTAB_OBJ'
   ,compatible            => '8.1'
   ,sort_list             => 'PRIORITY,ENQ_TIME'
   ,multiple_consumers    =>  false                
   ,message_grouping      =>  0
   ,comment               =>  'Test Queue Table'
   ,secure                =>  false
   );
end;
/

begin
  sys.dbms_aqadm.stop_queue (queue_name => 'QTAB_QUEUE');
  sys.dbms_aqadm.drop_queue (queue_name => 'QTAB_QUEUE');
end;
/

begin
  sys.dbms_aqadm.create_queue
  (
    queue_name     => 'QTAB_QUEUE'
   ,queue_table    => 'QTAB'
   ,queue_type     =>  sys.dbms_aqadm.normal_queue
   ,max_retries    =>  100
   ,retry_delay    =>  2
   ,retention_time =>  604800
   ,comment        => 'Test Queue'
   );
end;
/

begin sys.dbms_aqadm.start_queue(queue_name => 'QTAB_QUEUE', enqueue => true, dequeue => true); end;
/


---------------------- queuing ----------------------   
create or replace procedure cpu_work(p_cnt number) as
  l_x number;
begin
  for i in 1..p_cnt loop
    l_x := l_x + sqrt(i);
  end loop;
end;
/
create or replace procedure enq(p_cnt number, p_sleep_second number := 0.01) as
  l_enqueue_options     dbms_aq.enqueue_options_t;
  l_message_properties  dbms_aq.message_properties_t;
  l_message_id          raw(16);
  l_message             qtab_obj;
begin
  cpu_work(1000);
  for i in 1..p_cnt loop
    l_message := qtab_obj(i, 2*i);
    if p_sleep_second > 0 then dbms_lock.sleep(p_sleep_second); end if;
    dbms_aq.enqueue(queue_name            => 'QTAB_QUEUE',
                    enqueue_options       => l_enqueue_options,
                    message_properties    => l_message_properties,
                    payload               => l_message,
                    msgid                 => l_message_id);
    commit;
  end loop;
  cpu_work(1000);
end;
/
         
create or replace procedure deq(p_cnt number, p_wait_second binary_integer := 1) as
  l_dequeue_options     dbms_aq.dequeue_options_t;
  l_message_properties  dbms_aq.message_properties_t;
  l_message_id          raw(16);
  l_message             qtab_obj := qtab_obj(null, null);
begin
  l_dequeue_options.wait       := p_wait_second;
  cpu_work(1000);  
  for i in 1..p_cnt loop
   begin 
     dbms_aq.dequeue(queue_name            => 'QTAB_QUEUE',
                     dequeue_options       => l_dequeue_options,
                     message_properties    => l_message_properties,
                     payload               => l_message,
                     msgid                 => l_message_id);
     commit;
    exception when others then null;
   end;
  end loop;
  cpu_work(1000);
end;         
/         
create or replace procedure loop_job_enq(p_job_cnt number, p_sleep_second number := 0.01)
as
   l_job_id pls_integer;
begin
    for i in 1.. p_job_cnt loop
      dbms_job.submit(l_job_id, 'begin while true loop enq(100, '|| p_sleep_second ||'); end loop; end;');
    end loop;
    commit;
end;   
/
create or replace procedure loop_job_deq(p_job_cnt number, p_wait_second binary_integer := 1)
as
   l_job_id pls_integer;
begin
    for i in 1.. p_job_cnt loop
      dbms_job.submit(l_job_id, 'begin while true loop deq(100, '|| p_wait_second ||'); end loop; end;');
    end loop;
    commit;
end;   
/
create or replace procedure loop_job_cpu_burning (p_job_cnt number)
as
   l_job_id pls_integer;
begin
    for i in 1.. p_job_cnt loop
      dbms_job.submit(l_job_id, 'begin while true loop null; end loop; end;');
    end loop;
    commit;
end;   
/
create or replace procedure qstat as
  l_txt varchar2(100);
begin
  select 'QTAB_TOTAL= '||count(*)||', '||'READY='||count(decode(state, 0, 1))||', At: '||systimestamp
  into l_txt from QTAB;
  dbms_output.put_line(l_txt);

  select 'LOCKED_TOTAL='||locked_total||',   '||'PINNED_TOTAL='||pinned_total into l_txt
  from v$db_object_cache v where name in ('QTAB_OBJ');
  dbms_output.put_line(l_txt);
 
  -- 313buw98w5y1a  insert into "K"."QTAB"
  -- 3yxj9h80vc0jw  select  /*+ INDEX(TAB AQ$_QTAB_I) */ ... for update skip locked 
   
  for c in (select substr(sql_text, 1, 7) sql_text, executions, rows_processed
            from v$sql v where sql_id in ('313buw98w5y1a','3yxj9h80vc0jw') and executions > 0)
  loop
    dbms_output.put_line('SQL_TEXT='||c.sql_text||', '||'EXECUTIONS='||c.executions
                                    ||', '||'ROWS_PROCESSED='||c.rows_processed);
  end loop;
end;
/
create or replace procedure clean_job as
begin
  for c in (select d.job, d.sid, (select serial# from v$session where sid = d.sid) ser
            from dba_jobs_running d) loop
    begin
      execute immediate 'alter system kill session '''|| c.sid || ',' || c.ser  ||''' immediate';
      dbms_job.remove(c.job);
    exception when others then null;
    end;
    commit;
  end loop;
 
  for c in (select job from dba_jobs) loop
    begin
      dbms_job.remove(c.job);
    exception when others then null;
    end;
    commit;
  end loop;
end;
/

---------------------- latchstat---------------------- 
set echo on
drop type t_latch_rec_obj force;
create or replace type t_latch_rec_obj as object(
      snap                    number
     ,tim                     timestamp
     ,name                    varchar2(30)
     ,v1                      number
     ,v2                      number
     ,v3                      number
     ,v4                      number
     ,v5                      number
     ,v6                      number
     ,v7                      number
     ,v8                      number
     ,v9                      number
     ,v10                     number
    );
/
   
create or replace type t_lacth_rec_tab as table of t_latch_rec_obj;
/

create or replace procedure latchstat(cnt number, interval number, latch_num number, child_num number := null) as
 l_col_len_s       pls_integer := 8;
 l_col_len_l       pls_integer := 16;
  l_rec_tab         t_lacth_rec_tab := t_lacth_rec_tab();
  l_rec_tab_child   t_lacth_rec_tab := t_lacth_rec_tab();
  l_latch_name      varchar2(20);
  l_children_cnt    pls_integer;
  l_top_5_children  varchar2(100);
begin
 
 select max(name), count(*)
 into  l_latch_name, l_children_cnt
 from v$latch_children v where latch# in (latch_num); 
  
 select listagg(child# ||'('|| sleeps ||')', ' / ') within group (order by wait_time desc) child_sleeps
 into   l_top_5_children
 from (select * from v$latch_children v where latch# in (latch_num) order by wait_time desc)
 where rownum <= 5;

  dbms_output.put_line('  Latch: '                 || l_latch_name
                    || ', Children#: '             || l_children_cnt
                    || ', Top 5 Children(Sleeps): '|| l_top_5_children
                    || ', At: '                    || systimestamp);
                 
 dbms_output.put_line(null);                 
 
  dbms_output.put_line(lpad('P_gets', l_col_len_s)
                 ||lpad('misses', l_col_len_s)
                 ||lpad('sleeps', l_col_len_s)
                 ||lpad('spin_gets', l_col_len_l)
                 ||lpad('wait_time(ms)', l_col_len_l)
                 ||lpad('avg_wait_time', l_col_len_l)
                 ||lpad(' | ', l_col_len_s)
                 ||lpad('C_gets', l_col_len_s)
                 ||lpad('misses', l_col_len_s)
                 ||lpad('sleeps', l_col_len_s)
                 ||lpad('spin_gets', l_col_len_l)
                 ||lpad('wait_time(ms)', l_col_len_l)
                 ||lpad('avg_wait_time', l_col_len_l)
                 );
                              
  for i in 1..cnt loop
    select t_latch_rec_obj(i, systimestamp, l.name, l.gets, l.misses, l.sleeps, l.spin_gets, l.wait_time, k.gets, k.misses, k.sleeps, k.spin_gets, k.wait_time)
    bulk collect into l_rec_tab
    from v$latch l, v$latch_children k
    where l.latch#  = latch_num
     and l.latch#  = k.latch#(+)
      and child_num = k.child#(+);
   
    dbms_lock.sleep(interval);
   
    -- only select one Row --
    for c in
      (select l.name, v1.snap, v1.tim snap_start, systimestamp snap_end
             ,(l.gets - v1.v1) l_get_d, (l.misses - v1.v2) l_misses_d, (l.sleeps - v1.v3) l_sleeps_d
             ,(l.spin_gets - v1.v4) l_spin_gets_d, round((l.wait_time - v1.v5)/1000) l_wait_time_d
             ,(case when (l.gets - v1.v1) > 0 then round((l.wait_time - v1.v5)/(l.gets - v1.v1), 2)
                    else null
               end) l_avg_wait_time_d
             ,(k.gets - v1.v6) k_get_d, (k.misses - v1.v7) k_misses_d, (k.sleeps - v1.v8) k_sleeps_d
        ,(k.spin_gets - v1.v9) k_spin_gets_d, round((k.wait_time - v1.v10)/1000) k_wait_time_d
        ,(case when (k.gets - v1.v6) > 0 then round((k.wait_time - v1.v10)/(k.gets - v1.v6), 2)
               else null
          end) k_avg_wait_time_d
       from   table(l_rec_tab) v1, v$latch l, v$latch_children k
       where  l.latch#  = latch_num
         and  l.latch#  = k.latch#(+)
         and  child_num = k.child#(+)
      )
    loop
      -- Parent -- 
      dbms_output.put(  lpad(c.l_get_d, l_col_len_s)
                      ||lpad(c.l_misses_d, l_col_len_s)
                      ||lpad(c.l_sleeps_d, l_col_len_s)
                      ||lpad(c.l_spin_gets_d, l_col_len_l)
                      ||lpad(c.l_wait_time_d, l_col_len_l)
                      ||lpad(c.l_avg_wait_time_d, l_col_len_l));
         
      -- Child --  
   dbms_output.put(  lpad(' | ', l_col_len_s)
                   ||lpad(c.k_get_d, l_col_len_s)
                   ||lpad(c.k_misses_d, l_col_len_s)
                   ||lpad(c.k_sleeps_d, l_col_len_s)
                   ||lpad(c.k_spin_gets_d, l_col_len_l)
                   ||lpad(c.k_wait_time_d, l_col_len_l)
                   ||lpad(c.k_avg_wait_time_d, l_col_len_l));
         
    dbms_output.put_line(null);       
    end loop;
   
  end loop;
end;
/


/*
 ----- Usage latchstat(cnt, interval, latch_num, child_num) ------
 ----- Example (latch_num=280: "row cache objects", child_num=8: "dc_users" ) ------

  exec latchstat(4, 1, 280);
  exec latchstat(4, 1, 280, 8);
*/

set serveroutput on
set echo off