Tuesday, June 4, 2013

dbms_alert lock hash collision

dbms_alert package calls dbms_lock to synchronize alert senders and receivers based on lockid generated by the name in dbms_alert_info table by a function call:

   dbms_utility.get_hash_value(name, 2000000000, 2048)

which can have a high hash collision probability, and hence occasionally false UL lock contention.

For example, both 'warning_1828' and 'alarm_1828' are hashed to the same value: 2000000389.

 select dbms_utility.get_hash_value(upper('warning_1828'), 2000000000, 2048) lockid_1,
        dbms_utility.get_hash_value(upper('alarm_1828'),   2000000000, 2048) lockid_2 
 from dual;



Here is a testcase.

At first create a helper function:

 create or replace procedure dbms_alert_lock_collision(p_name varchar2) as
   l_message varchar2(1800);
   l_status  pls_integer;
   l_name    varchar2(30);
 begin
   dbms_alert.register(p_name);
   dbms_alert.waitany(l_name, l_message, l_status);
   dbms_output.put_line('Received Name='||l_name||', Message='||l_message||', Status:='||to_char(l_status));
 end;


then open 4 Sqlplus sessions, perform the test steps as follows:

(1). Session-1 at T1
         exec dbms_alert_lock_collision('warning_1828');

(2). Session-2  at T2
         exec dbms_alert_lock_collision('alarm_1828');

(3). Session-3 at T3
         exec dbms_alert.signal('warning_1828', 'warning_1828_msg');

 Firing signal wakes up the waiter session, which changes its state from "pipe get" (wait_class Idle)
 to "enq: UL - contention" (wait_class Application).

(4). Session-4 at T4
         exec dbms_alert.signal('alarm_1828', 'alarm_1828_msg');

(5). Session-3 at T5, wait Session-4 returns (this can take up to 32 seconds), then:
          commit;

 even Session-3 committed, Session-1 is still blocked by 'UL'lock. You can see it by:

  select * from v$lock where type = 'UL';

  select * from v$wait_chains;


(6). Session-4 at T6

         commit;

 Session-4's commit de-locks both Session-1 and Session-2.


If Session-3 at T5 commits before Session-4 at T4 returns, the commit of Session-3 at T5 can probably de-lock Session-1. However the behavior is dependent on the UL lock time sequence.

When signal is sent by the signalling session, but before commit, dbms_alert.waitany requests a UL lock with a timeout: 1, 2, 4, 8, 16, 32 seconds for the first 6 times, and afterwards always with timeout: 32 seconds.

If Session-4 at T4 waits less than 32 seconds, it can be blocked by Session-1 at T1. In this case, Session-3 holds S_MODE(4) lock and blocks Session-1's SX_MODE(3) lock Request, and Session-1's SX_MODE(3) lock Request blocks Session-4's S_MODE(4) lock Request (Session-4 is queued after Session-1). When Session-1 renews its lock request after maximum 32 seconds, Session-4 got its S_MODE(4) lock. And then both Session-3 and Session-4 hold S_MODE(4) lock, whereas both Session-1 and Session-2 are waiting SX_MODE(3) lock. Therefore when Session-3 at T5 commits (and hence release its S_MODE(4) lock), Session-4 is still blocking Session-1 and Session-2.

Following query can help to detect the hash collision names:

  select lockid, count(*) cnt, listagg(name, '; ') within group (order by name)
    from (
      select name, dbms_utility.get_hash_value(upper(name), 2000000000, 2048) lockid
        from (select distinct name from sys.dbms_alert_info) v
    )
  group by lockid
  having count(*) > 1
  order by cnt desc; 


If there are more than 100 waiting sessions like Session-1 and Session-2, the system can be disturbed by the heavy UNDO activities. Here the query to monitor them:

 select n.name, s.sid, s.value
   from v$statname n, v$sesstat s
  where s.statistic# = n.statistic#
    and name in ('data blocks consistent reads - undo records applied'
                ,'rollback changes - undo records applied')


By the way, in 14 May 2013, a first rank result on twin prime conjecture was reported: "Bounded gaps between primes".