Tuesday, November 5, 2013

ORA-04030 incident file and alert.log

In the last two Blogs, we talked about how to measure PL/SQL package PGA memory usage and different ways to reclaim unused memory:
      dbms_session package_memory_utilization
      Oracle PL/SQL Collection Memory Reclaim

In this Blog we will discuss its diagnostics.

ORA-04030 incident file


ORA-04030 is threw when PGA memory allocation is over certain limit.

16 GB in Rel 11.2.0.3.0


Quite often ORA-04030 is threw when PGA consumption reaches 16 GB, and the generated incident file contains following text:

   Dump of Real-Free Memory Allocator Heap [0x1108c1090]
   mag=0xfefe0001 flg=0x5000003 fds=0x0 blksz=65536
   blkdstbl=0x1108c10a0, iniblk=252928 maxblk=262144 numsegs=255
   In-use num=252073 siz=3639541760, Freeable num=0 siz=0, Free num=0 siz=0


The 16GB limit seems deduced by:
      262144(maxblk) * 65536(blksz) = 16GB
where blksz is controlled by DB hidden parameters (MOS [1506315.1], [1328489.1]):
  _use_realfree_heap=TRUE
  _realfree_heap_pagesize_hint = 65536


and both can be queried by:

col name format a30
col cur_val format a20


select i.ksppinm name, v.ksppstvl cur_val, v.ksppstdf default_val, v.ksppstvf
from x$ksppi i, x$ksppcv v
where i.indx = v.indx
  and i.ksppinm in ('_realfree_heap_pagesize_hint', '_use_realfree_heap');


For blksz=65536, in Blog: PGA, SGA memory usage watching from UNIX, we have also observed that majority of page (segment) is allocated in a unit of 64KB on both AIX and Solaris for high PGA sessions.

In the above example, the real allocated memory is:
     252073 (In-use num) * 65536(blksz) = 16'519'856'128 Bytes
but only siz=3,639,541,760 is reported. This is probably due to the 32 Bit Integer Overflow (maximum 4GB, as observed in Blog: dbms_session package_memory_utilization) in the process dumping.

Adding 3*4GB Overflow, the effective memory allocated should be:
   3*(4*1024*1024*1024) + 3639541760 = 16,524,443,648 (16GB)

16GB is upper limit. Sometimes a session exits by ORA-04030 with only 11GB memory as follows:
   Dump of Real-Free Memory Allocator Heap [0x1008b1080]
   mag=0xbefe0001 flg=0x5000003 fds=0x6 blksz=65536
   blkdstbl=0xffffffff7d700010, iniblk=183298 maxblk=262144 numsegs=226
   In-use num=178538 siz=3115057154, Freeable num=0 siz=0, Free num=1 siz=54726560     

  
   178538*65536 = 11,700,666,368 (11GB)
   2*(4*1024*1024*1024) + 3115057154 = 11,704,991,746 (11GB)


In such case, it could probably indicate memory capped on the UNIX layer or virtual space limitation.

Usually ORA-04030 only triggers single session exit. Occasionally ORA-04030 can cause entire DB crash when PSP (PSP0) background process is affected. This is probably due to either certain threshold on the number of ORA-04030 errors, or memory shortage which PSP and other Oracle mandatory background processes ask for.

32 GB in Rel 11.2.0.4.0


It seems that Oracle increased the limit to 32 GB in Rel 11.2.0.4.0, as ORA-04030 dumps showed:
  Dump of Real-Free Memory Allocator Heap [0xfffffd7ffdbad000]
  mag=0xfefe0001 flg=0x5000003 fds=0x6 blksz=65536
  blkdstbl=0xfffffd7ffdbad010, iniblk=523264 maxblk=524288 numsegs=321
  In-use num=522322 siz=4176347136, Freeable num=0 siz=0, Free num=0 siz=0

where maxblk is doubled from 262144 to 524288.

4 GB and max_map_count


However if "In-use num" in the incident file shows a number far less than maxblk, it could be due to some UNIX settings. For example, Linux vm.max_map_count, limits the VM memory map table for the process.

If setting vm.max_map_count=65536, the session throws ORA-04030 when approaching to 4GB (max_map_count * blksz), and "In-use num" (65007) shows a number close to 65536 in the generated incident file:

    Dump of Real-Free Memory Allocator Heap [0x7fcf69a3c000]
    mag=0xfefe0001 flg=0x5000003 fds=0x7 blksz=65536
    blkdstbl=0x7fcf69a3c010, iniblk=69632 maxblk=262144 numsegs=115
    In-use num=65007 siz=4264296448, Freeable num=0 siz=0, Free num=1 siz=46268416


In such case, "Process Map Dump" Section in incident dump file contains about 65007 (close to 65536) entries like:
         7f7cd86c3000-7f7cd86d3000 rw-p 00000000 00:05 1030     /dev/zero

which represents a data (bss) memory area (struct vm_area_struct) of 65536 Bytes (= 140174183706624 - 140174183641088 in decimal).
It is the information contained in /proc/[pid]/maps. pmap formats this information in a bit more readable manner.


alert.log


When DB crashed, one can get a quick evidence from alert.log, which contains something like:
    PMON (ospid: 21951): terminating the instance due to error 490
  Instance terminated by PMON, pid = 23951

      (ORA-00490 means: PSP process terminated with error, and its Cause: The process spawner died)
or

  DBW0 (ospid: 1310): terminating the instance due to error 4030
  Instance terminated by DBW0, pid = 1510


whereas a normal instance shutdown is marked by a message like:
    Instance terminated by USER, pid = 1407
in alert.log.


When OS system is under pressure of space allocation, we can also see some warning in alert.log and related trace files.

alert.log


  Errors in file /orabin/app/oracle/admin/testdb/diag/rdbms/testdb/testdb/trace/testdb_psp0_28835930.trc:
 ORA-27300: OS system dependent operation:fork failed with status: 12
 ORA-27301: OS failure message: Not enough space
 ORA-27302: failure occurred at: skgpspawn3
 Process W000 died, see its trace file
 Process startup failed, error stack:


testdb_psp0_28835930.trc


 Process startup failed, error stack:
 ORA-27300: OS system dependent operation:fork failed with status: 12
 ORA-27301: OS failure message: Not enough space
 ORA-27302: failure occurred at: skgpspawn3


testdb_w000_30212136.trc


  MMNL absent for 78398 secs; Foregrounds taking over