Friday, February 3, 2017

Oracle JVM Java OutOfMemoryError and lazy GC

Java Stored Procedure running in Oracle RDBMS embedded JVM can throw OutOfMemoryError in connection with lazy GC.
All tests are done in Oracle 12.1.0.2.0.


1. Test Setup



create or replace java source named "OracleJavaOOM512" as
public class OracleJavaOOM512 {
  // final static byte[] SB = new byte[1024*1024*500];  // workaround
  public static void createBuffer(int bufferSize) {
    // OracleRuntime.setMaxRunspaceSize(1024*1024*1024) // set 1GB, no help
    // System.gc();
    byte[] SB = new byte[bufferSize];
  }
}
/

alter JAVA CLASS "OracleJavaOOM512" COMPILE;

create or replace procedure createBuffer512(p_buffer_size in number) is language java
name 'OracleJavaOOM512.createBuffer(int)';
/

create or replace procedure createBuffer512_loop 
  (p_cnt number, p_start number, p_delta number, p_sleep_seconds number := 0) as
  l_size number := p_start;
begin
  for i in 1..p_cnt loop
    dbms_lock.sleep(p_sleep_seconds);
    dbms_output.put_line('Step -- '||i||' --, Buffer Size (MB) = '||l_size);
    createBuffer512(1024*1024*l_size);
    l_size := l_size + p_delta;
  end loop;
end;
/


2. Tests


Run first test to allocate 511MB:

Sqlplus > exec createBuffer512(1024*1024*511);
    PL/SQL procedure successfully completed

succeeded without Error.

Run second test for 512MB:

Sqlplus > exec createBuffer512(1024*1024*512);
    ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError
    ORA-06512: at "S.CREATEBUFFER512", line 1

It hits the general Java error: ORA-29532 java.lang.OutOfMemoryError, which probably indicates that the JVM is limited by 512MB for one single object instance, in this test, it is "new byte[bufferSize]".

Calling OracleRuntime.setMaxRunspaceSize(1024*1024*1024) to set MaxRunspaceSize to 1GB does not help.

If interested, a 29532 event trace can also be made:

Sqlplus > alter session set max_dump_file_size = UNLIMITED;
Sqlplus > alter session set tracefile_identifier='OOM512_29532'; 
Sqlplus > alter session set events '29532 trace name errorstack level 3'; 
Sqlplus > exec createBuffer512(1024*1024*512);
    ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError

Open a new Oracle session, run third test case, which gradually allocates memory from 1MB to 511MB, each time increases 1MB per call.

Sqlplus > exec createBuffer512_loop(511, 1, 1);

    Step -- 1 --, Buffer Size (MB) = 1
    Step -- 2 --, Buffer Size (MB) = 2
    ...
    Step -- 264 --, Buffer Size (MB) = 264
    Step -- 265 --, Buffer Size (MB) = 265
    BEGIN createBuffer512_loop(511, 1, 1); END;
    
    *
    ERROR at line 1:
    ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError
    ORA-04030: out of process memory when trying to allocate 282220624 bytes (joxu pga heap,f:OldSpace)
    ORA-06512: at "S.CREATEBUFFER512", line 1
    ORA-06512: at "S.CREATEBUFFER512_LOOP", line 8
    ORA-06512: at line 1

It throws OutOfMemoryError when allocating 266MB (the number can be varied, but before reaching 511), and generates trace file and incident file. Note that in this test, besides the above general Java error: ORA-29532, it throws also ORA-04030.


3. Oracle JVM Garbage Collection


Oracle 12c Document: Database Java Developer's Guide Chapter 1 Introduction to Java in Oracle Database, Section Memory Spaces Management depicts a 3-Layer Java memory Management:
  1.  New Space in Call space
  2.  Old Space in Call space
  3.  Session Memory in Session space
and 3 corresponding garbage collection algorithms:
  1.  Generational scavenging for short-lived objects
  2.  Mark and lazy sweep collection for objects that exist for the life of a single call
  3.  Copying collector for long-lived objects, that is, objects that live across calls within a session
It also mentions that Oracle JVM Garbage collection uses Oracle Database scheduling facilities.

Book: Oracle Database Programming using Java and Web Services (Kuassi Mensah) Section 2.2 Java Memory Management (Page 41-51) reveals internal GC mechanisms (probably written for Oracle 10g).

Garbage Collection Techniques
  The OracleJVM memory manager uses a set of GC techniques for its various memory structures (listed in Table 2.1), including the generational GC, mark-sweep GC, and copy GC.

These 3 GC techniques in Oracle 10g look similar to GC algorithms in 12c.

Mark-sweep GC
  A mark-sweep GC consists of two phases: (1) the mark phase (garbage detection) and the (2) sweep phase (Garbage Collection). The sweep phase places the “garbaged” memory on the freelists. The allocation is serviced out of the freelists. Lazy sweeping is a variation technique that marks objects normally, but, in the sweep phase, leaves it up to the allocation phase to determine whether the marked objects are live (no need to reclaim) or not (can be reclaimed), rather than sweeping the object memory at that time.

Java Memory Areas
  Oldspace is an object memory used for holding long-lived or large objects (i.e., larger than 1-K Bytes) for the duration of a call. It is cleaned up using Marksweep GC (described earlier) for large objects; it uses a variation of “lazy sweeping” for small objects.

Looking up Table 2.1 "Summary of OracleJVM Memory Structure" (Page 50), since our test allocation is more than 1-K Bytes, it should be allocated into Old-space using Buddy memory allocation, and garbage collected by Mark-Sweep.


4. Reasoning


The error message in Sqlplus Window indicates that memory problem is caused by:
    joxu pga heap,f:OldSpace
which also appears in trace file.

An excerpt of incident file looks like:

=======================================
TOP 10 MEMORY USES FOR THIS PROCESS
---------------------------------------
84%   91 MB,  17 chunks: "f:OldSpace                "  JAVA
         joxu pga heap   ds=fffffd7ffbe8bcc8  dsprt=fffffd7ffc0a9078
 2% 2432 KB,  92 chunks: "free memory               "  
         pga heap        ds=fffffd7ffc345640  dsprt=0
 2% 1767 KB, 437 chunks: "free memory               "  
         session heap    ds=fffffd7ffc02d728  dsprt=fffffd7ffc358350
 1% 1064 KB,   2 chunks: "permanent memory          "  JAVA
         joxu pga heap   ds=fffffd7ffbe8bcc8  dsprt=fffffd7ffc0a9078
...
=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
108 MB total:
   104 MB commented, 519 KB permanent
  4125 KB free (0 KB in empty extents),
      97 MB,   1 heap:    "joxp heap      "            2315 KB free held
    8894 KB,   1 heap:    "session heap   "            679 KB free held
------------------------------------------------------
Summary of subheaps at depth 1
104 MB total:
   101 MB commented, 952 KB permanent
  1814 KB free (25 KB in empty extents),
      94 MB,   1 heap:    "joxu pga heap  "           
------------------------------------------------------
Summary of subheaps at depth 2
99 MB total:
    97 MB commented, 1408 KB permanent
   272 KB free (0 KB in empty extents),
      91 MB,  17 chunks:  "f:OldSpace                "

=========================================
REAL-FREE ALLOCATOR DUMP FOR THIS PROCESS
-----------------------------------------
 
Dump of Real-Free Memory Allocator Heap [0xfffffd7ffbfed000]
mag=0xfefe0001 flg=0x5000003 fds=0x0 blksz=65536
blkdstbl=0xfffffd7ffbfed010, iniblk=523264 maxblk=524288 numsegs=255
In-use num=133 siz=112984064, Freeable num=224 siz=2914910208, Free num=15 siz=21889024

...
-------------------------
Top 10 processes:
-------------------------
(percentage is of 27 GB total allocated memory)
99% pid 66: 105 MB used of 27 GB allocated (27 GB freeable) <= CURRENT PROC
 0% pid 11: 7792 KB used of 8294 KB allocated 
...
======================================================
ESTIMATED MEMORY USES FOR ALL PROCESSES
------------------------------------------------------
(from 1 snapshot out of 67 processes)
99%     27 GB,   66 processes: "unsnapshotted               "   
 1%    172 MB,    1 process  : "free memory                 "   
           pga heap             95 chunks
 0%     93 MB,    1 process  : "f:OldSpace                  "  JAVA
...

******************* Dumping process map ****************
    Start addr     -      End addr       Size    PgSz       Shmid    Perms  Object name
------------------ -  ----------------- -------  ----     ---------- ------------------------------
...
        0x80000000 -        0x1af000000 4964352K    4K     0x40000039 rwxs-  [ anon ] osm 
...
0xfffffd792326d000 - 0xfffffd79333bd000 263488K    4K     0xffffffff rw---  [ anon ] 
0xfffffd793355d000 - 0xfffffd794365d000 263168K    4K     0xffffffff rw---  [ anon ] 
0xfffffd794374d000 - 0xfffffd795374d000 262144K    4K     0xffffffff rw---  [ anon ] 
0xfffffd795377d000 - 0xfffffd796293d000 247552K    4K     0xffffffff rw---  [ anon ] 
0xfffffd7962aad000 - 0xfffffd7971b6d000 246528K    4K     0xffffffff rw---  [ anon ]
...

There are about 30GB segments marked as:
   rw--- [ anon ]
which are session's PGA.

Above output shows:
   105 MB used of 27 GB allocated (27 GB freeable) <= CURRENT PROC
among 27 GB allocated, almost all of which is freeable.

However due to Oracle JVM lazy GC, they are not immediately released, and eventually leads to:
   ORA-04030: out of process memory when trying to allocate 283285584 bytes (joxu pga heap,f:OldSpace)

(By the way, biggest chunk 4964352K is SGA, created with Optimized Shared Memory (OSM) available since Oracle 12c when sga_max_size is not configured, or is set greater than the computed value.
See ISM, DISM in Oracle 11g Blog: PGA, SGA memory usage watching from UNIX)

Concerning the freeable memory, V$PROCESS_MEMORY documentation said:

    For the "Freeable" category, Column ALLOCATED is the amount of free PGA memory eligible to be released to the operating system.

Note that OutOfMemoryError process still holds the "Freeable" memory and does not immediately put them back to freelists since the process is still alive. It takes time for GC to reclaim them gradually. If the process (Oracle session) terminated (disconnected), it is immediately reusable by other processes.

You can query V$PROCESS_MEMORY or V$PROCESS to monitor it.

Crosschecking with Solaris pmap output:

oracle@test$ pmap -x 1122
1122:  testdb (LOCAL=NO)
         Address     Kbytes        RSS       Anon     Locked Mode   Mapped File
...         
0000000080000000    4964352    4964352          -    4964352 rwxs-    [ osm shmid=0x40000039 ]
...
FFFFFD785FC1D000     306816     306812     306812          - rw---    [ anon ]
FFFFFD787296D000     305792     305788     305788          - rw---    [ anon ]
FFFFFD78854BD000     297472     297452     297452          - rw---    [ anon ]
FFFFFD7897F1D000     296448     296412     296412          - rw---    [ anon ]
FFFFFD78AA76D000     295424     295420     295420          - rw---    [ anon ]
FFFFFD78BCECD000     294336     294332     294332          - rw---    [ anon ]
...
---------------- ---------- ---------- ---------- ----------
        total Kb   31990020   31933340   26615772    4982784

(you can even use Solaris MDB Formatting Dcmds to display content at one above virtual address space)

Incident file and pmap output show that the majority of memory are allocated as "anon" pages, but not removed from the mappings, which means that there are more mmap than munmap.

We can see this un-matching mmap and munmap with:

oracle@test$ truss -cp -t mmap,munmap 1122

syscall               seconds   calls  errors
mmap                     .056    1014
munmap                  6.447      68

-- Linux trace mmap and munmap: strace -e trace=memory -p pid

The output shows that munmap is 1716 (=(6.447/68)/(0.056/1014)) times expensive. That is probably one reason why the GC strategy is lazy delayed.

or watch the details by:

oracle@test$ truss -p -t mmap,munmap 1122  

...
mmap(0xFFFFFD7C6E8ED000, 202375168, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7C6E8ED000
mmap(0xFFFFFD7C625ED000, 203423744, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7C625ED000
mmap(0xFFFFFD7C561FD000, 204537856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7C561FD000
...
mmap(0xFFFFFD7A43ACD000, 243924992, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7A43ACD000
mmap(0xFFFFFD7A34F7D000, 244973568, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7A34F7D000
mmap(0xFFFFFD7A2632D000, 246022144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7A2632D000
mmap(0xFFFFFD7A175DD000, 247136256, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, 4294967295, 0) = 0xFFFFFD7A175DD000
munmap(0xFFFFFD7A2632D000, 246022144)           = 0
munmap(0xFFFFFD7A34F7D000, 244973568)           = 0
munmap(0xFFFFFD7A43ACD000, 243924992)           = 0
...

It shows that not all mapped chunks are un-mapped, and unmapping occurs in reverse order (last mapping is first unmapped).

With Solaris dtrace, we can count number of mmap and munmap, as well as sum of their amount (in MB). The large difference between mmap and munmap is probably pointing to the unreleased "Freeable" memory.

oracle@test$ dtrace -n 'syscall::mmap:entry,syscall::munmap:entry/pid==1122 && arg0 != 0x0/
{
 @CNT[probefunc] = count();
 @SUM[probefunc] = sum(arg1/1024/1024);
}
END
{
 printf("\n%10s %10s  %10s\n",  "NAME", "COUNT", "SUM(MB)");
 printa(  "%10s %10@d %10@d\n",         @CNT,   @SUM);
}'

------------ output -------------
      NAME      COUNT     SUM(MB)
    munmap         44       3987
      mmap        404      32400

There are 360 (404-44) more mmap, 32400 MB allocated, but only 3987 MB de-allocated. which results in 28413 (32400-3987) MB unreleased allocation.

With Section 7 appended Dtrace script, we can trace the details of each mmap and munmap. For the OutOfMemoryError execution:

Sqlplus > exec createBuffer512_loop(511, 1, 1);

    Step -- 1 --, Buffer Size (MB) = 1
    Step -- 2 --, Buffer Size (MB) = 2
    ...
    Step -- 262 --, Buffer Size (MB) = 262
    Step -- 263 --, Buffer Size (MB) = 263
    Step -- 264 --, Buffer Size (MB) = 264
    Step -- 265 --, Buffer Size (MB) = 265
    BEGIN createBuffer512_loop(511, 1, 1); END;
    
    *
    ERROR at line 1:
    ORA-29532: Java call terminated by uncaught Java exception: java.lang.OutOfMemoryError
    ORA-04030: out of process memory when trying to allocate 282220624 bytes (joxu pga heap,f:OldSpace)
    ORA-06512: at "S.CREATEBUFFER512", line 1
    ORA-06512: at "S.CREATEBUFFER512_LOOP", line 8
    ORA-06512: at line 1
Dtrace output looks like:
    
  => mmap  2017 Feb  5 14:16:01: mmap(addr=0xFFFF80FFBA5CF000, len=1114112, prot=0x3, flags=0x112, fildes=4294967295, off=0)
  <= mmap  2017 Feb  5 14:16:01: 0xFFFF80FFBA5CF000
**++** increase =   1114112(  1(MB)), mmap# = 1, mmap_MB = 1, munmap# = 0, munmap_MB = 0, sum = 1

  => mmap  2017 Feb  5 14:16:01: mmap(addr=0xFFFF80FFBA1AF000, len=1114112, prot=0x3, flags=0x112, fildes=4294967295, off=0)
  <= mmap  2017 Feb  5 14:16:01: 0xFFFF80FFBA1AF000
**++** increase =   1114112(  1(MB)), mmap# = 2, mmap_MB = 2, munmap# = 0, munmap_MB = 0, sum = 2

...

  => mmap  2017 Feb  5 14:19:43: mmap(addr=0xFFFF80FAA317F000, len=253493248, prot=0x3, flags=0x112, fildes=4294967295, off=0)
  <= mmap  2017 Feb  5 14:19:43: 0xFFFF80FAA317F000
**++** increase = 253493248(241(MB)), mmap# = 359, mmap_MB = 28726, munmap# = 44, munmap_MB = 8671, sum = 20055

  => munmap 2017 Feb  5 14:19:54: munmap(addr=0xFFFF80FAB24AF000, len=252444672)
  <= munmap 2017 Feb  5 14:19:54: 0.
**--** decrease = 252444672(240(MB)), mmap# = 359, mmap_MB = 28726, munmap# = 45, munmap_MB = 8911, sum = 19815

...

  => mmap  2017 Feb  5 14:20:04: mmap(addr=0xFFFF80FAE901F000, len=276955136, prot=0x3, flags=0x112, fildes=4294967295, off=0)
  <= mmap  2017 Feb  5 14:20:04: 0xFFFF80FAE901F000
**++** increase = 276955136(264(MB)), mmap# = 391, mmap_MB = 31308, munmap# = 58, munmap_MB = 11940, sum = 19368

  => mmap  2017 Feb  5 14:20:04: mmap(addr=0xFFFF80FAD864F000, len=278003712, prot=0x3, flags=0x112, fildes=4294967295, off=0)
  <= mmap  2017 Feb  5 14:20:04: 0xFFFF80FAD864F000
**++** increase = 278003712(265(MB)), mmap# = 393, mmap_MB = 31573, munmap# = 58, munmap_MB = 11940, sum = 19633

  => mmap  2017 Feb  5 14:20:06: mmap(addr=0xFFFF80FAB6FAF000, len=280100864, prot=0x3, flags=0x112, fildes=4294967295, off=0)
  <= mmap  2017 Feb  5 14:20:06: 0xFFFF80FAB6FAF000
**++** increase = 280100864(267(MB)), mmap# = 397, mmap_MB = 32106, munmap# = 58, munmap_MB = 11940, sum = 20166

  => mmap  2017 Feb  5 14:20:09: mmap(addr=0xFFFF80FA924DF000, len=281214976, prot=0x3, flags=0x112, fildes=4294967295, off=0)
  <= mmap  2017 Feb  5 14:20:09: 0xFFFF80FA924DF000
**++** increase = 281214976(268(MB)), mmap# = 399, mmap_MB = 32374, munmap# = 58, munmap_MB = 11940, sum = 20434

...

  :END
      NAME      COUNT     SUM(MB)
    munmap         93      15793
      mmap        405      32375
The error message indicates that OutOfMemoryError is raised when trying to allocate 282,220,624 bytes (joxu pga heap,f:OldSpace), where 282,220,624 bytes is about 269 MB. If we look the last line of Dtrace output, the last successful allocation is 268 MB, the next allocation of one more MB (p_delta of createBuffer512_loop call is 1 MB) hits the error.

Above createBuffer512_loop(511, 1, 1) is a loop of allocation with each step increasing 1 MB. It seems that each step requires a new memory chunk to be allocated since the previous step allocated memory is 1 MB smaller, which can not be reused and hence assigned to Category "Freeable". If such steps continue, in step 256, it will reach 32 GB (256*(1+256)/2=32,768 MB). But as shown above, RDBMS also makes un-allocation (munmap) to release certain allocated memory, that probably why it results in error in Step 265.

One worst speculation could also be that RDBMS only sums up allocated memory (ignore un-allocated), when reaching limit of 32 GB, it raises OutOfMemoryError.

The last line in Dtrace output shows: sum = 20,434 MB, which computed by 32,374 (mmap_MB) - 11,940 (munmap_MB). It corresponds to the UNIX command output:

  top.RES, prstat.RSS, "ps aux".RSS, "pmap -x".Anon total
In most tests, it is about 15% more than v$process_memory Category "Freeable" ("Freeable" is almost total of session PGA memory), for example:

select round(allocated/1e6) allocated_mb, v.* from v$process_memory v
where pid in (22) 
order by allocated desc;

  ALLOCATED_MB PID  SERIAL#  CATEGORY       ALLOCATED       Used  MAX_ALLOCATED
  ------------ ---- -------  --------  --------------  ---------  -------------
        17,180   22      53  Freeable  17,179,869,184          0  
           398   22      53  Other        397,914,636             3,533,074,452
             4   22      53  JAVA           4,167,792  4,160,648    548,833,672
             3   22      53  PL/SQL         2,622,904  2,533,008      2,924,192
             0   22      53  SQL              224,248     17,432      7,021,552
Note that v$process_memory.MAX_ALLOCATED is not filled (missed) for Category "Freeable", we will give a further look of this field in its historical view: dba_hist_process_mem_summary.

Probably We can say that this OutOfMemoryError is indeed a PGA Freeable memory problem, instead of OraleJVM Java.

For mmap prot / flags arguments, we can find the mapping in truss command output and sys/mman.h.
   
---------- truss output and mmap prot/flags  ----------
mmap(0xFFFF80FFA95D2000, 23461888, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFFFF80FFA95D2000

  prot=0x3    (PROT_READ|PROT_WRITE)
  flags=0x112 (MAP_PRIVATE|MAP_FIXED|MAP_ANON)

mmap(0xFFFF80FFB99E2000, 2818048, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_NORESERVE|MAP_ANON, -1, 12460032) = 0xFFFF80FFB99E2000

  prot=0x0    (PROT_NONE)
  flags=0x152 (MAP_PRIVATE|MAP_FIXED|MAP_NORESERVE|MAP_ANON)
  
---------- /usr/include/sys/mman.h ----------
#define PROT_READ       0x1             /* pages can be read */
#define PROT_WRITE      0x2             /* pages can be written */
#define PROT_EXEC       0x4             /* pages can be executed */

#define PROT_NONE       0x0             /* pages cannot be accessed */

/* sharing types:  must choose either SHARED or PRIVATE */
#define MAP_SHARED      1               /* share changes */
#define MAP_PRIVATE     2               /* changes are private */
#define MAP_TYPE        0xf             /* mask for share type */

/* other flags to mmap (or-ed in to MAP_SHARED or MAP_PRIVATE) */
#define MAP_FIXED       0x10            /* user assigns address */
#define MAP_NORESERVE   0x40            /* don't reserve needed swap area */
#define MAP_ANON        0x100           /* map anonymous pages directly */
#define MAP_ANONYMOUS   MAP_ANON        /* (source compatibility) */
Other dtrace commands can help us understand largeobj (oldSpace) allocation and GC activity.

oracle@test$ dtrace -n 'pid$target::eoa_new_largeobj:entry {@[pid, ustack(40, 0)] = count(); }' -p 1122

oracle@test$ dtrace -n 'pid$target::eoa_oldspace_gc_scan_xt:entry {@[pid, ustack(40, 0)] = count(); }' -p 1122

Two tests below have no problem since the first step already allocated the max needed memory, which is sufficient for the usage of all later steps (no more need to any allocation):

Sqlplus > exec createBuffer512_loop(511, 511, -1);

Sqlplus > exec createBuffer512_loop(511, 511, 0);

Both above tests propose us a quick workaround by introducing a class (static) variable and setting it to a enough big value as follows:
    final static byte[] SB = new byte[1024*1024*500];
Note that java class variable in Java Stored Procedure is stateful object, which survives across Java calls in the same RDBMS session (like PL/SQL Package variable), and can be reset by dbms_java.endsession (similar to dbms_session.reset_package), which clears any Java session state remaining from previous execution of Java in the current RDBMS session (introduced in 11g release 1, see Java Developer's Guide - 4.5 Two-Tier Duration for Java Session State ). Therefore, if JAVA VM memory is under pressure, dbms_java.endsession can be called after certain threshold to release the memory, a trade-off between Memory (JavaGC) and Performance.

Oracle Plsql starts OracleJVM by the call stack:

  kkxmjexe
  jox_invoke_java_
  joe_run_vm
JVM initializes Java session, loads classes, and finally clears session with:

  joevm_init_session_imcache       -- imcache in-memory cache
  joevm_in_clinit                  -- multi calls, one call for one class init
  joevm_clear_session_imcache
If we call dbms_java.endsession after the excution of java stored procedure (createBuffer512 in our example, see later test code), Java session state is cleaned (hence Java memory freed). The next excution of createBuffer512 makes kkxmjexe to re-execute above three steps (initialize, load, clear). hence performance sacrifice. It should be called in a controlled manner, see later example: procedure createBuffer512_loop calling dbms_java.endsession.

Here the call stacks of init_session and clear_session, which are re-run if Java session state is reset by dbms_java.endsession:

---------- joevm_init_session_imcache Call Stack ----------
#0  0x000000000585d5e0 in joevm_init_session_imcache ()
#1  0x00000000056ea471 in eoa_startup_default_objmems ()
#2  0x00000000056e9dd9 in ioesub_init_call ()
#3  0x00000000057304d2 in seoa_note_stack_outside ()
#4  0x00000000056e9ae1 in ioe_init_call ()
#5  0x0000000004d7ce51 in jox03_init_call ()
#6  0x0000000004d02f67 in jox_invoke_java_ ()
#7  0x0000000009c6db05 in kkxmjexe ()
#8  0x0000000004bed088 in kgmexcb ()
#9  0x000000000275a64b in kkxmswu ()
#10 0x0000000004beabc3 in kgmexwi ()
#11 0x0000000004be997a in kgmexec ()
#12 0x0000000011889684 in pefjavacal ()
#13 0x0000000005628dd1 in pefcal ()
#14 0x00000000054cda7b in pevm_FCAL ()
#15 0x00000000054aeb4e in pfrinstr_FCAL ()
#16 0x00000000129e612c in pfrrun_no_tool ()
#17 0x00000000129e4a96 in pfrrun ()
#18 0x00000000129edac0 in plsql_run ()

---------- joevm_clear_session_imcache Call Stack ----------
#0  0x0000000005859340 in joevm_clear_session_imcache ()
#1  0x00000000056e8209 in eoa_vm_call_cleanup ()
#2  0x00000000056e7b64 in ioesub_end_call ()
#3  0x00000000057304d2 in seoa_note_stack_outside ()
#4  0x00000000056e8809 in ioe_end_call ()
#5  0x0000000004c6a4b5 in joxenc ()
#6  0x0000000004c69609 in joxdlc_internal ()
#7  0x0000000004c68e73 in jox_flush_java_session ()
#8  0x0000000004d2cb10 in jox_ioe_call_java_ ()
#9  0x000000000c44c587 in pcklfun ()
#10 0x00000000056cb29b in spefcpfa ()
#11 0x000000000568d267 in spefmccallstd ()
#12 0x000000000562f45b in peftrusted ()
#13 0x0000000004113278 in psdexsp ()
#14 0x00000000126edae2 in rpiswu2 ()
#15 0x000000000373cdea in kxe_push_env_internal_pp_ ()
#16 0x00000000037a4bb5 in kkx_push_env_for_ICD_for_new_session ()
#17 0x0000000004112c63 in psdextp ()
#18 0x0000000005629427 in pefccal ()
#19 0x0000000005628cdf in pefcal ()
#20 0x00000000054cda7b in pevm_FCAL ()
#21 0x00000000054aeb4e in pfrinstr_FCAL ()
#22 0x00000000129e612c in pfrrun_no_tool ()
#23 0x00000000129e4a96 in pfrrun ()
#24 0x00000000129edac0 in plsql_run ()


5. Possible Causes and Fixes


When OracleJVM hitting OutOfMemoryError, it often falls into two v$process_memory categorie: Java or Freeable. Here two possible fixes.


5.1. Category JAVA


If v$process_memory for the session (Oracle PID) shows that high memory usage is Category JAVA, or incident file (ORA-04030: out of process memory) has the top consumer JAVA:

  100%   32 GB, 292 chunks: "f:OldSpace                "  JAVA
           joxu pga heap   ds=ffff80ffbc1abc60  dsprt=ffff80ffbacb9078
we can call dbms_java.endsession to clear any Java session state, which makes createBuffer512_loop(511, 1, 1) successfully completed:

create or replace procedure createBuffer512_loop 
  (p_cnt    number, p_start number, p_delta number, p_sleep_seconds number := 0) as
  l_size    number := p_start;
  l_ret     varchar2(100);
  l_chuncks number := 1;    --10  -- control dbms_java.endsession call frequency
begin
  for i in 1..p_cnt loop
    dbms_lock.sleep(p_sleep_seconds);
    dbms_output.put_line('Step -- '||i||' --, Buffer Size (MB) = '||l_size);
    createBuffer512(1024*1024*l_size);
    l_size := l_size + p_delta;
    if mod(i, l_chuncks) = 0 then
      l_ret  := dbms_java.endsession;
      --l_ret  := dbms_java.endsession_and_related_state;   
      --dbms_session.free_unused_user_memory;              -- no help
      dbms_output.put_line('dbms_java.endsession return = '||l_ret);
    end if;
  end loop;
end;
/

Sqlplus > exec createBuffer512_loop(511, 1, 1);
    Step -- 1 --, Buffer Size (MB) = 1
    dbms_java.endsession return = java session ended
    ...
    Step -- 510 --, Buffer Size (MB) = 510
    dbms_java.endsession return = java session ended
    Step -- 511 --, Buffer Size (MB) = 511
    dbms_java.endsession return = java session ended
    
    PL/SQL procedure successfully completed.


5.2. Category Freeable


If v$process_memory for the session (Oracle PID) shows that high memory usage is Category Freeable, or incident file (ORA-04030: out of process memory) has the top consumer "unsnapshotted":

  100%     17 GB,   36 processes: "unsnapshotted               "   
    0%   1247 KB,    1 process  : "f:OldSpace                  "  JAVA
             joxu pga heap        14 chunks
We can try to allocate max reuired memory in the very first call of OracleJVM, for example, above test:

  exec createBuffer512_loop(511, 511, 0);
We can also try to make certain pause (sleeping) and and expect that RDBMS can have time to collect the freeable space (not Java GC):

  exec createBuffer512_loop(511, 1, 1, 0.4);
As tested, this does not always work since the freeable space is marked as "unsnapshotted", and only released when the session disconnected.


6. Oracle Performance Views


Query GC statistics and process Java memory by:

select s.program, s.sid, n.name p_name, t.value, round(t.value/1e6, 2) mb
  from v$session s, v$sesstat t, v$statname n 
 where 1=1
   and s.sid=t.sid 
   and n.statistic# = t.statistic# 
   and name = 'java call heap gc count'
   and s.sid in (920);

select round(allocated/1e6) mb, v.* from v$process_memory v
 where pid in (66) 
 order by allocated desc;

select h.end_interval_time, allocated_max, num_processes, p.* 
  from dba_hist_process_mem_summary p, dba_hist_snapshot h 
 where h.snap_id = p.snap_id 
   and category ='JAVA' 
 order by h.end_interval_time desc;

Populate v$process_memory_detail by:
(see dbms_session.get_package_memory_utilization and limitations)

Sqlplus > exec pga_sampling(920, 120, 1);

and then run query:

select round(bytes/1024/1024, 2) mb, v.* from  process_memory_detail_v v
where 1=1
  and name in ('f:OldSpace', 'free memory')
  and round(bytes/1024/1024, 2) > 10
order by timestamp;

With following query, we can displays historical information about dynamic PGA memory usage and map them to AWR Section "Process Memory Summary" column names. One problem with this view is that max_allocated_max (AWR "Hist Max Alloc (MB)") is always missed for Category "Freeable" (that could be a bug). As previously discussed, this "Freeable" is exactly the figures about unreleased memory (leaks).

select h.end_interval_time, m.category
        -- total allocated for all processes in that Category at the snap_id
      ,round(allocated_total/1024/1024)   "Alloc (MB)"   
        -- max allocated for one single process in that Category at the snap_id       
      ,round(allocated_max/1024/1024)     "Max Alloc (MB)"    
        -- max ever allocated for one single process in that Category during snap interval  
        -- max of v$process_memory.MAX_ALLOCATED. It is null for Category "Freeable" 
      ,round(max_allocated_max/1024/1024) "Hist Max Alloc (MB)" 
      ,m.* 
from dba_hist_process_mem_summary m, dba_hist_snapshot h 
where m.snap_id = h.snap_id 
order by m.snap_id desc, m.category;
Oracle performance views v$process, v$process_memory reports "freeable" memory, whereas view:
   v$sesstat, v$active_session_history, dba_hist_active_sess_history
do not record this real figure, probably due to "Freeable" memory (27GB) which is signaled by the 27 GB "unsnapshotted" line in the incident file:
   99% 27 GB, 66 processes: "unsnapshotted "


7. Linux Test


All above tests are performed on Solaris. Repeat the same test on a Linux with 24GB of total usable memory.

Sqlplus > exec createBuffer512_loop(511, 1, 1);

    ORA-03113: end-of-file on communication channel
    Process ID: 32251

-- Linux trace all memory mapping related system calls: mmap and munmap
--    strace -e trace=memory -p 32251

There are no trace file or incident file generated. Checking dmesg out, it looks like that process is killed by PSP0 (call oom-killer) when reaching system limit of 24GB, but before reaching 32GB PGA limit.

[08:07:36] ora_psp0_td invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[08:07:36] ora_psp0_td cpuset=/ mems_allowed=0
[08:07:36] Pid: 23201, comm: ora_psp0_td Not tainted 2.6.32-642.6.2.el6.x86_64 #1
[08:07:36] Call Trace:
[08:07:36] [] ? dump_header+0x90/0x1b0
[08:07:36] [] ? security_real_capable_noaudit+0x3c/0x70
[08:07:36] [] ? oom_kill_process+0x82/0x2a0
[08:07:36] [] ? select_bad_process+0xe1/0x120
[08:07:36] [] ? out_of_memory+0x220/0x3c0
[08:07:36] [] ? __alloc_pages_nodemask+0x93c/0x950
[08:07:36] [] ? alloc_pages_current+0xaa/0x110
[08:07:36] [] ? __page_cache_alloc+0x87/0x90
[08:07:36] [] ? find_get_page+0x1e/0xa0
[08:07:36] [] ? filemap_fault+0x1a7/0x500
[08:07:36] [] ? __do_fault+0x54/0x530
[08:07:36] [] ? handle_pte_fault+0xf7/0xb20
[08:07:36] [] ? sem_lock+0x6c/0x130
[08:07:36] [] ? sys_semtimedop+0x338/0xae0
[08:07:36] [] ? handle_mm_fault+0x299/0x3d0
[08:07:36] [] ? __do_page_fault+0x146/0x500
[08:07:36] [] ? wait_consider_task+0x7e6/0xb20
[08:07:36] [] ? read_tsc+0x9/0x10
[08:07:36] [] ? ktime_get_ts+0xbf/0x100
[08:07:36] [] ? poll_select_copy_remaining+0xf8/0x150
[08:07:36] [] ? do_page_fault+0x3e/0xa0
[08:07:36] [] ? page_fault+0x25/0x30
...
[08:07:36] Out of memory: Kill process 32251 (oracle_32251_td) score 619 or sacrifice child
[08:07:36] Killed process 32251, UID 100, (oracle_32251_td) total-vm:20111724kB, anon-rss:15083248kB, file-rss:215188kB
[08:07:36] oracle_32251_td: page allocation failure. order:0, mode:0x280da
[08:07:36] Pid: 32251, comm: oracle_32251_td Not tainted 2.6.32-642.6.2.el6.x86_64 #1

Note Buddy memory request "order=0", and oom-killer reason: "score 619"


7. Dtrace for mmap and munmap



sudo dtrace -F -n '
BEGIN  {
   delta_size   = 0;
   mmap_mb      = 0;
   munmap_mb    = 0;
   mmap_count   = 0;
   munmap_count = 0;
} 
syscall::mmap:entry/pid == $1 && arg0 != 0x0/
{
   delta_size = arg1;    mmap_count++;
   mmap_mb    = mmap_mb + delta_size/1024/1024;
   printf("%Y: %s(addr=0x%X, len=%d, prot=0x%X, flags=0x%X, fildes=%d, off=%d)", 
           walltimestamp, probefunc, arg0, arg1, arg2, arg3, arg4, arg5);
   @CNT[probefunc] = count();
   @SUM[probefunc] = sum(arg1/1024/1024);
 }
syscall::mmap:return/pid == $1 && delta_size > 0/ 
{
   printf("%Y: 0x%X\n", walltimestamp, arg0);
   printf("**++** increase = %9d(%3d(MB)), mmap# = %d, mmap_MB = %d, munmap# = %d, munmap_MB = %d, sum = %d\n", 
           delta_size, delta_size/1024/1024, mmap_count, mmap_mb, munmap_count, munmap_mb, mmap_mb-munmap_mb);
   delta_size = 0;
}
syscall::munmap:entry/pid == $1/
{
   delta_size = arg1;    munmap_count++;
   munmap_mb   = munmap_mb + delta_size/1024/1024;
   printf("%Y: %s(addr=0x%X, len=%d)", walltimestamp, probefunc, arg0, arg1);
   @CNT[probefunc] = count();
   @SUM[probefunc] = sum(arg1/1024/1024);
}
syscall::munmap:return/pid == $1 && delta_size > 0/ 
{
   printf("%Y: %d.\n", walltimestamp, arg0);
   printf("**--** decrease = %9d(%3d(MB)), mmap# = %d, mmap_MB = %d, munmap# = %d, munmap_MB = %d, sum = %d\n", 
           delta_size, delta_size/1024/1024, mmap_count, mmap_mb, munmap_count, munmap_mb, mmap_mb-munmap_mb);
   delta_size = 0;
}
END
{
   printf("\n%10s %10s  %10s\n",  "NAME", "COUNT", "SUM(MB)");
   printa(  "%10s %10@d %10@d\n",         @CNT,   @SUM);
}' 1122


References


1. Database Java Developer's Guide Chapter 1 Introduction to Java in Oracle Database, Section Memory Spaces Management (Old Space)

2.Oracle Database Programming using Java and Web Services (Kuassi Mensah) Section 2.2 Java Memory Management (Page 41-49)

3. dbms_session.get_package_memory_utilization and limitations

4.Oracle Java OutOfMemoryError (11g Blog, no more reproducible in 12c)

5. PGA, SGA memory usage watching from UNIX