11g ADG 给了大家更多的灾备选择,包括之前美国飓风,给了ADG大放光彩的机会,但是ADG并没有想象中的那么完美,以下是最近的一个CASE,具体过程不详写了,目前这个case还在跟进中,记录之:

[oracle@dcb-srv-0174 trace]$ cat itemstd02_ora_22156.trc |grep "LibraryObjectLock" |awk '{print $3 "-- obj handle get"}' | uniq -c |awk  '{ if ($1>100) print $0}'
    107 Handle=0x2efee6a060-- obj handle get
    107 Handle=0x2eff322578-- obj handle get
    113 Handle=0x2eff54acf0-- obj handle get
    118 Handle=0x2f1f0b89c8-- obj handle get
    118 Handle=0x2f1fc813c0-- obj handle get
    125 Handle=0x2f3e772c58-- obj handle get
    127 Handle=0x2f1fccd750-- obj handle get
    128 Handle=0x2f7efaa2a0-- obj handle get
    129 Handle=0x2f7f831ac0-- obj handle get
    130 Handle=0x2f3e171640-- obj handle get
    130 Handle=0x2f7ee7a100-- obj handle get
    131 Handle=0x2f7ee7a268-- obj handle get
    132 Handle=0x2f7f855570-- obj handle get
    147 Handle=0x2f1a8f0bf8-- obj handle get
    147 Handle=0x2f3edd63d0-- obj handle get
    150 Handle=0x2f3edd6620-- obj handle get
    150 Handle=0x2f5ef05050-- obj handle get
    150 Handle=0x2f5f213530-- obj handle get
    154 Handle=0x2f3e81b988-- obj handle get
    159 Handle=0x2f3e926498-- obj handle get
    182 Handle=0x2eff03c3c0-- obj handle get
    190 Handle=0x2f1f5a2858-- obj handle get
    192 Handle=0x2f1f859808-- obj handle get
    195 Handle=0x2fbf7f46f8-- obj handle get
    195 Handle=0x2fbfe9eb80-- obj handle get
    197 Handle=0x2f5ec64840-- obj handle get
    200 Handle=0x2eff4e7698-- obj handle get
    200 Handle=0x2f5f54e510-- obj handle get
    202 Handle=0x2f1fee7520-- obj handle get
    203 Handle=0x2eff975f98-- obj handle get
    204 Handle=0x2eff0480e0-- obj handle get
    204 Handle=0x2efff29160-- obj handle get
    207 Handle=0x2f7eedaae0-- obj handle get
    208 Handle=0x2f3fd32ef0-- obj handle get
    208 Handle=0x2f3fd33510-- obj handle get
    209 Handle=0x2f3fd33ef0-- obj handle get
    209 Handle=0x2f3fd34140-- obj handle get
    211 Handle=0x2f1f9ed050-- obj handle get
    211 Handle=0x2f1fee7770-- obj handle get
    212 Handle=0x2f1fc7ee38-- obj handle get
    212 Handle=0x2f9fa5a770-- obj handle get
    213 Handle=0x2efff297e0-- obj handle get
    213 Handle=0x2f7fa54a28-- obj handle get
    218 Handle=0x2f9fce2748-- obj handle get
    231 Handle=0x2eff04a950-- obj handle get
    247 Handle=0x2ef65d8a88-- obj handle get
    264 Handle=0x2effd1c938-- obj handle get
    268 Handle=0x2f3ee0e1a8-- obj handle get
    268 Handle=0x2f3f6469a0-- obj handle get
    270 Handle=0x2f1f8a6120-- obj handle get
    270 Handle=0x2f1fcda868-- obj handle get
    290 Handle=0x2f7f85ade0-- obj handle get
    378 Handle=0x2f7f85b030-- obj handle get
    468 Handle=0x2f3f28ec08-- obj handle get
    697 Handle=0x2f3fd326e0-- obj handle get
    800 Handle=0x2efca0bdd0-- obj handle get
    847 Handle=0x2f9ff0e448-- obj handle get
[oracle@dcb-srv-0174 trace]$ cat temp.log |grep "library cache load lock" |wc -l && cat temp.log |grep  "cursor: pin S" |wc -l
619
158

..........


ADDR	00002B4642274E78
INDX	51844
INST_ID	1
KGLHDADR	0000002F3F28EC08
KGLHDPAR	0000002F3F28EC08
KGLHDCLT	65657
KGLNAOWN	SYS
KGLNAOBJ	TRG_SERVER_ERRORS
KGLFNOBJ	<CLOB>
KGLNADLK	
KGLNAHSH	102957177
KGLNAHSV	e230e86a06763591475191a006230079
KGLNATIM	27-10月-12
KGLNAPTM	
KGLHDNSP	3
KGLHDNSD	TRIGGER
KGLHDLMD	0
KGLHDPMD	0
KGLHDFLG	10241
KGLHDOBJ	0000002C9F3EA470
KGLHDLDC	1
KGLHDIVC	0
KGLHDEXC	0
KGLHDLKC	0
KGLHDKMK	0
KGLHDDMK	67
KGLHDAMK	0
KGLOBFLG	4
KGLOBSTA	1
KGLOBTYP	12
KGLOBTYD	TRIGGER
KGLOBHMK	0
KGLOBHS0	4712
KGLOBHS1	2480
KGLOBHS2	0
KGLOBHS3	0
KGLOBHS4	0
KGLOBHS5	0
KGLOBHS6	1032
KGLOBHS7	0
KGLOBHD0	0000002F3EE199D8
KGLOBHD1	0000002C9F3EABA0
KGLOBHD2	00
KGLOBHD3	00
KGLOBHD4	00
KGLOBHD5	00
KGLOBHD6	0000002C9F3EAC28
KGLOBHD7	00
KGLOBPC0	0
KGLOBPC6	0
KGLOBTP0	00
KGLOBT00	0
KGLOBT01	0
KGLOBT02	0
KGLOBT03	
KGLOBT04	0
KGLOBT05	0
KGLOBT35	0
KGLOBT06	0
KGLOBT07	0
KGLOBT08	0
KGLOBT09	0
KGLOBT10	0
KGLOBT11	0
KGLOBT12	0
KGLOBT13	0
KGLOBT14	0
KGLOBT15	0
KGLOBT16	0
KGLOBT17	0
KGLOBT18	0
KGLOBT19	0
KGLOBT20	0
KGLOBT21	0
KGLOBT22	0
KGLOBT23	2382
KGLOBT24	1795
KGLOBT25	0
KGLOBT26	0
KGLOBT28	0
KGLOBT29	0
KGLOBT30	0
KGLOBT31	0
KGLOBT27	0
KGLOBT32	0
KGLOBT33	0
KGLOBWAP	0
KGLOBWCC	0
KGLOBWCL	0
KGLOBWUI	0
KGLOBWDW	0
KGLOBT42	0
KGLOBT43	0
KGLOBT44	0
KGLOBT45	0
KGLOBT46	0
KGLOBT47	0
KGLOBT49	0
KGLOBT50	0
KGLOBT52	0
KGLOBT53	0
KGLOBTL0	0
KGLOBTL1	0
KGLOBTS0	
KGLOBTS1	
KGLOBTN0	
KGLOBTN1	
KGLOBTN2	
KGLOBTN3	
KGLOBTN4	
KGLOBTN5	
KGLOBTS2	
KGLOBTS3	
KGLOBTS5	
KGLOBTT0	
KGLOBCCE	
KGLOBCCEH	0
KGLOBCLA	
KGLOBCLC	0
KGLOBCCC	0
KGLOBTS4	
KGLOBCBCA	
KGLOBT48	0
KGLOBDSO	0
KGLOBDEX	0
KGLOBDPX	0
KGLOBDLD	0
KGLOBDIV	0
KGLOBDPS	0
KGLOBDDR	0
KGLOBDDW	0
KGLOBDBF	0
KGLOBDRO	0
KGLOBDCP	0
KGLOBDEL	0
KGLOBDFT	0
KGLOBDEF	0
KGLOBDUI	0
KGLOBDCL	0
KGLOBDAP	0
KGLOBDCC	0
KGLOBDPL	0
KGLOBDJV	0
KGLOBACS	0
KGLOBTS6	
KGLOBTS7	
KGLOBT54	0
KGLOBT55	0
KGLOBT56	0
KGLOBT57	0
KGLOBDCO	0
KGLOBDCI	0
KGLOBDRR	0
KGLOBDRB	0
KGLOBDWR	0
KGLOBDWB	0
KGLOBT58	0
KGLOBDOR	0
KGLHDMTX	0000002F3F28ED38
KGLHDMVL	0
KGLHDMSP	558
KGLHDMGT	667484
KGLHDDMTX	0000002F3F28ECB8
KGLHDDVL	0
KGLHDDSP	0
KGLHDDGT	0
KGLHDBID	65657
KGLHDBMTX	0000002E3546D210
KGLHDBVL	0
KGLHDBSP	3
KGLHDBGT	561277
KGLOBT59	0
KGLOBDCU	0
KGLOBPROP

问题SQL就不贴出来了,涉及到3张表的FTS,并发上来之后导致ADG 短暂的hang 死,当时的状态包括了很多类library cache lock(library cache load lock 与 cursor pin S) 的交互等待 。

对于这里涉及到的机制:

The connection between log file switches and the media recovery checkpoints is just a little messier than I’ve described, and 11g has introduced two new parameters, _defer_log_boundary_ckpt (defer media recovery checkpoint at log boundary) and _defer_log_count (number of log boundaries media recovery checkpoint lags behind), defining this behavior. Effectively these parameters allow Oracle to be lazy about copying buffers to disk on media recovery checkpoints until a few extra log files switches have taken place. This type of mechanism has been in place since 9i—but with no visible controls—presumably in the hope that a little extra delay at peak processing periods would allow incremental checkpointing to spread the write load that an immediate media recovery checkpoint would cause.

Mandatory checkpoint at every log boundary

Recovery slaves wait on log boundary checkpoint

No redo apply during log boundary checkpoint

Incremental recovery checkpoints keep log boundary checkpoint duration smaller Checkpoint performed prior to stopping MRP

Parameters “_defer_log_boundary_ckpt” and “_defer_log_count” define the behavior (not fully implemented in 11g)

Recovery until consistent required after crash of ADG

关于另外几个参数的说明:

_row_cache_cursors:参考 Steve Adams的Tuning the _row_cache_cursors

session_cached_cursor : 设置此参数用以缓解 library cache load lock.

_row_cr : SET FALSE 用以缓解 row CR 对于缓解dc_rollback_segments效果不明

_defer_log_count 针对 ADG target 端的checkpoint

_log_committime_block_cleanout 目前没有具体影响范围. (When it is set to TRUE then we will try to perform block cleanout when a transaction commits rather than applying delayed logging cleanout to the affected blocks.The commit causes us to walk the associated list of blocks state objects owned by the transaction and perform the relevant form of cleanout upon each block depending upon the parameter’s setting)

delayed logging behavior by JL:

When you commit, the commit cleanout updates some of the changed data blocks by stamping them with a couple of SCNs. But the commit still takes only 60 bytes of redo – the changes to the data blocks are not logged i.e. they are delayed.When the block is subsequently updated by another transaction, the ‘full’ cleanout of the ITL and its row locks takes place. However, if you examine the redo generated at this point, you will find that it has been engineered to look as if the new update started from a fully cleaned out block. In this respect, the logging of the commit cleanout has not been delayed, it has been totally bypassed.

Also by Steve : Read Consistency

_db_lost_write_checking and db_lost_write_protect can also change “_log_committime_block_cleanout”

Set “_log_committime_block_cleanout” to TRUE on primary if possible, redo overhead is less than 5%. (by paypal)

对于这个Case,目前采用TNS HA连接方式轮询多个可用的ADG,配合APP的自动failover.目前ADG 对于大并发垃圾SQL的支持是有一些问题,包括一些row cache latch 的竞争十分的激烈.可以考虑采用logical standby 或者ogg 来满足一些异步大SQL的查询。