11g ADG 给了大家更多的灾备选择,包括之前美国飓风,给了ADG大放光彩的机会,但是ADG并没有想象中的那么完美,以下是最近的一个CASE,具体过程不详写了,目前这个case还在跟进中,记录之:
[oracle@dcb-srv-0174 trace]$ cat itemstd02_ora_22156.trc |grep "LibraryObjectLock" |awk '{print $3 "-- obj handle get"}' | uniq -c |awk '{ if ($1>100) print $0}' 107 Handle=0x2efee6a060-- obj handle get 107 Handle=0x2eff322578-- obj handle get 113 Handle=0x2eff54acf0-- obj handle get 118 Handle=0x2f1f0b89c8-- obj handle get 118 Handle=0x2f1fc813c0-- obj handle get 125 Handle=0x2f3e772c58-- obj handle get 127 Handle=0x2f1fccd750-- obj handle get 128 Handle=0x2f7efaa2a0-- obj handle get 129 Handle=0x2f7f831ac0-- obj handle get 130 Handle=0x2f3e171640-- obj handle get 130 Handle=0x2f7ee7a100-- obj handle get 131 Handle=0x2f7ee7a268-- obj handle get 132 Handle=0x2f7f855570-- obj handle get 147 Handle=0x2f1a8f0bf8-- obj handle get 147 Handle=0x2f3edd63d0-- obj handle get 150 Handle=0x2f3edd6620-- obj handle get 150 Handle=0x2f5ef05050-- obj handle get 150 Handle=0x2f5f213530-- obj handle get 154 Handle=0x2f3e81b988-- obj handle get 159 Handle=0x2f3e926498-- obj handle get 182 Handle=0x2eff03c3c0-- obj handle get 190 Handle=0x2f1f5a2858-- obj handle get 192 Handle=0x2f1f859808-- obj handle get 195 Handle=0x2fbf7f46f8-- obj handle get 195 Handle=0x2fbfe9eb80-- obj handle get 197 Handle=0x2f5ec64840-- obj handle get 200 Handle=0x2eff4e7698-- obj handle get 200 Handle=0x2f5f54e510-- obj handle get 202 Handle=0x2f1fee7520-- obj handle get 203 Handle=0x2eff975f98-- obj handle get 204 Handle=0x2eff0480e0-- obj handle get 204 Handle=0x2efff29160-- obj handle get 207 Handle=0x2f7eedaae0-- obj handle get 208 Handle=0x2f3fd32ef0-- obj handle get 208 Handle=0x2f3fd33510-- obj handle get 209 Handle=0x2f3fd33ef0-- obj handle get 209 Handle=0x2f3fd34140-- obj handle get 211 Handle=0x2f1f9ed050-- obj handle get 211 Handle=0x2f1fee7770-- obj handle get 212 Handle=0x2f1fc7ee38-- obj handle get 212 Handle=0x2f9fa5a770-- obj handle get 213 Handle=0x2efff297e0-- obj handle get 213 Handle=0x2f7fa54a28-- obj handle get 218 Handle=0x2f9fce2748-- obj handle get 231 Handle=0x2eff04a950-- obj handle get 247 Handle=0x2ef65d8a88-- obj handle get 264 Handle=0x2effd1c938-- obj handle get 268 Handle=0x2f3ee0e1a8-- obj handle get 268 Handle=0x2f3f6469a0-- obj handle get 270 Handle=0x2f1f8a6120-- obj handle get 270 Handle=0x2f1fcda868-- obj handle get 290 Handle=0x2f7f85ade0-- obj handle get 378 Handle=0x2f7f85b030-- obj handle get 468 Handle=0x2f3f28ec08-- obj handle get 697 Handle=0x2f3fd326e0-- obj handle get 800 Handle=0x2efca0bdd0-- obj handle get 847 Handle=0x2f9ff0e448-- obj handle get
[oracle@dcb-srv-0174 trace]$ cat temp.log |grep "library cache load lock" |wc -l && cat temp.log |grep "cursor: pin S" |wc -l 619 158 .......... ADDR 00002B4642274E78 INDX 51844 INST_ID 1 KGLHDADR 0000002F3F28EC08 KGLHDPAR 0000002F3F28EC08 KGLHDCLT 65657 KGLNAOWN SYS KGLNAOBJ TRG_SERVER_ERRORS KGLFNOBJ <CLOB> KGLNADLK KGLNAHSH 102957177 KGLNAHSV e230e86a06763591475191a006230079 KGLNATIM 27-10月-12 KGLNAPTM KGLHDNSP 3 KGLHDNSD TRIGGER KGLHDLMD 0 KGLHDPMD 0 KGLHDFLG 10241 KGLHDOBJ 0000002C9F3EA470 KGLHDLDC 1 KGLHDIVC 0 KGLHDEXC 0 KGLHDLKC 0 KGLHDKMK 0 KGLHDDMK 67 KGLHDAMK 0 KGLOBFLG 4 KGLOBSTA 1 KGLOBTYP 12 KGLOBTYD TRIGGER KGLOBHMK 0 KGLOBHS0 4712 KGLOBHS1 2480 KGLOBHS2 0 KGLOBHS3 0 KGLOBHS4 0 KGLOBHS5 0 KGLOBHS6 1032 KGLOBHS7 0 KGLOBHD0 0000002F3EE199D8 KGLOBHD1 0000002C9F3EABA0 KGLOBHD2 00 KGLOBHD3 00 KGLOBHD4 00 KGLOBHD5 00 KGLOBHD6 0000002C9F3EAC28 KGLOBHD7 00 KGLOBPC0 0 KGLOBPC6 0 KGLOBTP0 00 KGLOBT00 0 KGLOBT01 0 KGLOBT02 0 KGLOBT03 KGLOBT04 0 KGLOBT05 0 KGLOBT35 0 KGLOBT06 0 KGLOBT07 0 KGLOBT08 0 KGLOBT09 0 KGLOBT10 0 KGLOBT11 0 KGLOBT12 0 KGLOBT13 0 KGLOBT14 0 KGLOBT15 0 KGLOBT16 0 KGLOBT17 0 KGLOBT18 0 KGLOBT19 0 KGLOBT20 0 KGLOBT21 0 KGLOBT22 0 KGLOBT23 2382 KGLOBT24 1795 KGLOBT25 0 KGLOBT26 0 KGLOBT28 0 KGLOBT29 0 KGLOBT30 0 KGLOBT31 0 KGLOBT27 0 KGLOBT32 0 KGLOBT33 0 KGLOBWAP 0 KGLOBWCC 0 KGLOBWCL 0 KGLOBWUI 0 KGLOBWDW 0 KGLOBT42 0 KGLOBT43 0 KGLOBT44 0 KGLOBT45 0 KGLOBT46 0 KGLOBT47 0 KGLOBT49 0 KGLOBT50 0 KGLOBT52 0 KGLOBT53 0 KGLOBTL0 0 KGLOBTL1 0 KGLOBTS0 KGLOBTS1 KGLOBTN0 KGLOBTN1 KGLOBTN2 KGLOBTN3 KGLOBTN4 KGLOBTN5 KGLOBTS2 KGLOBTS3 KGLOBTS5 KGLOBTT0 KGLOBCCE KGLOBCCEH 0 KGLOBCLA KGLOBCLC 0 KGLOBCCC 0 KGLOBTS4 KGLOBCBCA KGLOBT48 0 KGLOBDSO 0 KGLOBDEX 0 KGLOBDPX 0 KGLOBDLD 0 KGLOBDIV 0 KGLOBDPS 0 KGLOBDDR 0 KGLOBDDW 0 KGLOBDBF 0 KGLOBDRO 0 KGLOBDCP 0 KGLOBDEL 0 KGLOBDFT 0 KGLOBDEF 0 KGLOBDUI 0 KGLOBDCL 0 KGLOBDAP 0 KGLOBDCC 0 KGLOBDPL 0 KGLOBDJV 0 KGLOBACS 0 KGLOBTS6 KGLOBTS7 KGLOBT54 0 KGLOBT55 0 KGLOBT56 0 KGLOBT57 0 KGLOBDCO 0 KGLOBDCI 0 KGLOBDRR 0 KGLOBDRB 0 KGLOBDWR 0 KGLOBDWB 0 KGLOBT58 0 KGLOBDOR 0 KGLHDMTX 0000002F3F28ED38 KGLHDMVL 0 KGLHDMSP 558 KGLHDMGT 667484 KGLHDDMTX 0000002F3F28ECB8 KGLHDDVL 0 KGLHDDSP 0 KGLHDDGT 0 KGLHDBID 65657 KGLHDBMTX 0000002E3546D210 KGLHDBVL 0 KGLHDBSP 3 KGLHDBGT 561277 KGLOBT59 0 KGLOBDCU 0 KGLOBPROP
问题SQL就不贴出来了,涉及到3张表的FTS,并发上来之后导致ADG 短暂的hang 死,当时的状态包括了很多类library cache lock(library cache load lock 与 cursor pin S) 的交互等待 。
对于这里涉及到的机制:
The connection between log file switches and the media recovery checkpoints is just a little messier than I’ve described, and 11g has introduced two new parameters, _defer_log_boundary_ckpt (defer media recovery checkpoint at log boundary) and _defer_log_count (number of log boundaries media recovery checkpoint lags behind), defining this behavior. Effectively these parameters allow Oracle to be lazy about copying buffers to disk on media recovery checkpoints until a few extra log files switches have taken place. This type of mechanism has been in place since 9i—but with no visible controls—presumably in the hope that a little extra delay at peak processing periods would allow incremental checkpointing to spread the write load that an immediate media recovery checkpoint would cause.
Mandatory checkpoint at every log boundary
Recovery slaves wait on log boundary checkpoint
No redo apply during log boundary checkpoint
Incremental recovery checkpoints keep log boundary checkpoint duration smaller Checkpoint performed prior to stopping MRP
Parameters “_defer_log_boundary_ckpt” and “_defer_log_count” define the behavior (not fully implemented in 11g)
Recovery until consistent required after crash of ADG
关于另外几个参数的说明:
_row_cache_cursors:参考 Steve Adams的Tuning the _row_cache_cursors
session_cached_cursor : 设置此参数用以缓解 library cache load lock.
_row_cr : SET FALSE 用以缓解 row CR 对于缓解dc_rollback_segments效果不明
_defer_log_count 针对 ADG target 端的checkpoint
_log_committime_block_cleanout 目前没有具体影响范围. (When it is set to TRUE then we will try to perform block cleanout when a transaction commits rather than applying delayed logging cleanout to the affected blocks.The commit causes us to walk the associated list of blocks state objects owned by the transaction and perform the relevant form of cleanout upon each block depending upon the parameter’s setting)
delayed logging behavior by JL:
When you commit, the commit cleanout updates some of the changed data blocks by stamping them with a couple of SCNs. But the commit still takes only 60 bytes of redo – the changes to the data blocks are not logged i.e. they are delayed.When the block is subsequently updated by another transaction, the ‘full’ cleanout of the ITL and its row locks takes place. However, if you examine the redo generated at this point, you will find that it has been engineered to look as if the new update started from a fully cleaned out block. In this respect, the logging of the commit cleanout has not been delayed, it has been totally bypassed.
Also by Steve : Read Consistency
_db_lost_write_checking and db_lost_write_protect can also change “_log_committime_block_cleanout”
Set “_log_committime_block_cleanout” to TRUE on primary if possible, redo overhead is less than 5%. (by paypal)
对于这个Case,目前采用TNS HA连接方式轮询多个可用的ADG,配合APP的自动failover.目前ADG 对于大并发垃圾SQL的支持是有一些问题,包括一些row cache latch 的竞争十分的激烈.可以考虑采用logical standby 或者ogg 来满足一些异步大SQL的查询。