环境描述:双节点rac 数据库版本10.2.0.5 通过awr发现down机时间段产生大量的cursor: pin S wait on X等待,同时伴随大量的library latch等待,通过与相关人员沟通得知,此库在每月的月头都会有大量的报表业务,从而负载将会增长几十倍之多。通过下面
的awr可以看出负载确实很高:

DB Name DB Id Instance Inst num Release RAC Host
PORTALDB 597889515 portaldb1 1 10.2.0.5.0 YES ptdb01

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 4115 02-Nov-11 11:00:13 209 131.6
End Snap: 4116 02-Nov-11 12:03:09 289 16.2
Elapsed:   62.93 (mins)    
DB Time:   15,687.33 (mins)    

通过对比awr与其他时间段的awr发现DBTIME 增加了几十倍 说明确实存在大量的业务

通过TOP5时间 发现最多的等待为cursor: pin S wait on X 同时伴随library cache等待:

Top 5 Timed Events

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
cursor: pin S wait on X 77,646,738 759,238 10 80.7 Concurrency
library cache lock 200,160 95,987 480 10.2 Concurrency
enq: TX – row lock contention 70,851 34,471 487 3.7 Application
CPU time   18,501   2.0  
library cache pin 5,680 14,825 2,610 1.6 Concurrency

SELECT COMPONENT ,OPER_TYPE,FINAL_SIZE Final,to_char(start_time,’dd-mon hh24:mi:ss’) Started FROM V$SGA_RESIZE_OPS查询
短时间内发现大量的shrink,grow,由此怀疑业务高峰期间大量的shrink,grow导致shard_pool的部分cursor被刷出,从而导致hard parse的增加进而导致library cache 等待的发生,频繁的shrink,grow也会导致cursor: pin S wait on X的发生。

通过沟通,夜里将关闭ASMM,并且通过当时awr报告计算出shared_pool,buffer_cache峰值,明天将继续观察负载情况。
另外在峰值时段发现大量append操作引发enq-TX等待也将与应用人员沟通

跟进今天的情况:

Snap Id Snap Time Sessions Cursors/Session
Begin Snap: 4139 03-Nov-11 11:00:18 118 81.5
End Snap: 4140 03-Nov-11 12:00:36 159 108.3
Elapsed:   60.31 (mins)    
DB Time:   428.83 (mins)    

从上述情况看到今天的负载小于昨天,下面是今天的等待事件:

Top 5 Timed Events

Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
CPU time   20,034   77.9  
db file sequential read 333,128 1,400 4 5.4 User I/O
direct path read temp 175,492 434 2 1.7 User I/O
db file scattered read 239,062 261 1 1.0 User I/O
gc current block 2-way 331,161 231 1 .9 Cluster

可以看到昨天的cursor等待消失

Wait Events

  • s – second
  • cs – centisecond – 100th of a second
  • ms – millisecond – 1000th of a second
  • us – microsecond – 1000000th of a second
  • ordered by wait time desc, waits desc (idle events last)
  • %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0

Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn
db file sequential read 333,128   1,400 4 14.37
direct path read temp 175,492   434 2 7.57
db file scattered read 239,062   261 1 10.31
gc current block 2-way 331,161   231 1 14.28
gc cr multi block request 302,696   129 0 13.06
log file sequential read 6,144   106 17 0.27
log file parallel write 148,777   101 1 6.42
direct path write temp 186,873   88 0 8.06
gcs log flush sync 161,961 0 77 0 6.99
gc cr block 2-way 105,826   71 1 4.56
gc cr grant 2-way 127,157   63 0 5.48
log file sync 18,846 1 49 3 0.81
db file parallel read 13,520   29 2 0.58
CSS operation: query 1,230   22 18 0.05
control file sequential read 36,492   20 1 1.57
CSS initialization 410   18 44 0.02
gc current grant busy 26,892 0 17 1 1.16
gc current grant 2-way 42,689   13 0 1.84
gc buffer busy 6,259 0 9 1 0.27
control file parallel write 1,869   5 3 0.08
read by other session 846   5 6 0.04
cursor: pin S wait on X 434 100 4 10 0.02
latch: cache buffers chains 6,638   3 1 0.29
enq: TX – row lock contention 12 33 3 252 0.00
gc cr block busy 995   3 3 0.04
SQL*Net message to client 3,549,215   3 0 153.09
enq: WF – contention 14 29 2 145 0.00
DFS lock handle 2,376   1 1 0.10
direct path read 3,331   1 0 0.14
gc current block busy 199   1 6 0.01
row cache lock 4,347   1 0 0.19
cr request retry 1 100 1 977 0.00
enq: KO – fast object checkpoint 38 3 1 25 0.00
gc current multi block request 7,843   1 0 0.34
log buffer space 4   1 221 0.00
latch: KCL gc element parent latch 74   1 10 0.00
SQL*Net more data from client 7,756   1 0 0.33
gc current block congested 37   1 15 0.00
latch free 22   1 23 0.00
Log archive I/O 6,119   0 0 0.26
CGS wait for IPC msg 47,974 99 0 0 2.07
SQL*Net break/reset to client 2,210   0 0 0.10
enq: FB – contention 1,560   0 0 0.07
log file switch completion 20   0 18 0.00
CSS operation: action 807   0 0 0.03
gc cr block congested 11   0 25 0.00
gc cr grant congested 16   0 17 0.00
os thread startup 10   0 23 0.00
SQL*Net more data to client 15,643   0 0 0.67
kksfbc child completion 3 100 0 49 0.00
reliable message 306   0 0 0.01
direct path write 618   0 0 0.03
buffer busy waits 3,383   0 0 0.15
enq: PS – contention 203   0 0 0.01
LGWR wait for redo copy 8,131   0 0 0.35
ksxr poll remote instances 10,924 56 0 0 0.47
enq: CF – contention 157   0 1 0.01
library cache pin 301   0 0 0.01
PX Deq: reap credit 6,959 98 0 0 0.30
enq: TX – index contention 48   0 2 0.00
rdbms ipc reply 351   0 0 0.02
SQL*Net message from dblink 66   0 1 0.00
library cache lock 144   0 0 0.01
PX Deq Credit: send blkd 196 13 0 0 0.01
enq: HW – contention 104   0 0 0.00
enq: TM – contention 123   0 0 0.01
latch: cache buffers lru chain 112   0 0 0.00
gc current block unknown 49   0 0 0.00
IPC send completion sync 63 100 0 0 0.00
PX Deq: Signal ACK 203 39 0 0 0.01
latch: object queue header operation 29   0 1 0.00
latch: shared pool 42   0 0 0.00
gc current split 2   0 4 0.00
gc current grant congested 1   0 8 0.00
gc current retry 3   0 2 0.00
KJC: Wait for msg sends to complete 7   0 0 0.00
enq: TA – contention 11   0 0 0.00
latch: row cache objects 43   0 0 0.00
log file single write 12   0 0 0.00
ges inquiry response 6   0 0 0.00
latch: session allocation 3   0 1 0.00
enq: TD – KTF dump entries 6   0 0 0.00
latch: checkpoint queue latch 8   0 0 0.00
latch: library cache lock 7   0 0 0.00
enq: JS – job run lock – synchronize 2   0 1 0.00
enq: WL – contention 3   0 0 0.00
enq: TX – allocate ITL entry 3   0 0 0.00
enq: TT – contention 2   0 0 0.00
enq: MW – contention 2   0 0 0.00
enq: DR – contention 2   0 0 0.00
cursor: pin S 309   0 0 0.01
latch: library cache 3   0 0 0.00
SQL*Net more data to dblink 6   0 0 0.00
latch: gcs resource hash 3   0 0 0.00
latch: redo allocation 3   0 0 0.00
SQL*Net message to dblink 63   0 0 0.00
latch: ges resource hash list 1   0 0 0.00
lock escalate retry 7 86 0 0 0.00
SQL*Net more data from dblink 3   0 0 0.00
latch: messages 1   0 0 0.00
lock deadlock retry 1 100 0 0 0.00
cursor: mutex S 1   0 0 0.00
SQL*Net message from client 3,550,480   340,614 96 153.14
gcs remote message 2,022,558 19 17,483 9 87.24
virtual circuit status 121 100 3,540 29258 0.01
ges remote message 64,294 67 3,531 55 2.77
Streams AQ: qmn slave idle wait 128   3,531 27587 0.01
Streams AQ: qmn coordinator idle wait 265 48 3,531 13325 0.01
PX Idle Wait 1,476 93 3,530 2391 0.06
ASM background timer 806   3,528 4377 0.03
class slave wait 633 94 3,527 5573 0.03
Streams AQ: waiting for time management or cleanup tasks 12 100 3,464 288641 0.00
DIAG idle wait 15,062 4 2,928 194 0.65
i/o slave wait 580,651   1,423 2 25.05
PX Deq: Execute Reply 2,072 48 218 105 0.09
jobq slave wait 40 100 117 2930 0.00
PX Deq: Parse Reply 229 34 1 4 0.01
PX Deq: Execution Msg 431 8 1 2 0.02
PX Deq: Join ACK 242 38 0 1 0.01
KSV master wait 32   0 7 0.00
single-task message 3   0 32 0.00
PX Deq: Msg Fragment 95 9 0 0 0.00
PX Deq Credit: need buffer 44 11 0 0 0.00
Streams AQ: RAC qmn coordinator idle wait 265 100 0 0 0.01
PX Deq: Table Q Normal 8   0 0 0.00