前天处理了一启RAC ASM 内存泄露问题,系统为公司最后一套RAC one node 版本为11.2.0.3
这套核心RAC one node系统出现大量 log file switch buffer busy waits等待,并伴随大量cursor: pin S wait on X read by other session。
SID USERNAME MACHINE EVENT PARAM W WT SQL ST LT LOGON_TIME ------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ---------- 10917 PROD_DATA2 JDBC Thin Client buffer busy waits 14/907008/8 0 294 /bgx0f86ucnw8v A 355 7628 13549 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 172 /520mkxqpf15q8 A 294 20514 4456 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 6rsq14x8a7n5r/dsxmfbstazq7u A 525 94053 4521 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 6rsq14x8a7n5r/dsxmfbstazq7u A 508 94052 6831 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 1k8vrykzkvccg/d5dfm29f41s0d A 739 1400 3170 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:12: 1397817350/21363/0 0 391 4uj3xtfzq3a91/gky4m8njtfrcd A 392 536146 4611 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 181 fqc7yvkzxjzk7/cqb6f9fkfkcsq A 725 932 13448 PROD_DATA2 JDBC Thin Client buffer busy waits 56/645165/1 0 730 fqc7yvkzxjzk7/cqb6f9fkfkcsq A 730 736 940 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 181 fqc7yvkzxjzk7/cqb6f9fkfkcsq A 710 953 8026 PROD_DATA2 JDBC Thin Client buffer busy waits 20/279682/8 0 441 3f0thwr0rg23c/c4h5dg81ffxyr A 441 94043 9290 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 178 3f0thwr0rg23c/c4h5dg81ffxyr A 220 473 2750 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 199 b3dmt8325pmvf/bpk6g015uzqzt A 417 81687 3510 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 223 c5n6d774as72b/5ak0xdkuvc65n A 273 536146 2323 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 172 c5n6d774as72b/5ak0xdkuvc65n A 407 536140 3767 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 725 5q60jbr6700u7/g2th2agwfkk1t A 725 1133 10205 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 717 5q60jbr6700u7/g2th2agwfkk1t A 717 1186 10046 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 670 5q60jbr6700u7/g2th2agwfkk1t A 670 1186 10044 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 689 5q60jbr6700u7/g2th2agwfkk1t A 689 24971 6395 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35201/8 0 693 5q60jbr6700u7/g2th2agwfkk1t A 693 1522 9535 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 703 5q60jbr6700u7/g2th2agwfkk1t A 703 42595 6917 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 5q60jbr6700u7/g2th2agwfkk1t A 709 1221 2902 PROD_DATA2 JDBC Thin Client buffer busy waits 21/35200/8 0 700 5q60jbr6700u7/g2th2agwfkk1t A 700 2146 9554 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 5q60jbr6700u7/g2th2agwfkk1t A 726 2147 11662 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:73: 1397817350/21911/0 0 377 cg3urj768yv8g/apun8g5mb9byt A 378 18995 9378 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 cg3urj768yv8g/7g63srczfgfd6 A 627 19057 8698 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:73: 1397817350/21911/0 0 551 cg3urj768yv8g/5smgjxsscf91w A 551 18996 7575 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 204 cg3urj768yv8g/793hdqw3bn04q A 204 19035 ... 13028 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 623 cg3urj768yv8g/8s97vpbnuktp2 A 623 19061 ... 6659 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 615 cg3urj768yv8g/8s97vpbnuktp2 A 616 19035 5187 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:56: 1397817350/21911/0 0 418 cg3urj768yv8g/793hdqw3bn04q A 418 19069 4933 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 576 cg3urj768yv8g/9quv2rcpqjxwt A 576 18977 4346 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:0: 1397817350/21911/0 0 219 cg3urj768yv8g/8s97vpbnuktp2 A 546 943 .... 4082 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:14: 1397817350/21911/0 0 460 cg3urj768yv8g/520mkxqpf15q8 A 460 1522 7267 PROD_DATA2 JDBC Thin Client enq: SQ - contention:SQ-6:73: 1397817350/21911/0 0 415 cg3urj768yv8g/8s97vpbnuktp2 A 415 919 9532 PROD_DATA2 JDBC Thin Client DFS lock handle 1398145029/21916/0 0 353 2xw94b38cybhv/4001f9ha480pb A 354 358380 9716 PROD_DATA2 JDBC Thin Client DFS lock handle 1398145029/21916/0 0 264 2xw94b38cybhv/4001f9ha480pb A 265 2147 512 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 fhd852bb8agv6/gq6xnzt1spkzp A 654 533810 7001 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 178 4q0c1a3bsynyw/guwq1fpq9r1dy A 261 536140 11348 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 d3wp3qvbz5bna/1w0c0jhzmp1hh A 226 7299 11164 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 9fwybsrc3h1dd/70k37x7hwg0w5 A 701 1185 3096 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 220 9fwybsrc3h1dd/70k37x7hwg0w5 A 510 1522 8100 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 181 91r0dubcptymp/dsxmfbstazq7u A 649 95706 8192 PROD_DATA2 JDBC Thin Client buffer busy waits 24/274502/1 0 645 91r0dubcptymp/dsxmfbstazq7u A 645 94041 8783 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 91r0dubcptymp/dsxmfbstazq7u A 664 94039 6667 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 91r0dubcptymp/dsxmfbstazq7u A 685 1272 5445 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 91r0dubcptymp/dsxmfbstazq7u A 688 1185 38 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 199 91r0dubcptymp/dsxmfbstazq7u A 695 1263 9386 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 181 91r0dubcptymp/dsxmfbstazq7u A 685 95693 8868 PROD_DATA2 JDBC Thin Client buffer busy waits 53/118272/8 0 675 ghy358rghc4ty/3p3z1j33tk594 A 675 486608 6475 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 ghy358rghc4ty/3p3z1j33tk594 A 706 486608 6899 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 178 ftmka9gnhru7t/1w0c0jhzmp1hh A 738 442302 4003 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 8a5g4frpyags9/1w0c0jhzmp1hh A 401 435103 SID USERNAME MACHINE EVENT PARAM W WT SQL ST LT LOGON_TIME ------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ---------- 861 PROD_DATA2 dcb-srv-0343-vm02 buffer busy waits 60/1078781/1 0 504 fjnq76rpzzpdy/a6tzf0tjrwvqh A 505 20966 12361 PROD_DATA2 xen-4-211-vm05 buffer busy waits 60/1078781/1 0 402 fjnq76rpzzpdy/a6tzf0tjrwvqh A 403 20976 6213 TOAD JDBC Thin Client buffer busy waits 41/34704/571 0 542 5kjukxbrv5w3r/dbkpyhyfyvdpu A 542 1577749 7513 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 168 cvvzaj3syc76b/1w0c0jhzmp1hh A 703 502831 8015 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 175 1tg6ux7vc22zm/32bjxyxc0xwfs A 463 899 10817 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 205 1tg6ux7vc22zm/32bjxyxc0xwfs A 720 18977 12268 PROD_DATA2 JDBC Thin Client buffer busy waits 57/1302268/1 0 283 3szwvvzw5aq13/bpaj7frdt63mx A 284 82802 5959 PROD_DATA2 JDBC Thin Client log file switch (archiving nee 0/0/0 0 178 3szwvvzw5aq13/bpaj7frdt63mx A 583 75654 10975 PROD_DATA2 JDBC Thin Client read by other session 27/223916/1 0 2 2b47rbdt23jp8/a4td25ftgtqt8 A 40 51 10994 PROD_DATA2 JDBC Thin Client read by other session 27/223916/1 0 2 2b47rbdt23jp8/459f3z9u4fb3u A 34 34 10999 PROD_DATA2 JDBC Thin Client read by other session 27/223916/1 0 2 2b47rbdt23jp8/a4td25ftgtqt8 A 30 30 11015 PROD_DATA2 JDBC Thin Client read by other session 27/223916/1 0 2 2b47rbdt23jp8/a4td25ftgtqt8 A 8 9 4688 PROD_DATA2 JDBC Thin Client read by other session 20/1806690/1 0 2 4gzs73jtmhyd0/f2n7vrpx4v30v A 4 10 4626 PROD_DATA2 JDBC Thin Client db file sequential read 27/630098/1 0 0 11n7q3tu19vvp/adghpykwhquu2 A 1 23 10566 PROD_DATA2 JDBC Thin Client db file sequential read 57/780833/1 0 0 50731ypu4mu31/adghpykwhquu2 A 1 34 12527 PROD_DATA2 xen-21205-vm02 db file sequential read 28/168522/1 0 0 7szz51xufmuc4/520mkxqpf15q8 A 5 33 5489 PROD_DATA2 JDBC Thin Client read by other session 60/931838/1 0 1 1zbqmb9ukkhpd/459f3z9u4fb3u A 13 13 5063 PROD_DATA2 JDBC Thin Client db file sequential read 17/871948/1 0 2 g8jjuy5wt7h92/64yswjj6cdv5j A 12 16 SID USERNAME MACHINE EVENT PARAM W WT SQL ST LT LOGON_TIME ------ ---------- ---------------------- ------------------------------ -------------------- ---- ---- ---------------------------- -- ------ ---------- 10302 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2044424941/278013233 0 44 42zra6dwxqwrd/cnm20xg9c333h A 44 44 07008/21474836480 11330 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2044424941/278013233 0 15 42zra6dwxqwrd/523rw17fc75fb A 14 38 07008/21474836480 ..... 11698 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2044424941/278013233 0 2 42zra6dwxqwrd/459f3z9u4fb3u A 1 2 07008/21474836480 6473 PROD_DATA2 JDBC Thin Client read by other session 6/4044/1 0 0 42zra6dwxqwrd/gzcbp0razp0vn A 44 46 10136 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2044424941/278013233 0 25 42zra6dwxqwrd/523rw17fc75fb A 25 41 07008/21474836480 36320/21474836480 3669 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2179751475/452904301 0 3 2vncw4q0ysrjm/g6btczrdb4fa1 A 3 47 36320/21474836480 4854 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2179751475/452904301 0 3 2vncw4q0ysrjm/g6btczrdb4fa1 A 3 52 36320/21474836480 265 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2179751475/452904301 0 3 2vncw4q0ysrjm/g6btczrdb4fa1 A 3 50 36320/21474836480 9355 PROD_DATA2 JDBC Thin Client cursor: pin S wait on X 2179751475/452904301 0 3 2vncw4q0ysrjm/g6btczrdb4fa1 A 3 52 ...... 36320/21474836480 4764 rows selected.
主要原因archive进程无法归档。ASM无法分配新的文件标识。
ASMCMD [+arch/item/archivelog/2012_10_19] > cp thread_1_seq_3.257.797108745 aaaaa.test; ASMCMD-8012: can not determine file type for file ORA-00569: Failed to acquire global enqueue. (DBD ERROR: OCIStmtExecute)
结合ASM trace log:
Wed Jun 05 07:39:52 2013 Dumping diagnostic data in directory=[cdmp_20130605073952], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412241]. Wed Jun 05 07:45:58 2013 Dumping diagnostic data in directory=[cdmp_20130605074557], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412242]. Wed Jun 05 07:52:44 2013 Dumping diagnostic data in directory=[cdmp_20130605075244], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412243]. Wed Jun 05 07:57:58 2013 Dumping diagnostic data in directory=[cdmp_20130605075758], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412244]. Wed Jun 05 08:03:59 2013 Dumping diagnostic data in directory=[cdmp_20130605080359], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=412245]. Wed Jun 05 08:36:09 2013 SQL> /* ASMCMD */alter diskgroup /*ASMCMD*/ "data" add directory '+data/arch' SUCCESS: /* ASMCMD */alter diskgroup /*ASMCMD*/ "data" add directory '+data/arch' Wed Jun 05 08:42:40 2013 Dumping diagnostic data in directory=[cdmp_20130605084240], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=463685]. Wed Jun 05 08:47:47 2013 Dumping diagnostic data in directory=[cdmp_20130605084746], requested by (instance=2, osid=3061 (LMD0)), summary=[incident=463686]. Wed Jun 05 08:52:18 2013 NOTE: ASM client item_1:item disconnected unexpectedly. NOTE: check client alert log. NOTE: Trace records dumped in trace file /u01/grid/11.2.0/diag/asm/+asm/+ASM1/trace/+ASM1_ora_25588.trc
早上07:39 ASM1 instance 出现问题 导致arch进程无法完成归档 进而导致后续的等待。
(DESCRIPTION=(ADDRESS=(PROTOCOL=beq)(PROGRAM=/u01/oracle/11.2.0/oracle/product/db_1/bin/oracle)(ARGV0=oracleitem_1)(ENVS='ORACLE_HOME=/u01/oracle/11.2.0/oracle/product/db_1,ORACLE_SID=item_1,LD_LIBRARY_PATH=')(ARGS='(DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))'))(CONNECT_DATA=(SID=item_1))) 2013-06-05 07:39:53.587: [ora.item.db][1607948608] {1:21407:34683} [check] InstConnection::connectInt: server not attached 2013-06-05 07:39:53.717: [ USRTHRD][1607948608] {1:21407:34683} InstConnection:~InstConnection: this 1fc95100 2013-06-05 07:39:53.718: [ora.item.db][1607948608] {1:21407:34683} [check] DbAgent::isDgbHandleFO } status = 16646 Wed Jun 05 07:39:52 2013 ARC3: Error 19504 Creating archive log file to '+ARCH' ARCH: Archival stopped, error occurred. Will continue retrying ORACLE Instance item_1 - Archival Error ORA-16038: log 7 sequence# 59248 cannot be archived ORA-19504: failed to create file "" ORA-00312: online log 7 thread 1: '+DATA/item/onlinelog/group_7.322.797746361'
For ASM2 instance:
Wed Jun 05 07:39:51 2013 Errors in file /u01/grid/11.2.0/diag/asm/+asm/+ASM2/trace/+ASM2_lmd0_3061.trc (incident=412241): ORA-04031: unable to allocate 3768 bytes of shared memory ("shared pool","unknown object","sga heap(1,0)","ges enqueues") Incident details in: /u01/grid/11.2.0/diag/asm/+asm/+ASM2/incident/incdir_412241/+ASM2_lmd0_3061_i412241.trc
GES resource已经无法在shared_pool 中初始化,对于这个问题还是比较少见的 针对ASM instance的shared_pool 4031,在11.2.0.3之后
oracle的建议值为memory_target =1536MB 针对这个参数oracle也没有放出相关的最佳时间文档,相信在下一个release的时候oracle会改变
memory_target的default value .目前默认为:
SQL> show parameter target NAME TYPE VALUE ------------------------------------ --------------------------------- ------------------------------ memory_max_target big integer 1084M memory_target big integer 1084M pga_aggregate_target big integer 0 sga_target big integer 0
由于ASM中的memory 同样要负责GES GCS GRD 等资源的分配,memory fragment是一个不可忽视的问题,只是对于这个case而言大家极少的关注到
ASM 端的4031问题,针对于ASM在11.2.0.3(11.2.0.3 changed some default behaviors)的表现,oracle也应该意识到了这个问题。可以参考如下
文档获取一些有用信息。1536M 应该是oracle确定的一个最小安全值 这个也是得到了研发的认可,没有设置此参数的同学需要注意。
另外针对这个RAC 即使架构为RAC ONE NODE oracle在一些resource control方面仍然是常规RAC的行为,而RAC node NODE 也仅仅是减少了DB端的
一些资源争用。
针对这个case 可以做如下工作:
1 针对11gR2 设置 ASM memory_target 到一个安全的值
2 针对长时间running的database machines 需要做定期重启工作。这次出现问题的机器连续跑了220多天。
3 监控asm log 即使ASM 在11g中的稳定性已经得到了很大的加强。