帮朋友check一次中国银联数据库的坏块问题(并不是严格意义上的坏块),大致情况为smon需要trascation recover —>特定的block 发现无法读取 —>transaction recover出现问题 —>smon terminated
Wed Aug 15 19:02:57 2012
SMON: Parallel transaction recovery slave got internal error
SMON: Downgrading transaction recovery to serial
Wed Aug 15 19:02:57 2012
Errors in file /oracle/admin/settora/bdump/settora_smon_4755904.trc:
ORA-00600: internal error code, arguments: [kdourp_inorder2], [43], [43], [7], [44], [], [], []
Doing block recovery for file 10 block 176769
Doing block recovery for file 26 block 3259716
Block recovery from logseq 108030, block 49455 to scn 41320651147
Wed Aug 15 19:02:58 2012
Recovery of Online Redo Log: Thread 1 Group 5 Seq 108030 Reading mem 0
Mem# 0: /orasvr/settora/redo05.log
Block recovery completed at rba 108030.49555.16, scn 9.2665945485
ORACLE Instance settora (pid = 8) – Error 600 encountered while recovering transaction (7, 19) on object 331910.
过一段时间数据库就会down掉,从日志信息里面可以看到:
Wed Aug 15 19:23:18 2012
SMON: Parallel transaction recovery slave got internal error
SMON: Downgrading transaction recovery to serial
Wed Aug 15 19:23:18 2012
Errors in file /oracle/admin/settora/bdump/settora_smon_4755904.trc:
ORA-00600: internal error code, arguments: [kdourp_inorder2], [43], [43], [7], [44], [], [], []
Wed Aug 15 19:23:20 2012
Errors in file /oracle/admin/settora/bdump/settora_pmon_4690184.trc:
ORA-00474: SMON process terminated with error
Wed Aug 15 19:23:20 2012
PMON: terminating instance due to error 474
Instance terminated by PMON, pid = 4690184
执行下面SQL
select /*+full(a)*/ from PRO_SETTLE3.SHOPACCOUNT a;
returns:ORA-00600: internal error code, arguments: [kdourp_inorder2], [43], [43], [7], [44], [], [], []
使用dbv工具来检测:
$ dbv file=/orasvr1/settle5/settle000.dbf blocksize=8192 DBVERIFY: Release 10.2.0.4.0 - Production on Thu Aug 16 09:40:49 2012 Copyright (c) 1982, 2007, Oracle. All rights reserved. DBVERIFY - Verification starting : FILE = /orasvr1/settle5/settle000.dbf kdrchk: row is marked as both Last and Next continue prow=0x1101aec98 flag=0x07 Block Checking: DBA = 42119809, Block Type = KTB-managed data block data header at 0x1101ae07c kdbchk: bad row tab 0, slot 51 Page 176769 failed with check code 6253 DBVERIFY - Verification complete Total Pages Examined : 386688 Total Pages Processed (Data) : 282817 Total Pages Failing (Data) : 1 Total Pages Processed (Index): 98760 Total Pages Failing (Index): 0 Total Pages Processed (Other): 2707 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 2404 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Highest block SCN : 2666197038 (9.2666197038)
Total Pages Failing (Data) : 1
Page 176769 被标识为corruption
Total Pages Examined : 512 Number of blocks looked at Total Pages Processed (Data) : 1 Number of TABLE blocks seen Total Pages Failing (Data) : 1 Number of TABLE blocks with internal inconsistencies Total Pages Processed (Index): 0 Number of INDEX blocks seen Total Pages Failing (Index): 0 Number of INDEX block with internal inconsistencies Total Pages Empty : 507 Number of unused blocks seen Total Pages Marked Corrupt : 2 Number of blocks with corrupt cache wrappers Total Pages Influx : 0 Number of pages we re-read as the page looked like it was being modified when it was first read.
在数据库中并没有关于坏块的记录存在。这并不是一个普通意义上的坏块。对于这种块使用corrupt skip是没有用的skip_corrupt_blocks只能使oracle跳过Oracle能够读出的块,而如果在操作系统层read调用就失败的,则不能跳过该块。
oracle 在read page 176769时发生了error 6253 (cannot read arguments from address file,其实这并不是真正的root cause,我在和huangyong讨论过后基本认定为一个内部的inconsistent block) oracle 不能 mark 为 corrupt block.
这里的思路为使用rowid来跳过这些有问题的block 大致方法如下
SQL> SELECT dbms_rowid.rowid_create(1,331910,10,176769,0) LOW_RID from DUAL;
LOW_RID
——————
AABRCGAAKAAArKBAAA
SQL> SELECT dbms_rowid.rowid_create(1,331910,10,176770,0) HI_RID from dual;
HI_RID
——————
AABRCGAAKAAArKCAAA
SQL> SELECT dbms_rowid.rowid_create(1,331910,26,3259716,0) LOW_RID from DUAL;
LOW_RID
——————
AABRCGAAaAAMb1EAAA
SQL> SELECT dbms_rowid.rowid_create(1,331910,26,3259717,0) HI_RID from DUAL;
HI_RID
——————
AABRCGAAaAAMb1FAAA
SQL> create table PRO_SETTLE3.SHOPACCOUNT_1 as select /*+ rowid(a) */ * from PRO_SETTLE3.SHOPACCOUNT a where rowid < 'AABRCGAAKAAArKBAAA'; Table created. SQL> insert into PRO_SETTLE3.SHOPACCOUNT_1 select /*+ rowid(a) */ * from PRO_SETTLE3.SHOPACCOUNT a where rowid >= ‘AABRCGAAKAAArKCAAA’ and rowid < 'AABRCGAAaAAMb1EAAA'; 49626 rows created. SQL> insert into PRO_SETTLE3.SHOPACCOUNT_1 select /*+ rowid(a) */ * from PRO_SETTLE3.SHOPACCOUNT a where rowid >= ‘AABRCGAAaAAMb1FAAA’;
208837 rows created.
SQL> select count(*) from PRO_SETTLE3.SHOPACCOUNT_1;
COUNT(*)
———-
262256
之后drop掉这张有问题的table restart database 恢复正常.
同样我们可以采用Kerry Osborne的脚本来实现.这里稍微修改了下 增加了选择tablespace 的功能temp:
eg:
[oracle@testdb ~]$ dd if=/dev/zero of=/data/oracle/oradata/wuxuan1/liu.dbf bs=8192 seek=1798 count=1 conv=notrunc 1+0 records in 1+0 records out 8192 bytes (8.2 kB) copied, 5.2e-05 seconds, 158 MB/s [oracle@testdb ~]$ sqlplus liu/liu SQL*Plus: Release 11.2.0.3.0 Production on Fri Aug 17 02:32:20 2012 Copyright (c) 1982, 2011, Oracle. All rights reserved. Error accessing PRODUCT_USER_PROFILE Warning: Product user profile information not loaded! You may need to run PUPBLD.SQL as SYSTEM Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> select count(*) from temp; select count(*) from temp * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 4, block # 1798) ORA-01110: data file 4: '/data/oracle/oradata/wuxuan1/liu.dbf' SQL> @temp WARNING: This script may issue a DROP TABLE command. Do not execute it unless you have read through it and are comfortable you know what it does. Ready? (hit ctl-C to quit) Enter value for owner_name: LIU old 8: v_owner_name varchar2(30) := upper('&owner_name'); new 8: v_owner_name varchar2(30) := upper('LIU'); Enter value for table_name: TEMP old 9: v_table_name varchar2(30) := upper('&table_name'); new 9: v_table_name varchar2(30) := upper('TEMP'); Enter value for tablespace: LIU old 11: v_tablespace varchar2(30) := upper('&tablespace'); new 11: v_tablespace varchar2(30) := upper('LIU'); Saved 13247 records in TEMP_SAVED. 201 bads records in TEMP_BAD. PL/SQL procedure successfully completed. SQL> SQL> select count(*) from temp_saved; COUNT(*) ---------- 13247 SQL> select * from v$database_block_corruption; FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO ---------- ---------- ---------- ------------------ --------- 4 258 1 0 ALL ZERO 4 1798 1 0 ALL ZERO
block 1798 已经被标识为media corruption
手动构造rowid
SQL> select * from temp_bad where rownum<5; OLD_ROWID OLD_FILE OLD_OBJECT OLD_BLOCK OLD_ROW ------------------------------ ---------- ---------- ---------- ---------- ERROR_MESSAGE -------------------------------------------------------------------------------- 4.1798.181 13646 4 1798 181 ORA-01578: ORACLE data block corrupted (file # 4, block # 1798) ORA-01110: data file 4: '/data/oracle/oradata/wuxuan1/liu.dbf' 4.1798.182 13646 4 1798 182 ORA-01578: ORACLE data block corrupted (file # 4, block # 1798) ORA-01110: data file 4: '/data/oracle/oradata/wuxuan1/liu.dbf' 4.1798.183 13646 4 1798 183 OLD_ROWID OLD_FILE OLD_OBJECT OLD_BLOCK OLD_ROW ------------------------------ ---------- ---------- ---------- ---------- ERROR_MESSAGE -------------------------------------------------------------------------------- ORA-01578: ORACLE data block corrupted (file # 4, block # 1798) ORA-01110: data file 4: '/data/oracle/oradata/wuxuan1/liu.dbf' 4.1798.184 13646 4 1798 184 ORA-01578: ORACLE data block corrupted (file # 4, block # 1798) ORA-01110: data file 4: '/data/oracle/oradata/wuxuan1/liu.dbf' SQL> select dbms_rowid.rowid_create(1,13646,4,1798,181) from dual; DBMS_ROWID.ROWID_C ------------------ AAADVOAAEAAAAcGAC1 SQL> select dbms_rowid.rowid_create(1,13646,4,1798,182) from dual; DBMS_ROWID.ROWID_C ------------------ AAADVOAAEAAAAcGAC2 SQL> select * from temp where rowid ='AAADVOAAEAAAAcGAC1'; select * from temp where rowid ='AAADVOAAEAAAAcGAC1' * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 4, block # 1798) ORA-01110: data file 4: '/data/oracle/oradata/wuxuan1/liu.dbf' SQL> select * from temp where rowid ='AAADVOAAEAAAAcGAC2'; select * from temp where rowid ='AAADVOAAEAAAAcGAC2' * ERROR at line 1: ORA-01578: ORACLE data block corrupted (file # 4, block # 1798) ORA-01110: data file 4: '/data/oracle/oradata/wuxuan1/liu.dbf'
Reference:
Extracting Data from a Corrupt Table using ROWID Range Scans in Oracle8 and higher [ID 61685.1]
Extracting Data from a Corrupt Table using DBMS_REPAIR or Event 10231 [ID 33405.1]
Extract rows from a CORRUPT table creating ROWID from DBA_EXTENTS [ID 422547.1]
就一个逻辑坏块啊,bbed估计就搞定了,基于rowid,block内的数据就丢了。
不是逻辑坏块 bbed估计够呛