收到开发人员报告,oracle出现坏块,信息如下:

Hex dump of (file 7, block 1407500) in trace file /data/oracle/admin/orcl/bdump/orcl_p004_7383.trc
Corrupt block relative dba: 0x01d57a0c (file 7, block 1407500)
Fractured block found during crash/instance recovery
Data in bad block:
type: 6 format: 2 rdba: 0x01d57a0c
last change scn: 0x0000.e02e5f2d seq: 0x1 flg: 0x06
spare1: 0x0 spare2: 0x0 spare3: 0x0
consistency value in tail: 0x88150601
check value in block header: 0x462f
computed block checksum: 0xd738
Reread of rdba: 0x01d57a0c (file 7, block 1407500) found same corrupted data

至此,我们确定该坏块是 7月28号上午9:45分数据库server crash导致的。

解决思路:
首先确认该坏块所属对象

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> set pagesize 100
SQL> set linesize 144
SQL> col segment_name format a50

SQL> select
object_id||’-‘||segment_name||’/’||partition_name segment_name,
segment_type,
RELATIVE_FNO
from dba_extents a
,dba_objects b
where file_id = ‘7’
and 1407500 between block_id and block_id + blocks – 1
and a.SEGMENT_NAME = b.OBJECT_NAME
and (a.PARTITION_NAME = b.SUBOBJECT_NAME or a.PARTITION_NAME is null)
and a.owner = b.owner

SEGMENT_NAME SEGMENT_TYPE RELATIVE_FNO
—————— ————
159761-IDX_D_D_PRODUCT_ID/ INDEX 7

确认该坏块属于索引IDX_D_D_PRODUCT_ID。先重建索引暂时解决问题:

SQL> alter index IDX_D_D_PRODUCT_ID rebuild online tablespace pur_index ;
Index altered.


使用dbv检查:

[oracle@rac1 ~]$ dbv file=/data/oracle/oradata/orcl/purchase03.dbf blocksize=8192
DBVERIFY: Release 10.2.0.1.0 – Production on Thu Jul 29 10:53:48 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
DBVERIFY – Verification starting : FILE = /data/oracle/oradata/orcl/purchase03.dbf
DBV-00200: Block, dba 30767628, already marked corrupted
DBVERIFY – Verification complete
Total Pages Examined : 4194302
Total Pages Processed (Data) : 3508146
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 605887
Total Pages Failing (Index): 0
Total Pages Processed (Other): 75876
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 4393
Total Pages Marked Corrupt : 1
Total Pages Influx : 0
Highest block SCN : 3781089739 (0.3781089739)

我们也可以使用如下过程确定坏块属于哪个block:
SQL> select dbms_utility.data_block_address_file(30767628) from dual;
DBMS_UTILITY.DATA_BLOCK_ADDRESS_FILE(30767628)

7
Elapsed: 00:00:00.23
SQL> select dbms_utility.data_block_address_block(30767628) from dual;
DBMS_UTILITY.DATA_BLOCK_ADDRESS_BLOCK(30767628)

1407500

其次要确认坏块类型是物理坏块还是逻辑坏块
for physical corruption check: backup validate datafile ‘filename’;
for logical corruption check: backup check logical validate datafile ‘filename’
then find out whether there is corruption from dynamic view V$DATABASE_BLOCK_CORRUPTION
先进行物理坏块检测
RMAN> backup validate datafile 7;
Starting backup at 29-JUL-10
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=1736 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00007 name=/data/oracle/oradata/orcl/purchase03.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:35
Finished backup at 29-JUL-10
sys@std01> select * from V$DATABASE_BLOCK_CORRUPTION;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
———- ———- —————— ———
7 1407500 1 0 CORRUPT

确认为物理坏块
最后也是最重要的一步,进行坏块恢复,


RMAN> BLOCKRECOVER CORRUPTION LIST;
Starting blockrecover at 29-JUL-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=455 devtype=DISK
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00007
channel ORA_DISK_1: reading from backup piece /backup/rman/rac1_ORCL_20100729_eoljvn41_1_1.bak

channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=/backup/rman/rac1_ORCL_20100729_eoljvn41_1_1.bak tag=TAG20100729T050504
channel ORA_DISK_1: block restore complete, elapsed time: 00:57:56
failover to previous backup
channel ORA_DISK_1: restoring block(s)
channel ORA_DISK_1: specifying block(s) to restore from backup set
restoring blocks of datafile 00007
channel ORA_DISK_1: reading from backup piece /backup/rman/rac1_ORCL_20100728_eiljt2ju_1_1.bak

channel ORA_DISK_1: restored block(s) from backup piece 1
piece handle=/backup/rman/rac1_ORCL_20100728_eiljt2ju_1_1.bak tag=TAG20100728T050253
channel ORA_DISK_1: block restore complete, elapsed time: 00:58:36
starting media recovery
media recovery complete, elapsed time: 00:12:15
Finished blockrecover at 29-JUL-10


命令完成,分别使用DBV和backup validate命令进行验证


[oracle@rac1 ~]$ dbv file=/data/oracle/oradata/orcl/purchase03.dbf blocksize=8192
DBVERIFY: Release 10.2.0.1.0 – Production on Thu Jul 29 19:04:45 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
DBVERIFY – Verification starting : FILE = /data/oracle/oradata/orcl/purchase03.dbf

DBVERIFY – Verification complete
Total Pages Examined : 4194302
Total Pages Processed (Data) : 3507791
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 606239
Total Pages Failing (Index): 0
Total Pages Processed (Other): 75879
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 4393
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 3785553739 (0.3785553739)

使用 backup validate datafile 进行验证:


验证前:
SQL> select * from V$DATABASE_BLOCK_CORRUPTION;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
———- ———- —————— ———
7 1407500 1 0 CORRUPT
[oracle@rac1 ~]$ rman target /
Recovery Manager: Release 10.2.0.1.0 – Production on Thu Jul 29 21:17:10 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
connected to target database: ORCL (DBID=1185342296)
RMAN> backup validate datafile 7;
Starting backup at 29-JUL-10
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=455 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00007 name=/data/oracle/oradata/orcl/purchase03.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:02:35
Finished backup at 29-JUL-10
执行完上述命令,坏块从数据字典里消失:
SQL> select * from V$DATABASE_BLOCK_CORRUPTION;
no rows selected

问题得到解决。

—————————————————————————-
下面模拟一种特殊情况,CORRUPTION BLOCKS上面还不存在object 这时候用BLOCKRECOVER CORRUPTION LIST是恢复不了的,我们提供下面一种方法。
首先查找已经format的块,也就是所谓的存在object的块:

SQL>create tablespace block datafile ‘/u01/app/oracle/oradata/axx/block01.dbf’ size 100M;

tablespace created

SQL> select file_id from dba_data_files where tablespace_name=’BLOCK’;

FILE_ID
———-
6

SQL> select OWNER,SEGMENT_NAME from dba_extents a where file_id=6 and 500 between a.block_id and a.block_id+a.blocks;

no rows selected

SQL> create table tmp_1 (id int) tablespace block;

Table created.

找出BLOCK中已经使用的最大块:

SQL> select max(BLOCK_ID+blocks) from dba_extents a where tablespace_name=’BLOCK’;

MAX(BLOCK_ID+BLOCKS)
——————–
489

下面模拟损坏datafile 6 的 第 500个block;

[oracle@liu ~]$ dd of=/u01/app/oracle/oradata/axx/block01.dbf bs=8192 seek=500 conv=notrunc count=1 if=/dev/zero
1+0 records in
1+0 records out

RMAN> backup validate datafile 6;

Starting backup at 08-JAN-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=132 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00006 name=/u01/app/oracle/oradata/axx/block01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:02
Finished backup at 08-JAN-12

[oracle@liu ~]$ dbv file=/u01/app/oracle/oradata/axx/block01.dbf blocksize=8192;

DBVERIFY: Release 10.2.0.1.0 – Production on Sun Jan 8 20:17:19 2012

Copyright (c) 1982, 2005, Oracle. All rights reserved.

DBVERIFY – Verification starting : FILE = /u01/app/oracle/oradata/axx/block01.dbf
Page 500 is marked corrupt
Corrupt block relative dba: 0x018001f4 (file 6, block 500)
Completely zero block found during dbv:

DBVERIFY – Verification complete

Total Pages Examined : 12800
Total Pages Processed (Data) : 613
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 61
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 12125
Total Pages Marked Corrupt : 1
Total Pages Influx : 0
Highest block SCN : 1178159 (0.1178159)
[oracle@liu ~]$ sqlplus ‘/as sysdba’

SQL*Plus: Release 10.2.0.1.0 – Production on Sun Jan 8 20:17:30 2012

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production
With the Partitioning, OLAP and Data Mining options

SQL> select * from V$DATABASE_BLOCK_CORRUPTION;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
———- ———- ———- —————— ———
6 500 1 0 ALL ZERO

SQL> alter system switch logfile;

System altered.

可以看到第500个block已经被标记为 Corrupt,下面我们通过创建object format这个block;

SQL> create table tmp_2 (id int) tablespace block;

Table created.

创建一个monitor trigger 达到第500个block后自动终止操作,由于这次只是模拟,datafile很小,在实际环境中可能出现Corrupt block在datafile的尾部接近max block,这时候可以
通过创建一张temp table allocate extent达到接近这个max block 然后再创建填充数据的table.


SQL> create or replace trigger trigger_monitor after insert on tmp_2 for each row
2 declare
3 check_block_id number;
4 begin
5 select dbms_rowid.rowid_block_number(:new.rowid) into check_block_id from dual;
6 if check_block_id=500 then
7 raise_application_error(-20201,’curroption block has been format’);
8 end if;
9 end;
10
11 /

Trigger created

SQL>
SQL>
SQL> begin
2 for i in 1..1000000 loop
3 insert into tmp_2 values (i);
4 commit;
5 end loop;
6 end;
7 /

begin
for i in 1..1000000 loop
insert into tmp_2 values (i);
commit;
end loop;
end;

ORA-20201: curroption block has been format
ORA-06512: at “LIU.TRIGGER_MONITOR”, line 6
ORA-04088: error during execution of trigger ‘LIU.TRIGGER_MONITOR’
ORA-06512: at line 4

第500个block已经被填充,下面我们来验证结果:


[oracle@liu ~]$ rman target /

Recovery Manager: Release 10.2.0.1.0 – Production on Sun Jan 8 20:06:25 2012

Copyright (c) 1982, 2005, Oracle. All rights reserved.

connected to target database: AXX (DBID=665578413)

RMAN> backup validate datafile 6;

Starting backup at 08-JAN-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=133 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00006 name=/u01/app/oracle/oradata/axx/block01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 08-JAN-12

RMAN> exit

Recovery Manager complete.
[oracle@liu ~]$ sqlplus ‘/as sysdba’

SQL*Plus: Release 10.2.0.1.0 – Production on Sun Jan 8 20:07:11 2012

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 – Production
With the Partitioning, OLAP and Data Mining options

SQL> select * from V$DATABASE_BLOCK_CORRUPTION;

no rows selected

SQL> !
[oracle@liu ~]$ dbv file=/u01/app/oracle/oradata/axx/block01.dbf blocksize=8192;

DBVERIFY: Release 10.2.0.1.0 – Production on Sun Jan 8 20:09:07 2012

Copyright (c) 1982, 2005, Oracle. All rights reserved.

DBVERIFY – Verification starting : FILE = /u01/app/oracle/oradata/axx/block01.dbf

DBVERIFY – Verification complete

Total Pages Examined : 12800
Total Pages Processed (Data) : 614
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 61
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 12125
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 1178159 (0.1178159)

OK 已经恢复了,如果这种情况下再采用rman 的blockrecover 是恢复不了这个block的 我们来测试下:
制造坏块,同时确保max block 小于500;

SQL> select max(BLOCK_ID+blocks) from dba_extents a where tablespace_name=’BLOCK’;

MAX(BLOCK_ID+BLOCKS)
——————–
489

RMAN> backup database ;

Starting backup at 08-JAN-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=132 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u01/app/oracle/oradata/axx/system01.dbf
input datafile fno=00003 name=/u01/app/oracle/oradata/axx/sysaux01.dbf
input datafile fno=00005 name=/u01/app/oracle/oradata/axx/tbs_stream01.dbf
input datafile fno=00006 name=/u01/app/oracle/oradata/axx/block01.dbf
input datafile fno=00002 name=/u01/app/oracle/oradata/axx/undotbs01.dbf
input datafile fno=00004 name=/u01/app/oracle/oradata/axx/users01.dbf
channel ORA_DISK_1: starting piece 1 at 08-JAN-12
channel ORA_DISK_1: finished piece 1 at 08-JAN-12
piece handle=/u01/app/oracle/product/10.2.0/db_1/dbs/07n09aa7_1_1 tag=TAG20120108T202335 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:01:05
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
including current control file in backupset
including current SPFILE in backupset
channel ORA_DISK_1: starting piece 1 at 08-JAN-12
channel ORA_DISK_1: finished piece 1 at 08-JAN-12
piece handle=/u01/app/oracle/product/10.2.0/db_1/dbs/08n09ac8_1_1 tag=TAG20120108T202335 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 08-JAN-12

RMAN> exit

[oracle@liu ~]$ dd of=/u01/app/oracle/oradata/axx/block01.dbf bs=8192 seek=500 conv=notrunc count=1 if=/dev/zero
1+0 records in
1+0 records out

[oracle@liu ~]$ rman target /

Recovery Manager: Release 10.2.0.1.0 – Production on Sun Jan 8 20:23:01 2012

Copyright (c) 1982, 2005, Oracle. All rights reserved.

connected to target database: AXX (DBID=665578413)

RMAN> backup validate datafile 6;

Starting backup at 08-JAN-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=131 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00006 name=/u01/app/oracle/oradata/axx/block01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 08-JAN-12

SQL> select * from V$DATABASE_BLOCK_CORRUPTION;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
———- ———- ———- —————— ———
6 500 1 0 ALL ZERO

RMAN> blockrecover corruption list;

Starting blockrecover at 08-JAN-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=134 devtype=DISK

starting media recovery
media recovery complete, elapsed time: 00:00:00

Finished blockrecover at 08-JAN-12

SQL> select * from V$DATABASE_BLOCK_CORRUPTION;

FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
———- ———- ———- —————— ———
6 500 1 0 ALL ZERO

[oracle@liu ~]$ dbv file=/u01/app/oracle/oradata/axx/block01.dbf blocksize=8192;

DBVERIFY: Release 10.2.0.1.0 – Production on Sun Jan 8 20:26:51 2012

Copyright (c) 1982, 2005, Oracle. All rights reserved.

DBVERIFY – Verification starting : FILE = /u01/app/oracle/oradata/axx/block01.dbf
Page 500 is marked corrupt
Corrupt block relative dba: 0x018001f4 (file 6, block 500)
Completely zero block found during dbv:

DBVERIFY – Verification complete

Total Pages Examined : 12800
Total Pages Processed (Data) : 613
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 61
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 12125
Total Pages Marked Corrupt : 1
Total Pages Influx : 0
Highest block SCN : 1189579 (0.1189579)

可以看到坏块依然存在