今天报表数据库的备库出现了问题,由于监控脚本出现了问题,主机空间耗尽 而没有及时发出邮件,导致归档无法进行,DB停滞在一个时间点,同时在主库 由于只保留了2天的归档 导致这部分归档没有传输至备库,等我们发现问题时,主库已经删除了归档,备库接近8天gap无法恢复。由于主库超过3个T ,重新用备份恢复一个standby 时间上是不允许的,并且主库上也没有空间存放这么大的备份集,于是采取了增量备份的方法,跳过了这个GAP。下面记录这次恢复过程

查看备库日志发现:

Tue Mar 13 02:45:00 2012
ARC1: Closing local archive destination LOG_ARCHIVE_DEST_1: ‘/data/oracle/oradata/track1/arch/1_30021_754336076.dbf’ (error 19502) (track1)
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance track1 – Archival Error
ORA-16038: log 7 sequence# 30021 cannot be archived
ORA-19502: write error on file “”, block number (block size=)
ORA-00312: online log 7 thread 1: ‘/data/oracle/oradata/track1/standbyredo7.log’
Tue Mar 13 02:45:00 2012
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance track1 – Archival Error

Tue Mar 13 04:06:56 2012
Non critical error ORA-00001 caught while writing to trace file “/data/oracle/diag/rdbms/trackdg/track1/trace/track1_rfs_9177.trc”
Error message: Linux-x86_64 Error: 28: No space left on device
Additional information: 1
Writing to the above trace file is disabled for now on…
Creating archive destination file : /data/oracle/oradata/track1/arch/1_30038_754336076.dbf (808922 blocks)
Non critical error ORA-00001 caught while writing to trace file “/data/oracle/diag/rdbms/trackdg/track1/trace/track1_rfs_9175.trc”
Error message: Linux-x86_64 Error: 28: No space left on device
Additional information: 1
Writing to the above trace file is disabled for now on…

Fetching gap sequence in thread 1, gap sequence 30028-30127
Mon Mar 19 20:35:42 2012
Fetching gap sequence in thread 1, gap sequence 30028-30127
Mon Mar 19 20:35:52 2012
Fetching gap sequence in thread 1, gap sequence 30028-30127
Mon Mar 19 20:36:02 2012
Fetching gap sequence in thread 1, gap sequence 30028-30127
Mon Mar 19 20:36:12 2012
Fetching gap sequence in thread 1, gap sequence 30028-30127

发现从13号的2点 主机space 已经被耗尽. 13-17号的归档全部丢失(gap sequence 30028-30127)近100个归档

下面采取增量备份的方法恢复这个数据库:

在备库查找当前的SCN号

SQL> select current_scn from v$database;

CURRENT_SCN
———————–
16895677901

在主库用这个SCN 通过增量备份 备份出新的backupset :

RMAN> run {
2> allocate channel c1 device type disk;
3> allocate channel c2 device type disk;
4> allocate channel c3 device type disk;
5> allocate channel c4 device type disk;
6> BACKUP INCREMENTAL FROM SCN 16895677901 DATABASE FORMAT ‘/data/track_standby_%U’ tag ‘track_standby’;
7> release channel c1;
8> release channel c2;
9> release channel c3;
10> release channel c4;
11> }

将这个备份集 SCP 到备库的/data2 目录下 由于主备采用了convert 转换了文件的路径 需要通过set newname 来恢复备库 大致脚本如下:


[oracle@db59 track1]$ rman target /

Recovery Manager: Release 11.2.0.2.0 – Production on Mon Mar 19 23:44:58 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: TRACK1 (DBID=3812187020, not open)

RMAN> CATALOG START WITH ‘/data2/track_standby’;

searching for all files that match the pattern /data2/track_standby

List of Files Unknown to the Database
=====================================
File Name: /data2/track_standby_76n694la_1_1

.. …

….

File Name: /data2/track_standby_9bn69dqo_1_1 (一共71个)

Do you really want to catalog the above files (enter YES or NO)? YES
cataloging files…
cataloging done

List of Cataloged Files
=======================
File Name: /data2/track_standby_76n694la_1_1

File Name: /data2/track_standby_9bn69dqo_1_1

RMAN> run {
allocate channel c1 device type disk;
allocate channel c2 device type disk;
allocate channel c3 device type disk;
allocate channel c4 device type disk;
recover database noredo;
release channel c1;
release channel c2;
release channel c3;
release channel c4;
}

[oracle@db59 ~]$ tailalert

Incremental restore complete of datafile 280 /data/oracle/oradata/track1/lg_track3_100.dbf
checkpoint is 36031098956
Incremental restore complete of datafile 294 /data/oracle/oradata/track1/lg_track4_004.dbf
checkpoint is 36031098956
Mon Mar 19 17:54:45 2012
Incremental restore complete of datafile 28 /data/oracle/oradata/track1/lg_track18.dbf
checkpoint is 36025918455
last deallocation scn is 19965755207
Incremental restore complete of datafile 286 /data2/oracle/oradata/track1/lg_track119.dbf
checkpoint is 36025918455
Incremental restore complete of datafile 245 /data/oracle/oradata/track1/lg_track3_68.dbf
checkpoint is 36025918455
Incremental restore complete of datafile 241 /data/oracle/oradata/track1/lg_track3_64.dbf

..

完成后,由于主库在期间添加过数据文件,使用rman copy 将 file 296,297 copy 到本地,scp至备库的相应位置,主库生成standby controlfile 在备库通过rename file 重新定义file路径:


alter database rename file ‘/data/oracle/oradata/track1/system.dbf’ to ‘/data/oracle/oradata/track1/system.dbf’;
..
alter database rename file ‘/storage/disk01/lg_track3_92.dbf’ to ‘/data/oracle/oradata/track1/lg_track3_92.dbf’;
alter database rename file ‘/storage/disk02/lg_track3_93.dbf’ to ‘/data/oracle/oradata/track1/lg_track3_93.dbf’;
alter database rename file ‘/storage/disk03/lg_track3_94.dbf’ to ‘/data/oracle/oradata/track1/lg_track3_94.dbf’;
alter database rename file ‘/storage/disk04/lg_track3_95.dbf’ to ‘/data/oracle/oradata/track1/lg_track3_95.dbf’;
alter database rename file ‘/storage/disk05/lg_track3_96.dbf’ to ‘/data/oracle/oradata/track1/lg_track3_96.dbf’;
alter database rename file ‘/storage/disk06/lg_track3_97.dbf’ to ‘/data/oracle/oradata/track1/lg_track3_97.dbf’;
alter database rename file ‘/storage/disk07/lg_track3_98.dbf’ to ‘/data/oracle/oradata/track1/lg_track3_98.dbf’;
alter database rename file ‘/storage/disk00/lg_track3_99.dbf’ to ‘/data/oracle/oradata/track1/lg_track3_99.dbf’;
alter database rename file ‘/storage/disk01/lg_track3_100.dbf’ to ‘/data/oracle/oradata/track1/lg_track3_100.dbf’;
..
….

alter database rename file ‘/storage/disk06/lg_track4_006.dbf’ to ‘/data2/oracle/oradata/track1/lg_track4_006.dbf’;
alter database rename file ‘/storage/disk02/lg_track4_007.dbf’ to ‘/data2/oracle/oradata/track1/lg_track4_007.dbf’;

添加standby logfile


SQL> alter system set standby_file_management=MANUAL;

System altered.

ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 group 6(‘/data2/oracle/oradata/track1/standbyredo6.log’) SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 group 7(‘/data2/oracle/oradata/track1/standbyredo7.log’) SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 group 8(‘/data2/oracle/oradata/track1/standbyredo8.log’) SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 group 9(‘/data2/oracle/oradata/track1/standbyredo9.log’) SIZE 500M;
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 group 10(‘/data2/oracle/oradata/track1/standbyredo10.log’) SIZE 500M;

开启日志apply


SQL> alter system set standby_file_management=auto;

System altered.

SQL> alter database recover managed standby database disconnect from session using current logfile;

观察日志 经过将近4个小时 终于追上了主库


Recovery of Online Redo Log: Thread 1 Group 12 Seq 31152 Reading mem 0
Mem# 0: /data2/oracle/oradata/track1/standbyredo12.log
Tue Mar 20 00:28:09 2012
RFS[3]: Selected log 11 for thread 1 sequence 31153 dbid -482780276 branch 754336076
Tue Mar 20 00:28:13 2012
Archived Log entry 101 added for thread 1 sequence 31152 ID 0xe339e88c dest 1:
Tue Mar 20 00:28:16 2012
Media Recovery Waiting for thread 1 sequence 31153 (in transit)
Recovery of Online Redo Log: Thread 1 Group 11 Seq 31153 Reading mem 0
Mem# 0: /data2/oracle/oradata/track1/standbyredo11.log
Tue Mar 20 00:32:40 2012
RFS[3]: Selected log 12 for thread 1 sequence 31154 dbid -482780276 branch 754336076
Tue Mar 20 00:32:42 2012
Media Recovery Waiting for thread 1 sequence 31154 (in transit)
Recovery of Online Redo Log: Thread 1 Group 12 Seq 31154 Reading mem 0

NAME VALUE UNIT TIME_COMPUTED DATUM_TIME
—————————— ——————– —————————— —————————— ——————————
transport lag +00 00:00:00 day(2) to second(0) interval 03/20/2012 00:37:51 03/20/2012 00:37:51
apply lag +00 00:00:18 day(2) to second(0) interval 03/20/2012 00:37:51 03/20/2012 00:37:51
apply finish time +00 00:00:00.078 day(2) to second(3) interval 03/20/2012 00:37:51
estimated startup time 17 second 03/20/2012 00:37:51

SQL> select current_scn from v$database;

CURRENT_SCN
—————–
36050243871

SQL> select * from v$archive_gap;

no rows selected

在这种比较极端的情况下选择这种恢复的方法,如果重做备库 将花费大量的时间与空间,使用增量备份,极大的缩短了恢复的时间