VMCD.ORG » Some exadata disk tips

针对exadata最近频繁报出的IO error,做如下总结

data node alert

ORA-27603: 单元存储 I/O 错误, I/O 在磁盘 o/192.168.10.5/DATA_DM01_CD_08_dm01cel03 上失败, 偏移量 17331625984 (数据长度 253952)
ORA-27626: Exadata 错误: 201 (Generic I/O error)
WARNING: Read Failed. group:1 disk:32 AU:4132 offset:761856 size:253952
path:o/192.168.10.5/DATA_DM01_CD_08_dm01cel03
         incarnation:0x802360d9 asynchronous result:'I/O error'
         subsys:OSS iop:0x2b8c42c03640 bufp:0x2b8c42fc4e00 osderr:0xc9 osderr1:0x0
         Exadata error:'Generic I/O error'
         IO elapsed time: 18021514 usec Time waited on I/O: 18013517 usec
WARNING: failed to read mirror side 1 of virtual extent 2039 logical extent 0 of file 274 in group [1.540250240] from disk DATA_DM01_CD_08_DM01CEL03  allocation unit 4132 reason error; if possible, will try another mirror side
NOTE: successfully read mirror side 2 of virtual extent 2039 logical extent 1 of file 274 in group [1.540250240] from disk DATA_DM01_CD_05_DM01CEL02 allocation unit 4133

ASM alert

Wed Jun 19 08:45:30 2013
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_r000_76832.trc:
ORA-27603: Cell storage I/O error, I/O failed on disk o/192.168.10.4/DATA_DM01_CD_07_dm01cel02 at offset 1140850688 for data length 1048576
ORA-27626: Exadata error: 201 (Generic I/O error)
WARNING: Read Failed. group:1 disk:19 AU:272 offset:0 size:1048576

Sun Jul 28 23:05:07 2013
NOTE: repairing group 1 file 274 extent 2039
SUCCESS: extent 2039 of file 274 group 1 repaired - all online mirror sides found readable, no repair required

storage node alert

Jul 28 23:05:07 dm01cel03 kernel: sd 0:2:8:0: SCSI error: return code = 0x00070002
Jul 28 23:05:07 dm01cel03 kernel: end_request: I/O error, dev sdi, sector 33916368

针对在DB端与storage端报出的IO error ,ORACLE用直接利用ASM中默认的处理行为,首先去read secondary extent上的block
并且会在primary extent上尝试做repair操作,针对这个repair操作分为两种行为,针对以上ASM alert log 发现：

1. SUCCESS: extent 4753 of file 502 group 1 repaired by relocating to a different AU on the same disk or the disk is offline

ASM use the mirrored copy which allows the disk to re-allocate data around any bad blocks in the physical disk media–也就是重新分配了一块物理的AU SIZE区域

2. SUCCESS: extent 2039 of file 274 group 1 repaired - all online mirror sides found readable, no repair required

ASM 做了 initiate 操作重写了这个SIZE。

针对这个报错,表明stroage disk的寿命在不断的缩减,同理随着磁盘物理坏块的增加,一旦disk达到critical的值那么这块盘将建议被replaced(利用ASM fast disk sync来同步).
另外针对这个问题,在传统存储端不是很容易见到这个错误,例如我们所常用的external redundancy,在存储层面的冗余一般已经足够安全,所以XD在storage端的表现并不如它的软件所提供的功能那么亮眼。(我们可以说传统存储的安全性>>xd sun storage?,也许有点鲁莽,Maybe..)
针对上述ASM的自动修复行为可以参考之前的文章

这里顺便提一下在normal redundancy环境中的Req_mir_free_MB与Usable_file_MB

[grid@dm01db01 trace]$ asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  15593472  6102196          5197824          452186              0             N  DATA_DM01/
MOUNTED  NORMAL  N         512   4096  4194304    894720   893432           298240          297596              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3896064  1717684          1298688          209498              0             N  RECO_DM01/

total_MB/3=Req_mir_free_MB why ? Req_mir_free_MB可以等同于热备盘,oracle在normal模式下,ASM disk 将被等价的切分成3块,来实现Req_mir_free_MB包含的disk能够替代任意primary,secondary中的盘。另外Req_mir_free_MB中的空间也是可以被用到的,当Usable_file_MB用光的时候，将会使用继续使用Req_mir_free_MB的空间来写数据
但是Req_mir_free_MB/2 才是真实可以写的空间,因为normal必须写两份数据。当Req_mir_free_MB耗尽时,其实已经不存在hot spare disk了,这个时候如果主备extend同时坏掉，那么就会出现丢数据。结合一个案例来说明：

[grid@dm01db01 ~]$ asmcmd -p
ASMCMD [+] > lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  15593472  9918184          5197824         2360180              0             N  DATA_DM01/
MOUNTED  NORMAL  N         512   4096  4194304    894720   893432           298240          297596              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3896064    28248          1298688         -635220              0             N  RECO_DM01/

Usable_file_MB=-635220  ==> Req_mir_free_MB/2

恢复之后：

ASMCMD [+] > lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  NORMAL  N         512   4096  4194304  15593472  9918184          5197824         2360180              0             N  DATA_DM01/
MOUNTED  NORMAL  N         512   4096  4194304    894720   893432           298240          297596              0             Y  DBFS_DG/
MOUNTED  NORMAL  N         512   4096  4194304   3896064  3860220          1298688         1280766              0             N  RECO_DM01/

实际上这个时候 Usable_file_MB=(1280766+635220)MB