Print

[MVS]Oracle Exadata X8存储主动更换硬盘

10小时前 发表

问题描述

硬盘在系统日志提示有error,但ASM多次尝试修复,影响数据库稳定,主动更换硬盘。

过程分析

查看存储节点系统日志,多次提示硬盘error,但ASM接管重新挂载硬盘后,恢复正常。

 

解决方法

Oracle Exadata X8存储主动更换硬盘

环境:Oracle Exadata X8

exadata system info: 19.2.13.0.0.200428

OS: oracle linux 7.7

 

 

硬盘在系统日志提示有error,但ASM多次尝试修复,影响数据库稳定,主动更换硬盘。

 

1.检查硬盘号;

查看硬盘插槽,硬盘上承载的grid disk等;

# cellcli list diskmap

或:# cellcli list physicaldisk   

或  # cellcli –e "list diskmap" | grep 'X:Y'

 

2.删除硬盘

2.1. 如Oracle Exadata System Software release 21.2.0,则如下命令将自动删除硬盘并管理冗余性;

CellCLI> alter physicaldisk X:Y drop for replacement maintain redundancy

注意:等命令执行结束才能下一步;

 

2.2:如Oracle Exadata System Software release 小于21.2.0,则需要顺序执行如下命令:

1)、.删除对应grid disk;

SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name;

备注:如删除griddisk报错

ORA-15032: not all alterations performed

ORA-15410: Disks in disk group OCRVT do not have equal size.

或者

ORA-15032: not all alterations performed

ORA-15411: Failure groups in disk group OCRVT have different number of disks.

解决方案:

动态修改这两个隐藏参数解决限制副本数问题:

alter system set "_asm_disable_failgroup_size_checking"=true;

alter system set "_asm_disable_dangerous_failgroup_checking"=true;

 

2) . 等待磁盘rebalance操作完成,查询:select  * from v$asm_operation; 

 

3). cell中删除物理硬盘

CellCLI> alter physicaldisk X:Y drop for replacement

如防止超时退出,可后台执行:# cellcli -e "alter physicaldisk X:Y drop for replacement" & 

备注:

  • ϒ⁄ cell alert log  cell中查看进度, 日志路径:/opt/oracle/cell/log/diag/asm/cell/<cell_name>/trace/alert.log
  • ϒ⁄ 查看总体脏块刷新速度,最终是否结束以日志提示为准。CellCLI> list metriccurrent attributes name,metricvalue where name like 'FC_BY_DIRTY';  
  • ϒ⁄ exadata 19.2版本可能会hang,  但本次测试解决方案效果不明显;Exadata ALTER PHYSICALDISK N:N DROP FOR REPLACEMENT is hung (Doc ID 2574663.1)   
  • 14T HDD硬盘删除耗时5小时左右;

 

3.Ensure that the blue OK to Remove LED on the disk is lit before removing the disk.

硬盘删除成功后,# cellcli list physicaldisk 查看磁盘已删除;

 

4.Replace the new hard disk.

手动更换硬盘后,系统自动按原配置加到ASM 磁盘组中;

 

5.verify the LUN, cell disk and grid disk associated with the hard disk were created.

CellCLI> alter cell validate configuration     》》》自动升级固件;不能有mdadm错误; 

CellCLI> list lun lun_name

CellCLI> list celldisk where lun=lun_name

CellCLI> list griddisk where celldisk=celldisk_name

CellCLI> LIST PHYSICALDISK  detail

 

6.硬盘更换完成后将ASM隐藏参数修改回来;

alter system set "_asm_disable_failgroup_size_checking"=false;

alter system set "_asm_disable_dangerous_failgroup_checking"=false;

 

7.Verify the grid disk was added to the Oracle ASM disk groups.

The following query should return no rows.

SQL> SELECT path,header_status FROM v$asm_disk WHERE group_number=0;

The following query shows whether all the failure groups have the same number of disks:

SQL> select * from v$asm_disk;

 

参考:

官方文档:3.3.5 Replacing a Hard Disk Proactively

https://docs.oracle.com/en/engineered-systems/exadata-database-machine/dbmmn/replacing-hard-disk-proactively.html

 

Oracle Exadata换盘操作-Replacing a Hard Disk Proactively

https://blog.csdn.net/xxzhaobb/article/details/126389951

 

Exadata存储节点,主动更换磁盘

https://www.cnblogs.com/missyou-shiyh/p/18581627

 

MOS:

How to Replace a Hard Drive in an Exadata Storage Cell Server (Predictive Failure) (Doc ID 1390836)