不涉及
现场 UIS-Cell E0708超融合部署,前台查看CVK05有主机RAID卡控制器状态异常告警,重复次数1k+,尝试登录HDM口查看 RAID 卡是正常的,但是有硬盘故障告警。
怀疑是阵列卡或者硬盘有异常,收集如下信息:
1、LSI阵列卡日志:
在CVK05下执行如下命令,然后将生成的4个storcli开头的文件导出。
/opt/MegaRAID/storcli/storcli64 /c0 show all > storcli.showall
/opt/MegaRAID/storcli/storcli64 /c0 show events > storcli.events
/opt/MegaRAID/storcli/storcli64 /c0 show alilog > storcli.alilog
/opt/MegaRAID/storcli/storcli64 /c0 show termlog > storcli.termlog
2、SDS日志和事件日志
登录HDM页面,下载SDS日志和事件日志
分析如下:
1.设备出现硬盘报警,指向F08硬盘,即PD12
2.查看sds日志中的lsi9361信息,发现阵列卡本身没有报错,仍然是硬盘的报警
3.查看阵列卡日志中,也出现硬盘的报警,对应实际时间点,应该是11.29日下午16:27左右
11/29/19 8:27:37: C0:DM_DevSenseRetry dev 12 Sense Data: Len 12 RespCode 70 senseKey 5 asc 24 ascq 0
11/29/19 8:27:37: C0:DM_DevConfigParamCallback: Rdm x40c06c00 FAILED Pd 12 Data 43825e80 Len 10 Cdb 15 pageIndex 1
11/29/19 8:27:37: C0:DM_DevConfigParamCallback dev 12 Sense Data: Len 12 RespCode 70 senseKey 5 asc 26 ascq 0
11/29/19 8:27:37: C0:DM_DevConfigParamCallback: Rdm x40c06c00 FAILED Pd 12 Data 43825e80 Len 10 Cdb 15 pageIndex 1
11/29/19 8:27:37: C0:DM_DevConfigParamCallback dev 12 Sense Data: Len 12 RespCode 70 senseKey 5 asc 26 ascq 0
11/29/19 8:27:37: C0:DM_DevConfigParamCallback: Rdm x40c06c00 FAILED Pd 12 Data 43825e80 Len 10 Cdb 15 pageIndex 1
11/29/19 8:27:37: C0:DM_DevConfigParamCallback dev 12 Sense Data: Len 12 RespCode 70 senseKey 5 asc 26 ascq 0
4.阵列卡状态显示是needs attention状态。一般阵列卡下有failed VD,会显示需要注意
Status :
======
Controller Status = Needs Attention
Memory Correctable Errors = 0
Memory Uncorrectable Errors = 0
ECC Bucket Count = 0
Any Offline VD Cache Preserved = Yes
BBU Status = 0
Support PD Firmware Download = No
Lock Key Assigned = No
Failed to get lock key on bootup = No
Lock key has not been backed up = No
Bios was not detected during boot = No
Controller must be rebooted to complete security operation = No
A rollback operation is in progress = No
At least one PFK exists in NVRAM = Yes
SSC Policy is WB = No
Controller has booted into safe mode = No
Current PersOnality= RAID-Mode
5.RAID卡状态异常的首次告警出现的时间是11月29日16:30左右,而硬盘告警时间16:27, RAID卡状态异常告警是在F08硬盘报警之后出现的,而且时间点很接近,故怀疑这个报警和硬盘故障有关。
目前UIS-Cell只是简单的判断controller status字段不为optimal就产生告警。如果RAID卡下有硬盘故障,VD failed状态,会出现此告警。综上分析,UIS-Cell出现的RAID卡状态异常和F08硬盘故障有关。
更换故障硬盘后,RAID卡控制器状态异常告警自动消除,问题解决。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作