不涉及组网 版本R0809P27
堆叠部署,每台成员设备安装双300W交流电源供电,成员设备的两个电源分别由两路PDU供电,通过交叉涉及实现供电冗余。
具体硬件配置如下:
===============display device verbose===============
Chassis No. Slot No. Board Type Status Primary Local Primary SubSlots
-------------------------------------------------------------------------------
1 0 RT-MPU-100-X1 Normal Master Master 0
1 3 RT-SPE-S3 Normal N/A Master 0
1 5 RT-SFE Normal N/A N/A 6
1 6 RT-SFE Normal N/A N/A 6
2 0 RT-MPU-100-X1 Normal Standby Master 0
2 3 RT-SPE-S3 Normal N/A Master 0
2 5 RT-SFE Normal N/A N/A 6
2 6 RT-SFE Normal N/A N/A 6
Chassis 1:
Device Name: H3C MSR56-60
Slot 0: RT-MPU-100-X1
Subslot No. Board Type Status Max Ports
---------------------------------------------------------------
0 Fixed SubCard on Board Normal 2
Slot 3: RT-SPE-S3
Subslot No. Board Type Status Max Ports
---------------------------------------------------------------
0 Fixed SubCard on Board Normal 8
Slot 5: RT-SFE
Subslot No. Board Type Status Max Ports
---------------------------------------------------------------
0 Fixed SubCard on Board Normal 0
4 HMIM-8GEF Normal 8
6 HMIM-24GSW Normal 24
Slot 6: RT-SFE
Subslot No. Board Type Status Max Ports
---------------------------------------------------------------
0 Fixed SubCard on Board Normal 0
1 HMIM-8SAE Normal 8
3 HMIM-4GEF Normal 4
5 HMIM-24GSW Normal 24
Chassis 2:
Device Name: H3C MSR56-60
Slot 0: RT-MPU-100-X1
Subslot No. Board Type Status Max Ports
---------------------------------------------------------------
0 Fixed SubCard on Board Normal 2
Slot 3: RT-SPE-S3
Subslot No. Board Type Status Max Ports
---------------------------------------------------------------
0 Fixed SubCard on Board Normal 8
Slot 5: RT-SFE
Subslot No. Board Type Status Max Ports
---------------------------------------------------------------
0 Fixed SubCard on Board Normal 0
4 HMIM-8GEF Normal 8
6 HMIM-24GSW Normal 24
Slot 6: RT-SFE
Subslot No. Board Type Status Max Ports
---------------------------------------------------------------
0 Fixed SubCard on Board Normal 0
1 HMIM-8SAE Normal 8
3 HMIM-4GEF Normal 4
5 HMIM-24GSW Normal 24
反查日志在设备有如下打印:
%Jun 26 19:15:46:203 2022 XXX POWER/3/PowerDriverLog: Power overload. Please add a power supply or uninstall an HMIM. Otherwise the device will reboot in 116 minutes.
%Jun 26 19:14:46:215 2022 XXX POWER/3/PowerDriverLog: -Chassis=2-Slot=0; Power overload. Please add a power supply or uninstall an HMIM. Otherwise the device will reboot in 117 minutes.
故障发生当天由于机房动环因素影响导致一路PDU故障无法正常供电,随后设备在同一天发生整机重启,影响业务。
两台成员设备重启原因都为Last reboot reason : Power overload
设备双电源冗余在实际故障发生时未生效,理论单电源最大功能是300W,但参考官网《MSR 5600路由器 硬件描述》中单板功耗表计算现场板卡的所需功耗累加值并不到300W。
因此需要确认触发重启的机制。
===============display power-supply verbose===============
Chassis 1:
Index Status Type Description
----------------------------------------------------------
PWR1 normal AC PSR300-12A2
PWR2 normal AC PSR300-12A2
PWR3 absent
PWR4 absent
System power capacity: 450W
Remaining power for HMIM: 124W
Chassis 2:
Index Status Type Description
----------------------------------------------------------
PWR1 normal AC PSR300-12A2
PWR2 normal AC PSR300-12A2
PWR3 absent
PWR4 absent
System power capacity: 450W
Remaining power for HMIM: -26W
在设备重启并且故障PDU抢修恢复正常后,查看设备供电情况发现在双电源最大供电450W的情况下,预留HMIM卡功耗仅有124W(Remaining power for HMIM: 124W),这说明系统内部计算功耗时并没有按官网给出的板卡功耗累加而是计算预留了更多的功耗预算,达到了326W。
即此情况下减少到单电源供电,系统确实可判断为供电不足。
现场两台MSR56设备均为单主控、单SPE板卡运行,因此按照单板功耗表计算总功耗时仅各计算了一份功耗。
但系统内部计算是需要为双主控和双SPE板卡预留功耗的,因为这两类板卡对于MSR56设备来说都是高优供电硬件。
这种计算是为了防止单电源情况下新插一块SPE板卡导致整机供电不足下电而设计的提前告警。
双倍计算SPE板卡功耗后,现场每个成员设备整机功耗都达到300W以上。
在现场版本,当设备2小时内发现一直处于功耗预留不足的情况下,就会触发整机重启,即如下提示打印:
%Jun 26 19:15:46:203 2022 XXX POWER/3/PowerDriverLog: Power overload. Please add a power supply or uninstall an HMIM. Otherwise the device will reboot in 116 minutes.
由于重启会触发网络中断,后续版本(如2021年度版本R0821P18)对该机制做了优化:
1.预留功耗计算方式保持不变;
2.发现功耗预留不足时仅打印告警日志,但不会触发倒计时2小时后重启
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作