• 全部
  • 经验案例
  • 典型配置
  • 技术公告
  • FAQ
  • 漏洞说明
  • 全部
  • 全部
  • 大数据引擎
  • 知了引擎
产品线
搜索
取消
案例类型
发布者
是否解决
是否官方
时间
搜索引擎
匹配模式
高级搜索

某局点S12500X-AF堆叠分裂下挂业务大量中断问题

  • 0关注
  • 0收藏 1137浏览
粉丝:29人 关注:3人

组网及说明

/

告警信息

/

问题描述

前方反馈,网管发现superspine堆叠设备S12500X-AF出现日志告警,该设备下挂金融云、大数据业务陆续报障,前方通过整机掉电superspine设备发现无法恢复,最后下电隔离1框,3框单台运行业务最终恢复正常,业务中断约0.5小时。具体故障现象如下:

7408:44左右,网管收到大量superspine上送的异常日志,同时业务侧报障。

本日志告警1分钟发送一次

本次查询范围:2022-07-04 08:39 CST2022-07-04 08:44 CST

时间: 2022-07-04 08:44 CST

模块: DEV

摘要: BOARD_REMOVED

主机名: DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U

命中次数: 25

内容: 2022-07-04T08:44:03+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/3/BOARD_REMOVED: -DevIP=10.191.252.1; Board was removed from chassis 3 slot 0, type is LSXM1SUPH1.

 

本日志告警1分钟发送一次

本次查询范围:2022-07-04 08:39 CST2022-07-04 08:44 CST

时间: 2022-07-04 08:44 CST

模块: DRVPLAT

摘要: DrvDebug

主机名: DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U

命中次数: 6

内容: 2022-07-04T08:44:09+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DRVPLAT/4/DrvDebug: -DevIP=10.191.252.1;   WARNING: Heartbeat with chassis 1 slot 2 timed out.

过程分析

从网管侧的日志,可以看到3框在844分发生了堆叠分裂:

Jul 4, 2022 @ 08:44:49.463   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 15, type is LSXM1SFH08D1.

Jul 4, 2022 @ 08:44:49.463   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 14, type is LSXM1SFH08D1.

Jul 4, 2022 @ 08:44:49.441   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 13, type is LSXM1SFH08D1.

Jul 4, 2022 @ 08:44:49.432   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 12, type is LSXM1SFH08D1.

Jul 4, 2022 @ 08:44:49.425   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 11, type is LSXM1SFH08D1.

Jul 4, 2022 @ 08:44:49.410   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 10, type is LSXM1SFH08D1.

Jul 4, 2022 @ 08:44:49.360   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 5, type is LSXM1CGQ18QGHB1.

Jul 4, 2022 @ 08:44:49.293   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 4, type is LSXM1TGS48QGHA1.

Jul 4, 2022 @ 08:44:49.273   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 3, type is LSXM1QGS24HB1.

Jul 4, 2022 @ 08:44:49.266   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 2, type is LSXM1CGQ18QGHB1.

Jul 4, 2022 @ 08:44:49.250   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:04+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 3 slot 1, type is LSXM1SUPH1.

 

同时从下行的spine日志可以看到,3框被置于mad down状态(3框框号大),且1slot2的接口也down了:

%Jul  4 08:45:37:740 2022 DCXN-CLOSS-APP-spine-SW01_IDC02-S1301-1728U LLDP/5/LLDP_NEIGHBOR_AGE_OUT: -Slot=2; Nearest bridge agent neighbor aged out on port HundredGigE2/0/5 (IfIndex 503), neighbor's chassis ID is 78aa-82cf-5e00, port ID is HundredGigE3/2/0/5.

 

%Jul  4 08:45:28:740 2022 DCXN-CLOSS-APP-spine-SW01_IDC02-S1301-1728U LLDP/5/LLDP_NEIGHBOR_AGE_OUT: -Slot=2; Nearest bridge agent neighbor aged out on port HundredGigE2/0/4 (IfIndex 498), neighbor's chassis ID is 78aa-82cf-5e00, port ID is HundredGigE1/2/0/5.

 

%Jul  4 08:44:03:942 2022 DCXN-CLOSS-APP-spine-SW01_IDC02-S1301-1728U BGP/5/BGP_STATE_CHANGED:

 BGP.: 10.191.2.13 state has changed from ESTABLISHED to IDLE for physical interface configuration changed.

 

%Jul  4 08:44:03:940 2022 DCXN-CLOSS-APP-spine-SW01_IDC02-S1301-1728U IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE2/0/5 changed to down.

%Jul  4 08:44:03:938 2022 DCXN-CLOSS-APP-spine-SW01_IDC02-S1301-1728U IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE2/0/5 changed to down.

%Jul  4 08:43:29:305 2022 DCXN-CLOSS-APP-spine-SW01_IDC02-S1301-1728U IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE2/0/4 changed to down.

 

分裂后,此时1框独立运行,从日志看到844分时12槽频繁日志打印超时,并最终变成了fault状态:

Jul 4, 2022 @ 08:44:51.305   DEV Critical     BOARD_STATE_FAULT   2022-07-04T08:44:47+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/BOARD_STATE_FAULT: -DevIP=10.191.252.1; Board state changed to Fault on chassis 1 slot 2, type is LSXM1CGQ18QGHB1.

Jul 4, 2022 @ 08:44:51.278   STM Error        STM_LINK_DOWN 2022-07-04T08:44:47+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10STM/3/STM_LINK_DOWN: -DevIP=10.191.252.1; IRF port 1 went down.

Jul 4, 2022 @ 08:44:51.260   DRVPLAT Warning  DrvDebug        2022-07-04T08:44:39+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DRVPLAT/4/DrvDebug: -DevIP=10.191.252.1;   WARNING: Heartbeat with chassis 1 slot 2 timed out.

Jul 4, 2022 @ 08:44:51.258   DRVPLAT Warning  DrvDebug        2022-07-04T08:44:38+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DRVPLAT/4/DrvDebug: -DevIP=10.191.252.1-Chassis=1-Slot=15;   WARNING: Bcast IPC packets from chassis 1 slot 2 to chassis 1 slot 15 were blocked.

Jul 4, 2022 @ 08:44:51.256   DRVPLAT Warning  DrvDebug        2022-07-04T08:44:38+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DRVPLAT/4/DrvDebug: -DevIP=10.191.252.1-Chassis=1-Slot=15;   WARNING: Ucast IPC packets from chassis 1 slot 2 to chassis 1 slot 15 were blocked.

Jul 4, 2022 @ 08:44:51.254   DRVMNT Error        ERRORCODE   2022-07-04T08:44:38+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DRVMNT/3/ERRORCODE: -DevIP=10.191.252.1-Chassis=1-Slot=15; MdcId=1; ErrCode=0x6e06, GOLD: Ipc Block.

Jul 4, 2022 @ 08:44:51.186   DRVPLAT Warning  DrvDebug        2022-07-04T08:44:29+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DRVPLAT/4/DrvDebug: -DevIP=10.191.252.1;   WARNING: Heartbeat with chassis 1 slot 2 timed out.

Jul 4, 2022 @ 08:44:51.109   DRVPLAT Warning  DrvDebug        2022-07-04T08:44:19+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DRVPLAT/4/DrvDebug: -DevIP=10.191.252.1;   WARNING: Heartbeat with chassis 1 slot 2 timed out.

Jul 4, 2022 @ 08:44:50.612   DRVPLAT Warning  DrvDebug        2022-07-04T08:44:09+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DRVPLAT/4/DrvDebug: -DevIP=10.191.252.1;   WARNING: Heartbeat with chassis 1 slot 2 timed out.

由上述分析可以看出,在故障时,由于1slot2发生 IPC不通的硬件故障,且堆叠线缆集中在1slot2上,所以发生了堆叠分裂,同时因为3框号更大,现场老版本的MAD机制是将框号大的3框所有业务端口置于MAD DOWN状态,因此此时业务只能走1框。

同时因为现场确认冗余性,superspine与下行spine设备的只有1/2/0/5端口,而1slot2硬件故障了,因此大量业务转发不通,与现场业务报障现象吻合。

 

前方为了紧急恢复业务,在08:55时,尝试对1框和3框均进行掉电重启:

Jul 4, 2022 @ 08:55:30.751   DEV Critical     POWER_FAILED      2022-07-04T08:55:31+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/POWER_FAILED: -DevIP=10.191.252.1; Chassis 1 power 4 failed.

Jul 4, 2022 @ 08:55:28.346   DEV Critical     POWER_FAILED      2022-07-04T08:55:28+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/POWER_FAILED: -DevIP=10.191.252.1; Chassis 1 power 3 failed.

Jul 4, 2022 @ 08:55:25.941   DEV Critical     POWER_FAILED      2022-07-04T08:55:26+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/POWER_FAILED: -DevIP=10.191.252.1; Chassis 1 power 2 failed.

Jul 4, 2022 @ 08:55:24.336   DEV Critical     POWER_FAILED      2022-07-04T08:55:24+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/POWER_FAILED: -DevIP=10.191.252.1; Chassis 1 power 1 failed.

 

09:153框陆续启动并最终NORMAL,此时业务得以临时恢复,

Jul 4, 2022 @ 09:15:03.675   IFNET       Error        PHY_UPDOWN        2022-07-04T09:09:17+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/3/PHY_UPDOWN: -DevIP=10.191.252.1; Physical state on the interface Ten-GigabitEthernet3/4/0/2 changed to up.

Jul 4, 2022 @ 09:15:03.570   IFNET       Notice     LINK_UPDOWN      2022-07-04T09:09:17+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/5/LINK_UPDOWN: -DevIP=10.191.252.1; Line protocol state on the interface Ten-GigabitEthernet3/4/0/3 changed to up.

Jul 4, 2022 @ 09:15:03.568   IFNET       Error        PHY_UPDOWN        2022-07-04T09:09:17+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/3/PHY_UPDOWN: -DevIP=10.191.252.1; Physical state on the interface Ten-GigabitEthernet3/4/0/3 changed to up.

Jul 4, 2022 @ 09:15:03.464   IFNET       Notice     LINK_UPDOWN      2022-07-04T09:09:17+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/5/LINK_UPDOWN: -DevIP=10.191.252.1; Line protocol state on the interface Ten-GigabitEthernet3/4/0/40 changed to up.

Jul 4, 2022 @ 09:15:03.461   IFNET       Error        PHY_UPDOWN        2022-07-04T09:09:17+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/3/PHY_UPDOWN: -DevIP=10.191.252.1; Physical state on the interface Ten-GigabitEthernet3/4/0/40 changed to up.

Jul 4, 2022 @ 09:15:03.056   IFNET       Notice     LINK_UPDOWN      2022-07-04T09:09:17+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/5/LINK_UPDOWN: -DevIP=10.191.252.1; Line protocol state on the interface Ten-GigabitEthernet3/4/0/6 changed to up.

Jul 4, 2022 @ 09:15:03.053   IFNET       Notice     LINK_UPDOWN      2022-07-04T09:09:17+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/5/LINK_UPDOWN: -DevIP=10.191.252.1; Line protocol state on the interface Route-Aggregation5 changed to up.

 

但紧接着1框也陆续加载恢复normal,但是因为12槽位硬件故障了,即便掉电重启,也未能normal,所以当1框全部重启起来后,会再次触发mad 机制,将3框所有业务口mad down,导致业务再次受损:

Jul 4, 2022 @ 09:28:10.217   BFD  Notice     BFD_MAD_INTERFACE_CHANGE_STATE      2022-07-04T09:19:27+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10BFD/5/BFD_MAD_INTERFACE_CHANGE_STATE: -DevIP=10.191.252.1; BFD MAD function enabled on Vlan-interface4000 changed to the normal state.

Jul 4, 2022 @ 09:28:08.113   HA    Notice     HA_BATCHBACKUP_FINISHED       2022-07-04T09:19:25+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10HA/5/HA_BATCHBACKUP_FINISHED: -DevIP=10.191.252.1; Batch backup of standby board in chassis 1 slot 1 has finished.

Jul 4, 2022 @ 09:28:07.090   HA    Notice     HA_BATCHBACKUP_STARTED        2022-07-04T09:19:24+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10HA/5/HA_BATCHBACKUP_STARTED: -DevIP=10.191.252.1; Batch backup of standby board in chassis 1 slot 1 started.

Jul 4, 2022 @ 09:27:37.858   BFD  Notice     BFD_CHANGE_FSM         2022-07-04T09:18:55+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10BFD/5/BFD_CHANGE_FSM: -DevIP=10.191.252.1; Sess[10.191.1.21/10.191.1.22, LD/RD:8003/8030, Interface:RAGG151, SessType:Ctrl, LinkType:INET], Ver:1, Sta: INIT->UP, Diag: 0 (No Diagnostic)

Jul 4, 2022 @ 09:27:37.856   BFD  Notice     BFD_CHANGE_FSM         2022-07-04T09:18:55+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10BFD/5/BFD_CHANGE_FSM: -DevIP=10.191.252.1; Sess[10.191.1.21/10.191.1.22, LD/RD:8003/8030, Interface:RAGG151, SessType:Ctrl, LinkType:INET], Ver:1, Sta: DOWN->INIT, Diag: 0 (No Diagnostic)

Jul 4, 2022 @ 09:27:37.854   BGP Notice     BGP_STATE_CHANGED  2022-07-04T09:18:55+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10BGP/5/BGP_STATE_CHANGED: -DevIP=10.191.252.1;   BGP.DATA: 10.191.1.22 State is changed from OPENCONFIRM to ESTABLISHED.

Jul 4, 2022 @ 09:27:37.853   SYSLOG    Informational SYSLOG_RESTART   2022-07-04T08:52:59+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10SYSLOG/6/SYSLOG_RESTART: -DevIP=10.191.252.1; System restarted -- H3C Comware Software.

Jul 4, 2022 @ 09:27:28.457   IFNET       Notice     LINK_UPDOWN      2022-07-04T09:27:28+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/5/LINK_UPDOWN: -DevIP=10.191.252.1; Line protocol state on the interface Vlan-interface4000 changed to up.

Jul 4, 2022 @ 09:27:28.455   IFNET       Error        PHY_UPDOWN        2022-07-04T09:27:28+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/3/PHY_UPDOWN: -DevIP=10.191.252.1; Physical state on the interface Vlan-interface4000 changed to up.

Jul 4, 2022 @ 09:27:28.453   IFNET       Notice     LINK_UPDOWN      2022-07-04T09:27:28+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/5/LINK_UPDOWN: -DevIP=10.191.252.1; Line protocol state on the interface Ten-GigabitEthernet3/4/0/52 changed to up.

 

从下行spine设备日志中没有看到2/0/4 up日志,证实了掉电重启后1slot2依旧无法normal。同时2/0/5再次down,证实了1框重启恢复后,再次发生了MAD down

%Jul  4 09:27:28:706 2022 DCXN-CLOSS-APP-spine-SW01_IDC02-S1301-1728U IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE2/0/5 changed to down.

%Jul  4 09:27:28:704 2022 DCXN-CLOSS-APP-spine-SW01_IDC02-S1301-1728U IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE2/0/5 changed to down.

 

因此此时业务转发依旧故障,前方在09:29再次两框掉电重启,并不让1框上电:

Jul 4, 2022 @ 09:29:04.901   DEV Critical     POWER_FAILED      2022-07-04T09:20:22+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/POWER_FAILED: -DevIP=10.191.252.1; Chassis 1 power 4 failed.

Jul 4, 2022 @ 09:29:03.297   DEV Critical     POWER_FAILED      2022-07-04T09:20:20+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/POWER_FAILED: -DevIP=10.191.252.1; Chassis 1 power 3 failed.

Jul 4, 2022 @ 09:29:02.494   DEV Critical     POWER_FAILED      2022-07-04T09:20:19+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/POWER_FAILED: -DevIP=10.191.252.1; Chassis 1 power 2 failed.

Jul 4, 2022 @ 09:29:01.691   DEV Critical     POWER_FAILED      2022-07-04T09:20:19+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10DEV/2/POWER_FAILED: -DevIP=10.191.252.1; Chassis 1 power 1 failed.

 

09:473框启动恢复,最终业务全部恢复:

Jul 4, 2022 @ 09:47:37.393   IFNET       Error        PHY_UPDOWN        2022-07-04T09:47:36+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10IFNET/3/PHY_UPDOWN: -DevIP=10.191.252.1; Physical state on the interface Ten-GigabitEthernet3/4/0/3 changed to up.

Jul 4, 2022 @ 09:47:37.288   LLDP         Informational LLDP_CREATE_NEIGHBOR     2022-07-04T09:47:36+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10LLDP/6/LLDP_CREATE_NEIGHBOR: -DevIP=10.191.252.1-Chassis=3-Slot=4; Nearest bridge agent neighbor created on port Ten-GigabitEthernet3/4/0/40 (IfIndex 12588), neighbor's chassis ID is ac74-092c-e08d, port ID is Ten-GigabitEthernet1/0/28.

Jul 4, 2022 @ 09:47:37.287   SYSLOG    Informational SYSLOG_RESTART   2022-07-04T09:37:30+08:00DCXN-CLOSS-superspine-SW01_IDC03-S13021301-2132U %%10SYSLOG/6/SYSLOG_RESTART: -DevIP=10.191.252.1; System restarted -- H3C Comware Software.

 

最后隔离的1框重新上电,可以看到12槽也已经一直处于fault状态,说明现网1slot2确实已经彻底故障了:

Slot   Type                State    Subslot  Soft Ver             Patch Ver

1/0    LSXM1SUPH1          Master   0        S12508X-AF-2713      None     

1/1    LSXM1SUPH1          Standby  0        S12508X-AF-2713      None     

1/2    NONE                Fault    0        NONE                 None     

1/3    LSXM1QGS24HB1       Normal   0        S12508X-AF-2713      None  

 

综上,因为12槽发生了IPC不通的硬件故障引发了堆叠分裂,同时MAD机制将3框业务口down了,只有1框能转发业务,同时因为1框的下连spine只有1slot2一个端口,没有具备冗余性,最终导致的业务大面积受损。

 

关于IPC不通硬件故障的说明:

造成IPC不通的可能性较多,可能由于IPC芯片器件、芯片器件转发不通、CPU故障产生收发包异常。在单板normal能看到记录的异常信息。当前单板已经无法normal了,需要将单板返回后进行硬件分析。   

解决方法

     解决方案

1slot 2单板硬件故障,更换备件解决。


其他优化方案

1、建议现场堆叠线跨板卡部署,增加冗余性,避免单点故障引发堆叠分裂。

2、建议现场上下行端口跨板卡部署,增加冗余性,避免单点故障导致整机转发业务受损。

3、建议升级R2719P01版本,新版本支持MAD健康性检查。

该案例对您是否有帮助:

您的评价:1

若您有关于案例的建议,请反馈:

作者在2022-10-09对此案例进行了修订
0 个评论

该案例暂时没有网友评论

编辑评论

举报

×

侵犯我的权益 >
对根叔知了社区有害的内容 >
辱骂、歧视、挑衅等(不友善)

侵犯我的权益

×

泄露了我的隐私 >
侵犯了我企业的权益 >
抄袭了我的内容 >
诽谤我 >
辱骂、歧视、挑衅等(不友善)
骚扰我

泄露了我的隐私

×

您好,当您发现根叔知了上有泄漏您隐私的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您认为哪些内容泄露了您的隐私?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)

侵犯了我企业的权益

×

您好,当您发现根叔知了上有关于您企业的造谣与诽谤、商业侵权等内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到 zhiliao@h3c.com 邮箱,我们会在审核后尽快给您答复。
  • 1. 您举报的内容是什么?(请在邮件中列出您举报的内容和链接地址)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
  • 3. 是哪家企业?(营业执照,单位登记证明等证件)
  • 4. 您与该企业的关系是?(您是企业法人或被授权人,需提供企业委托授权书)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

抄袭了我的内容

×

原文链接或出处

诽谤我

×

您好,当您发现根叔知了上有诽谤您的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您举报的内容以及侵犯了您什么权益?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

对根叔知了社区有害的内容

×

垃圾广告信息
色情、暴力、血腥等违反法律法规的内容
政治敏感
不规范转载 >
辱骂、歧视、挑衅等(不友善)
骚扰我
诱导投票

不规范转载

×

举报说明

提出建议

    +

亲~登录后才可以操作哦!

确定

亲~检测到您登陆的账号未在http://hclhub.h3c.com进行注册

注册后可访问此模块

跳转hclhub

你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作