Print

某局点 S12508F-AF对接CR16010H-FA聚合接口无法选中故障案例

2022-04-29 发表

组网及说明

组网如下:


问题描述

DCGW1和DCGW2分别以三层聚合与下方DRNI中的二层聚合互联,右侧DCGW2——S-EOR2的RAGG11——BAGG4中始终有三个接口是未选中的状态;将三个接口踢出聚合口能正常双up;

EOR-2HGE3/0/343/0/353/0/36RT2HGE 3/0/34/0/45/0/4

接口down皆为

Current state: UP

Line protocol state: DOWN(LAGG)

左侧的DCGW1(RT1)——S-EOR1的链路聚合都是正常的。

*Apr  9 16:21:03:670 2022 NFV-D-HDNNBO-00A-3303-0G13-E-RT-02 LAGG/7/Fsm: -MDC=1-Slot=5; HundredGigE5/0/4 FSM.PTX

 FAST_PERIODIC-->PERIODIC_TX, PeriodTimer_Expired

*Apr  9 16:21:03:670 2022 NFV-D-HDNNBO-00A-3303-0G13-E-RT-02 LAGG/7/Fsm: -MDC=1-Slot=5; HundredGigE5/0/4 FSM.PTX

 PERIODIC_TX-->FAST_PERIODIC, Short_Timeout

Debug显示时间超时,怀疑配置了短超时导致报文交互没有来及处理。

过程分析

路由器侧分析:

*Apr  9 16:21:03:501 2022 NFV-D-HDNNBO-00A-3303-0G13-E-RT-02 LAGG/7/Packet: -MDC=1-Slot=5; PACKET.HundredGigE5/0/4.send.

size=110, subtype=1, version=1

Actor: type=1, len=20, sys-pri=0x8000, sys-mac=4873-97f2-3800, key=0x9, pri=0x8000, port-index=0x18, state=0xf                    //0Xf 路由器发的状态ABCD

Partner: type=2, len=20, sys-pri=0xa, sys-mac=0000-5e02-f001, key=0x9c44, pri=0x8000, port-index=0x8012, state=0x7

Collector: type=3, len=16, col-max-delay=0x0

Terminator: type=0, len=0

 

Aggregate Interface: Route-Aggregation11

Aggregation Mode: Dynamic

Loadsharing Type: Shar

System ID: 0x8000, 4873-97f2-3800

Local:

  Port                Status   Priority Index    Oper-Key               Flag

  HGE2/0/2            S        32768    15       9                      {ABCDEF}

  HGE2/0/3            S        32768    13       9                      {ABCDEF}

  HGE3/0/2            S        32768    32       9                      {ABCDEF}

  HGE3/0/3            U        32768    30       9                      {ABCD}

  HGE4/0/3            S        32768    22       9                      {ABCDEF}

  HGE4/0/4            U        32768    18       9                      {ABCD}

  HGE5/0/3            S        32768    25       9                      {ABCDEF}

  HGE5/0/4            U        32768    24       9                      {ABCD}

 

  Local:

  Port                Status   Priority Index    Oper-Key               Flag

  HGE3/0/33(R)        S        32768    32783    40004                  {ABCDEF}

  HGE3/0/34           U        32768    32786    40004                  {ABCD}

  HGE3/0/35           U        32768    32810    40004                  {ABCD}

  HGE3/0/36           U        32768    32827    40004                  {ABCD}

 

交换机侧分析:

交换机侧看remote路由器的状态时ABC:

Aggregate Interface: Bridge-Aggregation4

Creation Mode: Manual

Aggregation Mode: Dynamic

Loadsharing Type: Shar

Management VLANs: None

System ID: 0xa, 0000-5e02-f001

Local:

  Port                Status   Priority Index    Oper-Key               Flag

  HGE3/0/33(R)        S        32768    32783    40004                  {ABCDEF}

  HGE3/0/34           U        32768    32786    40004                  {ABCD}

  HGE3/0/35           U        32768    32810    40004                  {ABCD}

  HGE3/0/36           U        32768    32827    40004                  {ABCD}

Remote:

  Actor               Priority Index    Oper-Key SystemID               Flag  

  HGE3/0/33           32768    13       9        0x8000, 4873-97f2-3800 {ABCDEF}

  HGE3/0/34           32768    30       9        0x8000, 4873-97f2-3800 {ABC}

  HGE3/0/35           32768    18       9        0x8000, 4873-97f2-3800 {ABC}

  HGE3/0/36           32768    24       9        0x8000, 4873-97f2-3800 {ABC}

 

 交换机诊断中看到有如下BDPU保护日志,看聚合口4配置,配置了stp edged-port,且为边缘端口,同时查看设备stp 发现已开启BPDU-Protection功能,同时本设备为根桥设备,网络拓扑稳定的情况下,根桥设备是不会收到网桥设备发过来的TC BPDU报文,如果根桥设备持续收到TC BPDU报文,说明网络拓扑一直处于不稳定状态,即根桥的边缘端口收到BPDU报文后,引起的端口持续震荡和网络拓扑不收敛。

 

%Apr  7 11:18:04:661 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 STP/4/STP_BPDU_PROTECTION: BPDU-Protection port Bridge-Aggregation4 received BPDUs.

%Apr  7 11:18:40:975 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 STP/4/STP_BPDU_PROTECTION: BPDU-Protection port Bridge-Aggregation4 received BPDUs.

 

日志中能看到端口的一直UP/DOWN

%Apr  6 14:15:47:923 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/34 changed to down.

%Apr  6 14:15:48:082 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to down.

%Apr  6 14:15:48:456 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to up.

%Apr  6 14:15:48:834 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to down.

%Apr  6 14:15:49:106 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to up.

%Apr  6 14:15:49:559 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to down.

%Apr  6 14:16:34:120 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to up.

%Apr  6 14:16:36:167 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to down.

%Apr  6 14:16:39:824 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to up.

%Apr  6 14:16:43:050 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to down.

%Apr  6 14:16:46:367 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to up.

%Apr  6 14:16:50:136 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to down.

%Apr  6 14:16:53:771 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to up.

%Apr  6 14:16:57:041 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to down.

%Apr  6 14:17:00:653 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to up.

%Apr  6 14:17:04:190 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to down.

%Apr  6 14:17:07:823 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to up.

%Apr  6 14:17:10:940 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to down.

%Apr  6 14:17:14:599 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/36 changed to up.

%Apr  6 14:17:17:538 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE3/0/34 changed to up.

 

 

 

解决方法

1、EOR-2设备底层端口被block了,原因是聚合口使能stp,聚合成员有选中变化,导致聚合成员有block或者unblock的操作,stp和lag线程均会操作端口的block状态,可以通过shutdown/undo shutdown 端口恢复。 聚合口震荡时间和底层block记录一致。

%Apr  7 11:18:04:971 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation4 changed to down.

%Apr  7 11:18:04:973 2022 NFV-D-HDNNBO-00A-3303-0G10-S-EOR-02 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Bridge-Aggregation4 changed to down.

 

171: 2022/4/7 11:18.40'131000 JPS set stg -1 block.

172: 2022/4/7 11:18.40'175000 JPS set stg 1 block.

173: 2022/4/7 11:18.42'292000 TRUNK set stg -1 block.

174: 2022/4/7 11:18.42'322000 TRUNK set stg 1 block.

175: 2022/4/7 11:18.42'322000 JPS set stg 1 block.

176: 2022/4/7 11:19.12'242000 TRUNK set stg 1 block.

177: 2022/4/7 11:19.12'242000 JPS set stg 1 block.

178: 2022/4/7 11:19.12'895000 TRUNK set stg -1 block.

19: 2022/4/7 11:18.3'576000 TRUNK set stg 1 block.

20: 2022/4/7 11:18.3'577000 TRUNK set stg 1 block.

21: 2022/4/7 11:18.3'858000 JPS set stg -1 block.

22: 2022/4/7 11:18.3'867000 TRUNK set stg 1 block.

23: 2022/4/7 11:18.3'867000 JPS set stg 1 block.

24: 2022/4/7 11:18.3'868000 JPS set stg -1 block.

2、后续会发布补丁R2820H01解决