Print

某局点S12508G-AF ebgp邻居闪断故障

2025-10-16 发表

问题描述

交换机S12508G-AF通过vpn实例kanjia LoopBack461.148.1.242)接口与互联网两台RREBGP202.106.194.86 202.106.194.87,以及对应v6地址)邻居,中间跨了一层中间设备,125G通过四个三层口互联上联四台大网路由器,早上921日志报HGE2/7/0/16接口因收光低而震荡,接口bfd检测会话down,与8687 bgp邻居down。截取部分关键日志如下:

%May 15 09:21:50:686 2025 csw001&002.cn-beijing-4 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE2/7/0/16 changed to down.

%May 15 09:21:50:686 2025 csw001&002.cn-beijing-4 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE2/7/0/16 changed to down.

%May 15 09:21:51:794 2025 csw001&002.cn-beijing-4 BFD/5/BFD_CHANGE_FSM: Sess[61.148.1.242/202.106.194.87, LD/RD:36989/17259, Interface:N/A, SessType:Ctrl, LinkType:INET, Tx: 400, Rx: 400, detect: 5.], Ver:1, Sta: UP->DOWN, Diag: 1 (Control Detection Time Expired)

%May 15 09:21:51:795 2025 csw001&002.cn-beijing-4 BGP/5/BGP_STATE_CHANGED: BGP.kanjia: 202.106.194.87  state has changed from ESTABLISHED to IDLE for session down event received from BFD.

 

 

 

过程分析

1、       对设备日志信息、logfile信息进行详细分析,发现设备上在BGP内以及到邻居的互通ECMP路由中都进行了BFD的联动。关键配置如下:

bgp 65293

    #

ip vpn-instance kanjia

  router-id 61.148.1.242

  peer 202.106.194.86 as-number 4808

  peer 202.106.194.86 connect-interface LoopBack4

  peer 202.106.194.86 ebgp-max-hop 10

  peer 202.106.194.86 bfd multi-hop

  peer 202.106.194.86 password cipher $c$3$w/12/GCjeep1Gmp5jHqzdJeIuCRIZRO6Oh4p

  peer 202.106.194.87 as-number 4808

  peer 202.106.194.87 connect-interface LoopBack4

  peer 202.106.194.87 ebgp-max-hop 10

  peer 202.106.194.87 bfd multi-hop

  peer 202.106.194.87 password cipher $c$3$aYKd0Hl4VaFbAIJ0ocrzM8AFvuzkpGVgeEpc

#

ip route-static vpn-instance kanjia 202.106.194.86 32 HundredGigE1/6/0/16 61.51.112.237 bfd control-packet description to-internet-HundredGigE1/6/0/16

 ip route-static vpn-instance kanjia 202.106.194.86 32 HundredGigE1/7/0/16 124.65.60.25 bfd control-packet description to-internet-HundredGigE1/7/0/16

 ip route-static vpn-instance kanjia 202.106.194.86 32 HundredGigE2/6/0/16 61.148.4.45 bfd control-packet description to-internet-HundredGigE2/6/0/16

 ip route-static vpn-instance kanjia 202.106.194.86 32 HundredGigE2/7/0/16 114.243.132.205 bfd control-packet description to-internet-HundredGigE2/7/0/16

 ip route-static vpn-instance kanjia 202.106.194.87 32 HundredGigE1/6/0/16 61.51.112.237 bfd control-packet description to-internet-HundredGigE1/6/0/16

 ip route-static vpn-instance kanjia 202.106.194.87 32 HundredGigE1/7/0/16 124.65.60.25 bfd control-packet description to-internet-HundredGigE1/7/0/16

 ip route-static vpn-instance kanjia 202.106.194.87 32 HundredGigE2/6/0/16 61.148.4.45 bfd control-packet description to-internet-HundredGigE2/6/0/16

 ip route-static vpn-instance kanjia 202.106.194.87 32 HundredGigE2/7/0/16 114.243.132.205 bfd control-packet description to-internet-HundredGigE2/7/0/16

#

 

2、       其中BGPBFD的联动主要想实现的目的是为了在存在多个BGP邻居时,能够快速检测到链路或者某个邻居异常,从而实现快速切换到其他BGP邻居。而现场通过loopback口、多条链路形成ECMP来建立邻居,此时BGP联动的BFD发送BFD协议报文时会从4条链路中挑选一条链路发送,如果选择的链路出现故障时就会导致BFD down。以现网bgp邻居 202.106.194.87为例,本交换机到邻居202.106.194.874条等价链路,bgp bfd协议报文会挑选其中一条链路发送,结合故障现象分析故障时挑选了H2/7/0/16发送,当该端口因物理链路问题down了后,设备快速检测到并通知BGP邻居down

%May 15 09:21:50:686 2025 csw001&002.cn-beijing-4 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE2/7/0/16 changed to down.

%May 15 09:21:50:686 2025 csw001&002.cn-beijing-4 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE2/7/0/16 changed to down.

 

%May 15 09:21:51:794 2025 csw001&002.cn-beijing-4 BFD/5/BFD_CHANGE_FSM: Sess[61.148.1.242/202.106.194.87, LD/RD:36989/17259, Interface:N/A, SessType:Ctrl, LinkType:INET, Tx: 400, Rx: 400, detect: 5.], Ver:1, Sta: UP->DOWN, Diag: 1 (Control Detection Time Expired)

%May 15 09:21:51:795 2025 csw001&002.cn-beijing-4 BGP/5/BGP_STATE_CHANGED: BGP.kanjia: 202.106.194.87  state has changed from ESTABLISHED to IDLE for session down event received from BFD.

%May 15 09:21:51:795 2025 csw001&002.cn-beijing-4 BGP/5/BGP_STATE_CHANGED_REASON: BGP.kanjia: 202.106.194.87  state has changed from ESTABLISHED to IDLE. (Reason: a session down event was received from BFD, Error code: Send Notificationcode 6/0)

 

%May 15 09:22:02:870 2025 csw001&002.cn-beijing-4 BGP/5/BGP_STATE_CHANGED: BGP.kanjia: 202.106.194.86  state has changed from OPENCONFIRM to ESTABLISHED.

%May 15 09:22:02:870 2025 csw001&002.cn-beijing-4 BGP/5/BGP_STATE_CHANGED: BGP.kanjia: 2408:8000:1000:3000::86  state has changed from OPENCONFIRM to ESTABLISHED.

%May 15 09:22:03:315 2025 csw001&002.cn-beijing-4 BGP/5/BGP_STATE_CHANGED: BGP.kanjia: 202.106.194.87  state has changed from OPENCONFIRM to ESTABLISHED.

%May 15 09:22:11:976 2025 csw001&002.cn-beijing-4 IFNET/3/PHY_UPDOWN: Physical state on the interface HundredGigE2/7/0/16 changed to up.

%May 15 09:22:11:976 2025 csw001&002.cn-beijing-4 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE2/7/0/16 changed to up.

关于BGP联动BFD的应用场景可以参考配置文档中的说明:

https://www.h3c.com/cn/d_202409/2273206_30005_0.htm

BGPBFD联动配置

1. 组网需求

·     AS 200内使用OSPF作为IGP协议,实现AS内的互通。

·     Switch ASwitch C之间建立两条IBGP连接。当Switch ASwitch C之间的两条路径均连通时,Switch C1.1.1.0/24之间的报文使用Switch A<>Switch B<>Switch C这条路径转发;当Switch A<>Switch B<>Switch C这条路径发生故障时,BFD能够快速检测并通告BGP协议,使得Switch A<>Switch D<>Switch C这条路径能够迅速生效。

2. 组网图

7-3 配置BGPBFD联动组网图

3、       结合现网情况,客户预期是bgp邻居 在4条链路相互备份下不出现中断,而bgp bfd功能是快速检测链路问题使bgp邻居down,功能并不是客户预期,建议取消配置BGPBFD的联动。按照现场的配置,在去掉BGP联动的BFD后,两侧邻居依旧可以通过4ECMP实现互相冗余,其中的每条静态路由都单独联动了BFD实现快速收敛,保障BGP邻居及其同步的路由的稳定。

解决方法

1、取消掉BGPBFD的联动