• 全部
  • 经验案例
  • 典型配置
  • 技术公告
  • FAQ
  • 漏洞说明
  • 全部
  • 全部
  • 大数据引擎
  • 知了引擎
产品线
搜索
取消
案例类型
发布者
是否解决
是否官方
时间
搜索引擎
匹配模式
高级搜索

某局点S6800交换机EVPN组网堆叠切换丢包时间长

  • 0关注
  • 0收藏 2351浏览
粉丝:3人 关注:1人

组网及说明

某局点两组堆叠中间通过各个设备的48口相连,跑的是运营商的波分线路,用于连接两个数据中心,堆叠设备的48口做了三层动态聚合。并且两组堆叠设备间建立了用于数据中心间互联的VXLAN隧道,该隧道由两组堆叠设备间建立的EVPN自动创建。两组设备间underlay采用OSPF,并建立BGP EVPN邻居,借此建立起tunnel 0用于传输数据中心间流量。OSPF进程上配置了BFDGR,以保证堆叠切换时能够减少丢包数量。两组设备下联分别用二层聚合接口连接AB两台数据中心汇聚设备,并且在二层聚合口上起AC,用于数据中心间流量的加解封装。AB设备上起三层聚合口用于对接DCI设备,并切在AB三层聚合口上起同网段的地址互联。


问题描述

按照上述组网,AB之间通信正常,  现场当时通过依次重启1234设备的方式进行堆叠切换测试(即每次重启的均是堆叠主设备),测试方式为AB之间通过同网段地址互ping123设备的重启切换均正常,切换时间一般都在5-6秒左右。但是在进行4设备的重启时,发生以下现象:刚重启时ABping丢包同样在5秒左右,然后恢复,但是过了10秒左右,又发生丢包,并持续20多秒,然后恢复。

过程分析

1、日志分析

%Oct 20 22:51:38:815 2021 XX-B01-N08-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface Tunnel0 changed to down.

%Oct 20 22:51:38:816 2021 XX-B01-N08-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Tunnel0 changed to down.  //开始丢包

%Oct 20 22:51:46:874 2021 XX-B01-N08-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface Tunnel0 changed to up.

%Oct 20 22:51:46:875 2021 XX-B01-N08-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Tunnel0 changed to up.  //恢复

%Oct 20 22:51:52:570 2021 XX-B01-N08-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface Tunnel0 changed to down.

%Oct 20 22:51:52:570 2021 XX-B01-N08-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Tunnel0 changed to down.  //重新丢包

%Oct 20 22:52:16:057 2021 XX-B01-N08-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface Tunnel0 changed to up.

%Oct 20 22:52:16:059 2021 XX-B01-N08-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Tunnel0 changed to up.  //重新恢复

从日志能看出该问题的原因是tunnel 0在整个过程中down/up了两次,并且第二次的时间有20多秒,与丢包的时长也吻合;而123设备重启时,均只有一次tunnel 0down/up

2、异常丢包分析

从反馈的信息来看,现场1-2堆叠,3-4堆叠,当设备3重启完成后再重启设备4。从日志中查看,故障时设备1-2上在22:51:46OSPF邻居恢复fulltunnel 0up起来了;但是过了大约5S后(22:51:52OSPF邻居又变成了exstart,因而导致underlay网络中断、tunnel 0 down。又过了2S22:51:54OSPF邻居恢复,同时伴随着BFD会话由downup,而直至22:52:11 BGP邻居重新建立,tunnel 0重新up后网络通信可达。查看设备3上对应时间点的日志信息可以看到,OSPF邻居down的同时也有bfd会话down的信息。

%Oct 20 22:51:38:696 2021 XX-B01-N08-DCI-ZDS-6800 BFD/5/BFD_CHANGE_FSM: Sess[10.130.254.1/10.130.254.6, LD/RD:2006/2004, Interface:RAGG100, SessType:Ctrl, LinkType:INET], Ver:1, Sta: UP->DOWN, Diag: 1 (Control Detection Time Expired)

%Oct 20 22:51:38:698 2021 XX-B01-N08-DCI-ZDS-6800 OSPF/6/OSPF_LAST_NBR_DOWN: OSPF 1 Last neighbor down event: Router ID: 10.130.253.101 Local address: 10.130.254.1 Remote address: 10.130.254.6 Reason: BFD session down.

初步怀疑OSPF邻居中断应该是因为BFD检测出现问题,导致BFD会话down了,因此将OSPF邻居给down了。

#

interface Route-Aggregation100

ospf 1 area 0.0.0.0

ospf bfd enable

link-aggregation mode dynamic

bfd min-transmit-interval 1000

bfd min-receive-interval 1000

bfd detect-multiplier 3

#

 

  ===============display bfd session verbose=============== 

 Total Session Num: 2     Up Session Num: 1     Init Mode: Active

 

       Local Discr: 2006                 Remote Discr: 2006

         Source IP: 1.1.1.1       Destination IP: 1.1.1.2

     Session State: Up                      Interface: Route-Aggregation100

      Min Tx Inter: 1000ms               Act Tx Inter: 1000ms

      Min Rx Inter: 1000ms               Detect Inter: 3000ms

          Rx Count: 785                      Tx Count: 782                      //这里bfd报文收发数不一致,有可能是这个导致bfd down

      Connect Type: Direct             Running Up for: 00:11:18

         Hold Time: 2436ms                  Auth mode: None

       Detect Mode: Async                        Slot: 1

          Protocol: OSPF           

           Version: 1              

         Diag Info: No Diagnostic  

3       复现分析

由于设备配置了bfd,且现场设备并未设置irf链路down延迟上报时间。根据手册中的说明,在存在bfdGR等功能时,建议将irf link-delay设置为0,避免不必要的切换中断。

1. 2. 配置限制和指导

如果某些协议配置的超时时间小于延迟上报时间(例如CFDOSPF等),该协议将超时。此时请适当调整IRF链路down的延迟上报时间或者该协议的超时时间,使IRF链路down的延迟上报时间小于协议超时时间,保证协议状态不会发生不必要的切换。

下列情况下,建议将IRF链路down延迟上报时间配置为0

·     对主备倒换速度和IRF链路切换速度要求较高时

·     IRF环境中使用RRPPBFDGR功能

·     在执行关闭IRF物理端口或重启IRF成员设备的操作之前,请首先将IRF链路down延迟上报时间配置为0,待操作完成后再将其恢复为之前的值

发现该问题后,现场已不具备继续测试的条件,于是实验室搭建环境进行复现,结果如下:

 

1)未配置irf link-delay 0复现问题

经过几次主备切换,在1-2堆叠主设备重启过程中,在3-4堆叠打印如下:

Tunnel0 恢复后,又经过down,up

<QSH-NET06-DCI-ZDS-6800>%Nov 16 15:09:30:369 2021 QSH-NET06-DCI-ZDS-6800 LAGG/6/LAGG_INACTIVE_CONFIGURATION: Member port FGE1/0/49 of aggregation group RAGG100 changed to the inactive state, because the aggregation configuration of the port is incorrect.

%Nov 16 15:09:30:383 2021 QSH-NET06-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface FortyGigE1/0/49 changed to down.

%Nov 16 15:09:34:532 2021 QSH-NET06-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface FortyGigE1/0/49 changed to down.

%Nov 16 15:09:37:502 2021 QSH-NET06-DCI-ZDS-6800 BFD/5/BFD_CHANGE_FSM: Sess[10.130.254.6/10.130.254.1, LD/RD:2002/2002, Interface:RAGG100, SessType:Ctrl, LinkType:INET], Ver:1, Sta: UP->DOWN, Diag: 1 (Control Detection Time Expired)

%Nov 16 15:09:37:505 2021 QSH-NET06-DCI-ZDS-6800 OSPF/5/OSPF_NBR_CHG: OSPF 1 Neighbor 10.130.254.1(Route-Aggregation100) changed from FULL to DOWN.

%Nov 16 15:09:37:597 2021 QSH-NET06-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface Tunnel0 changed to down.

%Nov 16 15:09:37:598 2021 QSH-NET06-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Tunnel0 changed to down.

%Nov 16 15:09:44:840 2021 QSH-NET06-DCI-ZDS-6800 OSPF/5/OSPF_NBR_CHG: OSPF 1 Neighbor 10.130.254.1(Route-Aggregation100) changed from LOADING to FULL.

%Nov 16 15:09:45:212 2021 QSH-NET06-DCI-ZDS-6800 BGP/5/BGP_STATE_CHANGED: BGP.: 10.130.253.1 state has changed from ESTABLISHED to IDLE for two connections exist and MD5 authentication is configured for the neighbor.

%Nov 16 15:09:45:531 2021 QSH-NET06-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface Tunnel0 changed to up.

%Nov 16 15:09:45:532 2021 QSH-NET06-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Tunnel0 changed to up.

%Nov 16 15:09:48:946 2021 QSH-NET06-DCI-ZDS-6800 OSPF/5/OSPF_NBR_CHG: OSPF 1 Neighbor 10.130.254.1(Route-Aggregation100) changed from FULL to EXSTART.

%Nov 16 15:09:48:956 2021 QSH-NET06-DCI-ZDS-6800 OSPF/5/OSPF_NBR_CHG: OSPF 1 Neighbor 10.130.254.1(Route-Aggregation100) changed from LOADING to FULL.

%Nov 16 15:09:48:959 2021 QSH-NET06-DCI-ZDS-6800 BFD/5/BFD_CHANGE_FSM: Sess[10.130.254.6/10.130.254.1, LD/RD:2002/2004, Interface:RAGG100, SessType:Ctrl, LinkType:INET], Ver:1, Sta: DOWN->INIT, Diag: 0 (No Diagnostic)

%Nov 16 15:09:48:959 2021 QSH-NET06-DCI-ZDS-6800 BFD/5/BFD_CHANGE_FSM: Sess[10.130.254.6/10.130.254.1, LD/RD:2002/2004, Interface:RAGG100, SessType:Ctrl, LinkType:INET], Ver:1, Sta: INIT->UP, Diag: 0 (No Diagnostic)

%Nov 16 15:09:49:460 2021 QSH-NET06-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface Tunnel0 changed to down.

%Nov 16 15:09:49:460 2021 QSH-NET06-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Tunnel0 changed to down.

%Nov 16 15:10:10:214 2021 QSH-NET06-DCI-ZDS-6800 BGP/5/BGP_STATE_CHANGED: BGP.: 10.130.253.1 state has changed from OPENCONFIRM to ESTABLISHED.

%Nov 16 15:10:14:386 2021 QSH-NET06-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface Tunnel0 changed to up.

%Nov 16 15:10:14:387 2021 QSH-NET06-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Tunnel0 changed to up.

%Nov 16 15:11:32:002 2021 QSH-NET06-DCI-ZDS-6800 LLDP/5/LLDP_NEIGHBOR_AGE_OUT: Nearest bridge agent neighbor aged out on port FortyGigE1/0/49 (IfIndex 49), neighbor's chassis ID is 000f-0000-0002, port ID is FortyGigE1/0/49.

 

2)在1-23-4堆叠配置irf link-delay 0

多次主备切换过程,都没有打印tunnel0 down ,up的现象,故障消除

%Nov 17 10:04:47:689 2021 QSH-NET06-DCI-ZDS-6800 LAGG/6/LAGG_INACTIVE_CONFIGURATION: Member port FGE1/0/49 of aggregation group RAGG100 changed to the inactive state, because the aggregation configuration of the port is incorrect.

%Nov 17 10:04:47:711 2021 QSH-NET06-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface FortyGigE1/0/49 changed to down.

%Nov 17 10:04:52:083 2021 QSH-NET06-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface FortyGigE1/0/49 changed to down.

%Nov 17 10:04:56:312 2021 QSH-NET06-DCI-ZDS-6800 BGP/5/BGP_STATE_CHANGED: BGP.: 10.130.253.1 state has changed from ESTABLISHED to IDLE for two connections exist and MD5 authentication is configured for the neighbor.

%Nov 17 10:05:02:231 2021 QSH-NET06-DCI-ZDS-6800 OSPF/5/OSPF_NBR_CHG: OSPF 1 Neighbor 10.130.254.1(Route-Aggregation100) changed from FULL to EXSTART.

%Nov 17 10:05:02:238 2021 QSH-NET06-DCI-ZDS-6800 OSPF/5/OSPF_NBR_CHG: OSPF 1 Neighbor 10.130.254.1(Route-Aggregation100) changed from LOADING to FULL.

%Nov 17 10:05:04:286 2021 QSH-NET06-DCI-ZDS-6800 BFD/5/BFD_CHANGE_FSM: Sess[10.130.254.6/10.130.254.1, LD/RD:2004/2002, Interface:RAGG100, SessType:Ctrl, LinkType:INET], Ver:1, Sta: DOWN->INIT, Diag: 0 (No Diagnostic)

%Nov 17 10:05:04:288 2021 QSH-NET06-DCI-ZDS-6800 BFD/5/BFD_CHANGE_FSM: Sess[10.130.254.6/10.130.254.1, LD/RD:2004/2002, Interface:RAGG100, SessType:Ctrl, LinkType:INET], Ver:1, Sta: INIT->UP, Diag: 0 (No Diagnostic)

%Nov 17 10:05:21:312 2021 QSH-NET06-DCI-ZDS-6800 BGP/5/BGP_STATE_CHANGED: BGP.: 10.130.253.1 state has changed from OPENCONFIRM to ESTABLISHED.

%Nov 17 10:06:40:704 2021 QSH-NET06-DCI-ZDS-6800 LLDP/5/LLDP_NEIGHBOR_AGE_OUT: -Slot=1; Nearest bridge agent neighbor aged out on port FortyGigE1/0/49 (IfIndex 49), neighbor's chassis ID is 000f-0000-0002, port ID is FortyGigE1/0/49.

%Nov 17 10:10:38:835 2021 QSH-NET06-DCI-ZDS-6800 IFNET/3/PHY_UPDOWN: Physical state on the interface FortyGigE1/0/49 changed to up.

%Nov 17 10:10:38:868 2021 QSH-NET06-DCI-ZDS-6800 LAGG/6/LAGG_ACTIVE: Member port FGE1/0/49 of aggregation group RAGG100 changed to the active state.

%Nov 17 10:10:38:888 2021 QSH-NET06-DCI-ZDS-6800 IFNET/5/LINK_UPDOWN: Line protocol state on the interface FortyGigE1/0/49 changed to up.

%Nov 17 10:10:39:966 2021 QSH-NET06-DCI-ZDS-6800 LLDP/6/LLDP_CREATE_NEIGHBOR: -Slot=1; Nearest bridge agent neighbor created on port FortyGigE1/0/49 (IfIndex 49), neighbor's chassis ID is 000f-0000-0002, port ID is FortyGigE1/0/49.

%Nov 17 10:14:56:812 2021 QSH-NET06-DCI-ZDS-6800 NTP/5/NTP_CLOCK_CHANGE: System clock changed from 10:14:56:271 11/17/2021 to 10:14:56:810 11/17/2021, the NTP server's IP address is 10.130.254.1.

4     综合以上,判断现场堆叠设备因未配置irf link-delay 0,导致堆叠切换时bfd进程未及时切换,造成bfd报文丢失、会话down。虽然OSPF进程因为配置了GR而切换了,但是bfd会话down之后会把OSPF邻居也给down掉,造成后续的bgp downtunnel 0 down的情况,形成网络不通的现象。而修改link-delay后,bfd会话会直接init失效,然后马上切换到新的主设备后恢复。

解决方法

在两侧的堆叠设备上配置irf link-delay 0可以解决该问题。

该案例对您是否有帮助:

您的评价:1

若您有关于案例的建议,请反馈:

作者在2021-11-30对此案例进行了修订
0 个评论

该案例暂时没有网友评论

编辑评论

举报

×

侵犯我的权益 >
对根叔知了社区有害的内容 >
辱骂、歧视、挑衅等(不友善)

侵犯我的权益

×

泄露了我的隐私 >
侵犯了我企业的权益 >
抄袭了我的内容 >
诽谤我 >
辱骂、歧视、挑衅等(不友善)
骚扰我

泄露了我的隐私

×

您好,当您发现根叔知了上有泄漏您隐私的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您认为哪些内容泄露了您的隐私?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)

侵犯了我企业的权益

×

您好,当您发现根叔知了上有关于您企业的造谣与诽谤、商业侵权等内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到 zhiliao@h3c.com 邮箱,我们会在审核后尽快给您答复。
  • 1. 您举报的内容是什么?(请在邮件中列出您举报的内容和链接地址)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
  • 3. 是哪家企业?(营业执照,单位登记证明等证件)
  • 4. 您与该企业的关系是?(您是企业法人或被授权人,需提供企业委托授权书)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

抄袭了我的内容

×

原文链接或出处

诽谤我

×

您好,当您发现根叔知了上有诽谤您的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您举报的内容以及侵犯了您什么权益?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

对根叔知了社区有害的内容

×

垃圾广告信息
色情、暴力、血腥等违反法律法规的内容
政治敏感
不规范转载 >
辱骂、歧视、挑衅等(不友善)
骚扰我
诱导投票

不规范转载

×

举报说明

提出建议

    +

亲~登录后才可以操作哦!

确定

亲~检测到您登陆的账号未在http://hclhub.h3c.com进行注册

注册后可访问此模块

跳转hclhub

你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作