两台S6850搭建DRNI,通过dr口bagg401和华为设备对接。现场10日出现bagg401闪断,具体查看对应时间的日志为成员口收到的lacp异常因此未选中。但是对端查看日志也是差不多的报错,两边都说对端发的有问题。由于只是之前的一次闪断,到今天也没有再次出现过,没法debug或者抓包看报文。
(1)对应端口并未物理down,只是协议lagg down,两台设备打印的报错还不一样,具体协议down的原因设备侧日志提示如下:
DR1:这台设备报的是无D标志位和key不对
%@6141%Oct 10 07:32:03:818 2022 DR1 LAGG/6/LAGG_INACTIVE_OPERSTATE: Member port HGE1/0/27 of aggregation group BAGG401 changed to the inactive state, because the peer port did not have the Synchronization flag.
%@6142%Oct 10 07:32:03:818 2022 DR1 LAGG/6/LAGG_INACTIVE_PARTNER_KEY_WRONG: Member port HGE1/0/28 of aggregation group BAGG401 changed to the inactive state, because the operational key of the peer port was different from that of the reference port.
%@6143%Oct 10 07:32:03:822 2022 DR1 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/27 changed to down.
%@6144%Oct 10 07:32:03:822 2022 DR1 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/28 changed to down.
%@6148%Oct 10 07:32:03:855 2022 DR1 IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation401 changed to down.
%@6149%Oct 10 07:32:03:855 2022 DR1 IFNET/5/LINK_UPDOWN: Line protocol state on the interface Bridge-Aggregation401 changed to down.
其中27口down的原因是对端发的lacp报文中没有携带可聚合的标志位;28口down的原因为对端端口的操作key和参考端口不同。
DR2:这台提示的是
%@13479%Oct 10 07:32:03:835 2022 DR2 LAGG/6/LAGG_INACTIVE_OTHER: Member port HGE1/0/27 of aggregation group BAGG401 changed to the inactive state, because other reason.
%@13480%Oct 10 07:32:03:835 2022 DR2 LAGG/6/LAGG_INACTIVE_OTHER: Member port HGE1/0/28 of aggregation group BAGG401 changed to the inactive state, because other reason.
%@13481%Oct 10 07:32:03:837 2022 DR2 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/27 changed to down.
%@13482%Oct 10 07:32:03:838 2022 DR2 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/28 changed to down.
%@13483%Oct 10 07:32:03:842 2022 DR2 DRNI/6/DRNI_IFEVENT_DR_NOSELECTED: Local DR interface Bridge-Aggregation401 in DR group 401 does not have Selected member ports because the aggregate interface went down. Please check the aggregate link status.
%@13484%Oct 10 07:32:03:842 2022 DR2 DRNI/6/DRNI_IFEVENT_DR_GLOBALDOWN: The state of DR group 401 changed to down.
%@13485%Oct 10 07:32:03:880 2022 DR2 IFNET/3/PHY_UPDOWN: Physical state on the interface Bridge-Aggregation401 changed to down.
(2)基于上述信息来看似乎是对端设备发送的lacp报文存在问题,但是对端huawei设备的日志看也是指向我们设备发出的lacp存在异常:
对端设备打印日志如下:
Oct 10 2022 07:32:03+08:00 HW %%01LACP/3/LAG_DOWN_REASON_PDU(l)[258]:The member of the LACP mode Eth-Trunk interface went down because the local device received changed LACP PDU from partner. (TrunkName=Eth-Trunk27, PortName=100GE1/0/3, Reason=PartnerSyncFalse, OldParam=b1Synchronization:1, NewParam=b1Synchronization:0)
Oct 10 2022 07:32:03+08:00 HW %%01LACP/3/LAG_DOWN_REASON_PDU(l)[259]:The member of the LACP mode Eth-Trunk interface went down because the local device received changed LACP PDU from partner. (TrunkName=Eth-Trunk27, PortName=100GE0/0/4, Reason=PartnerSyncFalse, OldParam=b1Synchronization:1, NewParam=b1Synchronization:0)
Oct 10 2022 07:32:03+08:00 HW %%01LACP/3/OPTICAL_FIBER_MISCONNECT(l)[260]:The member of the LACP mode Eth-Trunk interface received an abnormal LACPDU, which may be caused by optical fiber misconnection. (TrunkName=Eth-Trunk27, PortName=100GE0/0/3, LocalParam=ActorOperPortKey:6993, PDUParam=PartnerKey:1089)
(3)查看本端最新聚合信息如下,本地设备的mac是a8c9-8a34-c4e1,对端是b008-7565-4900
Aggregate Interface: Bridge-Aggregation401
Creation Mode: Manual
Aggregation Mode: Dynamic
Loadsharing Type: Shar
Management VLANs: None
System ID: 0xa, a8c9-8a36-c4e1
Local:
Port Status Priority Index Oper-Key Flag
HGE1/0/27(R) S 32768 16392 40401 {ACDEF}
HGE1/0/28 S 32768 16393 40401 {ACDEF}
Remote:
Actor Priority Index Oper-Key SystemID Flag
HGE1/0/27 32768 2 6993 0x8000, b008-7565-4900 {ACDEF}
HGE1/0/28 32768 40 6993 0x8000, b008-7565-4900 {ACDEF}
System ID |
设备ID(由系统的LACP优先级和系统的MAC地址共同构成) |
(4)查看选中端口收到的聚合报文,聚合震荡时收到了异常报文:
通过probe视图下的display system internal link-aggregation lacp packet interface te x/0/x count 20命令可以查看到设备收到的报文,中间有个错误报文
该异常报文的解析为:
SystemID对端为32768,本端为32768
SystemMAC对端为b008-7565-4900,本端为5825-7570-a3c0
详细如下:
[ZJHZ-IXP22-NET-PE-H3C-S6850-49-probe]display system internal link-aggregation lacp packet interface h 1/0/27 count 20
Data and Time: 10/10 07:32:03.841
Packet description:
Local: SystemID=32768 SystemMAC=b008-7565-4900 Key=6993 Index=2 Priority=32768 Flag=13
Remote: SystemID=10 SystemMAC=a8c9-8a36-c4e1 Key=40401 Index=16392 Priority=32768 Flag=5 //正常对端发的lacp应该是这个
Data and Time: 10/10 07:32:03.807
Packet description:
Local: SystemID=32768 SystemMAC=b008-7565-4900 Key=1089 Index=27 Priority=32768 Flag=61
Remote: SystemID=32768 SystemMAC=5825-7570-a3c0 Key=7745 Index=54 Priority=32768 Flag=61 //端口震荡时,对端设备报文发串了,把发给5825-7570-a3c0的报文发给了我们
display system internal link-aggregation lacp packet interface te 1/0/18 count 20
对应时间点报文无问题
[ZJHZ-IXP22-NET-PE-H3C-S6850-50-probe]display system internal link-aggregation lacp packet interface h 1/0/27 count 20
Aggregate interface: Bridge-Aggregation401
Data and Time: 10/10 07:32:04.003
Packet description:
Local: SystemID=32768 SystemMAC=b008-7565-4900 Key=6993 Index=39 Priority=32768 Flag=61
Remote: SystemID=10 SystemMAC=a8c9-8a36-c4e1 Key=40401 Index=32776 Priority=32768 Flag=13
但是其他时间点也有异常报文,对应聚合端口也有震荡
Data and Time: 09/28 09:06:20.939
Packet description:
Local: SystemID=32768 SystemMAC=b008-7565-4900 Key=1089 Index=27 Priority=32768 Flag=61
Remote: SystemID=32768 SystemMAC=5825-7570-a3c0 Key=7745 Index=54 Priority=32768 Flag=61
%@13458%Sep 28 09:06:20:961 2022 ZJHZ-IXP22-NET-PE-H3C-S6850-50 LAGG/6/LAGG_INACTIVE_PARTNER_KEY_WRONG: Member port HGE1/0/27 of aggregation group BAGG401 changed to the inactive state, because the operational key of the peer port was different from that of the reference port.
%@13459%Sep 28 09:06:20:964 2022 ZJHZ-IXP22-NET-PE-H3C-S6850-50 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/27 changed to down.
Data and Time: 09/24 08:45:57.722
Packet description:
Local: SystemID=32768 SystemMAC=b008-7565-4900 Key=1089 Index=31 Priority=32768 Flag=61
Remote: SystemID=32768 SystemMAC=5825-7570-a3c0 Key=7745 Index=58 Priority=32768 Flag=61
%@13330%Sep 24 08:45:57:730 2022 ZJHZ-IXP22-NET-PE-H3C-S6850-50 LAGG/6/LAGG_INACTIVE_PARTNER_KEY_WRONG: Member port HGE1/0/27 of aggregation group BAGG401 changed to the inactive state, because the operational key of the peer port was different from that of the reference port.
%@13331%Sep 24 08:45:57:732 2022 ZJHZ-IXP22-NET-PE-H3C-S6850-50 IFNET/5/LINK_UPDOWN: Line protocol state on the interface HundredGigE1/0/27 changed to down.
1、排查对端设备发送异常lacp报文的原因。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作