如下图交换机均是单台运行,均开启了STP
SW1设备是根桥优先级4096,SW2是备份根桥优先级8192, 两台设备聚合组bridge-agg1互联。
日志中LU_DS01设备的bridge-agg01接口收到次优bpdu后,网管就告警有接入交换机脱管。 近期没有过组网变动和配置变更
LU_DS01的日志,最近经常不定时出现dispute日志:
%Jan 11 17:49:21:245 2022 XJBT1_LU_DS_01 SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=10.144.16.133-User=daiwenli; Command is dis lldp neighbor-information list
%Jan 11 17:49:29:289 2022 XJBT1_LU_DS_01 STP/4/STP_DISPUTE: Instance 0's port Bridge-Aggregation1 received an inferior BPDU from a designated port which is in forwarding or learning state.
%Jan 11 17:49:29:539 2022 XJBT1_LU_DS_01 STP/6/STP_NOTIFIED_TC: -Slot=4; Instance 0's port GigabitEthernet4/0/6 was notified a topology change.
%Jan 11 17:49:59:899 2022 XJBT1_LU_DS_01 STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1 was notified a topology change.
%Jan 11 17:51:33:865 2022 XJBT1_LU_DS_01 SSHS/6/SSHS_DISCONNECT: SSH user daiwenli (IP: 10.144.16.133) disconnected from the server.
%Jan 11 17:51:34:846 2022 XJBT1_LU_DS_01 SHELL/5/SHELL_LOGOUT: daiwenli logged out from 10.144.16.133.
%Jan 11 18:26:34:287 2022 XJBT1_LU_DS_01 STP/4/STP_DISPUTE: Instance 0's port Bridge-Aggregation1 received an inferior BPDU from a designated port which is in forwarding or learning state.
相应地LU_DS02的日志,显示是因为超时,自己成为根,然后才发出bpdu报文导致01设备dispute的:
%Jan 11 17:48:30:725 2022 XJBT1_LU_DS_02 SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=10.144.16.133-User=daiwenli; Command is dis link-aggregation verbose Bridge-Aggregation 1
%Jan 11 17:49:29:283 2022 XJBT1_LU_DS_02 STP/5/STP_BPDU_RECEIVE_EXPIRY: Instance 0's port Bridge-Aggregation1 received no BPDU within the rcvdInfoWhile interval. Information of the port aged out.
%Jan 11 17:49:29:485 2022 XJBT1_LU_DS_02 STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1 was notified a topology change.
%Jan 11 17:49:31:784 2022 XJBT1_LU_DS_02 STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1 was notified a topology change.
%Jan 11 17:49:44:142 2022 XJBT1_LU_DS_02 SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=10.144.16.133-User=daiwenli; Command is dis logbuffer reverse
%Jan 11 17:50:00:434 2022 XJBT1_LU_DS_02 STP/6/STP_DETECTED_TC: -Slot=4; Instance 0's port GigabitEthernet4/0/5 detected a topology change.
%Jan 11 17:50:08:333 2022 XJBT1_LU_DS_02 SHELL/6/SHELL_CMD: -Line=vty0-IPAddr=10.144.16.133-User=daiwenli; Command is dis lldp neighbor-information list
%Jan 11 17:51:33:865 2022 XJBT1_LU_DS_02 SSHS/6/SSHS_DISCONNECT: SSH user daiwenli (IP: 10.144.16.133) disconnected from the server.
%Jan 11 17:51:34:882 2022 XJBT1_LU_DS_02 SHELL/5/SHELL_LOGOUT: daiwenli logged out from 10.144.16.133.
%Jan 11 18:26:34:283 2022 XJBT1_LU_DS_02 STP/5/STP_BPDU_RECEIVE_EXPIRY: Instance 0's port Bridge-Aggregation1 received no BPDU within the rcvdInfoWhile interval. Information of the port aged out.
%Jan 11 18:26:34:514 2022 XJBT1_LU_DS_02 STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1 was notified a topology change.
因为两台设备直连,没有链路质量问题,综上两台设备的日志,怀疑是设备偶尔无法处理STP BPDU报文或者自己无法发出STP BPDU报文,导致备设备打印超时,或者dispute。
于是进一步去查看两台设备上送cpu报文情况,发现主设备存在大量的vrrp报文上送(超阈值),cpu存在繁忙的情况:
ID Type RcvPps Rcv_All DisPkt_All Pps Dyn Swi Hash ACLmax
4 VRRP 302 704490299 4819646 300 S On SMAC 8
从debug打印看,是收到自己发出去的vrrp报文,通过流统看到是防火墙回弹的:
[XJBT1_LU_DS_01-probe]dis qos policy interface inbound
Interface: GigabitEthernet3/0/11
Direction: Inbound
Policy: vrrp-deny
Classifier: vrrp-deny
Operator: AND
Rule(s) :
If-match acl 3600
Behavior: vrrp-deny
Accounting enable:
0 (Packets)
Interface: GigabitEthernet3/0/12
Direction: Inbound
Policy: vrrp-deny
Classifier: vrrp-deny
Operator: AND
Rule(s) :
If-match acl 3600
Behavior: vrrp-deny
Accounting enable:
23548 (Packets)
走读版本代码,确认老版本存在cpu调度优化不足问题,当大量协议报文上送cpu的时候,可能会导致其它协议报文无法及时处理,出现超时的情况。与现场情况吻合。
1.排除FW回弹问题规避。
2.升级R7596P10软件版本优化CPU版本调度解决。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作