新加交换机===两台华为交换机(vrrp)==(bagg1000,bagg1001)6300===服务器、存储
现场新接入一台交换机,抢根了,存储侧怀疑由此导致的6300下挂存储设备间心跳超时但是存储设备未同一网段,6300二层转发,流量不过上行口,下行口又都配置了边缘端口,所以理论上抢根也不会影响边缘端口的转发时间是乱的,核对了下,当时故障时间是2012年8月16日凌晨3:10分左右。
我们的
存储直接的心跳报文在63交换机上的转发出入端口(两两对应):1/0/1==1/0/6,2/0/1==2/0/6,1/0/21==1/0/26,2/0/21==2/0/26
(1)分析了下stp的震荡,在故障的时候这个几个端口是没有震荡的:
但是这些聚合组有发送TC的记录:
===============display stp tc===============
-------------- STP slot 1 TC or TCN count -------------
MST ID Port Receive Send
0 Bridge-Aggregation1 0 154
0 Bridge-Aggregation2 0 116
0 Bridge-Aggregation3 0 114
0 Bridge-Aggregation4 0 118
0 Bridge-Aggregation5 0 126
0 Bridge-Aggregation6 0 124
0 Bridge-Aggregation21 0 154
0 Bridge-Aggregation22 0 128
0 Bridge-Aggregation23 0 138
0 Bridge-Aggregation24 0 118
0 Bridge-Aggregation25 0 126
0 Bridge-Aggregation26 0 124
0 Bridge-Aggregation1000 349 341
0 Bridge-Aggregation1001 1146 113
1 Bridge-Aggregation1 0 142
1 Bridge-Aggregation2 0 116
1 Bridge-Aggregation3 0 114
1 Bridge-Aggregation4 0 118
1 Bridge-Aggregation5 0 126
1 Bridge-Aggregation6 0 124
1 Bridge-Aggregation1000 13 308
1 Bridge-Aggregation1001 429 111
这些聚合端口接的都是服务器,并且配置了边缘端口
(2)查看,在故障时刻,其他端口存在震荡,但是是聚合口1000和聚合口1001。是连接华为侧的端口。
%Aug 16 03:07:36:707 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1001 was notified a topology change.
%Aug 16 03:07:36:843 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:07:37:412 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:07:37:858 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:07:39:412 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:07:41:411 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:08:05:208 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:08:05:480 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:17:43:571 2012 YSS-YPT-SW STP/6/STP_DETECTED_TC: Instance 0's port Bridge-Aggregation1001 detected a topology change.
%Aug 16 03:17:43:969 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:17:45:077 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:17:45:916 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:18:12:162 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
%Aug 16 03:18:22:972 2012 YSS-YPT-SW STP/6/STP_DETECTED_TC: Instance 0's port Bridge-Aggregation1001 detected a topology change.
%Aug 16 03:18:23:258 2012 YSS-YPT-SW STP/6/STP_NOTIFIED_TC: Instance 0's port Bridge-Aggregation1000 was notified a topology change.
===============display stp history===============
--------------- STP slot 1 history trace ---------------
------------------- Instance 0 ---------------------
Port Bridge-Aggregation1001
Role change : ALTE->DESI
Time : 2012/08/16 03:21:26
Port priority : 4096.1047-801d-c920 0 32768.600b-03ad-0487 0
4096.1047-801d-c920 128.159 128.2041
Designated priority : 0.1047-8020-4d20 4001 32768.600b-03ad-0487 0
32768.600b-03ad-0487 128.2041 128.2041
Port Bridge-Aggregation1001
Role change : ROOT->ALTE
Time : 2012/08/16 03:18:23
Port priority : 0.1047-8020-4d20 4001 32768.600b-03ad-0487 0
4096.1047-801d-c920 128.159 128.2041
Designated priority : 0.1047-8020-4d20 4001 32768.600b-03ad-0487 0
32768.600b-03ad-0487 128.2041 128.2041
Port Bridge-Aggregation1000
Role change : DESI->ROOT
Time : 2012/08/16 03:18:23
Port priority : 0.1047-8020-4d20 4000 32768.600b-03ad-0487 0
0.1047-8024-3410 128.161 128.2040
Designated priority : 0.1047-8020-4d20 4001 32768.600b-03ad-0487 0
32768.600b-03ad-0487 128.2040 128.2040
Port Bridge-Aggregation1001
Role change : ALTE->ROOT
Time : 2012/08/16 03:18:22
Port priority : 0.1047-8020-4d20 4001 32768.600b-03ad-0487 0
4096.1047-801d-c920 128.159 128.2041
Designated priority : 0.1047-8020-4d20 4002 32768.600b-03ad-0487 0
32768.600b-03ad-0487 128.2041 128.2041
Port Bridge-Aggregation1000
Role change : ROOT->DESI
Time : 2012/08/16 03:18:22
Port priority : 0.1047-8024-3410 0 32768.600b-03ad-0487 0
0.1047-8024-3410 128.161 128.2040
Designated priority : 0.1047-8020-4d20 4002 32768.600b-03ad-0487 0
32768.600b-03ad-0487 128.2040 128.2040
Port Bridge-Aggregation1001
Role change : ROOT->ALTE
Time : 2012/08/16 03:17:43
Port priority : 0.1047-8020-4d20 4001 32768.600b-03ad-0487 0
4096.1047-801d-c920 128.159 128.2041
Designated priority : 0.1047-8020-4d20 4001 32768.600b-03ad-0487 0
32768.600b-03ad-0487 128.2041 128.2041
跟客户沟通,客户反馈当天故障时,现场正在上新设备,且华为设备未配置根桥保护,当时现场有抢根的情况。
研发确认:
因为边缘端口配置在成员口下,而没有再聚合口下配置,所以边缘端口没生效。从环境中可以看到,设备的根桥是有在不停变化的,对于生成树的计算,我们的机制是这样的,当设备的根桥不断变化,所有参与stp计算的端口的转发状态都会重新刷新,尽管其他端口的角色没有发生改变,但是因为stp需要重新算,所以所有端口都会重新block 然后至该有的状态,这个时间要看具体的stp的网络有多大,网络越大,时间越长。通常比较小的网络,基本无感知,但实际会影响端口的转发功能。但是类似环境中有不停的stp 根桥变化时,业务可能就会感知到。
(1) 请梳理客户组网
(2) 在华为设备上配置根保护;
(3) 所有接入设备配置边缘端口和bpdu报文;
(4) 可以适当在部分设备上配置TC抑制功能。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作