某局点新区与老区机房互联,在GZ_NEW_AS(S7506E堆叠)与下接的业务区(分为4个组,每组两台S75堆叠)互联,并配置聚合,方通业务VLAN(VLAN100-399)后,时间大概在9月11日15:40,15:50左右发现老区侧业务中断,4点左右将新区GZ_NEW_AS01与下接业务AC区域配置的聚合口down掉后,老区业务恢复。在我司设备上没有看到相关的日志信息,在HW设备OP_DS_01及OP_DS_02设备上发现有类似MAC漂移的日志,业务的网关在HW设备上。
以业务VLAN 210里的一个MAC:0050-56b2-914d为例
OM-AS01连接着旧机房核心,这台设备上的漂移都与AGG7相关 ,其他接口漂移到AGG7
slot 3 chip 0
MacAddress Vlan Agg Mod Port ->Agg Mod Port Cnt LatestTime Del
0 :50:56:97:69:d3 202 1 0 6 ->1 0 3 15 2019/9 /11 16:0 :17 1 Lagg7 -> Lagg4
0 :0 :5e:0 :1 :1 104 1 0 6 ->1 0 3 20 2019/9 /11 16:0 :17 1 Lagg7 -> Lagg4
0 :50:56:b2:c6:56 215 1 0 6 ->1 0 2 13 2019/9 /11 16:0 :17 1 Lagg7 -> Lagg3
0 :23:e9:44:3e:4b 218 1 0 6 ->1 0 3 19 2019/9 /11 16:0 :17 1 Lagg7 -> Lagg4
0 :50:56:b2:91:4d 210 1 0 6 ->1 0 0 24 2019/9 /11 16:0 :17 1 Lagg7 -> Lagg1
0 :23:e9:44:3e:46 204 1 0 6 ->1 0 3 65 2019/9 /11 16:0 :17 1 Lagg7 -> Lagg4
0 :50:56:b2:be:66 203 1 0 6 ->1 0 0 56 2019/9 /11 16:0 :18 1 Lagg7 ->
正常的MAC是从AGG1学习到,漂移到AGG7(也就是连接旧机房核心),然后最终漂回来
0050-56b2-914d 210 Learned Bridge-Aggregation1 AGING
这台设备竟然开了STP,且设备上STP状态是正常的,一般情况不应该会存在漂移
===============display stp brief===============
=====================================================
MSTID Port Role STP State Protection
0 Bridge-Aggregation1 DESI FORWARDING NONE
0 Bridge-Aggregation2 DESI FORWARDING NONE
0 Bridge-Aggregation3 DESI FORWARDING NONE
0 Bridge-Aggregation4 DESI FORWARDING NONE
0 Bridge-Aggregation5 DESI FORWARDING NONE
0 Bridge-Aggregation6 DESI FORWARDING NONE
0 Bridge-Aggregation7 DESI FORWARDING NONE
0 GigabitEthernet5/0/5 DESI FORWARDING NONE
0 Ten-GigabitEthernet7/0/1 DESI FORWARDING NONE
0 Ten-GigabitEthernet7/0/2 DESI FORWARDING NONE
查看了下AGG7接口的配置,AGG7被配置成了边缘端口,所以没法触发自环阻塞
#
interface Bridge-Aggregation7
description TO-GZ1_OLD_SW1_BAGG6
port link-type trunk
undo port trunk permit vlan 1
port trunk permit vlan 100 to 399 500 to 598 656 679 697 to 699
link-aggregation mode dynamic
stp edged-port enable
#
沿途查看设备漂移记录,最终漂移点在GZ1_OP_NEW_AS01这台设备上,漂移在聚合口成员口以及成员口与聚合口间,且次数不对应,仅仅一两次记录。
slot 2 chip 0
MacAddress Vlan Agg Mod Port ->Agg Mod Port Cnt LatestTime Del
0 :50:56:80:43:92 105 0 20 2 ->1 0 0 1 2019/9 /11 23:28:43 1 XGE2/2/0/10(xe1) -> Lagg30
0 :23:e9:de:e8:a 217 0 20 2 ->1 0 0 2 2019/9 /11 23:28:43 1 XGE2/2/0/10(xe1) -> Lagg30
0 :0 :5e:0 :1 :1 104 0 20 2 ->1 0 0 2 2019/9 /11 23:28:43 1 XGE2/2/0/10(xe1) -> Lagg30
0 :0 :5e:0 :1 :1 216 0 20 2 ->1 0 0 2 2019/9 /11 23:28:43 1 XGE2/2/0/10(xe1) -> Lagg30
0 :23:e9:44:3e:46 204 0 20 2 ->1 0 0 1 2019/9 /11 23:28:43 1 XGE2/2/0/10(xe1) -> Lagg30
ec:d6:8a:19:c3:58 213 0 20 2 ->1 0 0 1 2019/9 /11 23:28:43 1 XGE2/2/0/10(xe1) -> Lagg30
0 :50:56:80:40:e5 105 0 20 2 ->1 0 0 1 2019/9 /11 23:28:43 1 XGE2/2/0/10(xe1) -> Lagg30
0 :0 :5e:0 :1 :1 202 0 20 2 ->1 0 0 1 2019/9 /11 23:28:43 1 XGE2/2/0/10(xe1) -> Lagg30
0 :50:56:80:19:f2 215 0 20 2 ->1 0 0 1 2019/9 /11 23:28:43 1 XGE2/2/0/10(xe1) -> Lagg30
0 :0 :5e:0 :1 :1 211 0 20 2 ->1 0 0 2 2019/9 /11 23:28:44 1 XGE2/2/0/10(xe1) -> Lagg30
5c:f3:fc:f :c2:be 104 0 20 2 ->1 0 0 3 2019/9 /11 23:28:44 1 XGE2/2/0/10(xe1) -> Lagg3
结合现场反馈的操作记录最终找到原因:
现场操作人员开始操作新核心AGG32口,并且放通VLAN等配置
% Wrong parameter found at '^' position.
[GZ1_NEW_AS01]#
[GZ1_NEW_AS01]interface Bridge-Aggregation32
[GZ1_NEW_AS01-Bridge-Aggregation32] description TO-GZ1_OP_NEW_AS01_BAGG1
[GZ1_NEW_AS01-Bridge-Aggregation32]
[GZ1_NEW_AS01-Bridge-Aggregation32]#
[GZ1_NEW_AS01-Bridge-Aggregation32]interface Ten-GigabitEthernet1/2/0/11
[GZ1_NEW_AS01-Ten-GigabitEthernet1/2/0/11] description TO-GZ1_OP_NEW_AS01_XGE1/2/0/5
[GZ1_NEW_AS01-Ten-GigabitEthernet1/2/0/11] port link-aggregation group 32
[GZ1_NEW_AS01-Ten-GigabitEthernet1/2/0/11] undo shutdown
[GZ1_NEW_AS01-Ten-GigabitEthernet1/2/0/11]#
[GZ1_NEW_AS01-Ten-GigabitEthernet1/2/0/11]interface Ten-GigabitEthernet2/2/0/11
[GZ1_NEW_AS01-Ten-GigabitEthernet2/2/0/11] description TO-GZ1_OP_NEW_AS01_XGE1/2/0/6
[GZ1_NEW_AS01-Ten-GigabitEthernet2/2/0/11] port link-aggregation group 32
[GZ1_NEW_AS01-Ten-GigabitEthernet2/2/0/11] undo shutdown
[GZ1_NEW_AS01-Ten-GigabitEthernet2/2/0/11]#
[GZ1_NEW_AS01-Ten-GigabitEthernet2/2/0/11]interface Bridge-Aggregation32
[GZ1_NEW_AS01-Bridge-Aggregation32] stp disable
#Sep 11 15:32:26:777 2019 GZ1_NEW_AS01 IFNET/4/INTERFACE UPDOWN:
Trap 1.3.6.1.6.3.1.1.5.4<linkUp>: Interface 20709386 is Up, ifAdminStatus is 1, ifOperStatus is 1
#Sep 11 15:32:26:825 2019 GZ1_NEW_AS01 LAGG/1/AggPortRecoverActive:
Trap 1.3.6.1.4.1.2011.5.25.25.2.4<hwAggPortActiveNotification>: Aggregation Group 32: port member 20709386 becomes ACTIVE!
#Sep 11 15:32:26:887 2019 GZ1_NEW_AS01 IFNET/4/INTERFACE UPDOWN:
Trap 1.3.6.1.6.3.1.1.5.4<linkUp>: Interface 609419295 is Up, ifAdminStatus is 1, ifOperStatus is 1
[GZ1_NEW_AS01-Bridge-Aggregation32]
%Sep 11 15:32:26:908 2019 GZ1_NEW_AS01 IFNET/3/LINK_UPDOWN: Ten-GigabitEthernet1/2/0/11 link status is UP.
%Sep 11 15:32:26:939 2019 GZ1_NEW_AS01 LAGG/5/LAGG_ACTIVE: Member port Ten-GigabitEthernet1/2/0/11 of aggregation group BAGG32 becomes ACTIVE.p //这里只有1/2/0/11选中,2/2/0/11当时处于down的状态
%Sep 11 15:32:26:959 2019 GZ1_NEW_AS01 IFNET/3/LINK_UPDOWN: Bridge-Aggregation32 link status is UP.ort link-type trunk
[GZ1_NEW_AS01-Bridge-Aggregation32] undo port trunk permit vlan 1
Please wait... Done.
Configuring Ten-GigabitEthernet1/2/0/11... Done.
Configuring Ten-GigabitEthernet2/2/0/11... Done.
[GZ1_NEW_AS01-Bridge-Aggregation32]port trunk permit vlan 100 to 399
Please wait...... Done.
Configuring Ten-GigabitEthernet1/2/0/11...... Done.
Configuring Ten-GigabitEthernet2/2/0/11...... Done.
[GZ1_NEW_AS01-Bridge-Aggregation32]port trunk permit vlan 500 to 598
Please wait... Done.
Configuring Ten-GigabitEthernet1/2/0/11... Done.
Configuring Ten-GigabitEthernet2/2/0/11... Done.
[GZ1_NEW_AS01-Bridge-Aggregation32] undo shutdown
现场查看聚合32口,里面有未选中端口,因为2/2/0/11端口是down的,新核心这侧聚合是静态聚合
[GZ1_NEW_AS01]display link-aggregation verbose Bridge-Aggregation 32
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Port Status: S -- Selected, U -- Unselected
Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,
D -- Synchronization, E -- Collecting, F -- Distributing,
G -- Defaulted, H -- Expired
Aggregation Interface: Bridge-Aggregation32
Aggregation Mode: Static
Loadsharing Type: Shar
Port Status Priority Oper-Key
--------------------------------------------------------------------------------
XGE1/2/0/11 S 32768 4
XGE2/2/0/11 U 32768 4
后现场返回到OP_NEW_AS01设备上操作,查看聚合30的成员口是否正正确
[GZ1_OP_NEW_AS01]interface g
[GZ1_OP_NEW_AS01]interface te
[GZ1_OP_NEW_AS01]interface Ten-GigabitEthernet 1/2/0/10
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]dis this
#
interface Ten-GigabitEthernet1/2/0/10
port link-mode bridge
description TO-GZ1_NEW_ AS01_GE1/2/0/7
port link-type trunk
undo port trunk permit vlan 1
port trunk permit vlan 100 to 399 500 to 598 656 679 697 to 699
port link-aggregation group 30
#
return
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]quit
[GZ1_OP_NEW_AS01]interface Ten-GigabitEthernet 2/2/0/10
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]dis this
#
interface Ten-GigabitEthernet2/2/0/10
port link-mode bridge
description TO-GZ1_NEW_ AS01_GE2/2/0/7
port link-type trunk
undo port trunk permit vlan 1
port trunk permit vlan 100 to 399 500 to 598 656 679 697 to 699
port link-aggregation group 30
#
return
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]un
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]undo shu
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]undo shutdown
Interface Ten-GigabitEthernet2/2/0/10 is not shut down
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]
#Sep 11 23:26:28:432 2019 GZ1_OP_NEW_AS01 IFNET/4/INTERFACE UPDOWN:
Trap 1.3.6.1.6.3.1.1.5.4<linkUp>: Interface 171704329 is Up, ifAdminStatus is 1, ifOperStatus is 1
%Sep 11 23:26:28:452 2019 GZ1_OP_NEW_AS01 IFNET/3/LINK_UPDOWN: Ten-GigabitEthernet2/2/0/10 link status is UP. //undo shutdown 2/2/0/10口,端口UP,新核心那2/2/0/11口也会被选中
现场继续在OP_NEW_AS01设备上查看聚合组30,发现2/2/0/10未选中,理由很简单:新核心侧是静态聚合,所以OP_NEW_AS01侧只有一个默认选中端口
[GZ1_OP_NEW_AS01]display link-aggregation verbose Bridge-Aggregation 30
Loadsharing Type: Shar -- Loadsharing, NonS -- Non-Loadsharing
Port Status: S -- Selected, U -- Unselected
Flags: A -- LACP_Activity, B -- LACP_Timeout, C -- Aggregation,
D -- Synchronization, E -- Collecting, F -- Distributing,
G -- Defaulted, H -- Expired
Aggregation Interface: Bridge-Aggregation30
Aggregation Mode: Dynamic
Loadsharing Type: Shar
System ID: 0x8000, b0f9-634b-2220
Local:
Port Status Priority Oper-Key Flag
--------------------------------------------------------------------------------
XGE1/2/0/10 S 32768 1 {ACDEFG}
XGE2/2/0/10 U 32768 1 {ACG}
Remote:
Actor Partner Priority Oper-Key SystemID Flag
--------------------------------------------------------------------------------
XGE1/2/0/10 0 32768 0 0x8000, 0000-0000-0000 {DEF}
XGE2/2/0/10 0 32768 0 0x8000, 0000-0000-0000 {DEF}
下面 操作步骤就是环路出现的开始:现场在OP_NEW_AS01设备上把1/2/0/10口退出聚合组30,2/2/0/10变成聚合30的默认选中端口,这时候环路已经出现
[GZ1_OP_NEW_AS01]int Ten-GigabitEthernet1/2/0/10
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]un
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]dis this
#
interface Ten-GigabitEthernet1/2/0/10
port link-mode bridge
description TO-GZ1_NEW_ AS01_GE1/2/0/7
port link-type trunk
undo port trunk permit vlan 1
port trunk permit vlan 100 to 399 500 to 598 656 679 697 to 699
port link-aggregation group 30
#
return
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]un
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]undo port lin
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]undo port link-a
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]undo port link-aggregation gr
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]undo port link-aggregation group ?
<cr>
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet1/2/0/10]undo port link-aggregation group // 1/2/0/10这里退出聚合30
#Sep 11 23:27:27:374 2019 GZ1_OP_NEW_AS01 LAGG/1/AggPortRecoverActive:
Trap 1.3.6.1.4.1.2011.5.25.25.2.4<hwAggPortActiveNotification>: Aggregation Group 30: port member 171704329 becomes ACTIVE!
%Sep 11 23:27:27:397 2019 GZ1_OP_NEW_AS01 LAGG/5/LAGG_ACTIVE: Member port Ten-GigabitEthernet2/2/0/10 of aggregation group BAGG30 becomes ACTIVE. // 2/2/0/10口立马变成默认选中端口
此种状态下,从新核心过来的广播报文从1/2/0/10进来,就会从聚合组30出去,因为这两口目前是两个不同的逻辑口
后现场又把2/2/0/10口退出聚合组30,所以,1/2/0/10、2/2/0/10和新核心间又成环路
interface Ten-GigabitEthernet2/2/0/10
port link-mode bridge
description TO-GZ1_NEW_ AS01_GE2/2/0/7
port link-type trunk
undo port trunk permit vlan 1
port trunk permit vlan 100 to 399 500 to 598 656 679 697 to 699
port link-aggregation group 30
#
return
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]un
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]undo po
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]undo port li
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]undo port link-a
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]undo port link-aggregation gr
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]undo port link-aggregation group
[GZ1_OP_NEW_AS01-Ten-GigabitEthernet2/2/0/10]
#Sep 11 23:27:58:423 2019 GZ1_OP_NEW_AS01 IFNET/4/INTERFACE UPDOWN:
这就是OP_NEW_AS01设备上会存在聚合组30和其成员口 以及成员口间也有漂移记录的原因
所以,综合来看,此次新核心接入旧核心导致业务出异常的问题是现场误操作出环导致旧核心下行接入设备出现MAC漂移,最终影响业务
建议在现网配置聚合口的过程中,不要随意把成员端口踢出聚合组,保证无环路的前提下可进行操作
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作