设备:7606
版本7178
12月22日20:00左右,跃华机房的S7606设备发生掉电,设备重新上电后,RRPP网络存在异常,出现环路和mac漂移现象;之后手动down掉3/0/1端口消除环路后,发现昌州花园机房S7606设备(与故障设备的2/0/2口互连)无法学习vlan3354下的MAC。之后经远程定位,确认是跃华机房S7606设备出现异常掉电重启,触发2槽位单板底层广播掩码表错误,最终导致了业务异常。最终现场通过重启2号槽恢复业务。
1)
[CQYC_YH_S7606-probe] bcm s 4 c 0 l2/show/ext/vlan=3354-------(有些125x设备命令是bcm s 4 c 0 l2/show/vlan=3354,现场时XGS tcam芯片,要增加ext)
mac=6c:59:40:c0:fc:55 vlan=3354 GPORT=0x0 Trunk=0 SHit Group=Learnt
mac=8c:f2:28:73:71:7d vlan=3354 GPORT=0x0 Trunk=0 SHit Group=Learnt
mac=b0:df:c1:de:e3:18 vlan=3354 GPORT=0x0 Trunk=0 SHit Group=Learnt
mac=d4:ee:07:19:a0:57 vlan=3354 GPORT=0x0 Trunk=0 SHit Group=Learnt
mac=c8:3a:35:00:45:e8 vlan=3354 GPORT=0x0 Trunk=0 SHit Group=Learnt
mac=d0:76:e7:18:64:38 vlan=3354 GPORT=0x0 Trunk=0 SHit Group=Learnt
mac=cc:08:fb:2f:45:ee vlan=3354 GPORT=0x0 Trunk=0 SHit Group=Learnt
mac=14:69:a2:20:a5:8c vlan=3354 GPORT=0x0 Trunk=0 SHit Group=Learnt
mac=54:04:a6:9c:78:49 vlan=3354 GPORT=0x0 Trunk=0 SHit Group=Learnt
[CQYC_CZHY_S7606-probe]bcm s 3 c 0 l2/show/ext/vlan=3354-----确认底层mac就没有学习
--- 0 mac address(es) found ---
2)
[CQYC_YH_S7606-acl-ethernetframe-4000]dis this
#
acl number 4000
rule 0 permit source-mac 006b-8e2a-485f ffff-ffff-ffff
#
[CQYC_YH_S7606-probe]dis qos policy interface g 4/0/23
Interface: GigabitEthernet4/0/23
Direction: Inbound
Policy: lt
Classifier: lt
Operator: AND
Rule(s) :
If-match service-vlan-id 3354
If-match acl 4000
Behavior: lt
Accounting enable:
105 (Packets)
Interface: GigabitEthernet4/0/23
Direction: Outbound
Policy: lt
Classifier: lt
Operator: AND
Rule(s) :
If-match service-vlan-id 3354
If-match acl 4000
Behavior: lt
Accounting enable:
0 (Packets)
[CQYC_YH_S7606-acl-ethernetframe-4000]dis qos policy interface t 2/0/2
Interface: Ten-GigabitEthernet2/0/2
Direction: Inbound
Policy: lt
Classifier: lt
Operator: AND
Rule(s) :
If-match service-vlan-id 3354
If-match acl 4000
Behavior: lt
Accounting enable:
0 (Packets)
Interface: Ten-GigabitEthernet2/0/2
Direction: Outbound
Policy: lt
Classifier: lt
Operator: AND
Rule(s) :
If-match service-vlan-id 3354
If-match acl 4000
Behavior: lt
Accounting enable:
0 (Packets)
3)
[CQYC_YH_S7606-probe]debug qacl packet pattern smac 006b-8e2a-485f ffff-ffff-ffff slot 2 c 0
[CQYC_YH_S7606-probe] debug qacl packet action statistics slot 2 c 0
[CQYC_YH_S7606-probe]debug qacl packet control slot 2 c 0 28 in
[CQYC_YH_S7606-probe]debug qacl packet control slot 2 c 0 29 in
[CQYC_YH_S7606-probe]debug qacl packet control slot 2 c 0 30 in
[CQYC_YH_S7606-probe]debug qacl packet control slot 2 c 0 31 in
[CQYC_YH_S7606-probe]debug qacl show packet pattern s 2 c 0 28 in
========
Acl-Type Statistics based PktPattern, Stage IFP, SinglePort, Installed, Active
Prio Mjr/Sub 512/3, Group 6 [6], Slice/Idx 4/8, Entry 99, Double: 2056/2568
Rule Match --------
Ports: 0x10000000; 0xfc004007
Source mac: 006B-8E2A-485F, FFFF-FFFF-FFFF
Actions --------
Account mode packets, green and non-green
Accounting: Hi 49, LO 0
[CQYC_YH_S7606-probe]debug qacl show packet pattern s 2 c 0 29 out
========
Acl-Type Statistics based PktPattern, Stage EFP, SinglePort, Installed, Active
Prio Mjr/Sub 256/3, Group 7 [7], Slice/Idx 1/1, Entry 96, Single: 257
Rule Match --------
Out Port: 29
Source mac: 006B-8E2A-485F, FFFF-FFFF-FFFF
Actions --------
Account mode packets, green and non-green
Accounting: Hi 0, LO 0
4)
[CQYC_YH_S7606-probe]bcm s 2 c 0 getreg/bcast_block_mask_64
BCAST_BLOCK_MASK_64.cpu0[0x11a00042]=0xe0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xe0000001,BLK_BITMAP=0xe0000001>
BCAST_BLOCK_MASK_64.ge0[0x11a01042]=0xe0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xe0000001,BLK_BITMAP=0xe0000001>
BCAST_BLOCK_MASK_64.xe0[0x11a02042]=0xe0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xe0000001,BLK_BITMAP=0xe0000001>
BCAST_BLOCK_MASK_64.xe1[0x11a0e042]=0xe0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xe0000001,BLK_BITMAP=0xe0000001>
BCAST_BLOCK_MASK_64.xe2[0x11a1a042]=0xe0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xe0000001,BLK_BITMAP=0xe0000001>
BCAST_BLOCK_MASK_64.xe3[0x11a1b042]=0xe0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xe0000001,BLK_BITMAP=0xe0000001>
BCAST_BLOCK_MASK_64.hg0[0x11a1c042]=0xfc004007: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xfc004007,BLK_BITMAP=0xfc004007>
BCAST_BLOCK_MASK_64.hg1[0x11a1d042]=0xfc004007: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xfc004007,BLK_BITMAP=0xfc004007>
BCAST_BLOCK_MASK_64.hg2[0x11a1e042]=0xc0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xc0000001,BLK_BITMAP=0xc0000001>
BCAST_BLOCK_MASK_64.hg3[0x11a1f042]=0xc0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xc0000001,BLK_BITMAP=0xc0000001>
[CQYC_YH_S7606-probe]bcm s 2 c 0 pbmp/0xfc004007
0x000000000000000000000000fc004007 ==> cpu,ge,xe,hg
[CQYC_YH_S7606-probe]bcm s 3 c 0 getreg/bcast_block_mask
BCAST_BLOCK_MASK_64.cpu0[0x11a00042]=0xd0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xd0000001,BLK_BITMAP=0xd0000001>
BCAST_BLOCK_MASK_64.ge0[0x11a01042]=0xd0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xd0000001,BLK_BITMAP=0xd0000001>
BCAST_BLOCK_MASK_64.xe0[0x11a02042]=0xd0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xd0000001,BLK_BITMAP=0xd0000001>
BCAST_BLOCK_MASK_64.xe1[0x11a0e042]=0xd0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xd0000001,BLK_BITMAP=0xd0000001>
BCAST_BLOCK_MASK_64.xe2[0x11a1a042]=0xd0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xd0000001,BLK_BITMAP=0xd0000001>
BCAST_BLOCK_MASK_64.xe3[0x11a1b042]=0xd0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xd0000001,BLK_BITMAP=0xd0000001>
BCAST_BLOCK_MASK_64.hg0[0x11a1c042]=0xf0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xf0000001,BLK_BITMAP=0xf0000001>
BCAST_BLOCK_MASK_64.hg1[0x11a1d042]=0xf0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xf0000001,BLK_BITMAP=0xf0000001>
BCAST_BLOCK_MASK_64.hg2[0x11a1e042]=0xc0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xc0000001,BLK_BITMAP=0xc0000001>
BCAST_BLOCK_MASK_64.hg3[0x11a1f042]=0xc0000001: <BLK_BITMAP_1=0,
BLK_BITMAP_0=0xc0000001,BLK_BITMAP=0xc0000001>
[CQYC_YH_S7606-probe]bcm s 2 c 0 pbmp/0xf0000001
0x000000000000000000000000f0000001 ==> cpu,hg
2号槽的hg0和hg1与1号槽主控板互联;当前1号槽是主用主控板,为主网板;广播报文会走主网板,即从4号槽进来的广播报文会经过1号槽主控板转发到2槽位。由于2槽位与1槽位的互联hg广播掩码表错误,导致广播报文不通。而vlan 3354里的流量都是广播流量,因此对端设备(昌州花园S7606设备)无法收到流量,即导致无法学习到vlan 3354下的mac;而RRPP协议报文也是组播报文,因此也会导致RRPP协议异常,引起环路。广播掩码表只影响广播/组播报文,单播流量可以正常转发,因此只有部分VLAN受影响。
5)
6) 查看设备的日志记录,在2号槽位即将normal的时候,现场设备进行了一次主备倒换。操作不规范,研发实验室也按这个操作复现了问题。
(独立运行模式)
重启主用主控板时,如果备用主控板不存在,会重启整个系统;如果备用主控板存在并稳定运行,会引起主备倒换。当系统中有单板处于非稳定状态时,请不要使用reboot命令来触发主备倒换,以免影响系统和单板的运行。可使用display system stable state命令来显示系统的稳定状态。
(IRF模式)
重启全局主用主控板时,如果全局备用主控板不存在,会重启整个IRF;如果全局备用主控板存在并稳定运行,会引起主备倒换。当系统中有单板处于非稳定状态时,请不要使用reboot命令来触发主备倒换,以免影响IRF和单板的运行。可使用display system stable state命令来显示IRF的稳定状态。
%@1434%Dec 22 19:47:45:292 2019 CQYC_YH_S7606 DEV/3/BOARD_REMOVED: Board was removed from slot 0, type is LSQM3MPUB3.
%@1435%Dec 22 19:47:45:295 2019 CQYC_YH_S7606 HA/5/HA_STANDBY_TO_MASTER: Standby board in slot 1 changed to master.
%@1436%Dec 22 19:47:46:751 2019 CQYC_YH_S7606 DEV/5/BOARD_STATE_NORMAL: Board state changed to Normal on slot 2, type is LSQ1TGX4SD.
%@1437%Dec 22 19:47:49:809 2019 CQYC_YH_S7606 DEV/5/BOARD_STATE_NORMAL: Board state changed to Normal on slot 5, type is LSQ1GP24TSD.
%@1438%Dec 22 19:47:51:257 2019 CQYC_YH_S7606 DEV/5/BOARD_STATE_NORMAL: Board state changed to Normal on slot 4, type is LSQ1GP48SD.
由于现场设备异常掉电,触发了2槽位单板底层广播掩码表异常,导致经1槽位转发的广播报文无法从面板口出去,即无法转发给对端设备。而RRPP协议报文也是组播报文,也会受该表现影响,因此RRPP协议会不通,导致出现环路。广播掩码表只影响广播流量,单播流量可正常转发,因此只有部分vlan受影响。
1、
2、
3、在进行主备倒换时,一定要等所有板卡都状态稳定后切换,不然就有可能出现部分表项错误的情况。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作