无
多台S6800突然与下联服务器建立OSPF邻居,突然出现OSPF邻居震荡,然后后自动恢复。
%Aug
27 10:41:52:326 2019 SH-LG-0203-J16-H6800QTH1-XGWW-01 OSPF/5/OSPF_NBR_CHG: OSPF
4 Neighbor 9.18.77.172(Vlan-interface203) changed from FULL
to EXSTART.
%Aug
27 10:41:52:327 2019 SH-LG-0203-J16-H6800QTH1-XGWW-01 OSPF/5/OSPF_NBR_CHG: OSPF
4 Neighbor 9.18.77.164(Vlan-interface203) changed from FULL
to EXSTART.
%Aug
27 10:41:52:328 2019 SH-LG-0203-J16-H6800QTH1-XGWW-01 OSPF/5/OSPF_NBR_CHG: OSPF
4 Neighbor 9.18.77.144(Vlan-interface203) changed from FULL
to EXSTART.
%Aug
27 10:41:52:329 2019 SH-LG-0203-J16-H6800QTH1-XGWW-01 OSPF/5/OSPF_NBR_CHG: OSPF
4 Neighbor 9.18.77.181(Vlan-interface203) changed from FULL
to EXSTART.
%Aug
27 10:41:55:809 2019 SH-LG-0203-J16-H6800QTH1-XGWW-01 OSPF/5/OSPF_NBR_CHG: OSPF
4 Neighbor 9.18.77.168(Vlan-interface203) changed from LOADING
to FULL.
%Aug
27 10:41:55:835 2019 SH-LG-0203-J16-H6800QTH1-XGWW-01 OSPF/5/OSPF_NBR_CHG: OSPF
4 Neighbor 9.18.77.139(Vlan-interface203) changed from LOADING
to FULL.
%Aug
27 10:41:55:835 2019 SH-LG-0203-J16-H6800QTH1-XGWW-01 OSPF/5/OSPF_NBR_CHG: OSPF
4 Neighbor 9.18.77.156(Vlan-interface203) changed from LOADING
to FULL.
查看OSPF DOWN的原因是由于ospf hello报文超时,因此首先看softcar,发现OSPF报文上CPU的较多,但是并没有ospf报文超限速丢弃的情况。
%Jul 27 21:48:51:104 2019 SH-LG-0203-K14-H6800QTH1-XGWW-01 OSPF/6/OSPF_LAST_NBR_DOWN: OSPF 4 Last neighbor down event: Router ID: 9.18.80.172 Local address: 9.18.77.129 Remote address: 9.18.77.172 Reason: DeadInterval timer expired.
%Jul 27 21:48:51:104 2019 SH-LG-0203-K14-H6800QTH1-XGWW-01 OSPF/6/OSPF_LAST_NBR_DOWN: OSPF 4 Last neighbor down event: Router ID: 9.18.80.172 Local address: 9.18.77.129 Remote address: 9.18.77.172 Reason: DeadInterval timer expired.
%Jul 27 21:48:51:104 2019 SH-LG-0203-K14-H6800QTH1-XGWW-01 OSPF/6/OSPF_LAST_NBR_DOWN: OSPF 4 Last neighbor down event: Router ID: 9.18.80.172 Local address: 9.18.77.129 Remote address: 9.18.77.172 Reason: DeadInterval timer expired.
11 IPV4_MC_OSPF_5 871 -690389413 0 2000 S On SMAC 8
12 IPV4_MC_OSPF_6 48 35222692 0 2000 S On SMAC 8
2、查看show/c发现有上CPU 硬件35队列丢包的情况。 通过如下命令确定系统ACL是将 OSPF报文 通过 CPU队列35 上送的。 如下show/c和pw里都能看到35 队列在丢包。
[SH-LG-0203-J16-H6800QTH1-XGWW-01-probe]debug qacl show slot 1 chip 0 verbose 0 sysidx 11
========
Acl-Type RX IPv4 Middle High, Stage IFP, Pipe 0, Global, Installed, Active
Prio Mjr/Sub 523/23, Group 16 [16], Slice/Idx 10/20, Entry 79, Double: 14356/15380
Rule Match --------
Ports:
0x0000000000000000000000000000000000000000000001fffffffffffffffffe
0x0000000000000000000000000000000000000200000001ffffffffffffffffff
Lookup: VLAN ID valid[y], STP forwarding, 0x1c, 0x1c
Dest IP: 224.0.0.5, 255.255.255.255
IP protocol: ospf
Vlan Class id: 0x0 Mask: 0x20
Actions --------
CAR cir 0x7d0, cbs 0x800, pir 0x7d0, pbs 0x800, mode srTCM color blind,Packets
Account mode packets, green and non-green
Copy_to_cpu : Yes
Change CPU pkt COS 35 //将该OSPF报文送到35队列上传CPU。
Permit
Red Deny
Red_Copy_to_cpu : No
Yel Deny
Yel_Copy_to_cpu : No
MatchedName:11, IPV4_MC_OSPF_5
Remark COS 5, pri 5
RateValue: 0
Accounting: Hi 1083, LO 0
MCQ_DROP_PKT(35).cpu0: 918,978 +16,507 98/s
MCQ_DROP_BYTE(35).cpu0: 105,162,904 +1,787,742 10,035/s
3、设备上有如下四个协议共用硬件队列35,35队列的的限速也是2000PPS, 其他几项类型的报文也有收包计数,这几项上来的报文一起有瞬间超2000PPS的情况。
11 IPV4_MC_OSPF_5 871 -690389413 0 2000 S On SMAC 8
12 IPV4_MC_OSPF_6 48 35222692 0 2000 S On SMAC 8
13 IPV4_UC_OSPF 0 0 0 2000 S On SMAC 8
73 IPV4_UCOSPF_TTL 0 428919 0 2000 S On SMAC 8
===============bcm slot 1 chip 0 pw===============
Queue 35: PPS 2000. CurPkts 0. TotPkts -644864102. Disc rate 731, qlen 0.
%Aug 27 11:19:07:350 2019 SH-LG-0203-J16-H6800QTH1-XGWW-01 OSPF/5/OSPF_NBR_CHG: OSPF 4 Neighbor 9.18.77.171(Vlan-interface203) changed from FULL to DOWN.
%Aug 27 11:19:27:287 2019 SH-LG-0203-J16-H6800QTH1-XGWW-01 OSPF/5/OSPF_NBR_CHG: OSPF 4 Neighbor 9.18.77.171(Vlan-interface203) changed from LOADING to FULL.
经过排查是LD服务器OSPF中配置了大量的network业务网段,没有过滤业务网段的ospf hello报文,导致大量的ospf hello报文发给S6800,导致设备超过硬件35队列阈值导致丢包,把有用的协议ospf hello报文丢弃,出现交换机到LD的ospf 震荡,目前在LD上已经进行限制。
注:交换机有三个地方的协议限速,第一,硬件CAR限速,对于每个协议的单独限速情况,第二,硬件到CPU的队列限速,(由于6800硬件到CPU的队列只有48个,而协议不止48个,因此存在多个协议共用一个队列的情况),第三,软件softcar限速,对于报文上送到CPU后的限速。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作