组网拓扑:
如图,SDN强控方案组网,S5560X-EI和S6800分别作为VXLAN网络中VTEP设备,underlay直连使用 interface VLAN 10互连,原始拓扑图中左边的VTEP并不是S5560X-EI,而是S6800。S5560X-EI替换S6800作为VTEP设备,配置完全一样,后发现overlay网络不通,一步一步排查发现两个VTEP直连的underlay不通。
1、定位到直连不通,那相对来说问题定位就比较简单,首先明确两端是否都有对端的ARP报文。现场查看表项发现,S6800有表项S5560X-EI的ARP表项,但是S5560X-EI确没有S6800的ARP表项,直接原因已经找:S5560X-EI上没有ARP表项。
2、现场在S5660X-EI上debug发现,有发送的ARP请求,但是并没有回应的ARP,但是通在S6800上收集debug发现,S6800收到了对端的ARP请求并且也回了ARP应答。两侧的debug信息对比:
S5660X-EI的debug:
<5560-2071>ping 10.32.1.17
Ping 10.32.1.17 (10.32.1.17): 56 data bytes, press CTRL_C to break
*Jan 6 05:04:46:685 2013 5560-2071 ARP/7/ARP_SEND: Sent an ARP message, operation: 1, sender MAC: ac74-0939-8715, sender IP: 10.32.1.16, target MAC: 0000-0000-0000, target IP: 10.32.1.17
*Jan 6 05:04:47:824 2013 5560-2071 ARP/7/ARP_SEND: Sent an ARP message, operation: 1, sender MAC: ac74-0939-8715, sender IP: 10.32.1.16, target MAC: 0000-0000-0000, target IP: 10.32.1.17
Request time out
*Jan 6 05:04:48:855 2013 5560-2071 ARP/7/ARP_SEND: Sent an ARP message, operation: 1, sender MAC: ac74-0939-8715, sender IP: 10.32.1.16, target MAC: 0000-0000-0000, target IP: 10.32.1.17
Request time out
Request time out
Request time out
Request time out
S6800的debug:
<S6800>*Dec 22 10:33:54:593 2017 S6800 ARP/7/ARP_SEND: Sent an ARP message, operation: 1, sender MAC: 7e20-70a6-0202, sender IP: 10.32.1.17, target MAC: 0000-0000-0000, target IP: 10.32.1.16
*Dec 22 10:33:54:598 2017 S6800 ARP/7/ARP_RCV: Received an ARP message, operation: 2, sender MAC: 7e20-6ef9-0102, sender IP: 10.32.1.16, target MAC: 7e20-70a6-0202, target IP: 10.32.1.17
*Dec 22 10:34:09:221 2017 S6800 ARP/7/ARP_RCV: Received an ARP message, operation: 1, sender MAC: 7e20-6ef9-0102, sender IP: 10.32.1.16, target MAC: 0000-0000-0000, target IP: 10.32.1.17
*Dec 22 10:34:09:222 2017 S6800 ARP/7/ARP_SEND: Sent an ARP message, operation: 2, sender MAC: 7e20-70a6-0202, sender IP: 10.32.1.17, target MAC: 7e20-6ef9-0102, target IP: 10.32.1.16
*Dec 22 10:34:19:848 2017 S6800 ARP/7/ARP_RCV: Received an ARP message, operation: 1, sender MAC: 7e20-6ef9-0102, sender IP: 10.32.1.16, target MAC: 0000-0000-0000, target IP: 10.32.1.17
*Dec 22 10:34:19:848 2017 S6800 ARP/7/ARP_SEND: Sent an ARP message, operation: 2, sender MAC: 7e20-70a6-0202, sender IP: 10.32.1.17, target MAC: 7e20-6ef9-0102, target IP: 10.32.1.16
*Dec 22 10:34:27:681 2017 S6800 ARP/7/ARP_RCV: Received an ARP message, operation: 1, sender MAC: 7e20-6ef9-0102, sender IP: 10.32.1.16, target MAC: 0000-0000-0000, target IP: 10.32.1.17
*Dec 22 10:34:27:681 2017 S6800 ARP/7/ARP_SEND: Sent an ARP message, operation: 2, sender MAC: 7e20-70a6-0202, sender IP: 10.32.1.17, target MAC: 7e20-6ef9-0102, target IP: 10.32.1.16
*Dec 22 10:35:01:721 2017 S6800 ARP/7/ARP_RCV: Received an ARP message, operation: 1, sender MAC: 7e20-6ef9-0102, sender IP: 10.32.1.16, target MAC: 0000-0000-0000, target IP: 10.32.1.17
*Dec 22 10:35:01:721 2017 S6800 ARP/7/ARP_SEND: Sent an ARP message, operation: 2, sender MAC: 7e20-70a6-0202, sender IP: 10.32.1.17, target MAC: 7e20-6ef9-0102, target IP: 10.32.1.16
对比debug信息发现,S6800侧确认回应了ARP应答,但是6800侧未收到,抓包也发现S6800侧的ARP报文从驱动发出,那么问题就在S5560X-EI侧。
3、怀疑S6800回应的ARP报文被S5560X-EI给丢弃了,但是在S5560X-EI侧show/c查看并不存芯片丢包,因为是强控方案,所以怀疑上送CPU的ARP也匹配了流表上送了控制器。让现场在控制器上抓包发现确实如此,S6800回复的ARP应答被上送到了控制器。
4、咨询研发得知,S5560X-EI和S6800实现不同。即使是上送CPU的ARP报文也会被上送到控制器,即下发上送控制器的ACL优先级比上送CPU的优先级要高,S6800实现相反。
可以在控制器上配置互联的VLAN为带内管理VLAN,这样该VLAN内的所有报文只会按照普通流程去处理。
后续大家在使用S5560X-EI替换S6800的时候,需要注意把underlay互联的VLAN配置为带内管理VLAN,避免出现割接失败的问题。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作