SR66SR66-X系列路由器VRRP公网地址不通
导致业务中断经验案例
一、 组网:
二、 问题描述:
客户总部原先网络出口为一台SR6604作为IPSEC网关,还有一台Juniper设备作为NAT出口,客户现在新增一台设备SR6604路由器,希望与原SR6604做VRRP,并且将NAT配置移到SR6604上。总部通过电信接口与分支网点建立IPSEC连接,供网店与总部互访。
变更完成后,现场测试业务一切正常,但是过了半个小时左右发现IPSEC业务中断,客户从公网ping 电信VRRP虚地址100.0.0.100不通,ping SR6604-1电信实地址100.0.0.1也不通,但是ping SR6604-2电信实地址通,ping SR6604-1电信联通实地址通。客户通过联通地址登陆设备后,shutdown/undo shutdown电信接口后,业务恢复。此问题当天出现了四次,最终客户将业务回退到了Juniper设备上。
事后收集了两台设备的诊断信息以及logfile信息,通过分析操作记录以及logfile信息,定位除了原因所在。
三、 过程分析:
查看logfile文件,查找shutdown/undo shutdown记录如下:
%@50230%Oct 18 23:20:02:415 2014 SR6604-2 SHELL/6/SHELL_CMD: -Task=vt0-IPAddr=10.1.3.100-User=admin; Command is shutdown
%@50244%Oct 18 23:20:04:832 2014 SR6604-2 SHELL/6/SHELL_CMD: -Task=vt0-IPAddr=10.1.3.100-User=admin; Command is undo shutdown
%@50696%Oct 18 23:48:55:099 2014 SR6604-2 SHELL/6/SHELL_CMD: -Task=vt0-IPAddr=10.1.3.100-User=admin; Command is shutdown
%@50709%Oct 18 23:48:59:021 2014 SR6604-2 SHELL/6/SHELL_CMD: -Task=vt0-IPAddr=10.1.3.100-User=admin; Command is undo shutdown
%@50967%Oct 19 00:10:34:302 2014 SR6604-2 SHELL/6/SHELL_CMD: -Task=vt0-IPAddr=10.1.3.100-User=admin; Command is shutdown
%@50972%Oct 19 00:10:37:161 2014 SR6604-2 SHELL/6/SHELL_CMD: -Task=vt0-IPAddr=10.1.3.100-User=admin; Command is undo shutdown
%@51465%Oct 19 00:54:29:030 2014 SR6604-2 SHELL/6/SHELL_CMD: -Task=vt0-IPAddr=115.234.19.178-User=admin; Command is shutdown
%@51470%Oct 19 00:54:30:850 2014 SR6604-2 SHELL/6/SHELL_CMD: -Task=vt0-IPAddr=115.234.19.178-User=admin; Command is undo shutdown
查看shut/undo sh时间出现的异常,找到四次默认路由被删除的记录:
%@50164%Oct 18 23:15:08:613 2014 SR6604-2 RM/3/RMLOG: The default route has been changed or deleted, protocol is Static, nexthop address is 100.0.0.101, output interface is GigabitEthernet3/1/7
%@50605%Oct 18 23:40:32:944 2014 SR6604-2 RM/3/RMLOG: The default route has been changed or deleted, protocol is Static, nexthop address is 100.0.0.101, output interface is GigabitEthernet3/1/7
%@50950%Oct 19 00:09:16:225 2014 SR6604-2 RM/3/RMLOG: The default route has been changed or deleted, protocol is Static, nexthop address is 100.0.0.101, output interface is GigabitEthernet3/1/7
%@51446%Oct 19 00:50:58:856 2014 SR6604-2 RM/3/RMLOG: The default route has been changed or deleted, protocol is Static, nexthop address is 100.0.0.101, output interface is GigabitEthernet3/1/7
查看默认路由配置信息:
ip route-static 0.0.0.0 0.0.0.0 100.0.0.101 track 1
发现该默认路由与track 1配置了联动:
nqa entry dianxin 1
type icmp-echo
destination ip 100.0.0.101
frequency 10000
next-hop 100.0.0.101
probe count 5
probe timeout 1000
reaction 1 checked-element probe-fail threshold-type consecutive 5 action-type trigger-only
由此推论,默认路由被删的原因是因为nqa检测失败。
由此我们可以推断出问题原因所在,因为某种原因(链路质量)NQA检测失败了,然后删除了该默认路由。此时次优的联通默认路由便生效了:
ip route-static 0.0.0.0 0.0.0.0 200.0.0.201 track 2 preference 80
此时从公网ping SR6604-1电信接口地址以及VRRP虚地址,回复报文查找路由表从联通接口转发出去,这样就导致转发不通。因为此时联通默认路由生效,所以从公网ping联通地址可达。SR6604-2因为NQA检测没有失败,所以默认路由依然是电信的默认路由,所以ping SR6604-2的电信接口实地址可达。
当客户回退后,链路质量变好,未出现过默认路由被删除的现象,所以问题没有复现。
四、 解决方法:
建议客户排查电信出口链路或者交换机问题。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作