某局点组网规划如上:在区域0内生产中心和同城中心间各两台C6509双核心口字形互连,中心内各业务区设备双归属连接到本中心两台核心;同时通过两台C7609作为ABR与区域N业务互通。
该局点整网大都采用思科设备,日前客户将同城中心一台C6509-3替换为S12510-X,替换下来的C6509-3下挂到同城中心C6509-4上。拓扑图为替换后的显示。互联接口IP及开销如图所示。
设备替换后一段时间现场发现生产中心C6509-1(即图中红色标注设备)去往10.130.225.32/30(ABR C7609-2位于区域1的网段地址,途中红色框表示)的metric值时而52,时而53。
生产中心C6509-1去往10.130.225.32/30网络的路由条目在不考虑优选的情况下,共有四条路径:分别为图中途经同城中心S12510-X、C7609-2,cost值为52的红色路径;另外三条cost值为53的紫色路径,分别途经生产中心C7609-1、C2811;生产中心C6509-2、同城中心C6509-4、C7609-2;同城中心S12510-X、C6509-4、C7609-2,如图所示。
按照路由优选规则C6509-1去往10.130.225.32/30优选cost较小,为52的的红色路径。
DC-CO-CS-C6509-01#show ip rou 10.130.225.32
Routing entry for 10.130.225.32/30
Known via "ospf 100", distance 110, metric 52, type inter area
Last update from 10.0.255.246 on GigabitEthernet2/48, 00:00:04 ago
Routing Descriptor Blocks:
* 10.0.255.246, from 10.8.240.4, 00:00:04 ago, via GigabitEthernet2/48
Route metric is 52, traffic share count is 1
故障时发现,C6509-1去往10.130.225.32/30的路由持续震荡:一会优选cost为52的红色路径,一会变为三条cost为53的等价路由:
DC-CO-CS-C6509-01#show ip rou 10.130.225.32
Routing entry for 10.130.225.32/30
Known via "ospf 100", distance 110, metric 52, type inter area
Last update from 10.0.255.246 on GigabitEthernet2/48, 00:00:04 ago
Routing Descriptor Blocks:
* 10.0.255.246, from 10.8.240.4, 00:00:04 ago, via GigabitEthernet2/48
Route metric is 52, traffic share count is 1
DC-CO-CS-C6509-01#show ip rou 10.130.225.32
Routing entry for 10.130.225.32/30
Known via "ospf 100", distance 110, metric 53, type inter area
Last update from 10.0.255.246 on GigabitEthernet2/48, 00:00:02 ago
Routing Descriptor Blocks:
* 10.0.255.254, from 10.8.240.4, 00:00:02 ago, via Vlan999
Route metric is 53, traffic share count is 1
10.0.255.246, from 10.8.240.4, 00:00:02 ago, via GigabitEthernet2/48
Route metric is 53, traffic share count is 1
10.0.255.138, from 10.0.240.4, 00:00:02 ago, via Vlan912
Route metric is 53, traffic share count is 1
DC-CO-CS-C6509-01#show ip rou 10.130.225.32
Routing entry for 10.130.225.32/30
Known via "ospf 100", distance 110, metric 52, type inter area
Last update from 10.0.255.246 on GigabitEthernet2/48, 00:00:05ago
Routing Descriptor Blocks:
* 10.0.255.246, from 10.8.240.4, 00:00:04 ago, via GigabitEthernet2/48
Route metric is 52, traffic share count is 1
DC-CO-CS-C6509-01#show ip rou 10.130.225.32
Routing entry for 10.130.225.32/30
Known via "ospf 100", distance 110, metric 53, type inter area
Last update from 10.0.255.246 on GigabitEthernet2/48, 00:00:02 ago
Routing Descriptor Blocks:
* 10.0.255.254, from 10.8.240.4, 00:00:02 ago, via Vlan999
Route metric is 53, traffic share count is 1
10.0.255.246, from 10.8.240.4, 00:00:02 ago, via GigabitEthernet2/48
Route metric is 53, traffic share count is 1
10.0.255.138, from 10.0.240.4, 00:00:02 ago, via Vlan912
Route metric is 53, traffic share count is 1
…..
1、因图中所标示的紫色路径3也是途经S12510-X且cost值为53,怀疑是S12510-X与C7609-2互联网段10.8.255.168/30存在链路震荡,导致路由震荡。故障时在S12510-X查看接口状态稳定,没有UP/DOWN、无错包;且S12510-X上去往10.130.225.32/30的下一跳始终为C7609-2,cost值为51。
10.130.225.32/30 51 Inter 10.8.255.170 10.8.240.4 0.0.0.0
S12510-X与C7609-2互联链路震荡可能性排除。
2、故障前后同时在生产中心C6509-1和S12510-X上反复查看OSPF相关信息。
思科:
show ip route 10.130.225.32
show ip ospf database
show ip ospf border-routers
show ip ospf neighbor
show ip ospf interface
S12510-X:
display ospf routing 10.130.225.32
display ospf lsdb
display ospf abr-asbr
display ospf peer
display ospf interface
C6509-1:
C6509-01#Show ip ospf database
OSPF Router with ID (10.0.240.1) (Process ID 100)
Router Link States (Area 0)
Link ID ADV Router Age Seq# Checksum Link count
10.0.240.1 10.0.240.1 1916 0x8000AD9E 0x00137B 7
10.0.240.2 10.0.240.2 623 0x8000C7A6 0x004CEF 6
…..
Net Link States (Area 0)
10.8.255.146 10.8.240.9 249 0x800024EC 0x007801
10.8.255.150 10.8.240.10 1444 0x80000C7A 0x00D03F
10.8.255.169 10.8.240.20 3600 0x800AECC0 0x005B1A
10.8.255.173 10.8.240.2 1070 0x800004D3 0x00494C
10.8.255.177 10.8.240.20 3600 0x800D97F2 0x0090FD
10.8.255.181 10.8.240.2 1070 0x8000106F 0x008F57
10.8.255.250 10.8.240.20 1673 0x8000163D 0x00CDBC
……
C6509-01#Show ip ospf database
OSPF Router with ID (10.0.240.1) (Process ID 100)
Router Link States (Area 0)
Link ID ADV Router Age Seq# Checksum Link count
10.0.240.1 10.0.240.1 1922 0x8000AD9E 0x00137B 7
10.0.240.2 10.0.240.2 630 0x8000C7A6 0x004CEF 6
……
Net Link States (Area 0)
Link ID ADV Router Age Seq# Checksum
10.8.255.146 10.8.240.9 255 0x800024EC 0x007801
10.8.255.150 10.8.240.10 1451 0x80000C7A 0x00D03F
10.8.255.169 10.8.240.20 10 0x800AECC1 0x00591B
10.8.255.173 10.8.240.2 1077 0x800004D3 0x00494C
10.8.255.177 10.8.240.20 11 0x800D97F3 0x008EFE
10.8.255.181 10.8.240.2 1077 0x8000106F 0x008F57
10.8.255.250 10.8.240.20 1680 0x8000163D 0x00CDBC
……
C6509-01#Show ip ospf database
OSPF Router with ID (10.0.240.1) (Process ID 100)
Router Link States (Area 0)
Link ID ADV Router Age Seq# Checksum Link count
10.0.240.1 10.0.240.1 1936 0x8000AD9E 0x00137B 7
10.0.240.2 10.0.240.2 643 0x8000C7A6 0x004CEF 6
……
Net Link States (Area 0)
10.8.255.146 10.8.240.9 269 0x800024EC 0x007801
10.8.255.150 10.8.240.10 1465 0x80000C7A 0x00D03F
10.8.255.169 10.8.240.20 3601 0x800AECC2 0x00571C
10.8.255.173 10.8.240.2 1091 0x800004D3 0x00494C
10.8.255.177 10.8.240.20 3601 0x800D97F4 0x008CFF
10.8.255.181 10.8.240.2 1091 0x8000106F 0x008F57
10.8.255.250 10.8.240.20 1694 0x8000163D 0x00CDBC
……
故障前后在C6509-1上多次输入Show ip ospf database,发现10.8.255.169和10.8.255.177的Network LSA的Age不断在3600和其他较小值之间切换,ADV Router始终为10.8.240.20,而且Sequence字段增加很快。(稳定拓扑下应该是半小时加1,可以参看其他LSA,如10.8.255.146 ,Age逐渐增长,Sequence字段不变。)
Network LSA由DR产生,通过display ospf interface 可以看到S12510-X为10.8.255.168/30和10.8.255.176/30网段的DR,其对应接口地址分别为10.8.255.169、10.8.255.177。S12510-X的Router-id为10.8.240.20。
<S12510-X>display ospf interface
OSPF Process 100 with Router ID 10.8.240.20
Interfaces
Area: 0.0.0.0
IP Address Type State Cost Pri DR BDR
10.0.255.246 Broadcast DROther 1 0 10.0.255.245 0.0.0.0
10.8.255.250 Broadcast DR 50 1 10.8.255.250 10.8.255.249
10.8.240.20 PTP Loopback 0 1 0.0.0.0 0.0.0.0
10.8.255.145 Broadcast BDR 1 1 10.8.255.146 10.8.255.145
10.8.255.177 Broadcast DR 50 1 10.8.255.177 10.8.255.178
10.8.255.169 Broadcast DR 50 1 10.8.255.169 10.8.255.170
10.8.255.201 Broadcast DR 1 1 10.8.255.201 0.0.0.0
<S12510-X)>display ospf lsdb
OSPF Process 100 with Router ID 10.8.240.20
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 10.0.240.10 10.0.240.10 1217 60 8000D5AA 0
Router 10.0.240.4 10.0.240.4 735 60 80009195 0
……
Network 10.0.255.250 10.8.240.2 1784 32 800003F1 0
Network 10.0.255.245 10.0.240.1 1882 32 800001F7 0
Network 10.8.255.173 10.8.240.2 1035 32 800004D3 0
Network 10.8.255.169 10.8.240.20 3 32 800AECBD 0
Network 10.8.255.181 10.8.240.2 1035 32 8000106F 0
Network 10.8.255.177 10.8.240.20 3 32 800D97EF 0
……
<S12510-X)>display ospf lsdb
OSPF Process 100 with Router ID 10.8.240.20
Link State Database
Area: 0.0.0.0
Type LinkState ID AdvRouter Age Len Sequence Metric
Router 10.0.240.10 10.0.240.10 1271 60 8000D5AA 0
Router 10.0.240.4 10.0.240.4 789 60 80009195 0
……
Network 10.0.255.250 10.8.240.2 1838 32 800003F1 0
Network 10.0.255.245 10.0.240.1 1936 32 800001F7 0
Network 10.8.255.173 10.8.240.2 1089 32 800004D3 0
Network 10.8.255.169 10.8.240.20 0 32 800AECC3 0
Network 10.8.255.181 10.8.240.2 1089 32 8000106F 0
Network 10.8.255.177 10.8.240.20 9 32 800D97F4 0
在S12510-X上多次输入display ospf lsdb,发现10.8.255.169的Network LSA的Age一直都很小,AdvRouter始终为10.8.240.20,而且Sequence字段增加很快。
从以上信息可以看到,Network LSA 10.8.255.169 存在震荡:频繁老化、更新,且ADV Router始终为10.8.240.20。怀疑网络某台设备上的地址和DR地址10.8.255.169相同,导致network LSA误被清除和重生成。
Network LSA 10.8.255.177也存在震荡问题,但与问题现象10.130.225.32/30路由计算错误无关,此案例分析以10.8.255.169为例。
网络里震荡的是network LSA,由DR负责产生和清除。网络中同区域某台设备上的地址和DR地址相同时就有可能会导致network LSA误被清除或者重产生。这种冲突分为两种情况:
1)DR与非DR冲突,即网络中同区域有两台设备具有相同IP地址(假设为192.168.0.1),一台设备为此网段DR(假设为RTA),可以产生network LSA;一台设备为非此网段DR(假设为RTC),RTB为中间设备。
RTA(DR 192.168.0.1)----------RTB---------RTC(非DR 192.168.0.1)——
network LSA震荡过程大致如下:
因为只有DR可以产生network LSA。RTA产生network LSA,Link state ID为192.168.0.1,在所在区域泛洪。RTB收到该LSA,进行相应处理并通告给邻居RTC。
RTC收到该LSA后,检查发现该LSA LinkState ID与自身接口IP一致,认为该LSA是自己产生的。但是发布路由器AdvRouter和自己的Router ID不一致,RTC将接收到的LSA中的LS age增加至Maxage,其余不做改变(包括ADV router、Sequence)并重新泛洪,从路由域中清除。
关于自己产生的LSA定义:1)该LSA的发布路由器和自身的Router ID一致,或者2)该LSA为network-LSA,其Link State ID(Type 2 LSA Link State ID为DR的IP地址 )和自身某个接口IP一致。
A self-originated
LSA is detected when either 1) the LSA&#39;s Advertising Router is
equal to the router&#39;s own Router ID or 2) the LSA is a network-
LSA and its Link State ID is equal to one of the router&#39;s own IP
interface addresses.
关于收到自身产生LSA的处理
OSPF设备收到自身产生的LSA,且比当前实例的更新,会重新生成该 LSA,同时序列号增加1。但也有特例:如2类LSA,其LinkState ID和自身某个接口IP一致,但是AdvRouter不是自身,此时会将该LSA的age置为3600,在路由域泛洪。
Receiving self-originated LSAs
However, if the received self-originated LSA is newer than the
last instance that the router actually originated, the router
must take special action. The reception of such an LSA
indicates that there are LSAs in the routing domain that were
originated by the router before the last time it was restarted.
In most cases, the router must then advance the LSA&#39;s LS
sequence number one past the received LS sequence number, and
originate a new instance of the LSA.
It may be the case the router no longer wishes to originate the
received LSA. Possible examples include: 1) the LSA is a
summary-LSA or AS-external-LSA and the router no longer has an
(advertisable) route to the destination, 2) the LSA is a
network-LSA but the router is no longer Designated Router for
the network or 3) the LSA is a network-LSA whose Link State ID
is one of the router&#39;s own IP interface addresses but whose
Advertising Router is not equal to the router&#39;s own Router ID
(this latter case should be rare, and it indicates that the
router&#39;s Router ID has changed since originating the LSA). In
all these cases, instead of updating the LSA, the LSA should be
flushed from the routing domain by incrementing the received
LSA&#39;s LS age to MaxAge and reflooding
如何判断收到LSA是否更新
相同条件下:如同sequence 、checksums,age值为MaxAge会被认为是更新的LSA。
Determining which LSA is newer
An LSA is identified by its LS type, Link State ID and
Advertising Router. For two instances of the same LSA, the LS
sequence number, LS age, and LS checksum fields are used to
determine which instance is more recent:
The LSA having the newer LS sequence number is more recent.
If both instances have the same LS sequence number, then:
If the two instances have different LS checksums, then the
instance having the larger LS checksum (when considered as a
16-bit unsigned integer) is considered more recent.
Else, if only one of the instances has its LS age field set
to MaxAge, the instance of age MaxAge is considered to be
more recent.
Else, if only one of the instances has its LS age field set
to MaxAge, the instance of age MaxAge is considered to be
more recent.
RTB从RTC收到更新(newer)LSA,更新数据库并继续泛洪给RTA 。RTA收到该老化LSA,因为该LSA的发布路由器和自身的Router ID一致,RTA认为该Network LSA是自身产生的,且该LSA比当前数据库中的更新(当sequence 、checksums 都相同的情况下LS age为MaxAge被认为更新),RTA会重新生成一个新的该Network LSA,序列号增加1,在区域内更新。
RTC对该LSA进行老化清除、RTA重新生成。如此反复,造成Network LSA震荡。如果有路由经过该链路,因该Network LSA时有时无,会导致路由计算出错。
此时在RTC上反复输入display ospf lsdb可以看到冲突网段192.168.0.1的Network LSA的Age一直为3600或者偶尔没有这条LSA;AdvRouter为RTA;Sequence字段增加很快。
在中间路由器RTB路由器上反复输入display ospf lsdb可以看到冲突网段Network LSA的Age不断在3600和其他较小值之间切换;AdvRouter为RTA;Sequence字段增加很快。
在冲突网段的DR RTA路由器上反复输入display ospf lsdb可以看到冲突网段Network LSA的Age非自然增长,一直为较小值;AdvRouter为RTA;Sequence字段增加很快。
DR与非DR冲突时根据这条振荡Network LSA的LinkState ID可以知道冲突的IP地址,然后根据AdvRouter可以确定路由发布者,但与其冲突的设备只能够通过网络IP地址规划或区域搜查找到,很难通过OSPF自身携带的信息找到冲突设备。
2)两个DR的IP地址冲突:即网络中同区域有两台设备具有相同IP地址(假设为192.168.0.1),且两台设备都为该网段的DR。如下所示RTA、RTC为冲突的两台设备,RTB为中间设备。
RTA(DR 192.168.0.1)----------RTB---------RTC(DR 192.168.0.1)——
network LSA震荡过程大致如下:
因为只有DR可以产生network LSA。RTA、RTC都可以生成该network LSA,Linkstate ID为192.168.0.1,AdvRouter分别为RTA、RTC。
以RTA发送该network LSA更新为例,RTB收到该LSA,进行相应处理并通告给邻居RTC。
RTC收到该LSA后,检查发现该LSA LinkState ID与自身接口IP一致,认为该LSA是自己产生的。但是发布路由器AdvRouter和自己的Router ID不一致,RTC将该LSA中的LS age增加至Maxage,其余不做改变(包括AdvRouter、Sequence)并重新泛洪,清除该LSA。
RTB从RTC收到更新(newer)LSA,更新数据库并继续将该LSA泛洪给RTA。RTA收到该老化LSA后,因该LSA的发布路由器和自身的Router ID一致,认为该Network LSA是自身产生的,且该接收到的LSA比当前实例的更新(当sequence 、checksums 都相同的情况下LS age为MaxAge被认为更新),RTA会重新生成一个新的该Network LSA,序列号增加1,在区域泛洪。
同理RTC发送的该network LSA更新,到达RTA,会被RTA老化清除,重新泛洪。而RTC收到该老化LSA, RTC会重新生成一个新的该Network LSA,序列号增加1,在区域泛洪。
RTA、RTC互相老化对方产生的Network LSA,各自重新生成。如此反复,网络震荡。
此时在中间路由器RTB上反复输入display ospf lsdb可以看到存在两个LinkState Id为192.168.0.1的Network LSA,AdvRouter分别为RTA和RTC;并且这两个LSA的Age字段一直都很小;Sequence字段增加比较快。
DR与DR冲突时可以根据这两个LinkState ID相同的Network LSA的LinkState ID和AdvRouter判断出是哪台设备的哪个接口IP地址冲突了。
现场情况是Network LSA震荡,但AdvRouter一致。怀疑是某台设备上的地址(非DR)和S12510-X DR地址10.8.255.169和10.8.266.177冲突了。但冲突IP所在设备位置不好判断,只能区域0逐台排查。
由于现场只是做个设备替换操作后出现问题,地址规划不存在冲突,客户对我司设备存在怀疑。先在S12510-X上打印上CPU报文,确认老化该LSA的报文不是由S12510-X发出,排除S12510-X异常的可能性。
3、在S12510-X上打印上送CPU的ospf协议报文
[H3C]probe
[H3C -probe]display rxtx iptype 59 //过滤,只打印ospf报文
[H3C-probe]display rxtx -c 1000 -s 200 pkt slot 0 //接口板
[H3C-probe]display rxtx -c 1000 -s 200 pkt slot 17 //主控板
Debug RxTx packet is on!
[H3C-probe] quit
[H3C] quit
<H3C>t m
The current terminal is enabled to display logs.
<H3C>t d
The current terminal is enabled to display debugging logs.
<S12510-X>*Dec 29 12:35:34:098 2015 S12510-X DRVPLAT/7/RxTxDebug: -MDC=1-Slot=0;
From board 0: received packet from chip0,port41,reason=0x0,cos=30,sMod=0,sPort=41,len=138,Matched=11,time=0,src_vp=-1
*Dec 29 12:35:34:099 2015 S12508-X DRVPLAT/7/RxTxDebug: -MDC=1-Slot=0;
-----------------------------------------------------
0000 01 00 5e 00 00 05 d8 24 bd 91 07 40 81 00 0f ff
0010 08 00 45 c0 00 78 db 12 00 00 01 59 f3 5f 0a 00
0020 ff f5 e0 00 00 05 02 01 00 30 0a 00 f0 01 00 00
0030 00 00 00 00 00 02 00 00 01 10 56 62 89 33 ff ff
0040 ff fc 00 0a 12 01 00 00 00 28 0a 00 ff f5 00 00
0050 00 00 0a 08 f0 14 f5 7a e8 83 4b 82 a9 b7 66 d9
0060 97 8e 2b f6 af ee 00 00 00 09 00 01 00 04 00 00
0070 00 01 00 02 00 14 56 62 89 33 c4 e6 46 29 bd de
0080 d5 a0 36 2b 47 97 20 e0 ad ab
-----------------------------------------------------
*Dec 29 12:35:34:401 2015 S12510-X DRVPLAT/7/RxTxDebug: -MDC=1-Slot=0;
From board 0: transmit packet from chip0,port41,Priority=7,len=98
*Dec 29 12:35:34:401 2015 S12510-X DRVPLAT/7/RxTxDebug: -MDC=1-Slot=0;
-----------------------------------------------------
0000 01 00 5e 00 00 05 70 f9 6d 17 d5 15 08 00 45 c0
0010 00 54 53 72 00 00 01 59 7b 23 0a 00 ff f6 e0 00
0020 00 05 02 01 00 30 0a 08 f0 14 00 00 00 00 00 00
0030 00 02 00 00 01 10 01 75 86 23 ff ff ff fc 00 0a
0040 02 00 00 00 00 28 0a 00 ff f5 00 00 00 00 0a 00
0050 f0 01 cc e8 3c 0d f5 e1 32 bf 30 f0 9c af 1a 3b
0060 a8 e6
-----------------------------------------------------
……
转包分析:
从打印上CPU报文看,S12510-X未对外发送老化该网段的LSA,但频繁收到其他邻居发送的老化该LSA更新。
10.8.255.249老化该LSA的更新:
10.8.255.170老化该LSA的更新:
10.8.255.178老化该LSA的更新:
10.8.255.245老化该LSA的更新:
10.8.255.146老化该LSA的更新:
但Network LSA在区域中传来传去,无法判断谁是该老化 LSA的始发者。还是需要对现场区域0设备的IP地址进行排查
4)现场对区域0 OSPF接口地址进行排查,未发现冲突。将S12510-X与C7609-2的互联接口地址10.8.255.169地址改为192.168.0.1,想通过反证法证明是IP冲突造成LSA震荡,进而造成C6509-1路由计算出错。
测试结果为:现场把10.8.255.169的地址更改后,网络中10.8.255.169 network LSA消失,新配置的192.168.0.1/30 network LSA正常;把10.8.255.169再配置回去,并使10.8.255.169成为DR,发现10.8.255.169的network LSA也不再振荡。而10.8.255.177的LSA始终在振荡。
IP地址改回10.8.255.169也不复现与预想有出入,但从之前的现象看应该还是存在IP冲突,只能对区域0的设备逐台排查。
5、从之前S12510-X上打印的上CPU报文看,多台设备都在发送老化10.8.255.169 LSA,需要对每台发送该老化LSA的设备排查,并排查他们的邻居。排查到10.8.255.249时发现,该设备上存在10.8.255.169和10.8.255.177地址,只是这些地址接口被shutdown。了解到这台设备C6509-3就是之前被S12510-X替换下来的,当前下挂在同城中心C6509-4下。C6509-3还保留10.8.255.169和10.8.255.177地址,只是这些地址接口被shutdown,并在OSPF中去使能了。
理论上该设备10.8.255.169接口已被shutdown,收到S12510-X发出的 10.8.255.169 Network LSA不会老化该LSA。
但排查整个区域0,只有C6509-3这台设备上存在10.8.255.169、10.8.255.177这两个地址。在C6509-3上查看network LSA,发现10.8.255.177 LSA序列号仍在增长,age为3600。
Router#show ip ospf database
OSPF Router with ID (10.8.240.1) (Process ID 1)
……
Net Link States (Area 1)
Link ID ADV Router Age Seq# Checksum
10.8.255.177 10.8.240.20 3600 0x80000013 0x009FB3
10.8.255.169 10.8.240.20 173694 0x8000004 0x00C365
通过现场抓包,确认这台C6509-3当前仍然在对10.8.255.177的network LSA发送老化LSA。
至此问题已经定位:是由于这台cisco设备一直在对10.8.255.169、10.8.255.177 的Network LSA发送老化清除,导致该Network LSA震荡,相关路由计算出错。
但仍有两个问题无法解释:
1、C6509-3与S12510-X确实存在地址冲突,但C6509-3上对应三层接口已DOWN,并在OSPF去使能了,为什么C6509-3还会对该LSA发送老化清除?
2、为什么在S12510-X上将10.8.255.169的地址更改为其他地址,再改回来,重新成为DR后,12510-X上的10.8.255.169不再振荡?
对于问题1通过实验室复现发现在cisco的ios 12.2能复现出类似问题: Cisco上某接口IP与S12510-X DR冲突,LSA振荡已经发生后,此时shutdown cisco上冲突IP对应的三层接口,概率性该Network LSA将持续震荡;将对应网段去使能ospf后,还是会存在LSA震荡;clear ospf进程也无法恢复。
此处概率性主要与shutdown端口的操作时机有关,实验室测试发现在cisco上该Network LSA快发生变化时shutdown端口能够复现。具体机制不清楚,但可以确认是cisco内部实现的bug。
对于问题2 为什么在S12510-X上将10.8.255.169的地址更改为其他地址,再改回来,重新成为DR后,12510-X上的10.8.255.169不再振荡?
通过现场观察发现,S12510-X接口重新修改为10.8.255.169后,C6509-3这台设备上10.8.255.169的LSA还存在,并且也是处于老化状态,只是LSA的age远超过了3600(LSA的age最大就是3600),
12510-x改回原来地址后, C6509-3收到发送过来的10.8.255.169 Network LSA,还是会发送10.8.255.169的老化LSA, age为3600,只是LSA的序列号不变,如下图所示。网络中其他设备收到该LSA,会与数据库中已有LSA比较,发现新收到的LSA序列号小于数据库已有的,丢弃该LSA。所以不会引起全网的LSA振荡。
猜测,改地址中间过了一段时间,引起cisco另外一个异常流程,导致LSA的age出错,把序列号加到age上了,而序列号不再增加。
这个问题本质是思科设备的软件bug,但问题是典型的IP冲突问题,问题处理思路和定位手段值得借鉴。
后续有125-x再上线时,建议把cisco设备下线后,删除相应接口配置及OSPF配置后,再上线。不要让在网设备同时有相同的地址运行。
另外需要注意修改ospf router-id需要重启生效。
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作