某电力客户SR6608路由器LDP DOWN导致业务中断的问题
一 组网:
二 问题描述:
2014年01月19日下午两点左右,某电力公司省调接入网YA到调度网B的LDP会话都中断,后又自动恢复,导致业务中断。
三 过程分析:
通过对收集的省调接入网YA(SR6616)路由器诊断信息和日志文件的分析,省调接入网YA到调度网B的LDP会话都中断应该和OSPF协议的震荡有直接关系,现对收集的相关信息分析过程如下:
从调度网A路由器(SR88)侧的信息看,OSPF由于hello报文超时,状态从Full to Down。另外,A路由器 MP的多个成员都有类似错包,也说明链路存在故障:
【调度网A路由器】
%@5814270%Nov 19 14:34:36:841 2014 SCSD-SR8808-2 LDP/5/LDP_SESSION_DOWN: Session(10.51.160.191:0, public instance)'s state changed to down.
%@5814807%Nov 19 14:35:07:651 2014 SCSD-SR8808-2 OSPF/6/OSPF_LAST_NBR_DOWN: OSPF 1 Last neighbor down event: Router ID: 10.51.160.191 Local address: 51.3.2.253 Remote address: 51.3.2.254 Reason: DeadInterval timer expired.
%@5814808%Nov 19 14:35:07:651 2014 SCSD-SR8808-2 OSPF/5/OSPF_NBR_CHG: OSPF 1 Neighbor 51.3.2.254(Mp-group8/1/7) from Full to Down.
Serial8/1/9/5:0 current state: UP
Line protocol current state: UP
Description: SD2-YA-12M-5
The Maximum Transmit Unit is 1500, Hold timer is 10(sec)
Derived from Cpos8/1/9 e1 5, Unframed mode, Baudrate is 2048000 bps
Internet protocol processing : disabled
Link layer protocol is PPP
LCP opened, MP opened
CRC type is 16-bit
Last 300 seconds input: 493 packets/sec, 67172 bytes/sec
Last 300 seconds output: 22 packets/sec, 3509 bytes/sec
Input(total): 1295326873 packets, 175818040253 bytes
Input(Bad): 367 Abort, 2015 FCS-Error, 16181 FIFO-Abort, 0 Giant, 0 Runt //MP的多个成员都有类似错包
Output(total): 65180908 packets, 9692811391 bytes
Output(Bad): 0 Abort
Peak value of input: 69288 bytes/sec, at 2014-11-19 14:42:21
Peak value of output: 5859 bytes/sec, at 2014-11-19 08:22:54
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 2 TU12 1 Alarm SLM-P recover! Start Time : 2014-11-19 14:34:23:183! //历史记录出现过大量物理层告警
%@5814059%Nov 19 14:34:29:698 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 2 TU12 2 Alarm RDI-P recover! Start Time : 2014-11-19 14:34:27:589!
%@5814060%Nov 19 14:34:29:698 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 2 TU12 2 Alarm RFI-P recover! Start Time : 2014-11-19 14:34:27:589!
%@5814061%Nov 19 14:34:29:699 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 2 TU12 2 Alarm SLM-P recover! Start Time : 2014-11-19 14:34:23:197!
%@5814062%Nov 19 14:34:29:699 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 2 TU12 2 Alarm STU-P recover! Start Time : 2014-11-19 14:34:27:589!
%@5814063%Nov 19 14:34:29:699 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 3 TU12 2 Alarm AIS-P recover! Start Time : 2014-11-19 14:34:27:594!
%@5814064%Nov 19 14:34:29:699 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 3 TU12 2 Alarm RDI-P recover! Start Time : 2014-11-19 14:34:27:594!
%@5814065%Nov 19 14:34:29:700 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 3 TU12 2 Alarm RFI-P recover! Start Time : 2014-11-19 14:34:27:594!
%@5814066%Nov 19 14:34:29:700 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 3 TU12 2 Alarm SLM-P recover! Start Time : 2014-11-19 14:34:27:594!
%@5814067%Nov 19 14:34:29:700 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 3 TU12 2 Alarm STU-P recover! Start Time : 2014-11-19 14:34:27:594!
%@5814068%Nov 19 14:34:29:700 2014 SCSD-SR8808-2 CPET/4/CPET_LOG_WARN: -Slot=8;
Cpos8/1/9 : VC4 1 TUG-3 1 TUG-2 2 TU12 3 Alarm STU-P recover! Start Time : 2014-11-19 14:34:27:607!
省调接入网YA路由器侧OSPF、LDP状态也相应的发生了变化,同时,PPP MP中的两个成员有较多错包:
【省调接入网YA路由器】
%Nov 19 14:34:36:948 2014 SCYA-SR6608 LDP/5/LDP_SESSION_DOWN: Session(10.51.160.2:0, public instance)'s state changed to down.
%Nov 19 14:35:09:134 2014 SCYA-SR6608 OSPF/5/OSPF_NBR_CHG: OSPF 1 Neighbor 51.3.2.253(Mp-group2/2/2) from Full to Init.
Serial2/2/6 current state: UP
Line protocol current state: UP
Description: YA-SD2-12M-5
The Maximum Transmit Unit is 1500, Hold timer is 10(sec)
Physical layer is E1-F, baudrate is 2048000 bps
fe1 unframed
Internet protocol processing : disabled
Link layer protocol is PPP
LCP opened, MP opened
Output queue : (Urgent queuing : Size/Length/Discards) 0/100/0
Output queue : (Protocol queuing : Size/Length/Discards) 0/500/0
Output queue : (FIFO queuing : Size/Length/Discards) 0/75/0
Last clearing of counters: 11:29:15 beijing Fri 08/22/2014
Input error packet detect: Initial
Last 300 seconds input rate 3615.38 bytes/sec, 28923 bits/sec, 22.66 packets/sec
Last 300 seconds output rate 67073.62 bytes/sec, 536589 bits/sec, 492.83 packets/sec
Input: 78813214 packets, 12012254407 bytes, 0 no buffers
0 broadcasts, 0 multicasts
30349 errors, 0 runts, 0 giants //PPP MP中的两个成员有较多错包
7139 CRC, 0 align errors, 0 overruns
0 dribbles, 0 aborts, 23210 frame errors
Output:4890671910 packets, 662691731210 bytes
0 errors, 0 underruns, 0 collisions
0 deferred
Serial2/2/7 current state: UP
Line protocol current state: UP
Description: YA-SD2-12M-6
The Maximum Transmit Unit is 1500, Hold timer is 10(sec)
Physical layer is E1-F, baudrate is 2048000 bps
fe1 unframed
Internet protocol processing : disabled
Link layer protocol is PPP
LCP opened, MP opened
Output queue : (Urgent queuing : Size/Length/Discards) 0/100/0
Output queue : (Protocol queuing : Size/Length/Discards) 0/500/0
Output queue : (FIFO queuing : Size/Length/Discards) 0/75/0
Last clearing of counters: 11:29:15 beijing Fri 08/22/2014
Input error packet detect: Initial
Last 300 seconds input rate 64382.47 bytes/sec, 515059 bits/sec, 675.21 packets/sec
Last 300 seconds output rate 67079.15 bytes/sec, 536633 bits/sec, 492.83 packets/sec
Input: 5565360083 packets, 526773064473 bytes, 0 no buffers
0 broadcasts, 0 multicasts
37500 errors, 0 runts, 0 giants
14375 CRC, 0 align errors, 0 overruns
0 dribbles, 0 aborts, 23125 frame errors
Output:4890558191 packets, 662677212205 bytes
0 errors, 0 underruns, 0 collisions
0 deferred
从已有的信息综合分析,YA与省调B之间协议震荡是线路质量原因导致。
调度网A路由器,OSPF由于hello报文超时,状态从Full to Down,说明此时OSPF不能收到对端发送的ospf hello报文。省调接入网YA路由器OSPF状态from Full to Init,说明此时YA路由器还可以收到对端发送的hello报文,只是该hello报文里面没有YA的ID信息(因为省调度A路由器收不到YA的hello报文)。
四 解决方法:
路由器的协议频繁震荡是由于链路异常导致的,影响了客户的业务,同时,会造成设备无谓的消耗。因此,建议排查链路质量,避免频繁震荡,从根源上解决问题。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作