某石油客户SR6608备用主控板CPU利用率高的问题
一 组网:
二 问题描述:
2014年01月28日晚上八点多,某石油客户反映,SR6608上的备用主控单板CPU的FIB任务占用率很高。 我司办事处工程师第一时间赶到了现场,收集了故障设备slot0上备用主控板的诊断信息和CPU FIB任务的相关信息。通过收集的信息分析, CPU的SCAR任务利用率也比较高,一直维持在14%左右。后来,又多次收集备用主控板的FIB任务和SCAR任务当前运行信息,反馈我司研发分析
三 过程分析:
通过查看SR6608路由器备用主控上的CPU FIB和SCAR任务运行调用栈,FIB任务在接收主用主控同步的FIB表项并下刷FIB表项。SCAR任务是一个定时器任务,每次触发时,执行下刷软件限速设置。
再次查看诊断信息中发现,slot 0备用主控板的CPU FIB和SCAR任务一直都有error信息记录:
===============================================================
===============error information on slot 0===============
===============================================================
===== Error report info (no: 0 idx: 42) =====
Error Report Slot: 0
Error Report Task: FIB (TID: 87)
Error Report Time: 2014-11-28 20:20:46
Error Report Tick: 0x418b(CPU Tick High) 0x3911e6aa(CPU Tick Low)
Error Module Id: 0x20000300
Error Code: 0x9d
Call Stack:
StackBase ReturnIns FuncEntry
0641de18 02e41c7c ffffffff UnKnown! MON_ReportError_Inner
0641de78 02e2499c ffffffff UnKnown! MON_ReportError
0641de88 02e04b34 ffffffff UnKnown! MEM_MallocType
0641dfe8 02e05268 ffffffff UnKnown! MEM_Malloc
0641dff8 014d0710 ffffffff UnKnown! DP_DRV_IPv4_HandleRouteChange
0641e048 02b69648 ffffffff UnKnown! FIB_DRV_HandleRouteChange
0641e178 02b69eb4 ffffffff UnKnown! FIB_DRV_HandleDrvForRTMSG
0641e1a8 02b6046c ffffffff UnKnown! FIB_MSG_Refresh
0641e298 02b6065c ffffffff UnKnown! FIB_MSG_RtMsgProc
0641e2a8 02b607d4 ffffffff UnKnown! FIB_Msg_PackMsgProc
0641e2d8 02b60ab4 ffffffff UnKnown! FIB_IPCQue_Read
0641e338 02b623bc ffffffff UnKnown! FIB_TaskMain
===== Error report info (no: 1 idx: 41) =====
Error Report Slot: 0
Error Report Task: SCAR (TID: 27)
Error Report Time: 2014-11-28 20:20:45
Error Report Tick: 0x418b(CPU Tick High) 0x35e2aef5(CPU Tick Low)
Error Module Id: 0x20000300
Error Code: 0x9d
Call Stack:
StackBase ReturnIns FuncEntry
06669c68 02e41c7c ffffffff UnKnown! MON_ReportError_Inner
06669cc8 02e2499c ffffffff UnKnown! MON_ReportError
06669cd8 02e04b34 ffffffff UnKnown! MEM_MallocType
06669e38 02e05268 ffffffff UnKnown! MEM_Malloc
06669e48 0069b0f8 ffffffff UnKnown! Drv_Qacl_Sal_Rx_Acl_Create_Rule_LoopTest
06669e68 006b8c30 ffffffff UnKnown! Drv_Qacl_Sal_Rx_Create_Parse_Rule
06669ed8 006b9c00 ffffffff UnKnown! Drv_Qacl_Sal_Rx_Create_EntryLst
06669f58 006bb174 ffffffff UnKnown! Drv_Qacl_Sal_Rx_IsUnderAttack
06669fc8 00632574 ffffffff UnKnown! DRV_QACL_RX_Is_Under_Attack
06669fd8 0070d69c ffffffff UnKnown! drv_rxtx_acl_is_under_attack
0666a038 00714334 ffffffff UnKnown! Drv_RxTx_Port_LimitRate_byACL
0666a088 00714774 ffffffff UnKnown! Drv_RxTx_Port_LimitRate
0666a0a8 0071484c ffffffff UnKnown! Drv_SoftCar_Monitor
通过对上面error信息调用栈的翻译发现:CPU FIB任务下刷FIB表项时,申请内存一直失败,导致下刷FIB表项失败。因此,FIB任务一直在不停的重新下刷FIB表项,导致任务占CPU高;SCAR定时任务内存申请也是失败的,导致创建规则失败。定时器每次都会继续尝试创建规则,因此CPU占用率也比较高。
再次查看诊断信息中异常记录,slot0备用主控板连续多次出现内存相关的故障,记录了连续多次不同的异常现象。下面摘取前两个信息:第一个是访问正常的内存地址导致异常,第二个是释放内存时导致了异常:
==================================================================
===============display exception information slot 0===============
=================================================================
================= exception info (no: 0) =================
Exception Number: 0x300
Exception Name: DATA ACCESS EXCEPTION
Exception Instruction: 0x02E217D0
Exception Slot: 0
Exception VCpu: 0
Exception Task: TICK (TID: 4)
Exception Stack Base: 0x29e008
Exception Stack Top: 0x29df18
Exception Time: 2014-11-12 03:56:36
Exception Tick: 0x1bcd(CPU Tick High) 0xb215ad81(CPU Tick Low)
Register contents:
Reg: link[0], Val = 0x00000000 ; Reg: link[1], Val = 0x00000000 ;
Reg: link[2], Val = 0x00000000 ; Reg: link[3], Val = 0x00000000 ;
Reg: r0, Val = 0x91410064 ; Reg: r1, Val = 0x0029df18 ;
Reg: r2, Val = 0x00205b00 ; Reg: r3, Val = 0x056dcb44 ;
Reg: r4, Val = 0x052b0000 ; Reg: r5, Val = 0x00000001 ;
Reg: r6, Val = 0x3b7df8c0 ; Reg: r7, Val = 0x00000809 ;
Reg: r8, Val = 0x3b7df8bc ; Reg: r9, Val = 0x91410064 ;
Reg: r10, Val = 0xfffffc00 ; Reg: r11, Val = 0x04080000 ;
Reg: r12, Val = 0x0029df68 ; Reg: r13, Val = 0x002334a8 ;
Reg: r14, Val = 0x3eae5700 ; Reg: r15, Val = 0x3e990000 ;
Reg: r16, Val = 0x02e40000 ; Reg: r17, Val = 0x04920000 ;
Reg: r18, Val = 0x02e23f90 ; Reg: r19, Val = 0x052b0000 ;
Reg: r20, Val = 0x000004c0 ; Reg: r21, Val = 0x00000001 ;
Reg: r22, Val = 0x02e230d8 ; Reg: r23, Val = 0x00010004 ;
Reg: r24, Val = 0x02e1a19c ; Reg: r25, Val = 0x00000000 ;
Reg: r26, Val = 0x02e20000 ; Reg: r27, Val = 0x02e23f90 ;
Reg: r28, Val = 0x052b0000 ; Reg: r29, Val = 0x00000000 ;
Reg: r30, Val = 0x043a0000 ; Reg: r31, Val = 0x056dcb44 ;
Reg: cr, Val = 0x42004048 ; Reg: bar, Val = 0x02e24020 ;
Reg: xer, Val = 0x20000000 ; Reg: lr, Val = 0x02e2260c ;
Reg: ctr, Val = 0x02e2163c ; Reg: srr0, Val = 0x02e217d0 ;
Reg: srr1, Val = 0x00021200 ; Reg: dar, Val = 0x91410064 ;
Reg: dsisr, Val = 0x00800000 ; Reg: vector, Val = 0x00000300 ;
Dump stack(total 512Bytes,16Bytes/line):
0x0029df18: 00 29 df 58 02 e2 24 ec 3b 7e 1e 14 06 72 18 00
0x0029df28: 00 00 00 00 00 00 00 b3 00 ba 4f 24 00 00 00 00
0x0029df38: 00 02 92 00 06 72 18 00 00 00 00 00 00 00 00 00
0x0029df48: 00 29 df 58 00 00 00 00 02 e2 3f f0 00 00 00 00
0x0029df58: 00 29 df a8 02 e2 2c 6c 00 00 00 02 00 00 00 00
0x0029df68: 05 6d ab 64 02 e2 3f f0 3b 7e 1e 10 00 00 00 00
0x0029df78: 00 29 df 88 00 1c e6 34 22 00 40 42 00 be 95 7c
0x0029df88: 00 29 df 98 02 e2 40 20 05 9c 00 00 02 e2 3f 90
0x0029df98: 05 2b 00 00 05 6e 00 00 05 2b 00 00 02 e2 3f f0
0x0029dfa8: 00 29 df d8 02 e2 30 c0 00 02 92 00 00 01 00 04
0x0029dfb8: 02 e1 a1 9c 02 e2 2e f0 02 e2 49 60 04 e1 00 00
0x0029dfc8: 02 e0 d0 6c 00 00 80 00 05 2b 00 00 00 00 04 c0
0x0029dfd8: 00 29 e0 08 02 e2 31 b4 00 00 80 00 00 01 00 04
0x0029dfe8: 00 29 df f8 00 00 00 00 02 e2 00 00 05 2b 00 00
0x0029dff8: 00 00 00 04 02 e2 3a 8c 00 00 04 c0 02 e2 30 d8
0x0029e008: 00 29 e0 38 02 e1 a3 28 00 00 01 00 00 01 00 04
0x0029e018: 02 e1 a1 9c 40 00 40 22 06 71 fd bc 02 e1 a1 9c
0x0029e028: 00 29 40 60 00 01 00 04 06 71 fd 98 06 72 18 00
0x0029e038: 00 29 e0 58 00 1c aa 7c 06 71 fd bc 00 00 a0 00
0x0029e048: 00 29 40 60 00 00 00 00 06 71 fd 98 06 72 18 00
0x0029e058: 00 00 00 00 00 1c be 3c 00 00 00 00 00 00 00 00
0x0029e068: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e078: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e088: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e098: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e0a8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e0b8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e0c8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e0d8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e0e8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e0f8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0029e108: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Print layers of stack when exception:
Stack Top = 0x0029df18, Return Address = 0x02e217d0 drvAddTimerToHash
Stack Top = 0x0029df58, Return Address = 0x02e22c6c TIME_TickTrigReltimer
Stack Top = 0x0029dfa8, Return Address = 0x02e230c0 TimeEvent
Stack Top = 0x0029dfd8, Return Address = 0x02e231b4 TIME_TickTaskEntry
Stack Top = 0x0029e008, Return Address = 0x02e1a328 tskAllTaskEntry
================= exception info (no: 1) =================
Exception Number: 0x200
Exception Name: MACHINE CHECK EXCEPTION
Exception Instruction: 0x014CD898
Exception Slot: 0
Exception VCpu: 0
Exception Task: FIB (TID: 87)
Exception Stack Base: 0x641e368
Exception Stack Top: 0x641dfb8
Exception Time: 2014-11-05 02:04:30
Exception Tick: 0xae3(CPU Tick High) 0x82004ad7(CPU Tick Low)
Register contents:
Reg: link[0], Val = 0x00000000 ; Reg: link[1], Val = 0x80000000 ;
Reg: link[2], Val = 0x00000000 ; Reg: link[3], Val = 0x00000000 ;
Reg: r0, Val = 0xb697a234 ; Reg: r1, Val = 0x0641dfb8 ;
Reg: r2, Val = 0x00205b00 ; Reg: r3, Val = 0x00000001 ;
Reg: r4, Val = 0x00000000 ; Reg: r5, Val = 0x2ec85594 ;
Reg: r6, Val = 0x00000001 ; Reg: r7, Val = 0x0641e018 ;
Reg: r8, Val = 0x0000007f ; Reg: r9, Val = 0xe0010d82 ;
Reg: r10, Val = 0x00000000 ; Reg: r11, Val = 0x05680000 ;
Reg: r12, Val = 0x42004028 ; Reg: r13, Val = 0x002334a8 ;
Reg: r14, Val = 0x3eae5700 ; Reg: r15, Val = 0x0000000f ;
Reg: r16, Val = 0x2ea9d050 ; Reg: r17, Val = 0x00000000 ;
Reg: r18, Val = 0x029cb8c4 ; Reg: r19, Val = 0x0641e1d8 ;
Reg: r20, Val = 0x078b9468 ; Reg: r21, Val = 0x0840eab0 ;
Reg: r22, Val = 0x094c7410 ; Reg: r23, Val = 0x00000009 ;
Reg: r24, Val = 0x00000001 ; Reg: r25, Val = 0x2ea9d050 ;
Reg: r26, Val = 0x014d0000 ; Reg: r27, Val = 0x0641e180 ;
Reg: r28, Val = 0x014cd84c ; Reg: r29, Val = 0x052b0000 ;
Reg: r30, Val = 0x00000000 ; Reg: r31, Val = 0x3697a1cc ;
Reg: cr, Val = 0x42004028 ; Reg: bar, Val = 0x00000000 ;
Reg: xer, Val = 0x20000000 ; Reg: lr, Val = 0x014d0548 ;
Reg: ctr, Val = 0x014ce38c ; Reg: srr0, Val = 0x014cd898 ;
Reg: srr1, Val = 0x00029200 ; Reg: dar, Val = 0xfdfff501 ;
Reg: dsisr, Val = 0x00800000 ; Reg: vector, Val = 0x00000200 ;
Dump stack(total 512Bytes,16Bytes/line):
0x0641dfb8: 06 41 df c8 06 41 e0 18 00 00 00 00 32 97 a1 cc
0x0641dfc8: 06 41 df f8 01 4d 05 48 09 56 9d 24 00 00 0d ec
0x0641dfd8: 06 41 df e8 2e a9 ba 50 00 00 00 01 2e a9 ba 50
0x0641dfe8: 00 00 00 01 00 00 00 01 00 00 00 00 2e a9 ba 50
0x0641dff8: 06 41 e0 48 01 4d 09 1c 00 00 00 00 05 73 00 00
0x0641e008: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0641e018: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0641e028: 06 41 e0 48 2e a9 ba 50 00 00 00 01 06 41 e1 80
0x0641e038: 2e a9 d0 50 00 00 00 00 00 00 00 00 05 73 00 00
0x0641e048: 06 41 e1 78 02 b6 96 48 00 00 00 00 00 00 00 00
0x0641e058: 06 41 e1 78 02 b6 67 00 ff ff ff ff 00 00 00 00
0x0641e068: ff ff ff ff 00 03 ff ff ff ff ff ff ff ff ff ff
0x0641e078: ff ff ff ff ff ff ff ff 00 00 00 01 00 00 05 dc
0x0641e088: 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff
0x0641e098: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0641e0a8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0641e0b8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0641e0c8: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
0x0641e0d8: 06 41 e1 08 ff ff ff ff 00 00 00 00 ff ff ff ff
0x0641e0e8: 06 41 e0 f8 04 39 a2 2c ff ff ff ff 05 73 00 00
0x0641e0f8: 06 41 e1 08 04 39 a2 2c 00 00 00 00 05 6d 00 00
0x0641e108: 06 41 e1 18 04 39 a2 64 ff ff ff ff ff ff ff ff
0x0641e118: 06 41 e1 48 02 e0 38 a8 00 02 92 00 09 5c b5 40
0x0641e128: 06 41 e1 38 05 bd 03 24 00 00 00 c8 00 00 00 01
0x0641e138: 05 2b 00 00 06 41 e1 84 00 00 67 50 00 00 00 00
0x0641e148: 06 41 e1 68 02 df 2e 48 00 00 00 00 00 00 36 79
0x0641e158: 05 73 00 00 06 41 e1 d8 2e a9 d0 50 00 00 00 01
0x0641e168: 2e a9 ba 50 00 00 00 01 00 00 00 00 00 00 00 01
0x0641e178: 06 41 e1 a8 02 b6 9e b4 00 00 00 00 00 00 00 09
0x0641e188: 07 8b 94 34 05 bd 03 24 07 8b 94 5c 0a c7 3e ae
0x0641e198: 00 00 00 1c 00 00 00 00 ff ff ff ff 02 b6 8e 0c
0x0641e1a8: 06 41 e2 98 02 b6 04 6c 01 41 e1 a0 3b 7e 1f bc
Print layers of stack when exception:
Stack Top = 0x0641dfb8, Return Address = 0x014cd898 DP_FIB_FreeRTEntry
Stack Top = 0x0641dfc8, Return Address = 0x014d0548 DP_RT_ModifyRoute
Stack Top = 0x0641dff8, Return Address = 0x014d091c DP_DRV_IPv4_HandleRouteChange
Stack Top = 0x0641e048, Return Address = 0x02b69648 FIB_DRV_HandleRouteChange
Stack Top = 0x0641e178, Return Address = 0x02b69eb4 FIB_DRV_HandleDrvForRTMSG
Stack Top = 0x0641e1a8, Return Address = 0x02b6046c FIB_MSG_Refresh
Stack Top = 0x0641e298, Return Address = 0x02b6065c FIB_MSG_RtMsgProc
Stack Top = 0x0641e2a8, Return Address = 0x02b607d4 FIB_Msg_PackMsgProc
Stack Top = 0x0641e2d8, Return Address = 0x02b60ab4 FIB_IPCQue_Read
Stack Top = 0x0641e338, Return Address = 0x02b623bc FIB_TaskMain
Stack Top = 0x0641e368, Return Address = 0x02e1a328 tskAllTaskEntry
综上所述,故障的原因是SR6608路由器slot0备用主控板个体硬件故障,导致备用主控上运行的系统访问内存时出现异常,需要更换硬件解决。
四 解决方法:
先将SR6608路由器slot 0的备用主控拔掉,申请到备件后更换硬件。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作