无
现场只有一台本体无法上线,对比上行设备接口的配置和其他本体没有区别,本体与ac能ping通,上线过程中状态到C后一会就变回Idle
在ac和ap上同时debug有如下信息:
ac上debug:
*Aug 27 16:19:18:307 2020 WX3508H CWS/7/ERROR: Failed to match change state event request with SeqNum 3 from IP address 172.25.200.49:32077
*Aug 27 16:19:18:309 2020 WX3508H CWS/7/ERROR: Failed to match change state event request with SeqNum 3 from IP address 172.25.200.49:32077
ap上debug
*Aug 27 16:17:51:969 2020 WT-9F CWC/7/RCV_PKT: Assembled configuration response with SeqNum 2 from AC at 172.25.200.254:5246.
*Aug 27 16:19:17:742 2020 WT-9F CWC/7/FSM: Enter Data Check state.
*Aug 27 16:19:17:744 2020 WT-9F CWC/7/SND_PKT: Sent change state event request with SeqNum 3 to AC 172.25.200.254:5246.
*Aug 27 16:19:17:745 2020 WT-9F CWC/7/SND_PKT: Verbose info for change state event request sent to AC 172.25.200.254:5246, length=24. 00 10 02 00 00 00 00 00 00 00 00 0B 03 00 0B 00 00 21 00 04 00 00 00 00 .
*Aug 27 16:19:17:745 2020 WT-9F CWC/7/TMR: [Port 5] MaxDiscovery Interval timer expired.
*Aug 27 16:19:17:746 2020 WT-9F CWC/7/TMR: [Port 17] MaxDiscovery Interval timer expired.
*Aug 27 16:19:17:746 2020 WT-9F CWC/7/TMR: [Port 23] MaxDiscovery Interval timer expired.
ac上debug信息显示,在和ap交互change(SeqNum 3)报文时出了问题,再结合ap的debug发现当ap和ac交互完config(SeqNum 2)报文后,到发送change请求用了1分26秒,判断是本体上处理慢。
在远程过程中还发现当再本体上看display wlan ap 的命令时会卡住,判断是apmgr进程异常,可以通过在本体上dis process name apmgrlited查看状态。查看后发现很多线程是dead状态。
TID LAST_CPU Stack PRI State HH:MM:SS:MESC Name
127 0 84K 120 D 0:33:32:640 apmgrlited
128 0 84K 120 D 0:35:5:990 apmgrlited
129 0 84K 120 D 0:0:0:100 apmgrlited
线程是dead状态,99%都是因为被内核挂住,内核有问题,可以查一下probe视图view /proc/secondary_log_buf
通过查看 secondary log buffer 里的内容如下,可以看到有Poe硬件故障。
<0>[ 13.159000] 0:please shutdown wtu port
<0>[ 14.377000] 0:please shutdown wtu port
<0>[ 15.592000] 0:please shutdown wtu port
<0>[ 15.592000] 0:ERROR poe <{FILE}> 2382 DRV_POE_KeepLive vcpu(0): Failed to send POE command!
<0>[ 21.807000] 0:ERROR poe <{FILE}> 846 DRV_POE_UART_Parse vcpu(0): Time out or error happened when getting semaphore RtnS!
<0>[ 23.021000] 0:please shutdown wtu port
所以最终定位问题原因为:Poe硬件故障导致内核工作不正常,进一步导致apmgr工作不正常,所以apmgr报文间隔特别大,ac上的信息表示报文超时了。
更换备件处理。
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作