现网组网如下:Bas – S125 – OLT – 终端
设备下挂终端反馈组播业务阻断,125和bas上核对设备配置均正确,bas上查看组播流量正常向125转发,因此在125上进行流统定位丢包位置;但在125进行流统时,发现新建acl后,命令行卡顿无反应,反复尝试后均卡顿无法进入acl视图,此时查看设备CPU和内存等信息处于正常范围内;收集设备诊断信息时,当诊断信息回显至display interface下挂终端全部收不到组播流量,看设备上组播没有表象,acl资源也是充足的cpu和内存也正常。
排查发现94进程一直居高不下,而与之关联的进程338,发现有大量的驱动ipmc组播栈调用动作,导致ACl挂死,进一步排查发现,chassis 2 slot 6有异常parity error报错,且未修复,导致LEM表项发生异常,而组播报文新增表项会下发到LEM表中,进而导致组播报文转发异常,针对parity error报错未自动修复的问题,家里需要搭建环境复现,建议客户侧重启chassis 2 slot 6来修复这个parity error报错。
[TZ-TT-DSW-1.MAN.H3C12510F-probe]monitor thread ch 2 s 6
201 processes; 219 threads
Thread states: 5 running, 214 sleeping, 0 stopped, 0 zombie
CPU states: 67.75% idle, 0.70% user, 29.07% kernel, 2.48% interrupt
Memory: 4012M total, 1839M available, page size 16K
JID TID LAST_CPU PRI State HH:MM:SS MAX CPU Name
94 94 3 123 R 88:28:04 129 23.91% [bcmDPC]
1 1 3 120 S 00:02:52 14 2.22% scmd
158 158 3 123 R 587h 1 1.29% [bLK0]
159 159 2 123 D 578h 24 1.29% [bLK1]
211 213 2 120 S 03:07:15 1 0.73% ifmgr
1971270214 2 120 R 00:00:00 1 0.55% diagd
152 152 2 100 D 87:21:06 1 0.36% [bRX3]
188 188 0 139 S 117h 0 0.36% [dbfd_2]
189 189 1 120 S 156h 0 0.36% [dbfd_rcv]
109 109 2 105 S 32:51:18 1 0.18% [dport_omcd]
[TZ-TT-DSW-1.MAN.H3C12510F-probe]bcm ch 2 s 6 c 0 mutex
Err=0 MutexUsed=3484 mutex_block_num=4 task_block_num=1
*****************************************************************
BlkPID BlkPName BlkPri OwnPID OwnPName OwnPri MutexCnt MutexName
71 NULL 0 113 bTM_v_get123 2 BCM Petra Field unit lock
91 NULL 0 94 bcmDPC 123 2 soc_sand_os_mutex_create
100 NULL 0 113 bTM_v_get123 2 dpp cache interlock
113 NULL 0 94 bcmDPC 123 2 soc_sand_os_mutex_create
116 NULL 0 94 bcmDPC 123 2 soc_sand_os_mutex_create
123 NULL 0 94 bcmDPC 123 2 soc_sand_os_mutex_create
147 NULL 0 113 bTM_v_get123 2 BCM Petra Field unit lock
338 NULL 0 94 bcmDPC 123 2 soc_sand_os_mutex_create
347 NULL 0 352 karp/1 115 2 _dpp_l3_unit_lock
352 NULL 0 94 bcmDPC 123 2 soc_sand_os_mutex_create
891 NULL 0 94 bcmDPC 123 2 soc_sand_os_mutex_create
[TZ-TT-DSW-1.MAN.H3C12510F-probe]follow process 338 c 2 s 6
Attaching to process 338 (mcsd)
Iteration 1 of 5
------------------------------
Thread (LWP 338):
Switch counts: 267524872
User stack:
User stack deliberately skipped. Reason: Thread state=D.
Kernel stack:
[<ffffffff804ad9f0>] schedule+0x710/0x1050
[<ffffffffc3e92ba0>] mutex_lock_replacer+0x120/0x1e0 [system]
[<ffffffffc3e93234>] drv_sal_mutex_take+0x5d4/0x7f0 [system]
[<ffffffffc4e7ea34>] soc_sand_os_mutex_take+0x24/0x40 [system]
[<ffffffffc4eaf4b0>] soc_sand_take_chip_descriptor_mutex+0x190/0x270 [system]
[<ffffffffc4ef2dc4>] soc_ppd_frwrd_ipv4_mc_route_get+0x214/0x380 [system]
[<ffffffffc6308ac4>] _bcm_ppd_frwrd_ipv4_mc_route_find+0x244/0x440 [system]
[<ffffffffc6309130>] bcm_petra_ipmc_find+0x470/0x4d0 [system]
[<ffffffffc63097e8>] _bcm_ppd_frwrd_ipv4_mc_route_remove+0x208/0x470 [system]
[<ffffffffc630b0e0>] bcm_petra_ipmc_remove+0x4a0/0x500 [system]
[<ffffffffc31303dc>] drv_ipmc_bcm_del_l3entry+0x1dc/0x400 [system]
[<ffffffffc313abe0>] drv_ipmc_del_l2_group+0x150/0x4c0 [system]
[<ffffffffc3a91584>] drv_ipv4mc_del_l2_group+0xc4/0x170 [system]
[<ffffffffc3126a70>] DRV_IPV4MC_HandleMRouteChange+0xd20/0xe00 [system]
[<ffffffffc78b19ac>] M3DRV4_L2EntryToDrv+0x4c/0x80 [system]
[<ffffffffc78b2924>] M3DRV4_IOCTL_IPMsgToDrv+0x234/0x6d0 [system]
[<ffffffffc78d07e0>] L2MIOCTL_ProcIPMsgToDrv+0xa0/0x1c0 [system]
[<ffffffffc78d18cc>] MFIB_L2MIOCTL_CallBack+0x2bc/0x9d0 [system]
[<ffffffffc76d3d9c>] CIOCTL_Doit+0xfc/0x2b0 [system]
[<ffffffff80202f44>] stack_done_ra+0x0/0x1c
[TZ-TT-DSW-1.MAN.H3C12510F-probe]dis parity-error c 2 s 6
Jun 09 2022 17:48:53:430147:unit 1:name=IHP_ParityErrInt, id=468, index=0, block=0, unit=1, recurring_action=0 | nof_occurences=0001, cnt_overflowf=0x0, memory address=0x005a84d6 memory=IHP_MEM_590000, mem_id=3943, array element=3, index=1238 | EM Soft Recovery
Error count 1. First logged at Jun 09 2022 17:48:53:430147.
[TZ-TT-DSW-1.MAN.H3C12510F-probe]view /proc/94/stack ch 2 s 6
[<ffffffff804ad9f0>] schedule+0x710/0x1050
[<ffffffffc3e92ba0>] mutex_lock_replacer+0x120/0x1e0 [system]
[<ffffffffc3e93234>] drv_sal_mutex_take+0x5d4/0x7f0 [system]
[<ffffffffc3e98774>] sal_config_get+0x114/0x170 [system]
[<ffffffffc3e53d24>] soc_property_get_str+0x214/0x520 [system]
[<ffffffffc3e58f3c>] soc_property_get+0x1c/0x50 [system]
[<ffffffffc45d6fe0>] arad_pp_frwrd_mact_is_dma_supported+0xa0/0x170 [system]
[<ffffffffc45862e0>] arad_pp_lem_access_parse_only+0xd0/0x12a0 [system]
[<ffffffffc45876f0>] arad_pp_lem_access_parse+0x240/0x300 [system]
[<ffffffffe3d2b018>] _arad_pp_frwrd_lem_get_block_unsafe+0xa68/0x17b0 [ksplice_lpu_xlp_1617106360_system_new]
[<ffffffffe3d2bff8>] _arad_pp_frwrd_mact_get_block_unsafe+0x298/0x470 [ksplice_lpu_xlp_1617106360_system_new]
[<ffffffffe3d2c2e4>] arad_pp_frwrd_mact_get_block_unsafe+0x114/0x210 [ksplice_lpu_xlp_1617106360_system_new]
[<ffffffffe3ca259c>] soc_ppd_frwrd_mact_get_block+0x39c/0x400 [ksplice_lpu_xlp_1617106360_system_new]
[<ffffffffe3d30684>] h3c_arad_pp_
重启故障单板后问题解决
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作