Print

某局点S12510-F交换机下挂业务组播流阻断及ACL视图无法进入问题

2022-06-17 发表

问题描述

现网组网如下:Bas – S125 – OLT – 终端

     设备下挂终端反馈组播业务阻断,125bas上核对设备配置均正确,bas上查看组播流量正常向125转发,因此在125上进行流统定位丢包位置;但在125进行流统时,发现新建acl后,命令行卡顿无反应,反复尝试后均卡顿无法进入acl视图,此时查看设备CPU和内存等信息处于正常范围内;收集设备诊断信息时,当诊断信息回显至display interface下挂终端全部收不到组播流量,看设备上组播没有表象,acl资源也是充足的cpu和内存也正常。

过程分析

排查发现94进程一直居高不下,而与之关联的进程338,发现有大量的驱动ipmc组播栈调用动作,导致ACl挂死,进一步排查发现,chassis 2 slot 6有异常parity error报错,且未修复,导致LEM表项发生异常,而组播报文新增表项会下发到LEM表中,进而导致组播报文转发异常,针对parity error报错未自动修复的问题,家里需要搭建环境复现,建议客户侧重启chassis 2 slot 6来修复这个parity error报错。

 

[TZ-TT-DSW-1.MAN.H3C12510F-probe]monitor thread ch 2 s 6

201 processes; 219 threads

Thread states: 5 running, 214 sleeping, 0 stopped, 0 zombie

CPU states: 67.75% idle, 0.70% user, 29.07% kernel, 2.48% interrupt

Memory: 4012M total, 1839M available, page size 16K

    JID    TID  LAST_CPU  PRI  State  HH:MM:SS   MAX    CPU    Name

     94     94      3     123    R    88:28:04   129  23.91%   [bcmDPC]

      1      1      3     120    S    00:02:52    14   2.22%   scmd

    158    158      3     123    R        587h     1   1.29%   [bLK0]

    159    159      2     123    D        578h    24   1.29%   [bLK1]

    211    213      2     120    S    03:07:15     1   0.73%   ifmgr

    1971270214      2     120    R    00:00:00     1   0.55%   diagd

    152    152      2     100    D    87:21:06     1   0.36%   [bRX3]

    188    188      0     139    S        117h     0   0.36%   [dbfd_2]

    189    189      1     120    S        156h     0   0.36%   [dbfd_rcv]

    109    109      2     105    S    32:51:18     1   0.18%   [dport_omcd]

 

[TZ-TT-DSW-1.MAN.H3C12510F-probe]bcm ch 2 s 6 c 0 mutex

Err=0 MutexUsed=3484 mutex_block_num=4 task_block_num=1

*****************************************************************

BlkPID BlkPName BlkPri OwnPID OwnPName OwnPri MutexCnt MutexName

71     NULL     0      113    bTM_v_get123    2        BCM Petra Field unit lock

91     NULL     0      94     bcmDPC   123    2        soc_sand_os_mutex_create

100    NULL     0      113    bTM_v_get123    2        dpp cache interlock

113    NULL     0      94     bcmDPC   123    2        soc_sand_os_mutex_create

116    NULL     0      94     bcmDPC   123    2        soc_sand_os_mutex_create

123    NULL     0      94     bcmDPC   123    2        soc_sand_os_mutex_create

147    NULL     0      113    bTM_v_get123    2        BCM Petra Field unit lock

338    NULL     0      94     bcmDPC   123    2        soc_sand_os_mutex_create

347    NULL     0      352    karp/1   115    2        _dpp_l3_unit_lock

352    NULL     0      94     bcmDPC   123    2        soc_sand_os_mutex_create

891    NULL     0      94     bcmDPC   123    2        soc_sand_os_mutex_create

 

 

[TZ-TT-DSW-1.MAN.H3C12510F-probe]follow process 338 c 2 s 6

Attaching to process 338 (mcsd)

Iteration 1 of 5

------------------------------

Thread (LWP 338):

Switch counts: 267524872

User stack:

  User stack deliberately skipped. Reason: Thread state=D.

Kernel stack:

[<ffffffff804ad9f0>] schedule+0x710/0x1050

[<ffffffffc3e92ba0>] mutex_lock_replacer+0x120/0x1e0 [system]

[<ffffffffc3e93234>] drv_sal_mutex_take+0x5d4/0x7f0 [system]

[<ffffffffc4e7ea34>] soc_sand_os_mutex_take+0x24/0x40 [system]

[<ffffffffc4eaf4b0>] soc_sand_take_chip_descriptor_mutex+0x190/0x270 [system]

[<ffffffffc4ef2dc4>] soc_ppd_frwrd_ipv4_mc_route_get+0x214/0x380 [system]

[<ffffffffc6308ac4>] _bcm_ppd_frwrd_ipv4_mc_route_find+0x244/0x440 [system]

[<ffffffffc6309130>] bcm_petra_ipmc_find+0x470/0x4d0 [system]

[<ffffffffc63097e8>] _bcm_ppd_frwrd_ipv4_mc_route_remove+0x208/0x470 [system]

[<ffffffffc630b0e0>] bcm_petra_ipmc_remove+0x4a0/0x500 [system]

[<ffffffffc31303dc>] drv_ipmc_bcm_del_l3entry+0x1dc/0x400 [system]

[<ffffffffc313abe0>] drv_ipmc_del_l2_group+0x150/0x4c0 [system]

[<ffffffffc3a91584>] drv_ipv4mc_del_l2_group+0xc4/0x170 [system]

[<ffffffffc3126a70>] DRV_IPV4MC_HandleMRouteChange+0xd20/0xe00 [system]

[<ffffffffc78b19ac>] M3DRV4_L2EntryToDrv+0x4c/0x80 [system]

[<ffffffffc78b2924>] M3DRV4_IOCTL_IPMsgToDrv+0x234/0x6d0 [system]

[<ffffffffc78d07e0>] L2MIOCTL_ProcIPMsgToDrv+0xa0/0x1c0 [system]

[<ffffffffc78d18cc>] MFIB_L2MIOCTL_CallBack+0x2bc/0x9d0 [system]

[<ffffffffc76d3d9c>] CIOCTL_Doit+0xfc/0x2b0 [system]

[<ffffffff80202f44>] stack_done_ra+0x0/0x1c

 

[TZ-TT-DSW-1.MAN.H3C12510F-probe]dis parity-error c 2 s 6

Jun 09 2022 17:48:53:430147:unit 1:name=IHP_ParityErrInt, id=468, index=0, block=0, unit=1, recurring_action=0 | nof_occurences=0001, cnt_overflowf=0x0, memory address=0x005a84d6 memory=IHP_MEM_590000, mem_id=3943, array element=3, index=1238 | EM Soft Recovery

 Error count 1. First logged at Jun 09 2022 17:48:53:430147.

 

[TZ-TT-DSW-1.MAN.H3C12510F-probe]view /proc/94/stack ch 2 s 6

[<ffffffff804ad9f0>] schedule+0x710/0x1050

[<ffffffffc3e92ba0>] mutex_lock_replacer+0x120/0x1e0 [system]

[<ffffffffc3e93234>] drv_sal_mutex_take+0x5d4/0x7f0 [system]

[<ffffffffc3e98774>] sal_config_get+0x114/0x170 [system]

[<ffffffffc3e53d24>] soc_property_get_str+0x214/0x520 [system]

[<ffffffffc3e58f3c>] soc_property_get+0x1c/0x50 [system]

[<ffffffffc45d6fe0>] arad_pp_frwrd_mact_is_dma_supported+0xa0/0x170 [system]

[<ffffffffc45862e0>] arad_pp_lem_access_parse_only+0xd0/0x12a0 [system]

[<ffffffffc45876f0>] arad_pp_lem_access_parse+0x240/0x300 [system]

[<ffffffffe3d2b018>] _arad_pp_frwrd_lem_get_block_unsafe+0xa68/0x17b0 [ksplice_lpu_xlp_1617106360_system_new]

[<ffffffffe3d2bff8>] _arad_pp_frwrd_mact_get_block_unsafe+0x298/0x470 [ksplice_lpu_xlp_1617106360_system_new]

[<ffffffffe3d2c2e4>] arad_pp_frwrd_mact_get_block_unsafe+0x114/0x210 [ksplice_lpu_xlp_1617106360_system_new]

[<ffffffffe3ca259c>] soc_ppd_frwrd_mact_get_block+0x39c/0x400 [ksplice_lpu_xlp_1617106360_system_new]

[<ffffffffe3d30684>] h3c_arad_pp_

解决方法

重启故障单板后问题解决