• 全部
  • 经验案例
  • 典型配置
  • 技术公告
  • FAQ
  • 漏洞说明
  • 全部
  • 全部
  • 大数据引擎
  • 知了引擎
产品线
搜索
取消
案例类型
发布者
是否解决
是否官方
时间
搜索引擎
匹配模式
高级搜索

光纤卡SFP故障引发光纤磁带库识别异常

  • 0关注
  • 0收藏 33浏览
刘军 七段
粉丝:8人 关注:0人

组网及说明

R4900 G3 / QLA2692 / CentOS Linux 7.6

问题描述

光纤卡SFP故障引发光纤磁带库识别异常

过程分析

1. 硬件排查

 

1.1. 硬件系统健康日志

 

未见异常,无打印信息

 

1.2. 动态监控日志

 

截取其中一次重启过程,未见异常

 

0 1 2022-04-07 11:50:55 2022-04-07 03:50:55 PDIndex(Front:6)----Inserted: PD 11(e1/s6) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e006,0000000000000000

0 1 2022-04-07 11:50:55 2022-04-07 03:50:55 PDIndex(Front:0)----Dedicated Hot Spare created on PD 0d(e8/s0) (ded,rev,ac=1)

0 1 2022-04-07 11:50:55 2022-04-07 03:50:55 Controller operating temperature within normal range, full operation restored---CtrlIndex(2)

0 1 2022-04-07 11:50:56 2022-04-07 03:50:56 Time established as 04/07/22  3:50:24; (94 seconds since power on)---CtrlIndex(2)

0 0 2022-04-07 11:51:18 2022-04-07 03:51:18 SensorType: OS Boot, SensorName: System, EventType: Discrete, Event: boot completed - boot device not specified Boot completed - boot device not specified

0 0 2022-04-07 11:51:18 2022-04-07 03:51:18 EventType: OEM, Event: ME Firmware Health Event---Event data:0xa0 0xe 0x2, Data2: 14, Data3: 2 ME Firmware Health Event---Event data:0xa0 0xe 0x2

0 0 2022-04-07 11:51:49 2022-04-07 03:51:49 EventType: System ACPI Power State, Event: LPC Reset occurred LPC Reset occurred

0 0 2022-04-07 11:51:50 2022-04-07 03:51:50 EventType: System Boot / Restart, Event: System Restart, Data2: 48 System restart---Unknown cause

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Firmware initialization started (PCI ID 005d/1000/9361/1000)---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Firmware version 4.680.00-8551---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Battery Present---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Package version 24.21.0-0146---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Board Revision 11C---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Battery charge complete---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Battery temperature is normal---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure (SES) discovered on PD 08(e1/s0)---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure PD 08(e1/s0) communication restored---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure PD 08(e1/s0) phy bad for slot 12---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure PD 08(e1/s0) phy bad for slot 13---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure PD 08(e1/s0) phy bad for slot 14---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure PD 08(e1/s0) phy bad for slot 15---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure PD 08(e1/s0) phy bad for slot 16---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure PD 08(e1/s0) phy bad for slot 17---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure PD 08(e1/s0) phy bad for slot 18---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Enclosure PD 08(e1/s0) phy bad for slot 19---CtrlIndex(2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Inserted: PD 08(e8/s255)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 Inserted: PD 08(e1/s0) Info: enclPd=08, scsiType=d, portMap=00, sasAddr=578aa82cb193e07e,0000000000000000

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Rear:9)----Inserted: PD 09(e8/s28)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Rear:9)----Inserted: PD 09(e1/s28) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e01c,0000000000000000

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Rear:10)----Inserted: PD 0a(e8/s29)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Rear:10)----Inserted: PD 0a(e1/s29) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e01d,0000000000000000

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:2)----Inserted: PD 0b(e8/s2)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:2)----Inserted: PD 0b(e1/s2) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e002,0000000000000000

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:4)----Inserted: PD 0c(e8/s4)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:4)----Inserted: PD 0c(e1/s4) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e004,0000000000000000

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:0)----Inserted: PD 0d(e8/s0)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:0)----Inserted: PD 0d(e1/s0) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e000,0000000000000000

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:5)----Inserted: PD 0e(e8/s5)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:5)----Inserted: PD 0e(e1/s5) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e005,0000000000000000

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:3)----Inserted: PD 0f(e8/s3)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:3)----Inserted: PD 0f(e1/s3) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e003,0000000000000000

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:1)----Inserted: PD 10(e8/s1)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:1)----Inserted: PD 10(e1/s1) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e001,0000000000000000

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:6)----Inserted: PD 11(e8/s6)

0 1 2022-04-07 11:53:50 2022-04-07 03:53:50 PDIndex(Front:6)----Inserted: PD 11(e1/s6) Info: enclPd=08, scsiType=0, portMap=00, sasAddr=578aa82cb193e006,0000000000000000

0 1 2022-04-07 11:53:51 2022-04-07 03:53:51 PDIndex(Front:0)----Dedicated Hot Spare created on PD 0d(e8/s0) (ded,rev,ac=1)

0 1 2022-04-07 11:53:51 2022-04-07 03:53:51 Controller operating temperature within normal range, full operation restored---CtrlIndex(2)

0 1 2022-04-07 11:53:51 2022-04-07 03:53:51 Time established as 04/07/22  3:53:40; (94 seconds since power on)---CtrlIndex(2)

0 0 2022-04-07 11:54:35 2022-04-07 03:54:35 EventType: OEM, Event: ME Firmware Health Event---Event data:0xa0 0xe 0x2, Data2: 14, Data3: 2 ME Firmware Health Event---Event data:0xa0 0xe 0x2

0 0 2022-04-07 11:54:35 2022-04-07 03:54:35 SensorType: OS Boot, SensorName: System, EventType: Discrete, Event: boot completed - boot device not specified Boot completed - boot device not specified

1.3. 硬件底层日志排查

 

未见异常

 

1.4. 光纤卡QLA2692固件

 

当前版本未见异常

 

2. 系统排查

 

2.2.1. 系统驱动版本

 

版本未见异常

 

filename:       /lib/modules/3.10.0-957.el7.x86_64/extra/qlgc-qla2xxx/qla2xxx.ko

firmware:       ql2700_fw.bin

firmware:       ql8300_fw.bin

firmware:       ql2600_fw.bin

firmware:       ql2500_fw.bin

firmware:       ql2400_fw.bin

firmware:       ql2322_fw.bin

firmware:       ql2300_fw.bin

firmware:       ql2200_fw.bin

firmware:       ql2100_fw.bin

version:        10.01.00.33.07.6-k

license:        GPL

description:    Cavium Fibre Channel HBA Driver

author:         QLogic Corporation

 

2.2.2. 光纤卡固件版本

 

固件版本为"9.07.00",未见异常

 

2.2.3. 参考系统日志,每次插拔光纤线缆出现重启的会伴随出现vmcore-dmesg,截取部分信息如下

[ 927.020984 ] qla2xxx [0000:d8:00.1]-5090:16: LOOP INIT ERROR (2003).

[  927.022568] qla2xxx [0000:d8:00.1]-d011:16: -> fwdt0 running...

[  927.038189] qla2xxx [0000:d8:00.1]-d015:16: -> Firmware dump saved to buffer (16/ffffa67cc8c6a000) <f>

[  927.429050] qla2xxx [0000:d8:00.1]-00af:16: Performing ISP error recovery - ha=ffff9b5f72e06000.

[  927.436162] qla2xxx [0000:d8:00.1]-0075:16: ZIO mode 6 enabled; timer delay (200 us).

[  932.878261] qla2xxx [0000:d8:00.1]-5090:16: LOOP INIT ERROR (2003).

[  932.879940] qla2xxx [0000:d8:00.1]-d01f:16: -> Firmware already dumped (ffffa67cc8c6a000) -- ignoring request

[  934.935491] qla2xxx [0000:d8:00.1]-5090:16: LOOP INIT ERROR (200b).

[  934.937253] qla2xxx [0000:d8:00.1]-d01f:16: -> Firmware already dumped (ffffa67cc8c6a000) -- ignoring request

[  936.981735] qla2xxx [0000:d8:00.1]-5090:16: LOOP INIT ERROR (200b).

[  936.983557] qla2xxx [0000:d8:00.1]-d01f:16: -> Firmware already dumped (ffffa67cc8c6a000) -- ignoring request

[  937.006489] qla2xxx [0000:d8:00.1]-500a:16: LOOP UP detected (8 Gbps).

[  937.072687] qla2xxx [0000:d8:00.1]-1005:16: Cmd 0x5d aborted with timeout since ISP Abort is pending

[  937.072701] qla2xxx [0000:d8:00.1]-1005:16: Cmd 0x7c aborted with timeout since ISP Abort is pending

[  937.072741] BUG: unable to handle kernel NULL pointer dereference at 00000000000001a0

[  937.074647] IP: [<ffffffffc06c8195>] qla2x00_free_fcport+0x15/0x150 [qla2xxx]

[  937.076539] PGD 0 

[  937.078362] Oops: 0000 [#1] SMP 

[  937.080162] Modules linked in: binfmt_misc target_core_user target_core_mod uio ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter iTCO_wdt iTCO_vendor_support skx_edac coretemp intel_rapl iosf_mbi kvm_intel kvm vfat fat irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd ipmi_ssif pcspkr ses enclosure scsi_transport_sas joydev sg mei_me i2c_i801 lpc_ich mei wmi ipmi_si ipmi_devintf

[  937.089407]  ipmi_msghandler acpi_power_meter ip_tables xfs sd_mod crc_t10dif crct10dif_generic qla2xxx(OE) crct10dif_pclmul crct10dif_common crc32c_intel ast i2c_algo_bit drm_kms_helper bnx2x nvme_fc syscopyarea sysfillrect nvme_fabrics sysimgblt fb_sys_fops nvme_core ttm scsi_transport_fc i40e scsi_tgt drm ahci libahci mdio ptp pps_core megaraid_sas libcrc32c libata drm_panel_orientation_quirks nfit libnvdimm dm_mirror dm_region_hash dm_log dm_mod

[  937.094758] CPU: 5 PID: 9597 Comm: qla2xxx_16_dpc Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.el7.x86_64 #1

[  937.096597] Hardware name: N/A N/A/RS33M2C9S, BIOS 2.00.48 03/10/2021

[  937.097525] task: ffff9b66d2e46180 ti: ffff9b66c1dbc000 task.ti: ffff9b66c1dbc000

[  937.098460] RIP: 0010:[<ffffffffc06c8195>]  [<ffffffffc06c8195>] qla2x00_free_fcport+0x15/0x150 [qla2xxx]

[  937.099426] RSP: 0018:ffff9b66c1dbfd20  EFLAGS: 00010282

[  937.100376] RAX: 0000000000000100 RBX: 0000000000000000 RCX: ffffffffc0758ad0

[  937.101336] RDX: 0000000000000000 RSI: ffff9b66dd530740 RDI: 0000000000000000

[  937.102286] RBP: ffff9b66c1dbfd48 R08: 0000000000000100 R09: 0000000000000001

[  937.103226] R10: 0000000000000a99 R11: ffff9b66c1dbf7be R12: ffff9b5f72e06000

[  937.104159] R13: 0000000000000100 R14: 0000000000000100 R15: ffff9b66dd5307d0

[  937.105081] FS:  0000000000000000(0000) GS:ffff9b5edcb40000(0000) knlGS:0000000000000000

[  937.105913] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[  937.106724] CR2: 00000000000001a0 CR3: 000000073ea76000 CR4: 00000000007607e0

[  937.107532] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[  937.108345] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

[  937.109126] PKRU: 00000000

[  937.109891] Call Trace:

[  937.110661]  [<ffffffffc06cc272>] qla2x00_loop_resync+0x722/0x1120 [qla2xxx]

[  937.111456]  [<ffffffffc06b957a>] qla2x00_do_dpc+0x9fa/0xbc0 [qla2xxx]

[  937.112257]  [<ffffffffc06b8b80>] ? qla24xx_process_purex_list+0xd0/0xd0 [qla2xxx]

[  937.113069]  [<ffffffff89ac1c31>] kthread+0xd1/0xe0

[  937.113884]  [<ffffffff89ac1b60>] ? insert_kthread_work+0x40/0x40

[  937.114712]  [<ffffffff8a174c1d>] ret_from_fork_nospec_begin+0x7/0x21

[  937.115557]  [<ffffffff89ac1b60>] ? insert_kthread_work+0x40/0x40

[  937.116386] Code: 84 00 00 00 00 00 e8 2b f2 3c c9 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 10 <48> 8b 97 a0 01 00 00 48 85 d2 74 77 48 8b 47 10 48 8b 8f a8 01 

[  937.118180] RIP  [<ffffffffc06c8195>] qla2x00_free_fcport+0x15/0x150 [qla2xxx]

[  937.119040]  RSP <ffff9b66c1dbfd20>

[  937.119879] CR2: 00000000000001a0

 

 

参考以上信息,在Call trace时,扔指向qla2xxx先关,结合以上的内容,光纤卡固件与驱动未见异常,继续光纤卡排查

 

2.2.4. 通过光纤卡工具"QConvergeConsole"进一步排查光纤卡实时状态

 

2.2.4.1. 通过命令“# qaucli -z”查看光纤卡固件生效状况和相关参数

 

现场远程查看,未见异常

 

2.2.4.2 .通过命令"#qaucli -pr fc -dm all general"查看光纤卡SFP模块状态信息

 

实际发现,一个端口的SFP,发光功率低于临界值“0.1259mW”,尝试插拔重置无效。更换SFP后正常。

 

且之后,多次插拔测试再未出现异常。

 

 

 

解决方法

### 3.结论

 

3.1. 主机光纤卡HBA中的SFP光衰导致本次故障

 

3.2. 主机其他硬件未见异常

 

3.3. 未发现系统、固件、驱动异常导致本次故障

 

### 4. 建议

 

4.1.更换光纤卡故障SFP模块

该案例对您是否有帮助:

您的评价:1

若您有关于案例的建议,请反馈:

0 个评论

该案例暂时没有网友评论

编辑评论

举报

×

侵犯我的权益 >
对根叔知了社区有害的内容 >
辱骂、歧视、挑衅等(不友善)

侵犯我的权益

×

泄露了我的隐私 >
侵犯了我企业的权益 >
抄袭了我的内容 >
诽谤我 >
辱骂、歧视、挑衅等(不友善)
骚扰我

泄露了我的隐私

×

您好,当您发现根叔知了上有泄漏您隐私的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到pub.zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您认为哪些内容泄露了您的隐私?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)

侵犯了我企业的权益

×

您好,当您发现根叔知了上有关于您企业的造谣与诽谤、商业侵权等内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到 pub.zhiliao@h3c.com 邮箱,我们会在审核后尽快给您答复。
  • 1. 您举报的内容是什么?(请在邮件中列出您举报的内容和链接地址)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
  • 3. 是哪家企业?(营业执照,单位登记证明等证件)
  • 4. 您与该企业的关系是?(您是企业法人或被授权人,需提供企业委托授权书)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

抄袭了我的内容

×

原文链接或出处

诽谤我

×

您好,当您发现根叔知了上有诽谤您的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到pub.zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您举报的内容以及侵犯了您什么权益?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

对根叔知了社区有害的内容

×

垃圾广告信息
色情、暴力、血腥等违反法律法规的内容
政治敏感
不规范转载 >
辱骂、歧视、挑衅等(不友善)
骚扰我
诱导投票

不规范转载

×

举报说明

提出建议

    +

亲~登录后才可以操作哦!

确定

亲~检测到您登陆的账号未在http://hclhub.h3c.com进行注册

注册后可访问此模块

跳转hclhub

你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作