• 全部
  • 经验案例
  • 典型配置
  • 技术公告
  • FAQ
  • 全部
  • 全部
产品线
搜索
取消
案例类型
发布者
是否解决
是否官方
时间
高级搜索

SN8600B CP Auto Reboot

2020-03-02 发表
  • 0关注
  • 0收藏,503浏览
粉丝:1人 关注:0人

组网及说明

Brocade


问题描述

SN8000

firmwareshow -v:
Slot Name       Appl     Primary/Secondary Versions               Status
--------------------------------------------------------------------------
  1  CP0        FOS      v8.2.1c                                  STANDBY
                         v8.2.1c                                 
  2  CP1        FOS      v8.2.1c                                  ACTIVE *
                         v8.2.1c                                 
2020/02/12-13:35:39, [PLAT-1001], 7616, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1 resetting other CP (double reset may occur).


过程分析

CP1 Log:


2020/02/12-13:35:33:960658, [HAMK-5008], 8728/0, SLOT 2 | CHASSIS, WARNING, LJ132_E10_E132, heartbeat missed for 10 seconds!!, htbt.c, line: 717, comp:swapper, ltime:2020/02/12-13:35:33:960450

 

2020/02/12-13:35:39:966126, [HAMK-5008], 8729/0, SLOT 2 | CHASSIS, WARNING, LJ132_E10_E132, HA HTBT down, take active!, hasm_dev.c, line: 512, comp:swapper, ltime:2020/02/12-13:35:39:960470

 

2020/02/12-13:35:39:992944, [PLAT-5057], 8730/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: Enabling CP to be Active cp_misc_err 0x8 cp_fence 0x1 cp_hk_fence 0x8 misc_wr_ctrl 0x60 i2c_sel 0xa387b9e0, gen6_ctrl.c, line: 307, comp:swapper, ltime:2020/02/12-13:35:39:992884

 

2020/02/12-13:35:39:993035, [PLAT-5057], 8731/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: RESET OCP cp_misc_err 0x8 cp_fence 0x0 cp_hk_fence 0xc misc_wr_ctrl 0x60 i2c_sel 0x29002, gen6_ctrl.c, line: 329, comp:swapper, ltime:2020/02/12-13:35:39:992950

 

2020/02/12-13:35:39:993062, [PLAT-1001], 8732/7616, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1 resetting other CP (double reset may occur)., modular_ctrl.c, line: 118, comp:swapper, ltime:2020/02/12-13:35:39:993002

 

2020/02/12-13:35:39:993123, [PLAT-5057], 8733/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: I"m the Master now! cp_misc_err 0x8 cp_fence 0x0 cp_hk_fence 0xc misc_wr_ctrl 0x60 i2c_sel 0x29002, gen6_ctrl.c, line: 348, comp:swapper, ltime:2020/02/12-13:35:39:993059

 

2020/02/12-13:35:40:095159, [PLAT-5048], 8734/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: CP Attention irq cp_mask 0xdd cp_status 0x42 intr_bitmap 0x2 cp_misc_err 0x18 cp_change 0x10 cp_change_mask 0xfe, modular_cpctrl_, line: 347, comp:swapper, ltime:2020/02/12-13:35:40:094915

 

2020/02/12-13:35:40:095205, [PLAT-5049], 8735/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: OCP control event detected reason 0x35 cp_fence 0xf0 cp_hk_fence 0xc, modular_cpctrl_, line: 264, comp:swapper, ltime:2020/02/12-13:35:40:095071

 

2020/02/12-13:35:40:160322, [PLAT-5074], 8736/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, SPDC Host init success spdc_status 0x80, rdy0=0x3f, rdy1=0x80, fen=0x0, hk_fen=0xc, ocp=0x10 , modular_pdc.c, line: 573, comp:swapper, ltime:2020/02/12-13:35:40:160258

 

2020/02/12-13:35:40:160546, [PLAT-5057], 8737/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: CP ready to be Active cp_misc_err 0x18 cp_fence 0xf0 cp_hk_fence 0xc misc_wr_ctrl 0x60 i2c_sel 0x29002, gen6_ctrl.c, line: 434, comp:swapper, ltime:2020/02/12-13:35:40:160427

 

 

2020/02/12-13:35:55, [EM-1033], 7635, SLOT 2 | CHASSIS, ERROR, LJ132_E10_E132, CP in Slot 1 set to faulty because CP ERROR asserted.

 

2020/02/12-13:35:55, [EM-1047], 7636, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP in slot 1 not faulty, CP ERROR deasserted.


CP0 Log:

2020/02/12-13:33:06:662269, [PLAT-5048], 28709/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, CP0: CP Attention irq cp_mask 0xff cp_status 0x40 intr_bitmap 0x0 cp_misc_err 0x8 cp_change 0x11 cp_change_mask 0xff, modular_cpctrl_, line: 347, comp:insmod, ltime:2020/02/12-13:32:32:856221

 

2020/02/12-13:33:06:662302, [PLAT-5053], 28710/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, CP0: OCP State Change event cp_change 0xff cp_change_mask 0x10 cp_misc_err 0x8, modular_cpctrl_, line: 181, comp:insmod, ltime:2020/02/12-13:32:32:856310

 

2020/02/12-13:33:06:662322, [PLAT-8002], 28711/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, platform_hasm_register, gen6_sysha.c, line: 309, comp:insmod, ltime:2020/02/12-13:32:32:856791

 

2020/02/12-13:33:06:662350, [PLAT-5071], 28712/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, fpga enable reset co-proccessor value   (16)., allegiance.c, line: 630, comp:insmod, ltime:2020/02/12-13:32:32:874614

 

2020/02/12-13:33:06:662369, [PLAT-5044], 28713/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, CP0: Platform Module initialized. CPLD Rev 0x6 0x80 pciFence 0x1 hkFence 0x8 miscWrite 0x20 miscRead 0x44, allegiance.c, line: 747, comp:insmod, ltime:2020/02/12-13:32:32:877080

 

2020/02/12-13:33:06:662395, [CHS-5001], 28714/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, Initialized the Chassis module., chassis_ent.c, line: 344, comp:insmod, ltime:2020/02/12-13:32:34:262390

 

2020/02/12-13:33:06:662421, [KTRC-5005], 28715/0, SLOT 1 | CHASSIS, WARNING, LJ132_E10_E132, traced: kboard_get_tbuff mid=109h already initialize, ras_ktrace.c, line: 317, comp:insmod, ltime:2020/02/12-13:32:36:119630

 

2020/02/12-13:33:06:662442, [KTRC-5005], 28716/0, SLOT 1 | CHASSIS, WARNING, LJ132_E10_E132, traced: kboard_get_tbuff mid=10ah already initialize, ras_ktrace.c, line: 317, comp:insmod, ltime:2020/02/12-13:32:36:119697

 

2020/02/12-13:33:06:662473, [HAM-1004], 28717/25884, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, Processor rebooted - Reset., reboot.c, line: 117, comp:hamd, ltime:2020/02/12-13:32:56:427355

 从   log 看, CP0 检测到 丢失 heartbeat, 试图接管, 所以导致 cp  reboot. 什么原因触发的 米杀死那个heartbeat?


The customer is running the following every five min or so:
3/0;portstats64show 3/0;3/1;portstats64show 3/1;3/2;portstats64show 3/2;3/3;portstats64show 3/3;3/4;portstats64show 3/4;3/5;portstats64show 3/5;3/6;portstats64show 3/6;3/7;portstats64show 3/7;3/8;portstats64show 3/8;3/9;portstats64show 3/9;3/10;portstats64show 3/10;3/11;portstats64show 3/11;3/12;portstats64show 3/12;3/13;portstats64show 3/13;3/14;portstats64show 3/14;3/15;portstats64show 3/15;3/16;portstats64show 3/16;3/17;portstats64show 3/17;3/18;portstats64show 3/18;3/19;po
Not only is this a nested cli command which has issues( see defect), it is a string which has numerous invalid commands. It is recommended to no longer use scripting to run commands to the switch and instead use SNMP traps to gather the requested information. At the minimum the command needs to be changed to no longer be nested and to remove any of the invalid parts.
For example, the first part of the above nested command is the following:
3/0;portstats64show 3/0;3/1;portstats64show 3/1;3/2;portstats64show 3/2
This is running the following commands one right after the other because of the ";"
3/0 <--- invalid command
portstats64show 3/0
3/1 <--- invalid command
portstats64show 3/1
3/2 <--- invalid command
portstats64show 3/2

客户每5分钟都有运用 一个 script, 里面包含 无效命令.  触发 firmware bug, 最终导致 CP reboot.







解决方法

 升级 firmware to 8.2.1J, 8.2.1d, 8.2.2a 或者 停掉有问题的 script.

0 个评论

该案例暂时没有网友评论

编辑评论

举报

×

侵犯我的权益 >
对根叔知了社区有害的内容 >
辱骂、歧视、挑衅等(不友善)

侵犯我的权益

×

泄露了我的隐私 >
侵犯了我企业的权益 >
抄袭了我的内容 >
诽谤我 >
辱骂、歧视、挑衅等(不友善)
骚扰我

泄露了我的隐私

×

您好,当您发现根叔知了上有泄漏您隐私的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您认为哪些内容泄露了您的隐私?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)

侵犯了我企业的权益

×

您好,当您发现根叔知了上有关于您企业的造谣与诽谤、商业侵权等内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到 zhiliao@h3c.com 邮箱,我们会在审核后尽快给您答复。
  • 1. 您举报的内容是什么?(请在邮件中列出您举报的内容和链接地址)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
  • 3. 是哪家企业?(营业执照,单位登记证明等证件)
  • 4. 您与该企业的关系是?(您是企业法人或被授权人,需提供企业委托授权书)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

抄袭了我的内容

×

原文链接或出处

诽谤我

×

您好,当您发现根叔知了上有诽谤您的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您举报的内容以及侵犯了您什么权益?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

对根叔知了社区有害的内容

×

垃圾广告信息
色情、暴力、血腥等违反法律法规的内容
政治敏感
不规范转载 >
辱骂、歧视、挑衅等(不友善)
骚扰我
诱导投票

不规范转载

×

举报说明

提出建议

    +
<

亲~登录后才可以操作哦!

确定

你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作