Brocade
SN8000
firmwareshow -v:
Slot Name Appl Primary/Secondary Versions Status
--------------------------------------------------------------------------
1 CP0 FOS v8.2.1c STANDBY
v8.2.1c
2 CP1 FOS v8.2.1c ACTIVE *
v8.2.1c
2020/02/12-13:35:39, [PLAT-1001], 7616, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1 resetting other CP (double reset may occur).
CP1 Log:
2020/02/12-13:35:33:960658, [HAMK-5008], 8728/0, SLOT 2 | CHASSIS, WARNING, LJ132_E10_E132, heartbeat missed for 10 seconds!!, htbt.c, line: 717, comp:swapper, ltime:2020/02/12-13:35:33:960450
2020/02/12-13:35:39:966126, [HAMK-5008], 8729/0, SLOT 2 | CHASSIS, WARNING, LJ132_E10_E132, HA HTBT down, take active!, hasm_dev.c, line: 512, comp:swapper, ltime:2020/02/12-13:35:39:960470
2020/02/12-13:35:39:992944, [PLAT-5057], 8730/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: Enabling CP to be Active cp_misc_err 0x8 cp_fence 0x1 cp_hk_fence 0x8 misc_wr_ctrl 0x60 i2c_sel 0xa387b9e0, gen6_ctrl.c, line: 307, comp:swapper, ltime:2020/02/12-13:35:39:992884
2020/02/12-13:35:39:993035, [PLAT-5057], 8731/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: RESET OCP cp_misc_err 0x8 cp_fence 0x0 cp_hk_fence 0xc misc_wr_ctrl 0x60 i2c_sel 0x29002, gen6_ctrl.c, line: 329, comp:swapper, ltime:2020/02/12-13:35:39:992950
2020/02/12-13:35:39:993062, [PLAT-1001], 8732/7616, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1 resetting other CP (double reset may occur)., modular_ctrl.c, line: 118, comp:swapper, ltime:2020/02/12-13:35:39:993002
2020/02/12-13:35:39:993123, [PLAT-5057], 8733/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: I"m the Master now! cp_misc_err 0x8 cp_fence 0x0 cp_hk_fence 0xc misc_wr_ctrl 0x60 i2c_sel 0x29002, gen6_ctrl.c, line: 348, comp:swapper, ltime:2020/02/12-13:35:39:993059
2020/02/12-13:35:40:095159, [PLAT-5048], 8734/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: CP Attention irq cp_mask 0xdd cp_status 0x42 intr_bitmap 0x2 cp_misc_err 0x18 cp_change 0x10 cp_change_mask 0xfe, modular_cpctrl_, line: 347, comp:swapper, ltime:2020/02/12-13:35:40:094915
2020/02/12-13:35:40:095205, [PLAT-5049], 8735/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: OCP control event detected reason 0x35 cp_fence 0xf0 cp_hk_fence 0xc, modular_cpctrl_, line: 264, comp:swapper, ltime:2020/02/12-13:35:40:095071
2020/02/12-13:35:40:160322, [PLAT-5074], 8736/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, SPDC Host init success spdc_status 0x80, rdy0=0x3f, rdy1=0x80, fen=0x0, hk_fen=0xc, ocp=0x10 , modular_pdc.c, line: 573, comp:swapper, ltime:2020/02/12-13:35:40:160258
2020/02/12-13:35:40:160546, [PLAT-5057], 8737/0, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP1: CP ready to be Active cp_misc_err 0x18 cp_fence 0xf0 cp_hk_fence 0xc misc_wr_ctrl 0x60 i2c_sel 0x29002, gen6_ctrl.c, line: 434, comp:swapper, ltime:2020/02/12-13:35:40:160427
2020/02/12-13:35:55, [EM-1033], 7635, SLOT 2 | CHASSIS, ERROR, LJ132_E10_E132, CP in Slot 1 set to faulty because CP ERROR asserted.
2020/02/12-13:35:55, [EM-1047], 7636, SLOT 2 | CHASSIS, INFO, LJ132_E10_E132, CP in slot 1 not faulty, CP ERROR deasserted.
CP0 Log:
2020/02/12-13:33:06:662269, [PLAT-5048], 28709/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, CP0: CP Attention irq cp_mask 0xff cp_status 0x40 intr_bitmap 0x0 cp_misc_err 0x8 cp_change 0x11 cp_change_mask 0xff, modular_cpctrl_, line: 347, comp:insmod, ltime:2020/02/12-13:32:32:856221
2020/02/12-13:33:06:662302, [PLAT-5053], 28710/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, CP0: OCP State Change event cp_change 0xff cp_change_mask 0x10 cp_misc_err 0x8, modular_cpctrl_, line: 181, comp:insmod, ltime:2020/02/12-13:32:32:856310
2020/02/12-13:33:06:662322, [PLAT-8002], 28711/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, platform_hasm_register, gen6_sysha.c, line: 309, comp:insmod, ltime:2020/02/12-13:32:32:856791
2020/02/12-13:33:06:662350, [PLAT-5071], 28712/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, fpga enable reset co-proccessor value (16)., allegiance.c, line: 630, comp:insmod, ltime:2020/02/12-13:32:32:874614
2020/02/12-13:33:06:662369, [PLAT-5044], 28713/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, CP0: Platform Module initialized. CPLD Rev 0x6 0x80 pciFence 0x1 hkFence 0x8 miscWrite 0x20 miscRead 0x44, allegiance.c, line: 747, comp:insmod, ltime:2020/02/12-13:32:32:877080
2020/02/12-13:33:06:662395, [CHS-5001], 28714/0, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, Initialized the Chassis module., chassis_ent.c, line: 344, comp:insmod, ltime:2020/02/12-13:32:34:262390
2020/02/12-13:33:06:662421, [KTRC-5005], 28715/0, SLOT 1 | CHASSIS, WARNING, LJ132_E10_E132, traced: kboard_get_tbuff mid=109h already initialize, ras_ktrace.c, line: 317, comp:insmod, ltime:2020/02/12-13:32:36:119630
2020/02/12-13:33:06:662442, [KTRC-5005], 28716/0, SLOT 1 | CHASSIS, WARNING, LJ132_E10_E132, traced: kboard_get_tbuff mid=10ah already initialize, ras_ktrace.c, line: 317, comp:insmod, ltime:2020/02/12-13:32:36:119697
2020/02/12-13:33:06:662473, [HAM-1004], 28717/25884, SLOT 1 | CHASSIS, INFO, LJ132_E10_E132, Processor rebooted - Reset., reboot.c, line: 117, comp:hamd, ltime:2020/02/12-13:32:56:427355
从 log 看, CP0 检测到 丢失 heartbeat, 试图接管, 所以导致 cp reboot. 什么原因触发的 米杀死那个heartbeat?
The customer is running the following every five min
or so:
3/0;portstats64show 3/0;3/1;portstats64show 3/1;3/2;portstats64show
3/2;3/3;portstats64show 3/3;3/4;portstats64show 3/4;3/5;portstats64show 3/5;3/6;portstats64show
3/6;3/7;portstats64show 3/7;3/8;portstats64show 3/8;3/9;portstats64show
3/9;3/10;portstats64show 3/10;3/11;portstats64show 3/11;3/12;portstats64show
3/12;3/13;portstats64show 3/13;3/14;portstats64show 3/14;3/15;portstats64show
3/15;3/16;portstats64show 3/16;3/17;portstats64show 3/17;3/18;portstats64show
3/18;3/19;po
Not only is this a nested cli command which has issues( see defect), it is a
string which has numerous invalid commands. It is recommended to no longer use
scripting to run commands to the switch and instead use SNMP traps to gather
the requested information. At the minimum the command needs to be changed to no
longer be nested and to remove any of the invalid parts.
For example, the first part of the above nested command is the following:
3/0;portstats64show 3/0;3/1;portstats64show 3/1;3/2;portstats64show 3/2
This is running the following commands one right after the other because of the
";"
3/0 <--- invalid command
portstats64show 3/0
3/1 <--- invalid command
portstats64show 3/1
3/2 <--- invalid command
portstats64show 3/2
客户每5分钟都有运用 一个 script, 里面包含 无效命令. 触发 firmware bug, 最终导致 CP reboot.
升级 firmware to 8.2.1J, 8.2.1d, 8.2.2a 或者 停掉有问题的 script.
该案例暂时没有网友评论
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作