某局点MSR2600-X1设备运行中重启,重启后保存诊断查看上次重启原因为内存耗尽。
H3C Comware Software, Version 7.1.064, Release 0809P33
Copyright (c) 2004-2020 New H3C Technologies Co., Ltd. All rights reserved.
H3C MSR2600 uptime is 0 weeks, 0 days, 1 hour, 16 minutes
Last reboot reason : Memory exhausted
查看设备日志,重启前有大量尝试登陆设备日志、登陆设备失败日志,及用户线占满日志。怀疑内存异常与登陆设备用户太多有关。
%@6670762%Jul 8 11:41:02:244 2022 H3C PWDCTL/6/PWDCTL_ADD_BLACKLIST: admin was added to the blacklist for failed login attempts.
……
%@6671336%Jul 8 11:42:17:926 2022 H3C TELNETD/6/TELNETD_REACH_SESSION_LIMIT: Telnet client 1.1.1.1 failed to log in. The current number of Telnet sessions is 32. The maximum number allowed is (32).
%@6671337%Jul 8 11:42:17:966 2022 H3C TELNETD/6/TELNETD_REACH_SESSION_LIMIT: Telnet client 1.1.1.1 failed to log in. The current number of Telnet sessions is 32. The maximum number allowed is (32).
%@6671338%Jul 8 11:42:17:975 2022 H3C TELNETD/6/TELNETD_REACH_SESSION_LIMIT: Telnet client 1.1.1.1 failed to log in. The current number of Telnet sessions is 32. The maximum number allowed is (32).
……
%@6671339%Jan 1 08:00:00:000 2011 H3C SYSLOG/6/SYSLOG_RESTART: System restarted --
H3C Comware Software.
查看设备flash中的文件,lauth.dat文件占了约390M,该文件多用于存储认证相关信息,内容不应该有这么多。该文件大说明设备认证有异常信息。
===============dir /all /all-filesystems===============
Directory of flash: (YAFFS2)
……
7 -rw- 396870 Jul 08 2022 13:03:01 lauth.dat
查看诊断中的密码控制黑名单,发现存在大量表项,该表项用于记录客户端尝试登陆设备次数。按照password-control功能配置,客户端登陆设备失败3次后会被加入黑名单锁定,如失败2次后不再继续尝试,按照设备当前实现,该表项会长时间记录于设备上,导致内存占用持续增长。现网设备暴露于公网环境,受到来自很多客户端的登陆尝试,因此该表项内容持续增长,最终导致内存耗尽,设备重启。
===============display password-control blacklist===============
Blacklist items matched: 11002.
Username IP address Login failures Lock flag
admin 1.1.1.2 2 unlock
admin 1.1.1.3 2 unlock
admin 1.1.1.4 2 unlock
admin 1.1.1.5 1 unlock
1. 可以对ssh/telnet用户配置acl,限制能够登陆设备的源ip,避免产生大量黑名单表项;
2. 如设备已产生大量黑名单表项,可通过reset password-control blacklist暂时清除,释放内存。