• 全部
  • 经验案例
  • 典型配置
  • 技术公告
  • FAQ
  • 漏洞说明
  • 全部
  • 全部
  • 大数据引擎
  • 知了引擎
产品线
搜索
取消
案例类型
发布者
是否解决
是否官方
时间
搜索引擎
匹配模式
高级搜索

【有手就行】CX系列-1. CX5036G重启集群后,私有客户端无法访问

2023-12-26 发表
  • 0关注
  • 0收藏 252浏览
粉丝:19人 关注:16人

组网及说明

正常关机流程

1. 先把业务停掉,然后保障没有io下发后,umount所有节点的文件系统;

2. mmshutdown -a关闭所有集群节点,mmgetstate -a 观察所有节点状态是否为down

3. mmgetstate -a确认所有节点服务down了之后就可以shutdown了 。

 

提示“The /lib/modules/5.15.0-86-generic/extra/mmfslinux.ko kernel extension does not exist.”可能原因分为

1)若非正常关机流程,而是机房意外掉电后重启,体现为全部私有客户端或部分客户端无法访问存储;

2)部分客户端升级了kernel小版本,导致部分客户端无法自动重建GPL module

配置步骤

1、进入无法访问的私有客户端,root权限用户,重建GPL module,重启GPFS服务

(base) sdses@client1:~$ su root

密码:

 

root@client1:/home/sdses# /usr/lpp/mmfs/bin/mmbuildgpl      <<重建GPL module

--------------------------------------------------------

mmbuildgpl: Building GPL (5.1.6.1) module begins at 2023 10 07日 星期六 15:37:58 CST.

--------------------------------------------------------

Verifying Kernel Header...

  kernel version = 51500083 (515000083000000, 5.15.0-83-generic, 5.15.0-83)

  module include dir = /lib/modules/5.15.0-83-generic/build/include

  module build dir   = /lib/modules/5.15.0-83-generic/build

  kernel source dir  = /usr/src/linux-5.15.0-83-generic/include

  Found valid kernel header file under /lib/modules/5.15.0-83-generic/build/include

Getting Kernel Cipher mode...

   Will use skcipher routines

Verifying Compiler...

  make is present at /bin/make

  cpp is present at /bin/cpp

  gcc is present at /bin/gcc

  g++ is present at /bin/g++

  ld is present at /bin/ld

make World ...

make InstallImages ...

--------------------------------------------------------

mmbuildgpl: Building GPL module completed successfully at 2023 10 07日 星期六 15:38:16 CST.

--------------------------------------------------------

root@client1:/home/sdses# /usr/lpp/mmfs/bin/mmstartup    <<重新启动GPFS服务

2023 10 07日 星期六 15:38:32 CST: mmstartup: Starting GPFS ...

root@client1:/home/sdses# /usr/lpp/mmfs/bin/mmgetstate  <<获取GPFS服务状态

 

 Node number  Node name  GPFS state

-------------------------------------

           6  client1    active

root@client1:/home/sdses# df -h     <<可见挂载的文件夹

文件系统        容量  已用  可用 已用% 挂载点

udev            504G     0  504G    0% /dev

tmpfs           101G  3.6M  101G    1% /run

/dev/sda2       879G   94G  741G   12% /

tmpfs           504G   85M  504G    1% /dev/shm

tmpfs           5.0M     0  5.0M    0% /run/lock

tmpfs           504G     0  504G    0% /sys/fs/cgroup

/dev/loop0      128K  128K     0  100% /snap/bare/5

/dev/loop2       41M   41M     0  100% /snap/snapd/19993

/dev/loop1       62M   62M     0  100% /snap/core20/1611

/dev/loop3       64M   64M     0  100% /snap/core20/2015

/dev/loop4       74M   74M     0  100% /snap/core22/864

/dev/loop5      350M  350M     0  100% /snap/gnome-3-38-2004/143

/dev/loop6      347M  347M     0  100% /snap/gnome-3-38-2004/115

/dev/loop7       13M   13M     0  100% /snap/snap-store/959

/dev/loop11      92M   92M     0  100% /snap/gtk-common-themes/1535

/dev/loop12      41M   41M     0  100% /snap/snapd/20092

/dev/loop10      55M   55M     0  100% /snap/snap-store/558

/dev/loop9      486M  486M     0  100% /snap/gnome-42-2204/126

/dev/loop8       74M   74M     0  100% /snap/core22/858

/dev/sda1       511M  6.1M  505M    2% /boot/efi

tmpfs           101G   20K  101G    1% /run/user/125

/dev/loop13     497M  497M     0  100% /snap/gnome-42-2204/141

tmpfs           101G   36K  101G    1% /run/user/1000

synthesis01     128T  308G  128T    1% /sdses

tmpfs           101G     0  101G    0% /run/user/0

root@client1:/home/sdses# /usr/lpp/mmfs/bin/mmces service list

Enabled services: NFS

NFS is running

 

按照同样步骤,对其他无法访问的私有客户端执行/usr/lpp/mmfs/bin/mmbuildgpl /usr/lpp/mmfs/bin/mmstartup操作,确保GPFS状态为Active,存储挂载可见。

 

 

2、进入存储控制节点,重启GPFS服务,重建GPL module

Last login: Mon Sep 18 18:24:22 2023

[root@ece1 ~]# mmhealth cluster show node       <<获取集群节点状态

 

Component     Node                      Status            Reasons

------------------------------------------------------------------------------------------

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       ces_ips_all_unassigned

NODE          ***.***        HEALTHY       -

NODE          ***.***         FAILED        gpfs_down,quorum_down,gui_pmsensors_connection_failed

NODE          ***.***     FAILED        nfsd_down,gpfs_down,local_exported_fs_unavail

NODE          ***.***     FAILED        gpfs_down,quorum_down,unmounted_fs_check

NODE          ***.***     FAILED        nfsd_down,gpfs_down,local_exported_fs_unavail

NODE          ***.***     HEALTHY       -

 

[root@ece1 ~]# mmhealth cluster show      <<获取集群状态

 

Component            Total         Failed       Degraded        Healthy          Other

-----------------------------------------------------------------------------------------------------------------

NODE                     9              4              0              5              0

GPFS                     9              4              0              5              0

NETWORK                  9              0              0              9              0

FILESYSTEM               1              0              1              0              0

DISK                    16              0              0             16              0

CES                      2              2              0              0              0

CESIP                    1              1              0              0              0

FILESYSMGR               1              0              0              1              0

GUI                      1              0              1              0              0

NATIVE_RAID              4              0              0              4              0

PERFMON                  5              0              0              5              0

THRESHOLD                5              0              0              5              0

 

[root@ece1 ~]# mmstartup -a       <<重新启动所有节点GPFS服务

Sun Oct  8 15:08:30 CST 2023: mmstartup: Starting GPFS ...

***.***:  The GPFS subsystem is already active.

***.***:  The GPFS subsystem is already active.

***.***:  The GPFS subsystem is already active.

***.***:  The GPFS subsystem is already active.

***.***:  The GPFS subsystem is already active.

***.***:  mmremote: startSubsys: The /lib/modules/5.15.0-86-generic/extra/mmfslinux.ko kernel extension does not exist.  Use mmbuildgpl command to create the needed kernel extension for your kernel or copy the binaries from another node with the identical environment.      <<提示执行mmbuildgpl为内核创建所需的内核扩展,或从具有相同环境的其他节点复制二进制文件。

***.***:  mmremote: startSubsys: Unable to verify kernel/module configuration.

***.***:  mmremote: startSubsys: The /lib/modules/5.15.0-83-generic/extra/mmfslinux.ko kernel extension does not exist.  Use mmbuildgpl command to create the needed kernel extension for your kernel or copy the binaries from another node with the identical environment.

mmdsh: ***.*** remote shell process had return code 1.

***.***:  mmremote: startSubsys: Unable to verify kernel/module configuration.

***.***:  mmremote: startSubsys: The /lib/modules/5.15.0-83-generic/extra/mmfslinux.ko kernel extension does not exist.  Use mmbuildgpl command to create the needed kernel extension for your kernel or copy the binaries from another node with the identical environment.

***.***:  mmremote: startSubsys: Unable to verify kernel/module configuration.

mmdsh: ***.*** remote shell process had return code 1.

mmdsh: ***.*** remote shell process had return code 1.

mmstartup: Command failed. Examine previous error messages to determine cause.

[root@ece1 ~]# mmhealth cluster show

 

Component            Total         Failed       Degraded        Healthy          Other

-----------------------------------------------------------------------------------------------------------------

NODE                     9              4              0              5              0

GPFS                     9              4              0              5              0

NETWORK                  9              0              0              9              0

FILESYSTEM               1              0              1              0              0

DISK                    16              0              0             16              0

CES                      2              2              0              0              0

CESIP                    1              1              0              0              0

FILESYSMGR               1              0              0              1              0

GUI                      1              0              1              0              0

NATIVE_RAID              4              0              0              4              0

PERFMON                  5              0              0              5              0

THRESHOLD                5              0              0              5              0

 

[root@ece1 ~]# mmhealth cluster show

 

Component            Total         Failed       Degraded        Healthy          Other

-----------------------------------------------------------------------------------------------------------------

NODE                     9              4              0              5              0

GPFS                     9              4              0              5              0

NETWORK                  9              0              0              9              0

FILESYSTEM               1              0              1              0              0

DISK                    16              0              0             16              0

CES                      2              2              0              0              0

CESIP                    1              1              0              0              0

FILESYSMGR               1              0              0              1              0

GUI                      1              0              1              0              0

NATIVE_RAID              4              0              0              4              0

PERFMON                  5              0              0              5              0

THRESHOLD                5              0              0              5              0

 

[root@ece1 ~]#

[root@ece1 ~]#

[root@ece1 ~]#

[root@ece1 ~]#

[root@ece1 ~]#

[root@ece1 ~]#

[root@ece1 ~]# mmbuildgpl     <<执行重建动作

--------------------------------------------------------

mmbuildgpl: Building GPL (5.1.6.1) module begins at Sun Oct  8 15:17:09 CST 2023.

--------------------------------------------------------

Verifying Kernel Header...

  kernel version = 41800305 (418000305003001, 4.18.0-305.3.1.el8.x86_64, 4.18.0-305.3.1)

  module include dir = /lib/modules/4.18.0-305.3.1.el8.x86_64/build/include

  module build dir   = /lib/modules/4.18.0-305.3.1.el8.x86_64/build

  kernel source dir  = /usr/src/linux-4.18.0-305.3.1.el8.x86_64/include

  Found valid kernel header file under /usr/src/kernels/4.18.0-305.3.1.el8.x86_64/include

Getting Kernel Cipher mode...

   Will use skcipher routines

Verifying Compiler...

  make is present at /bin/make

  cpp is present at /bin/cpp

  gcc is present at /bin/gcc

  g++ is present at /bin/g++

  ld is present at /bin/ld

Verifying libelf devel package...

  Verifying  elfutils-libelf-devel is installed ...

    Command: /bin/rpm -q  elfutils-libelf-devel

    The required package  elfutils-libelf-devel is installed

Verifying Additional System Headers...

  Verifying kernel-headers is installed ...

    Command: /bin/rpm -q kernel-headers

    The required package kernel-headers is installed

make World ...

make InstallImages ...

--------------------------------------------------------

mmbuildgpl: Building GPL module completed successfully at Sun Oct  8 15:17:30 CST 2023.

--------------------------------------------------------

 

继续确认节点及集群状态,启动过程稍慢,需耐心等待

[root@ece1 ~]# mmhealth cluster show

 

Component            Total         Failed       Degraded        Healthy          Other

-----------------------------------------------------------------------------------------------------------------

NODE                     9              3              0              6              0

GPFS                     9              3              0              6              0

NETWORK                  9              0              0              9              0

FILESYSTEM               1              0              1              0              0

DISK                    16              0              0             16              0

CES                      2              2              0              0              0

CESIP                    1              1              0              0              0

FILESYSMGR               1              0              0              1              0

GUI                      1              0              1              0              0

NATIVE_RAID              4              0              0              4              0

PERFMON                  5              0              0              5              0

THRESHOLD                5              0              0              5              0

 

[root@ece1 ~]# mmhealth cluster show

 

Component            Total         Failed       Degraded        Healthy          Other

-----------------------------------------------------------------------------------------------------------------

NODE                     9              3              0              6              0

GPFS                     9              3              0              6              0

NETWORK                  9              0              0              9              0

FILESYSTEM               1              0              1              0              0

DISK                    16              0              0             16              0

CES                      2              2              0              0              0

CESIP                    1              1              0              0              0

FILESYSMGR               1              0              0              1              0

GUI                      1              0              1              0              0

NATIVE_RAID              4              0              0              4              0

PERFMON                  5              0              0              5              0

THRESHOLD                5              0              0              5              0

 

[root@ece1 ~]# mmhealth cluster show node

 

Component     Node                      Status            Reasons

------------------------------------------------------------------------------------------

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       ces_ips_all_unassigned

NODE          ***.***        HEALTHY       -

NODE          ***.***         HEALTHY       gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE          ***.***     FAILED        nfsd_down,gpfs_down,local_exported_fs_unavail

NODE          ***.***     FAILED        gpfs_down,quorum_down,unmounted_fs_check

NODE          ***.***     FAILED        nfsd_down,gpfs_down,local_exported_fs_unavail

NODE          ***.***     HEALTHY       -

 

 

[root@ece1 ~]# mmgetstate -a

 

 Node number  Node name  GPFS state

-------------------------------------

           1  ece1       active

           2  ece2       active

           3  ece3       active

           4  ece4       active

           5  gui        active

           6  client1    active

           7  client2    down

           8  client3    down

           9  client4    active

[root@ece1 ~]# mmhealth cluster show node

 

Component     Node                      Status            Reasons

------------------------------------------------------------------------------------------

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***         HEALTHY       gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE          ***.***     TIPS          nfs_in_grace,numactl_not_installed

NODE          ***.***     FAILED        gpfs_down,quorum_down,unmounted_fs_check

NODE          ***.***     FAILED        nfsd_down,gpfs_down,local_exported_fs_unavail

NODE          ***.***     HEALTHY       -

 

[root@ece1 ~]# mmgetstate -a

 

 Node number  Node name  GPFS state

-------------------------------------

           1  ece1       active

           2  ece2       active

           3  ece3       active

           4  ece4       active

           5  gui        active

           6  client1    active

           7  client2    active

           8  client3    arbitrating

           9  client4    active

[root@ece1 ~]# mmgetstate -a

 

 Node number  Node name  GPFS state

-------------------------------------

           1  ece1       active

           2  ece2       active

           3  ece3       active

           4  ece4       active

           5  gui        active

           6  client1    active

           7  client2    active

           8  client3    active

           9  client4    active

 

3、确认节点状态,按照TIPS提示进行既定动作

[root@ece1 ~]# mmhealth cluster show node

 

Component     Node                      Status            Reasons

------------------------------------------------------------------------------------------

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***         HEALTHY       gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE          ***.***     TIPS          numactl_not_installed

NODE          ***.***     HEALTHY       -

NODE          ***.***     TIPS          ces_network_ips_down,numactl_not_installed

NODE          ***.***     HEALTHY       -

   -

 

[root@ece1 ~]# mmhealth cluster show

 

Component            Total         Failed       Degraded        Healthy          Other

-----------------------------------------------------------------------------------------------------------------

NODE                     9              0              0              7              2

GPFS                     9              0              0              7              2

NETWORK                  9              0              0              9              0

FILESYSTEM               1              0              0              1              0

DISK                    16              0              0             16              0

CES                      2              0              2              0              0

CESIP                    1              0              0              1              0

FILESYSMGR               1              0              0              1              0

GUI                      1              0              1              0              0

NATIVE_RAID              4              0              0              4              0

PERFMON                  5              0              0              5              0

THRESHOLD                5              0              0              5              0

 

 

[root@ece1 ~]# mmhealth cluster show node

 

Component     Node                      Status            Reasons

------------------------------------------------------------------------------------------

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***         HEALTHY       gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE          ***.***     TIPS          numactl_not_installed

NODE          ***.***     HEALTHY       -

NODE          ***.***     TIPS          numactl_not_installed

NODE          ***.***     HEALTHY       -

 

 

root@client1:/home/sdses# apt-get install numactl

正在读取软件包列表... 完成

正在分析软件包的依赖关系树

正在读取状态信息... 完成

下列软件包是自动安装的并且现在不需要了:

  gir1.2-goa-1.0 nvidia-firmware-535-535.86.05

使用'apt autoremove'来卸载它(它们)

下列【新】软件包将被安装:

  numactl

升级了 0 个软件包,新安装了 1 个软件包,要卸载 0 个软件包,有 14 个软件包未被升级。

需要下载 38.5 kB 的归档。

解压缩后会消耗 150 kB 的额外空间。

获取:1 ***.***/ubuntu focal/main amd64 numactl amd64 2.0.12-1 [38.5 kB]

已下载 38.5 kB,耗时 0 (211 kB/s)

正在选中未选择的软件包 numactl

(正在读取数据库 ... 系统当前共安装有 217894 个文件和目录。)

准备解压 .../numactl_2.0.12-1_amd64.deb  ...

正在解压 numactl (2.0.12-1) ...

正在设置 numactl (2.0.12-1) ...

正在处理用于 man-db (2.9.1-1) 的触发器 ...

root@client1:/home/sdses#

 

[root@ece1 ~]# mmhealth cluster show node

 

Component     Node                      Status            Reasons

------------------------------------------------------------------------------------------

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***        HEALTHY       -

NODE          ***.***         HEALTHY       gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE          ***.***     HEALTHY       -

NODE          ***.***     HEALTHY       -

NODE          ***.***     HEALTHY       -

NODE          ***.***     HEALTHY       -

配置关键点

见配置步骤

该案例对您是否有帮助:

您的评价:1

若您有关于案例的建议,请反馈:

0 个评论

该案例暂时没有网友评论

编辑评论

举报

×

侵犯我的权益 >
对根叔知了社区有害的内容 >
辱骂、歧视、挑衅等(不友善)

侵犯我的权益

×

泄露了我的隐私 >
侵犯了我企业的权益 >
抄袭了我的内容 >
诽谤我 >
辱骂、歧视、挑衅等(不友善)
骚扰我

泄露了我的隐私

×

您好,当您发现根叔知了上有泄漏您隐私的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到pub.zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您认为哪些内容泄露了您的隐私?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)

侵犯了我企业的权益

×

您好,当您发现根叔知了上有关于您企业的造谣与诽谤、商业侵权等内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到 pub.zhiliao@h3c.com 邮箱,我们会在审核后尽快给您答复。
  • 1. 您举报的内容是什么?(请在邮件中列出您举报的内容和链接地址)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
  • 3. 是哪家企业?(营业执照,单位登记证明等证件)
  • 4. 您与该企业的关系是?(您是企业法人或被授权人,需提供企业委托授权书)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

抄袭了我的内容

×

原文链接或出处

诽谤我

×

您好,当您发现根叔知了上有诽谤您的内容时,您可以向根叔知了进行举报。 请您把以下内容通过邮件发送到pub.zhiliao@h3c.com 邮箱,我们会尽快处理。
  • 1. 您举报的内容以及侵犯了您什么权益?(请在邮件中列出您举报的内容、链接地址,并给出简短的说明)
  • 2. 您是谁?(身份证明材料,可以是身份证或护照等证件)
我们认为知名企业应该坦然接受公众讨论,对于答案中不准确的部分,我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

对根叔知了社区有害的内容

×

垃圾广告信息
色情、暴力、血腥等违反法律法规的内容
政治敏感
不规范转载 >
辱骂、歧视、挑衅等(不友善)
骚扰我
诱导投票

不规范转载

×

举报说明

提出建议

    +

亲~登录后才可以操作哦!

确定

亲~检测到您登陆的账号未在http://hclhub.h3c.com进行注册

注册后可访问此模块

跳转hclhub

你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作