知

【有手就行】CX系列-1. CX5036G重启集群后，私有客户端无法访问

2023-12-26 发表

0关注
0收藏 460浏览

杨刚中

杨刚中六段

粉丝：21人关注：16人

组网及说明

正常关机流程

1. 先把业务停掉，然后保障没有io下发后，umount所有节点的文件系统；

2. mmshutdown -a关闭所有集群节点，mmgetstate -a 观察所有节点状态是否为down；

3. mmgetstate -a确认所有节点服务down了之后就可以shutdown了。

提示“The /lib/modules/5.15.0-86-generic/extra/mmfslinux.ko kernel extension does not exist.”可能原因分为

（1）若非正常关机流程，而是机房意外掉电后重启，体现为全部私有客户端或部分客户端无法访问存储；

（2）部分客户端升级了kernel小版本，导致部分客户端无法自动重建GPL module

配置步骤

1、进入无法访问的私有客户端，root权限用户，重建GPL module，重启GPFS服务

(base) sdses@client1:~$ su root

密码：

root@client1:/home/sdses# /usr/lpp/mmfs/bin/mmbuildgpl <<重建GPL module

--------------------------------------------------------

mmbuildgpl: Building GPL (5.1.6.1) module begins at 2023年 10月 07日星期六 15:37:58 CST.

--------------------------------------------------------

Verifying Kernel Header...

kernel version = 51500083 (515000083000000, 5.15.0-83-generic, 5.15.0-83)

module include dir = /lib/modules/5.15.0-83-generic/build/include

module build dir = /lib/modules/5.15.0-83-generic/build

kernel source dir = /usr/src/linux-5.15.0-83-generic/include

Found valid kernel header file under /lib/modules/5.15.0-83-generic/build/include

Getting Kernel Cipher mode...

Will use skcipher routines

Verifying Compiler...

make is present at /bin/make

cpp is present at /bin/cpp

gcc is present at /bin/gcc

g++ is present at /bin/g++

ld is present at /bin/ld

make World ...

make InstallImages ...

--------------------------------------------------------

mmbuildgpl: Building GPL module completed successfully at 2023年 10月 07日星期六 15:38:16 CST.

--------------------------------------------------------

root@client1:/home/sdses# /usr/lpp/mmfs/bin/mmstartup <<重新启动GPFS服务

2023年 10月 07日星期六 15:38:32 CST: mmstartup: Starting GPFS ...

root@client1:/home/sdses# /usr/lpp/mmfs/bin/mmgetstate <<获取GPFS服务状态

Node number Node name GPFS state

-------------------------------------

6 client1 active

root@client1:/home/sdses# df -h <<可见挂载的文件夹

文件系统容量已用可用已用% 挂载点

udev 504G 0 504G 0% /dev

tmpfs 101G 3.6M 101G 1% /run

/dev/sda2 879G 94G 741G 12% /

tmpfs 504G 85M 504G 1% /dev/shm

tmpfs 5.0M 0 5.0M 0% /run/lock

tmpfs 504G 0 504G 0% /sys/fs/cgroup

/dev/loop0 128K 128K 0 100% /snap/bare/5

/dev/loop2 41M 41M 0 100% /snap/snapd/19993

/dev/loop1 62M 62M 0 100% /snap/core20/1611

/dev/loop3 64M 64M 0 100% /snap/core20/2015

/dev/loop4 74M 74M 0 100% /snap/core22/864

/dev/loop5 350M 350M 0 100% /snap/gnome-3-38-2004/143

/dev/loop6 347M 347M 0 100% /snap/gnome-3-38-2004/115

/dev/loop7 13M 13M 0 100% /snap/snap-store/959

/dev/loop11 92M 92M 0 100% /snap/gtk-common-themes/1535

/dev/loop12 41M 41M 0 100% /snap/snapd/20092

/dev/loop10 55M 55M 0 100% /snap/snap-store/558

/dev/loop9 486M 486M 0 100% /snap/gnome-42-2204/126

/dev/loop8 74M 74M 0 100% /snap/core22/858

/dev/sda1 511M 6.1M 505M 2% /boot/efi

tmpfs 101G 20K 101G 1% /run/user/125

/dev/loop13 497M 497M 0 100% /snap/gnome-42-2204/141

tmpfs 101G 36K 101G 1% /run/user/1000

synthesis01 128T 308G 128T 1% /sdses

tmpfs 101G 0 101G 0% /run/user/0

root@client1:/home/sdses# /usr/lpp/mmfs/bin/mmces service list

Enabled services: NFS

NFS is running

按照同样步骤，对其他无法访问的私有客户端执行/usr/lpp/mmfs/bin/mmbuildgpl 、/usr/lpp/mmfs/bin/mmstartup操作，确保GPFS状态为Active，存储挂载可见。

2、进入存储控制节点，重启GPFS服务，重建GPL module

Last login: Mon Sep 18 18:24:22 2023

[root@ece1 ~]# mmhealth cluster show node <<获取集群节点状态

Component Node Status Reasons

------------------------------------------------------------------------------------------

NODE ***.*** HEALTHY -

NODE ***.*** HEALTHY ces_ips_all_unassigned

NODE ***.*** HEALTHY -

NODE ***.*** FAILED gpfs_down,quorum_down,gui_pmsensors_connection_failed

NODE ***.*** FAILED nfsd_down,gpfs_down,local_exported_fs_unavail

NODE ***.*** FAILED gpfs_down,quorum_down,unmounted_fs_check

NODE ***.*** FAILED nfsd_down,gpfs_down,local_exported_fs_unavail

NODE ***.*** HEALTHY -

[root@ece1 ~]# mmhealth cluster show <<获取集群状态

Component Total Failed Degraded Healthy Other

-----------------------------------------------------------------------------------------------------------------

NODE 9 4 0 5 0

GPFS 9 4 0 5 0

NETWORK 9 0 0 9 0

FILESYSTEM 1 0 1 0 0

DISK 16 0 0 16 0

CES 2 2 0 0 0

CESIP 1 1 0 0 0

FILESYSMGR 1 0 0 1 0

GUI 1 0 1 0 0

NATIVE_RAID 4 0 0 4 0

PERFMON 5 0 0 5 0

THRESHOLD 5 0 0 5 0

[root@ece1 ~]# mmstartup -a <<重新启动所有节点GPFS服务

Sun Oct 8 15:08:30 CST 2023: mmstartup: Starting GPFS ...

***.***: The GPFS subsystem is already active.

***.***: mmremote: startSubsys: The /lib/modules/5.15.0-86-generic/extra/mmfslinux.ko kernel extension does not exist. Use mmbuildgpl command to create the needed kernel extension for your kernel or copy the binaries from another node with the identical environment. <<提示执行mmbuildgpl为内核创建所需的内核扩展，或从具有相同环境的其他节点复制二进制文件。

***.***: mmremote: startSubsys: Unable to verify kernel/module configuration.

***.***: mmremote: startSubsys: The /lib/modules/5.15.0-83-generic/extra/mmfslinux.ko kernel extension does not exist. Use mmbuildgpl command to create the needed kernel extension for your kernel or copy the binaries from another node with the identical environment.

mmdsh: ***.*** remote shell process had return code 1.

***.***: mmremote: startSubsys: Unable to verify kernel/module configuration.

mmdsh: ***.*** remote shell process had return code 1.

mmstartup: Command failed. Examine previous error messages to determine cause.

[root@ece1 ~]# mmhealth cluster show

Component Total Failed Degraded Healthy Other

-----------------------------------------------------------------------------------------------------------------

NODE 9 4 0 5 0

GPFS 9 4 0 5 0

NETWORK 9 0 0 9 0

FILESYSTEM 1 0 1 0 0

DISK 16 0 0 16 0

CES 2 2 0 0 0

CESIP 1 1 0 0 0

FILESYSMGR 1 0 0 1 0

GUI 1 0 1 0 0

NATIVE_RAID 4 0 0 4 0

PERFMON 5 0 0 5 0

THRESHOLD 5 0 0 5 0

[root@ece1 ~]# mmhealth cluster show

Component Total Failed Degraded Healthy Other

-----------------------------------------------------------------------------------------------------------------

NODE 9 4 0 5 0

GPFS 9 4 0 5 0

NETWORK 9 0 0 9 0

FILESYSTEM 1 0 1 0 0

DISK 16 0 0 16 0

CES 2 2 0 0 0

CESIP 1 1 0 0 0

FILESYSMGR 1 0 0 1 0

GUI 1 0 1 0 0

NATIVE_RAID 4 0 0 4 0

PERFMON 5 0 0 5 0

THRESHOLD 5 0 0 5 0

[root@ece1 ~]#

[root@ece1 ~]# mmbuildgpl <<执行重建动作

--------------------------------------------------------

mmbuildgpl: Building GPL (5.1.6.1) module begins at Sun Oct 8 15:17:09 CST 2023.

--------------------------------------------------------

Verifying Kernel Header...

kernel version = 41800305 (418000305003001, 4.18.0-305.3.1.el8.x86_64, 4.18.0-305.3.1)

module include dir = /lib/modules/4.18.0-305.3.1.el8.x86_64/build/include

module build dir = /lib/modules/4.18.0-305.3.1.el8.x86_64/build

kernel source dir = /usr/src/linux-4.18.0-305.3.1.el8.x86_64/include

Found valid kernel header file under /usr/src/kernels/4.18.0-305.3.1.el8.x86_64/include

Getting Kernel Cipher mode...

Will use skcipher routines

Verifying Compiler...

make is present at /bin/make

cpp is present at /bin/cpp

gcc is present at /bin/gcc

g++ is present at /bin/g++

ld is present at /bin/ld

Verifying libelf devel package...

Verifying elfutils-libelf-devel is installed ...

Command: /bin/rpm -q elfutils-libelf-devel

The required package elfutils-libelf-devel is installed

Verifying Additional System Headers...

Verifying kernel-headers is installed ...

Command: /bin/rpm -q kernel-headers

The required package kernel-headers is installed

make World ...

make InstallImages ...

--------------------------------------------------------

mmbuildgpl: Building GPL module completed successfully at Sun Oct 8 15:17:30 CST 2023.

--------------------------------------------------------

继续确认节点及集群状态，启动过程稍慢，需耐心等待

[root@ece1 ~]# mmhealth cluster show

Component Total Failed Degraded Healthy Other

-----------------------------------------------------------------------------------------------------------------

NODE 9 3 0 6 0

GPFS 9 3 0 6 0

NETWORK 9 0 0 9 0

FILESYSTEM 1 0 1 0 0

DISK 16 0 0 16 0

CES 2 2 0 0 0

CESIP 1 1 0 0 0

FILESYSMGR 1 0 0 1 0

GUI 1 0 1 0 0

NATIVE_RAID 4 0 0 4 0

PERFMON 5 0 0 5 0

THRESHOLD 5 0 0 5 0

[root@ece1 ~]# mmhealth cluster show

Component Total Failed Degraded Healthy Other

-----------------------------------------------------------------------------------------------------------------

NODE 9 3 0 6 0

GPFS 9 3 0 6 0

NETWORK 9 0 0 9 0

FILESYSTEM 1 0 1 0 0

DISK 16 0 0 16 0

CES 2 2 0 0 0

CESIP 1 1 0 0 0

FILESYSMGR 1 0 0 1 0

GUI 1 0 1 0 0

NATIVE_RAID 4 0 0 4 0

PERFMON 5 0 0 5 0

THRESHOLD 5 0 0 5 0

[root@ece1 ~]# mmhealth cluster show node

Component Node Status Reasons

------------------------------------------------------------------------------------------

NODE ***.*** HEALTHY -

NODE ***.*** HEALTHY ces_ips_all_unassigned

NODE ***.*** HEALTHY -

NODE ***.*** HEALTHY gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE ***.*** FAILED nfsd_down,gpfs_down,local_exported_fs_unavail

NODE ***.*** FAILED gpfs_down,quorum_down,unmounted_fs_check

NODE ***.*** FAILED nfsd_down,gpfs_down,local_exported_fs_unavail

NODE ***.*** HEALTHY -

[root@ece1 ~]# mmgetstate -a

Node number Node name GPFS state

-------------------------------------

1 ece1 active

2 ece2 active

3 ece3 active

4 ece4 active

5 gui active

6 client1 active

7 client2 down

8 client3 down

9 client4 active

[root@ece1 ~]# mmhealth cluster show node

Component Node Status Reasons

------------------------------------------------------------------------------------------

NODE ***.*** HEALTHY -

NODE ***.*** HEALTHY gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE ***.*** TIPS nfs_in_grace,numactl_not_installed

NODE ***.*** FAILED gpfs_down,quorum_down,unmounted_fs_check

NODE ***.*** FAILED nfsd_down,gpfs_down,local_exported_fs_unavail

NODE ***.*** HEALTHY -

[root@ece1 ~]# mmgetstate -a

Node number Node name GPFS state

-------------------------------------

1 ece1 active

2 ece2 active

3 ece3 active

4 ece4 active

5 gui active

6 client1 active

7 client2 active

8 client3 arbitrating

9 client4 active

[root@ece1 ~]# mmgetstate -a

Node number Node name GPFS state

-------------------------------------

1 ece1 active

2 ece2 active

3 ece3 active

4 ece4 active

5 gui active

6 client1 active

7 client2 active

8 client3 active

9 client4 active

3、确认节点状态，按照TIPS提示进行既定动作

[root@ece1 ~]# mmhealth cluster show node

Component Node Status Reasons

------------------------------------------------------------------------------------------

NODE ***.*** HEALTHY -

NODE ***.*** HEALTHY gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE ***.*** TIPS numactl_not_installed

NODE ***.*** HEALTHY -

NODE ***.*** TIPS ces_network_ips_down,numactl_not_installed

NODE ***.*** HEALTHY -

[root@ece1 ~]# mmhealth cluster show

Component Total Failed Degraded Healthy Other

-----------------------------------------------------------------------------------------------------------------

NODE 9 0 0 7 2

GPFS 9 0 0 7 2

NETWORK 9 0 0 9 0

FILESYSTEM 1 0 0 1 0

DISK 16 0 0 16 0

CES 2 0 2 0 0

CESIP 1 0 0 1 0

FILESYSMGR 1 0 0 1 0

GUI 1 0 1 0 0

NATIVE_RAID 4 0 0 4 0

PERFMON 5 0 0 5 0

THRESHOLD 5 0 0 5 0

[root@ece1 ~]# mmhealth cluster show node

Component Node Status Reasons

------------------------------------------------------------------------------------------

NODE ***.*** HEALTHY -

NODE ***.*** HEALTHY gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE ***.*** TIPS numactl_not_installed

NODE ***.*** HEALTHY -

NODE ***.*** TIPS numactl_not_installed

NODE ***.*** HEALTHY -

root@client1:/home/sdses# apt-get install numactl

正在读取软件包列表... 完成

正在分析软件包的依赖关系树

正在读取状态信息... 完成

下列软件包是自动安装的并且现在不需要了：

gir1.2-goa-1.0 nvidia-firmware-535-535.86.05

使用'apt autoremove'来卸载它(它们)。

下列【新】软件包将被安装：

numactl

升级了 0 个软件包，新安装了 1 个软件包，要卸载 0 个软件包，有 14 个软件包未被升级。

需要下载 38.5 kB 的归档。

解压缩后会消耗 150 kB 的额外空间。

获取:1 ***.***/ubuntu focal/main amd64 numactl amd64 2.0.12-1 [38.5 kB]

已下载 38.5 kB，耗时 0秒 (211 kB/s)

正在选中未选择的软件包 numactl。

(正在读取数据库 ... 系统当前共安装有 217894 个文件和目录。)

准备解压 .../numactl_2.0.12-1_amd64.deb ...

正在解压 numactl (2.0.12-1) ...

正在设置 numactl (2.0.12-1) ...

正在处理用于 man-db (2.9.1-1) 的触发器 ...

root@client1:/home/sdses#

[root@ece1 ~]# mmhealth cluster show node

Component Node Status Reasons

------------------------------------------------------------------------------------------

NODE ***.*** HEALTHY -

NODE ***.*** HEALTHY gui_pmsensors_connection_failed,time_not_in_sync,gui_refresh_task_failed

NODE ***.*** HEALTHY -

配置关键点

见配置步骤

该案例对您是否有帮助：

您的评价：1

若您有关于案例的建议，请反馈：

0 个评论

该案例暂时没有网友评论

编辑评论

侵犯我的权益 >

对根叔知了社区有害的内容 >

辱骂、歧视、挑衅等（不友善）

侵犯我的权益

泄露了我的隐私 >

侵犯了我企业的权益 >

抄袭了我的内容 >

诽谤我 >

辱骂、歧视、挑衅等（不友善）

骚扰我

泄露了我的隐私

您好，当您发现根叔知了上有泄漏您隐私的内容时，您可以向根叔知了进行举报。请您把以下内容通过邮件发送到pub.zhiliao@h3c.com 邮箱，我们会尽快处理。

1. 您认为哪些内容泄露了您的隐私？（请在邮件中列出您举报的内容、链接地址，并给出简短的说明）
2. 您是谁？（身份证明材料，可以是身份证或护照等证件）

侵犯了我企业的权益

您好，当您发现根叔知了上有关于您企业的造谣与诽谤、商业侵权等内容时，您可以向根叔知了进行举报。请您把以下内容通过邮件发送到 pub.zhiliao@h3c.com 邮箱，我们会在审核后尽快给您答复。

1. 您举报的内容是什么？（请在邮件中列出您举报的内容和链接地址）
2. 您是谁？（身份证明材料，可以是身份证或护照等证件）
3. 是哪家企业？（营业执照，单位登记证明等证件）
4. 您与该企业的关系是？（您是企业法人或被授权人，需提供企业委托授权书）

我们认为知名企业应该坦然接受公众讨论，对于答案中不准确的部分，我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

抄袭了我的内容

原文链接或出处

诽谤我

您好，当您发现根叔知了上有诽谤您的内容时，您可以向根叔知了进行举报。请您把以下内容通过邮件发送到pub.zhiliao@h3c.com 邮箱，我们会尽快处理。

1. 您举报的内容以及侵犯了您什么权益？（请在邮件中列出您举报的内容、链接地址，并给出简短的说明）
2. 您是谁？（身份证明材料，可以是身份证或护照等证件）

我们认为知名企业应该坦然接受公众讨论，对于答案中不准确的部分，我们欢迎您以正式或非正式身份在根叔知了上进行澄清。

对根叔知了社区有害的内容

垃圾广告信息

色情、暴力、血腥等违反法律法规的内容

政治敏感

不规范转载 >

辱骂、歧视、挑衅等（不友善）

骚扰我

诱导投票

不规范转载

举报说明

✖

案例意见反馈

➤

网站相关: 关于我们; 服务条款; 帮助中心; 经验与权限; 积分规则

联系我们: 联系我们; 建议反馈

常用链接: 标杆的神器下载

关注我们: H3C官网; 新华三服务公众号; 安仔远程运维服务; 新华三商城

内容许可: 除特别说明外，用户内容均可采用知识共享署名-相同方式共享3.0中国大陆许可协议进行许可

本图标版权归新华三集团所有，仅限本社区使用，切勿用做商业目的，违者必究

浙ICP备09064986号-1 浙公网安备 33010802004416号

✖

亲~登录后才可以操作哦!

确定

✖

亲~检测到您登陆的账号未在http://hclhub.h3c.com进行注册

注册后可访问此模块

跳转hclhub

✖

你的邮箱还未认证，请认证邮箱或绑定手机后进行当前操作

✖

产品线		搜索取消
案例类型
发布者
是否解决
是否官方
时间
搜索引擎
匹配模式	默认策略匹配全词匹配整句

【有手就行】CX系列-1. CX5036G重启集群后，私有客户端无法访问

组网及说明

配置步骤

配置关键点

编辑评论

提出建议