测试环境中clockdiff 命令执行经常出现 ”is down“,严重影响OceanBase安装和运行。
报错如下:
[root@rocky95 ~]# clockdiff -o 192.168.169.41
...........................clockdiff: 192.168.169.41 is down
clockdiff可以测量两个主机之间系统时间的差异
命令选项:
默认使用 ICMP 时间戳报文
-o:使用ICMP ECHO 的 IP 四跳时间戳(需目标主机支持)。
-o1:使用三跳IP时间戳(对某些系统更有效,如旧版本Solaris)。
列说明:
host 192.168.1.1 目标主机的ip地址,测量本机与ip地址为192.168.1.1的机器之间的系统时差
rtt 750(187)ms/0ms 平均往返时延(多次往返时延的标准差)/最小的往返时延
delta 1ms/1ms delta=目标主机系统时间−本机系统时间,两种测量方式计算出的系统时间差 (ms)
clockdiff命令
正常返回情况如下:
[root@rocky95 ~]# clockdiff -o 10.165.7.181
..................................................
host=10.165.7.181 rtt=50(4)ms/43ms delta=1ms/0ms Thu Nov 20 09:57:23 2025[root@rocky95 ~]#
[root@rocky95 ~]# clockdiff -o 127.0.0.1
..................................................
host=127.0.0.1 rtt=0(0)ms/0ms delta=0ms/0ms Thu Nov 20 10:20:41 2025
因搭建的NTP服务器,测试经常出现clockdiff check failed报错,检查为clockdiff 如下报错,严重影响OB Server部署和日常运维操作。
[root@rocky95 ~]# clockdiff -o 192.168.169.41
...........................clockdiff: 192.168.169.41 is down
偶然返回如下结果
[root@rocky95 ~]# clockdiff -o 192.168.169.41
...................................................
host=192.168.169.41 rtt=0(0)ms/0ms delta=0ms/0ms Thu Nov 20 10:07:32 2025
[root@rocky95 ~]# nc -uvz 192.168.169.41 123
Ncat: Version 7.92 ( ***.***/ncat )
Ncat: Connected to 192.168.169.41:123.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.03 seconds.
[root@rocky95 ~]# nc -uvz 10.165.7.181 123
Ncat: Version 7.92 ( ***.***/ncat )
Ncat: Connected to 10.165.7.181:123.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.
[root@rocky95 ~]# tcpdump -i any -nn -vv udp port 123
[root@rocky95 ~]# chronyc tracking
Reference ID : C0A8A929 (192.168.169.41)
Stratum : 6
Ref time (UTC) : Thu Nov 20 08:06:52 2025
System time : 0.000007659 seconds fast of NTP time
Last offset : +0.000008557 seconds
RMS offset : 0.000027306 seconds
Frequency : 8.509 ppm fast
Residual freq : +0.000 ppm
Skew : 0.028 ppm
Root delay : 0.070013240 seconds
Root dispersion : 0.003589844 seconds
Update interval : 517.5 seconds
Leap status : Normal
[root@rocky95 ~]# chronyc sources
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 192.168.169.41 5 9 377 206 +25us[ +34us] +/- 39ms
[root@rocky95 ~]# ping -T tsandaddr 192.168.169.41 -c 2
PING 192.168.169.41 (192.168.169.41) 56(124) bytes of data.
64 bytes from 192.168.169.41: icmp_seq=1 ttl=64 time=0.382 ms
TS: 192.168.169.53 29439298 absolute <==绝对时间戳:表示从系统启动(或某个固定时间点)到生成数据包时的累计时间。转换为小时:29439298 ms ÷ 1000 ÷ 3600 ≈ 8.17 小时。
192.168.169.41 0 <== 这个值代表node之间的时间差,单位是ms
192.168.169.41 0
192.168.169.53 0
64 bytes from 192.168.169.41: icmp_seq=2 ttl=64 time=0.390 ms
TS: 192.168.169.53 29440346 absolute
192.168.169.41 1
192.168.169.41 0
192.168.169.53 0
--- 192.168.169.41 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1048ms
rtt min/avg/max/mdev = 0.382/0.386/0.390/0.004 ms
--- 192.168.169.41 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1026ms
rtt min/avg/max/mdev = 0.356/0.459/0.563/0.103 ms
[root@rocky95 ~]# ping -T tsandaddr 10.165.7.181 -c 2
PING 10.165.7.181 (10.165.7.181) 56(124) bytes of data.
64 bytes from 10.165.7.181: icmp_seq=1 ttl=58 time=45.9 ms
TS: 192.168.169.53 9625625 absolute
192.168.169.254 26769471
10.12.172.1 -1842441
10.12.191.2 -24927031
Unrecorded hops: 11
64 bytes from 10.165.7.181: icmp_seq=2 ttl=58 time=45.8 ms
TS: 192.168.169.53 9626626 absolute
192.168.169.254 26769471
10.12.172.1 -1842441
10.12.191.2 -24927030
Unrecorded hops: 11
--- 10.165.7.181 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 45.833/45.859/45.885/0.026 ms
命令语法: strace clockdiff 10.165.7.181
strace clockdiff 10.165.7.181
[root@rocky95 ~]# strace clockdiff 10.165.7.181
execve("/usr/bin/clockdiff", ["clockdiff", "10.165.7.181"], 0x7fff9703c998 /* 42 vars */) = 0
…………………………
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
sendto(3, "\r\0:\356\6\371\v\0\0\231\245\177\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.165.7.181")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=1, tv_nsec=0}, NULL, 8) = 0 (Timeout)
write(2, "clockdiff: ", 11clockdiff: ) = 11
write(2, "10.165.7.181 is down", 2010.165.7.181 is down) = 20
write(2, "\n", 1
) = 1
close(1) = 0
close(2) = 0
exit_group(1) = ?
+++ exited with 1 +++
strace clockdiff 192.168.169.41
[root@rocky95 ~]# strace clockdiff 192.168.169.41
execve("/usr/bin/clockdiff", ["clockdiff", "192.168.169.41"], 0x7fff35277da8 /* 42 vars */) = 0
…………………………
sendto(3, "\r\0\350r;\3712\0\0\232\233\371\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("192.168.169.41")}, 16) = 20
ppoll([{fd=3, events=POLLIN|POLLHUP}], 1, {tv_sec=0, tv_nsec=0}, NULL, 8) = 1 ([{fd=3, revents=POLLIN}], left {tv_sec=0, tv_nsec=0})
recvfrom(3, "E\0\0(\203\365\0\0@\1#0\300\250\251)\300\250\2515\16\0\256K;\3712\0\0\232\233\371"..., 1024, 0, NULL, 0x7ffe6c68f1c8) = 40
write(1, ".", 1.) = 1
openat(AT_FDCWD, "/etc/localtime", O_RDONLY|O_CLOEXEC) = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=561, ...}) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=561, ...}) = 0
read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 561
lseek(4, -342, SEEK_CUR) = 219
read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 342
close(4) = 0
write(1, "\n", 1
) = 1
write(1, "host=192.168.169.41 rtt=0(0)ms/0"..., 73host=192.168.169.41 rtt=0(0)ms/0ms delta=0ms/0ms Thu Nov 20 10:48:52 2025) = 73
close(1) = 0
close(2) = 0
exit_group(0) = ?
+++ exited with 0 +++
[root@obocp4 ~]# sysctl -a |grep icmp
net.ipv4.icmp_echo_enable_probe = 0
net.ipv4.icmp_echo_ignore_all = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_errors_use_inbound_ifaddr = 0
net.ipv4.icmp_ignore_bogus_error_respOnses= 1
net.ipv4.icmp_msgs_burst = 50
net.ipv4.icmp_msgs_per_sec = 1000
net.ipv4.icmp_ratelimit = 1000
net.ipv4.icmp_ratemask = 6168
net.ipv6.icmp.echo_ignore_all = 0
net.ipv6.icmp.echo_ignore_anycast = 0
net.ipv6.icmp.echo_ignore_multicast = 0
net.ipv6.icmp.ratelimit = 1000
net.ipv6.icmp.ratemask = 0-1,3-127
修改参数后,测试无效;
sysctl原配置:
[root@obocp4 ~]# sysctl -a |grep net.ipv4.icmp_msgs_burst net.ipv4.icmp_msgs_burst = 50 [root@obocp4 ~]# sysctl -a |grep net.ipv4.icmp_msgs_per_sec net.ipv4.icmp_msgs_per_sec = 1000修改sysctl并重启:
net.ipv4.icmp_msgs_burst=200 net.ipv4.icmp_msgs_per_sec=10000
作用介绍:
1. 突发流量处理:
○ 若瞬间收到 200 条 ICMP 请求,所有请求会被立即处理(令牌桶初始有 200 个令牌)。
○ 超出 200 条后的请求将被丢弃,直到令牌按每秒 10,000 个的速率补充。
2.持续流量控制:
• 令牌补充速率为每秒 10,000 个,即长期平均处理速率为 10,000 条/秒。
• 若持续流量超过 10,000 条/秒,超出部分会被丢弃。
[root@rocky95 ~]# tcpdump -i any -vvv -w clockdi555.pcap
使用wireshark分析抓包文件,可见最后的十几包出现堵塞,最后timeout。

clockdiff程序bug问题,2023年4月解决的。
Bug简述:局域网延时低,则clockdiff自动调整轮询间隔1ms内;但系统来不及回包,就出现超时现象。
Rocky95 自带clockdiff版本为20210202,外网下载高版本替换测试正常。
[root@rocky95 soft]# ./clockdiff -V
clockdiff from iputils 20240117
libcap: yes, IDN: yes, NLS: no, error.h: yes, getrandom(): yes, __fpending(): yes
BUG情况见:
[Clockdiff host is down #326](https://github.com/iputils/iputils/issues/326)
[clockdiff: xx.xx.xx.xx is down](***.***/knowledge-base/oceanbase-database-1000000000207674)
[Wireshark TS | Linux 系统对时问题](https://blog.csdn.net/weixin_47627078/article/details/136270996)
总结该文:思科区域核心网关交换机疑似不识别 IPv4 Options 里的 TimeStamp 字段,从而造成丢包。
[Clockdiff host is down #326](https://github.com/iputils/iputils/issues/326)
该案例暂时没有网友评论
✖
案例意见反馈
亲~登录后才可以操作哦!
确定你的邮箱还未认证,请认证邮箱或绑定手机后进行当前操作