Ceph FileSystem - MDS Management

作者: Anoyi

2019-08-21 18:16

Ceph FileSystem 状态

ceph fs status
labfs - 11 clients
=====
+------+--------+--------+---------------+-------+-------+
| Rank | State  |  MDS   |    Activity   |  dns  |  inos |
+------+--------+--------+---------------+-------+-------+
|  0   | active | data-2 | Reqs:   11 /s | 1899k | 1899k |
|  1   | active | data-3 | Reqs:   18 /s |  810k |  810k |
|  2   | active | data-1 | Reqs:   35 /s | 1283k | 1276k |
|  3   | active | data-4 | Reqs:   10 /s | 1725k | 1720k |
+------+--------+--------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 2660M | 39.7T |
|   cephfs_data   |   data   |  794G | 39.7T |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
+-------------+

配置 mds_cache_memory_limit

查看 mds_cache_memory_limit

$ ceph daemon mds.data-1 config show | grep mds_cache
———————————————— 
    "mds_cache_memory_limit": "1073741824",
    "mds_cache_mid": "0.700000",
    "mds_cache_reservation": "0.050000",
    "mds_cache_size": "0",
    "mds_cache_trim_decay_rate": "1.000000",
    "mds_cache_trim_threshold": "65536",

默认 1G,若要配置为 5G,可以修改文件 /etc/ceph/ceph.conf ,添加

[mds]
mds cache memory limit = 5368709120

Tips:不合理的 cache 设置,会出现问题: 1 MDSs report oversized cache


配置 MDS 多主

# fsmap e5: 1/1/1 up {0=a=up:active}, 2 up:standby

ceph fs set <fs_name> max_mds 2

# fsmap e8: 2/2/2 up {0=a=up:active,1=c=up:creating}, 1 up:standby
# fsmap e9: 2/2/2 up {0=a=up:active,1=c=up:active}, 1 up:standby

不启用 Standby MDS

ceph fs set <fs> standby_count_wanted 0 

Tips:即使多主 MDS,如果其中一个 MDS 出现故障,仍然需要备用 MDS来接管。因此,对于高可用性系统,实际配置 max_mds 时,最好比系统中 MDS 的总数少一个。


执行 ceph fs status 报错

[root@data-1 lab]# ceph fs status
Error EINVAL: Traceback (most recent call last):
  File "/usr/share/ceph/mgr/mgr_module.py", line 889, in _handle_command
    return self.handle_command(inbuf, cmd)
  File "/usr/share/ceph/mgr/status/module.py", line 251, in handle_command
    return self.handle_fs_status(cmd)
  File "/usr/share/ceph/mgr/status/module.py", line 111, in handle_fs_status

    mds_versions[metadata.get('ceph_version', "unknown")].append(info['name'])
AttributeError: 'NoneType' object has no attribute 'get'

解决方法

查看 MDS 元数据信息 ceph mds metadata,发现有信息不全的节点,在该节点重启 MDS 服务即可恢复

[
    {
        "name": "data-5"
    },
    {
        "name": "data-4",
        "addr": "[v2:10.0.5.14:6800/1944047450,v1:10.0.5.14:6801/1944047450]",
        "arch": "x86_64",
        "ceph_release": "nautilus",
        "ceph_version": "ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable)",
        "ceph_version_short": "14.2.2",
        "cpu": "Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz",
        "distro": "centos",
        "distro_description": "CentOS Linux 7 (Core)",
        "distro_version": "7",
        "hostname": "data-4",
        "kernel_description": "#1 SMP Tue Jun 18 16:35:19 UTC 2019",
        "kernel_version": "3.10.0-957.21.3.el7.x86_64",
        "mem_swap_kb": "0",
        "mem_total_kb": "16412812",
        "os": "Linux"
    }
]

重启 MDS 服务

systemctl restart ceph-mds@data-1.service

如果重启失败,查看失败原因

systemctl status ceph-mds@data-1.service

如果失败原因是 START REQUEST REPEATED TOO QUICKLY,可以修改文件 /etc/systemd/system/ceph-mds.target.wants/ceph-mds@data-1.service,注释掉 StartLimitInterval

# StartLimitInterval=30min

修改之后,需要执行 systemctl daemon-reload 使其生效!

评论

评论

昵称
邮箱