Ceph Series (Chapter 0): Deploying the ceph cluster by cephadm

What is Ceph?

At first, the phonetic transcription of ‘Ceph’ is /sɛf/. The name ‘Ceph’ comes from ‘cephalopod,’ which refers to marine animals such as octopuses and squids.

Similarly, Ceph, as an open-source distributed storage system, has garnered widespread attention and application due to its high scalability, reliability, and performance.

Ceph supports multiple storage interfaces such as object storage, block storage, and file system storage, meeting the storage needs in various business scenarios.

This article will provide a detailed guide on how to deploy a Ceph distributed storage cluster from scratch using containerization on the Rocky 9.5 operating system. Through this guide, you will be able to master the installation, configuration, and management of Ceph.

Replacing ceph-ansible, through containerization, cephadm provides a standardized approach to operate Ceph clusters, effectively reducing operational complexity.

Deployment Plan Table

Most readers are encountering Ceph for the first time. To make it clearer and more intuitive, I have created the following deployment plan table to help you deploy the ceph cluster in VMware virtual environment.

Number	Operating System	Ceph Version	Role	IP	Configuration	Hostname
001	Rocky9.5(x86_64)	squid (latest 19.2.0)	bootstrap，mon，mgr，osd	172.16.173.129	core(s):4, memeory:4G, disk: 500G*4	ceph001.haoyang.cn
002	Rocky9.5(x86_64)	squid (latest 19.2.0)	mon，mgr，osd	172.16.173.130	core(s):4, memeory:4G, disk: 500G*4	ceph002.haoyang.cn
003	Rocky9.5(x86_64)	squid (latest 19.2.0)	mon，mgr，osd	172.16.173.131	core(s):4, memeory:4G, disk: 500G*4	ceph003.haoyang.cn

Because you will deploy the ceph cluster in VMware, you need to download the x86_64 structure iso image from Rocky Linux Official Website to install the Rocky9.5 operating system as the base. I used the minimal version to install rocky, so you could not copy/paste from your desktop to rocky. You’d better use ssh command to login in Rocky Linux remotely.
More information about the Ceph Release Version, please visit the Official Website.
More detailed information about the Role, for example: “What’s the meaning of mon/mgr/osd?” or “What do these words stand for?” .etc, please visit my another blog: Ceph Series (Chapter 1):Introducing Red Hat Ceph Storage Architecture.
From the aspect of IP address, I just set the network adaptor to NAT mode and I used the default subnet IP as well as the default generated IP address here. It doesn’t matter if you’d like to modify the subnet IP as what you want, and you may get another random IP address finally. As long as these three virtual machines can communicate with each other, that’s enough.
According to the performance of your hardware, I recommend you to set 4C/4G/500G*4 here. Don’t worry about disk space issues because of the Thin Provisioning Mechanism. When creating virtual disks, only the space for the actual data used is allocated. The total capacity declared by the virtual disk is just a logical value, and the actual storage space is dynamically allocated as the data grows.
As for the hostname, it’s fine as long as it’s simple and easy to understand. You can name it however you like. I used my Chinese name to represent the hostname here.

Prerequisites

Please note that, unless specified，otherwise, the following prerequisites must be completed on all nodes.

Setting the specified hostname

Set an appropriate hostname on each node for resolution.

Take the first node, 001, as an example here.

1	hostnamectl hostname ceph001.haoyang.cn

Setting the resolution between the cluster

cat > /etc/hosts <<-'EOF'
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.16.173.129 ceph001.haoyang.cn ceph001
172.16.173.130 ceph002.haoyang.cn ceph002
172.16.173.131 ceph003.haoyang.cn ceph003
EOF

Configuring the dnf software repository

I provided two configurations here.

The repository provided by Ceph’s official site.

cat > /etc/yum.repos.d/ceph.repo <<-'EOF'
[ceph]
name=Ceph packages for x86_64
baseurl=https://download.ceph.com/rpm-squid/el9/x86_64
enabled=1
priority=2
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.asc

[ceph-noarch]
name=Ceph noarch packages
baseurl=https://download.ceph.com/rpm-squid/el9/noarch
enabled=1
priority=2
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=https://download.ceph.com/rpm-squid/el9/SRPMS
enabled=0
priority=2
gpgcheck=1
gpgkey=https://download.ceph.com/keys/release.asc
EOF

Using Nanjing University for repository acceleration.

cat > /etc/yum.repos.d/ceph.repo <<-'EOF'
[ceph]
name=Ceph packages for x86_64
baseurl=https://mirrors.nju.edu.cn/ceph/rpm-squid/el9/x86_64
enabled=1
priority=2
gpgcheck=1
gpgkey=https://mirrors.nju.edu.cn/ceph/keys/release.asc

[ceph-noarch]
name=Ceph noarch packages
baseurl=https://mirrors.nju.edu.cn/ceph/rpm-squid/el9/noarch
enabled=1
priority=2
gpgcheck=1
gpgkey=https://mirrors.nju.edu.cn/ceph/keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=https://mirrors.nju.edu.cn/ceph/rpm-squid/el9/SRPMS
enabled=0
priority=2
gpgcheck=1
gpgkey=https://mirrors.nju.edu.cn/ceph/keys/release.asc
EOF

Updating and generating the cache for the dnf package manager

1	dnf makecache

Installing the necessary software packages

Python 3
Systemd
Podman
Chrony
LVM2

1	dnf install podman chrony lvm2 systemd python3 bash-completion wget curl epel-release -y

It may update critical components like systemd, so please restart the server after the installation.

reboot

Enabling NTP synchronization

By default, it syncs from public network sources, but you can specify your own time source. Here, I use ntp.aliyun.com.

Edit the configuration file and add the following line pool ntp.aliyun.com iburst at the very beginning.

vi /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (https://www.pool.ntp.org/join.html).
pool ntp.aliyun.com iburst
pool 2.rocky.pool.ntp.org iburst
......

After editing the /etc/chrony.conf file, please set the chronyd.service to start automatically at boot and take effect immediately.

1 2	systemctl enable chronyd --now systemctl restart chronyd

Installing cephadm

Installing the cephadm tool is sufficient, but I also install the ceph-common package to execute various Ceph commands like ceph and rados directly on the host. Since Ceph is deployed in a containerized manner with cephadm, these commands are not available on the host by default. By installing ceph-common, you can avoid logging into the container each time, making it more efficient.

1	dnf install cephadm ceph-common -y

Up to this point, all prerequisites have been completed.

Deploying a new ceph cluster

Cephadm bootstrap is the first step in initializing a Ceph cluster. It creates a small initial Ceph cluster by bootstrapping, which includes a monitor (mon) and a manager (mgr). This is the foundational step for the entire Ceph cluster deployment and management process.

The cephadm bootstrap command will perform the following actions:

Create a monitor (mon) and a manager (mgr) daemon on the local host for the new cluster.
Generate a new SSH key for the Ceph cluster and add it to the root user’s /root/.ssh/authorized_keys file.
Write a copy of the public key to the /etc/ceph/ceph.pub file.
Write a minimal configuration file to /etc/ceph/ceph.conf, which is used for communication with the Ceph daemons.
Write a copy of the client.admin administrator (privileged) key to the /etc/ceph/ceph.client.admin.keyring file.
Add the _admin label to the bootstrap host. By default, any host with this label will also receive copies of the /etc/ceph/ceph.conf and /etc/ceph/ceph.client.admin.keyring files.

If the hostname is Fully Qualified Domain Name(FQDN), you need to add the specific parameter: --allow-fqdn-hostname.

P.S. FQDN format could be like that, for example: host.example.com.

host name: host

domain name: example.com

root domain: . (Omitted in daily use)

If you are doing a single-node deployment, you need to add the specific parameter: --single-host-defaults.

1	cephadm bootstrap --mon-ip 172.16.173.129 --single-host-defaults --initial-dashboard-user admin --initial-dashboard-password Sunhaoyang --dashboard-password-noupdate --allow-fqdn-hostname

This deployment uses a multi-node setup.

1	cephadm bootstrap --mon-ip 172.16.173.129 --initial-dashboard-user admin --initial-dashboard-password Sunhaoyang --dashboard-password-noupdate --allow-fqdn-hostname

More than these parameters, all usage of all parameters could be checked by using cephadm bootstrap --help.

--mon-ip specifies the IP address of the monitor (mon) that will be created during the bootstrap process. This is the IP address of the host where the initial MON daemon will run.
--initial-dashboard-user sets the username for the Ceph dashboard’s initial administrative user. In this case, the username will be admin.
--initial-dashboard-password specifies the password for the initial administrative user of the Ceph dashboard. The password will be set to Sunhaoyang.
--dashboard-password-noupdate prevents the Ceph cluster from automatically updating the dashboard password after the bootstrap process. This ensures the password specified in --initial-dashboard-password remains unchanged.
--allow-fqdn-hostname allows the use of a Fully Qualified Domain Name (FQDN) as the hostname for the Ceph cluster’s initial node. This is useful when the hostname includes domain information, such as ceph.example.com.

Finally, the installation console output looks like this.

Ceph Dashboard is now available at:

             URL: https://ceph001.haoyang.cn:8443/
            User: admin
        Password: Sunhaoyang
...
Bootstrap complete.

You could not use the promption URL to visit the front-end ceph website, because you did not set the resolution in your local /etc/hosts file. While, you could use IP address + port to visit it directly, such as: https://172.16.173.129:8443/.

When you are opening the website, you may be reminded that the website is unsafe. Now, you need to agree with it by clicking advanced button and then clicking continue button.

ceph_dashboard_yellow

As shown in the image, the page has already prompted us to expand the cluster, and there is a yellow warning next to the dashboard icon in the top left corner. Next, let’s add some disks to the cluster.

Deploying OSD resources

In Ceph, OSD (Object Storage Daemon) is one of the essential components of the storage cluster. Its main responsibilities include storing data, handling data replication, recovery, backfilling, and rebalancing operations.

Key Concepts of Ceph OSD:

Data Storage:

OSDs are responsible for storing data objects. Each OSD usually corresponds to a physical storage device, such as a hard drive or SSD.
Data Replication:

To ensure high availability and durability, OSDs replicate data among themselves. Ceph uses the CRUSH algorithm to determine the placement of data.
Data Recovery:

When an OSD fails or goes offline, the cluster automatically recovers data from other OSDs and replicates it to new OSDs.
Backfilling and Rebalancing:

Backfilling refers to redistributing data after an OSD is restored or new OSDs are added to ensure data is evenly distributed.

Rebalancing ensures load balancing across OSDs to prevent overloading certain OSDs.
Monitoring and Management:

OSDs use a heartbeat mechanism to report their status to the Ceph cluster, ensuring cluster health and consistency.

Having understood Ceph OSDs, let’s proceed to add some OSDs to the cluster to complete the expansion of the Ceph cluster.

Based on the Deployment Plan Table, you have already added four disks in total for each node.

Excluding the partation used for installing the operating system, there are three remaining disks: nvme0n2, nvme0n3 and nvme0n4.

[root@ceph001 ~]# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sr0          11:0    1  1024M  0 rom
nvme0n1     259:0    0   500G  0 disk
├─nvme0n1p1 259:1    0   600M  0 part /boot/efi
├─nvme0n1p2 259:2    0     1G  0 part /boot
└─nvme0n1p3 259:3    0 498.4G  0 part
  ├─rl-root 253:0    0 494.5G  0 lvm  /var/lib/containers/storage/overlay
  │                                   /
  └─rl-swap 253:1    0   3.9G  0 lvm  [SWAP]
nvme0n2     259:4    0   500G  0 disk
nvme0n3     259:5    0   500G  0 disk
nvme0n4     259:6    0   500G  0 disk

Then, you could use these three remaining disks and add OSD daemon into the cluster.

[root@ceph001 ~]# ceph orch daemon add osd ceph001.haoyang.cn:/dev/nvme0n2
Created osd(s) 0 on host 'ceph001.haoyang.cn'
[root@ceph001 ~]# ceph orch daemon add osd ceph001.haoyang.cn:/dev/nvme0n3
Created osd(s) 1 on host 'ceph001.haoyang.cn'
[root@ceph001 ~]# ceph orch daemon add osd ceph001.haoyang.cn:/dev/nvme0n4
Created osd(s) 2 on host 'ceph001.haoyang.cn'

Let’s check the list of OSD now.

[root@ceph001 ~]# ceph osd crush tree
ID  CLASS  WEIGHT   TYPE NAME
-1         1.46489  root default
-3         1.46489      host ceph001
 0    ssd  0.48830          osd.0
 1    ssd  0.48830          osd.1
 2    ssd  0.48830          osd.2

If we manually add all the disks on each node one by one, it would be too tedious. Fortunately, we can use the parameter --all-available-devices to automatically detect and utilize all available storage devices in the system as OSDs. This simplifies the process of adding OSDs, eliminating the need to specify each device manually.

1 2	[root@ceph001 ~]# ceph orch apply osd --all-available-devices Scheduled osd.all-available-devices update...

Add new hosts to the cluster

The New host must meet all the Prerequisites of this article before it can be added to the cluster.

Distribute the cluster’s SSH key to the authorized_keys file of the root user on all hosts to enable passwordless operations.

[root@ceph001 ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph001.haoyang.cn
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

[root@ceph001 ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph002.haoyang.cn
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

[root@ceph001 ~]# ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph003.haoyang.cn
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes

Lets’ check the current status of cluster’s host list.

[root@ceph001 ~]# ceph orch host ls --detail
HOST                ADDR         LABELS  STATUS  VENDOR/MODEL               CPU    RAM    HDD  SSD      NIC
ceph001.haoyang.cn  172.16.173.129  _admin          VMware, Inc. (VMware20,1)  4C/4T  4 GiB  -    4/2.1TB  1
1 hosts in cluster

When adding a host to a Ceph cluster, it is typically necessary to specify both the hostname and the IP address. This is because:

Hostname: Ceph uses hostnames to identify nodes in the cluster. These hostnames must be unique and resolvable throughout the cluster (usually configured via /etc/hosts or DNS).
IP Address: The IP address is crucial for communication between Ceph nodes. Specifying the IP address ensures that Ceph knows how to communicate with the host, especially in environments with multiple network interfaces or complex network configurations.

After adding a host, the new host will automatically trigger the download of container images and the startup of containers, which might take some time to be ready. Additionally, since we previously configured automatic OSD addition, the disks on the new host will be automatically added to the cluster.

[root@ceph001 ~]# ceph orch host add ceph002.haoyang.cn 172.16.173.130
Added host 'ceph002.haoyang.cn' with addr '172.16.173.130'

[root@ceph001 ~]# ceph orch host add ceph003.haoyang.cn 172.16.173.131
Added host 'ceph003.haoyang.cn' with addr '172.16.173.131'


[root@ceph001 ~]# ceph orch host ls --detail
HOST                ADDR            LABELS  STATUS  VENDOR/MODEL                      CPU  RAM    HDD  SSD      NIC  
ceph001.haoyang.cn  172.16.173.129  _admin          VMware, Inc. VMware (VMware20,1)  N/A  4 GiB  -    4/2.1TB  1    
ceph002.haoyang.cn  172.16.173.130  _admin          VMware, Inc. VMware (VMware20,1)  N/A  4 GiB  -    4/2.1TB  1    
ceph003.haoyang.cn  172.16.173.131                  VMware, Inc. VMware (VMware20,1)  N/A  4 GiB  -    4/2.1TB  1    
3 hosts in cluster

The container image download and container startup in the new host will take some time. You can use the command to check if all services are running normally.

If everything is normal, all services will be in the “running” state.

[root@ceph001 ~]# ceph orch ps
NAME                   HOST                PORTS             STATUS             REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID  
alertmanager.ceph001   ceph001.haoyang.cn  *:9093,9094       running (-11706s)     9m ago  22h    19.4M        -  0.25.0   4d8d4d8334be  d7d85f0d5f90  
ceph-exporter.ceph001  ceph001.haoyang.cn                    running (-11706s)     9m ago  22h    18.7M        -  19.2.0   fd3234b9d664  41365c9457ee  
ceph-exporter.ceph002  ceph002.haoyang.cn                    running (-11708s)     7m ago  22h    5553k        -  19.2.0   fd3234b9d664  37efd9d7f4e1  
ceph-exporter.ceph003  ceph003.haoyang.cn                    running (-11709s)     7m ago  22h    5666k        -  19.2.0   fd3234b9d664  5270951d5acd  
crash.ceph001          ceph001.haoyang.cn                    running (-11706s)     9m ago  22h    6685k        -  19.2.0   fd3234b9d664  56d8da9604fa  
crash.ceph002          ceph002.haoyang.cn                    running (-11707s)     7m ago  22h    6681k        -  19.2.0   fd3234b9d664  5fc552cacd65  
crash.ceph003          ceph003.haoyang.cn                    running (-11709s)     7m ago  22h    6689k        -  19.2.0   fd3234b9d664  ffd2d7310ac6  
grafana.ceph001        ceph001.haoyang.cn  *:3000            running (-11706s)     9m ago  22h    76.1M        -  9.4.12   f3e6303dba5e  fdf44407fb4c  
mgr.ceph001.hkkqlh     ceph001.haoyang.cn  *:9283,8765,8443  running (-11706s)     9m ago  22h     551M        -  19.2.0   fd3234b9d664  7ba2eecea18b  
mgr.ceph002.mldtvp     ceph002.haoyang.cn  *:8443,9283,8765  running (-11707s)     7m ago  22h     452M        -  19.2.0   fd3234b9d664  bdfd928dabf9  
mon.ceph001            ceph001.haoyang.cn                    running (-11706s)     9m ago  22h     134M    2048M  19.2.0   fd3234b9d664  2bcdeda36a41  
mon.ceph002            ceph002.haoyang.cn                    running (-11708s)     7m ago  22h     130M    2048M  19.2.0   fd3234b9d664  91486fa9f36b  
mon.ceph003            ceph003.haoyang.cn                    running (-11709s)     7m ago  22h     129M    2048M  19.2.0   fd3234b9d664  345686a5334d  
node-exporter.ceph001  ceph001.haoyang.cn  *:9100            running (-11706s)     9m ago  22h    14.7M        -  1.5.0    68cb0c05b3f2  9cbaabb099cc  
node-exporter.ceph002  ceph002.haoyang.cn  *:9100            running (-11708s)     7m ago  22h    15.0M        -  1.5.0    68cb0c05b3f2  1b8fdb1f51c0  
node-exporter.ceph003  ceph003.haoyang.cn  *:9100            running (-11709s)     7m ago  22h    12.6M        -  1.5.0    68cb0c05b3f2  0796493a5f8e  
osd.0                  ceph001.haoyang.cn                    running (-11709s)     9m ago  22h    40.4M    4096M  19.2.0   fd3234b9d664  490045e69852  
osd.1                  ceph001.haoyang.cn                    running (-11709s)     9m ago  22h    43.8M    4096M  19.2.0   fd3234b9d664  d0ce9e899dd7  
osd.2                  ceph001.haoyang.cn                    running (-11709s)     9m ago  22h    54.2M    4096M  19.2.0   fd3234b9d664  c3678e3dc74e  
osd.3                  ceph002.haoyang.cn                    running (4h)          7m ago   4h    50.6M    4096M  19.2.0   fd3234b9d664  8787bc6caa84  
osd.4                  ceph003.haoyang.cn                    running (4h)          7m ago   4h    51.9M    4096M  19.2.0   fd3234b9d664  be6e18374b5a  
osd.5                  ceph002.haoyang.cn                    running (4h)          7m ago   4h    49.2M    4096M  19.2.0   fd3234b9d664  1877027bdbab  
osd.6                  ceph003.haoyang.cn                    running (4h)          7m ago   4h    47.6M    4096M  19.2.0   fd3234b9d664  8236520080e4  
osd.7                  ceph003.haoyang.cn                    running (4h)          7m ago   4h    52.9M    4096M  19.2.0   fd3234b9d664  93ee38964a01  
osd.8                  ceph002.haoyang.cn                    running (4h)          7m ago   4h    50.1M    4096M  19.2.0   fd3234b9d664  a737972f17c7  
prometheus.ceph001     ceph001.haoyang.cn  *:9095            running (-11706s)     9m ago  22h    85.6M        -  2.43.0   77ee200e57dc  ce155b30e24f

Assigning new management privileges

For convenience in management, we will add ceph002.haoyang.cn as a management host.

Before assigning management privileges, let’s take a look at the configuration files and keys of the ceph002.haoyang.cn host.

Based on the information, there are no keys or configuration files present.

1 2	[root@ceph002 ~]# ls /etc/ceph rbdmap

Similarly, without the appropriate permissions, it is not possible to retrieve cluster information.

1 2	[root@ceph002 ~]# ceph -s Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')

Let’s assign the _admin label to ceph002.haoyang.cn.

1 2	[root@ceph001 ~]# ceph orch host label add ceph002.haoyang.cn _admin Added label _admin to host ceph002.haoyang.cn

Checking the current status of the cluster’s host list again.

[root@ceph001 ~]# ceph orch host ls --detail
HOST                ADDR            LABELS  STATUS  VENDOR/MODEL                      CPU  RAM    HDD  SSD      NIC  
ceph001.haoyang.cn  172.16.173.129  _admin          VMware, Inc. VMware (VMware20,1)  N/A  4 GiB  -    4/2.1TB  1    
ceph002.haoyang.cn  172.16.173.130  _admin          VMware, Inc. VMware (VMware20,1)  N/A  4 GiB  -    4/2.1TB  1    
ceph003.haoyang.cn  172.16.173.131                  VMware, Inc. VMware (VMware20,1)  N/A  4 GiB  -    4/2.1TB  1    
3 hosts in cluster

Checking the keys or configuration files again in ceph002.haoyang.cn.

1 2	[root@ceph002 ~]# ls /etc/ceph ceph.client.admin.keyring ceph.conf rbdmap

We can now confirm that ceph002.haoyang.cn has management privileges.

1 2	[root@ceph002 ~]# ls /etc/ceph ceph.client.admin.keyring ceph.conf rbdmap

If the following command executes successfully, it indicates that it has successfully obtained the cluster information and the permissions are working correctly.

[root@ceph002 ~]# ceph -s
  cluster:
    id:     365472d8-d815-11ef-90ff-000c29c84d93
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph001,ceph002,ceph003 (age 4h)
    mgr: ceph001.hkkqlh(active, since 4h), standbys: ceph002.mldtvp
    osd: 9 osds: 9 up (since 4h), 9 in (since 4h)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   244 MiB used, 4.4 TiB / 4.4 TiB avail
    pgs:     1 active+clean

Check the status of the Ceph cluster

Since we have added a new host and new OSDs to the cluster, the yellow status on the dashboard should have turned green.

ceph_dashboard_green

Finally, let’s use the command to check the cluster status!

[root@ceph001 ~]# ceph -s
  cluster:
    id:     365472d8-d815-11ef-90ff-000c29c84d93
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph001,ceph002,ceph003 (age 5h)
    mgr: ceph001.hkkqlh(active, since 5h), standbys: ceph002.mldtvp
    osd: 9 osds: 9 up (since 5h), 9 in (since 5h)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   244 MiB used, 4.4 TiB / 4.4 TiB avail
    pgs:     1 active+clean

With this, our ceph cluster deployment has been successfully completed!