-
- In Ceph, the placement group (PG) is very essential for pool. It should be aligned carefully with the pool size (i.e. # replica of objects), the number of OSDs and data size. Therefore this link.
-
node organisation (initial)
- admin: dccn-l018
- mon: dccn-c005
- mds: dccn-c005
- osd: dccn-c035, dccn-c036
-
node orgainsation (additional)
- extra mon: dccn-c035
- extra mds: dccn-c036 (for CephFS)
-
filesystem on osd:
- data: multiple disk aggregated into one BTRFS mount
- journal: SSD
-
preparation on admin node (account: root)
-
add ceph repository
$ /opt/cluster/sbin/ceph/01_add_ceph_yum_repo.sh
-
make sure
sudoers
are configured properly to include the specific SUDO configuration of ceph account. -
create ceph account
$ /opt/cluster/sbin/ceph/02_create_manage_account.sh
-
install ntp packages
$ /opt/cluster/sbin/ceph/03_install_ntp.sh
-
-
preparation on other nodes (account: root)
-
create ceph account
$ /opt/cluster/sbin/ceph/02_create_manage_account.sh
-
install ntp packages
$ /opt/cluster/sbin/ceph/03_install_ntp.sh
-
-
distribute SSH key for password-less login from admin node to other nodes (account: ceph)
-
logon to the admin node via ceph account
-
run
$/opt/cluster/sbin/ceph/04_deploy_ceph_sshkey.sh dccn-c005 dccn-c035 dccn-c036
-
add the following contents into
$HOME/.ssh/config
Host dccn-c005 StrictHostKeyChecking no UserKnownHostsFile=/dev/null Hostname dccn-c005.dccn.nl User ceph Host dccn-c035 StrictHostKeyChecking no UserKnownHostsFile=/dev/null Hostname dccn-c035.dccn.nl User ceph Host dccn-c036 StrictHostKeyChecking no UserKnownHostsFile=/dev/null Hostname dccn-c036.dccn.nl User ceph
-
change mode of
$HOME/.ssh/config
$ chmod 600 $HOME/.ssh/config
-
test if user ceph can login without password.
-
-
login to admin node (account: ceph)
-
create directory for cluster profile's
$ mkdir $HOME/dccn-ceph-cluster $ cd $HOME/dccn-ceph-cluster
-
create new cluster profile, the first MON node needs to be given.
$ ceph-deploy new dccn-c005
-
append the following lines in
ceph.conf
osd pool default size = 2 [osd] osd journal size = 60000 osd mkfs options btrfs = -f -d single
-
install ceph packages on all nodes
$ ceph-deploy install dccn-l018 dccn-c005 dccn-c035 dccn-c036
-
initalise the cluster
$ ceph-deploy mon create-initial
This step generates keyrings to be shared amongst nodes in the cluster. At the moment, the MON service should be started on dccn-c005.
-
prepare OSD
Here we assume on dccn-c035 and dccn-c036,
/dev/sdb
is the SSD, while/dev/sd[c-e]
are HDD's.The following commands will create an OSD on dccn-c035 and dccn-c036 with data located on
/dev/sdc
and journal on/dev/sdb
. The size of journal will be60 GB
as specified in theceph.conf
file. Proper journal size calculation follows this recommendation.$ ceph-deploy osd prepare --fs-type btrfs dccn-c035:/dev/sdc:/dev/sdb $ ceph-deploy osd prepare --fs-type btrfs dccn-c036:/dev/sdc:/dev/sdb
-
activate OSD
Firstly, login onto OSD nodes and check the directories created during the prepartion step above. Normally, it is under
/var/lib/ceph/osd
and calledceph-N
whereN
is a sequencial number.If it's a clean installation, on dccn-c035, one should have
/var/lib/ceph/osd/ceph-0
and/var/lib/ceph/osd/ceph-1
on dccn-c036$ ceph-deploy osd activate dccn-c035:/var/lib/ceph/osd/ceph-0 $ ceph-deploy osd activate dccn-c036:/var/lib/ceph/osd/ceph-1
-
adding more devices to BTRFS
This step has nothing to do with Ceph; but under the management of BTRFS. BTRFS provides capability of adding/removing storage devices on the fly to a mounting point. We use this feature to add more disked into the Ceph cluster.
Example of adding
/dev/sdd and /dev/sde
on dccn-c005-
login dccn-c005 as root
-
run
$ parted /dev/sdd mklabel gpt $ parted /dev/sdd mkpart primary 0% 100% $ btrfs device add -f /dev/sdd1 /var/lib/ceph/osd/ceph-0
-
repeat above commands for
/dev/sde
-
rebalance data in BTRFS and mirror metadata (as in single disk mode, BTRFS use raid0 for metadata)
$ btrfs balance start -mconvert=raid1 /var/lib/ceph/osd/ceph-0
-
after that, one should see three disks being utilized by a single BTFS file system
$ btrfs fi show Label: none uuid: 3bce839b-9683-49b6-ae5c-d0f009f8e8f4 Total devices 3 FS bytes used 1.07MiB devid 1 size 1.09TiB used 40.00MiB path /dev/sdc1 devid 2 size 1.09TiB used 1.03GiB path /dev/sdd1 devid 3 size 1.09TiB used 1.00GiB path /dev/sde1 Btrfs v3.16.2
-
-
deploy the administrator keyring to all nodes
$ ceph-deploy admin dccn-l018 dccn-c005 dccn-c035 dccn-c036
-
no the management node, make sure the keyring of client.admin is readable for the
ceph
user$ sudo chmod +r /etc/ceph/ceph.client.admin.keyring
-
health check
$ ceph health HEALTH_OK
If the health is not
OK
, resons of not beingOK
will be shown. If the reasoning is not clear, you may want to start from scratch. In this case, follow the instructions below about Purge Ceph Installation to clean up everything and restart. -
cluster status
$ ceph -s
-
monitor cluster status (similar to
watch
)$ ceph -w
At the initial phase while still figuring out proper configuration of the Ceph cluster,
one may need repeat the installation steps few times. For every iteration, it's more
convenient to start from scratch. This prevent unexpected situation due to left-over
files for configurations, keyrings or ref. IDs. The ceph-deploy
command offers
a quick way to erase everything. The following steps show how to do it. They should
be followed in a sequencial order.
- purge ceph installation
$ ceph-deploy purge dccn-l018 dccn-c005 dccn-c035 dccn-c036
This shuts down services and remove RPM packages from the nodes.
- purge ceph data
$ ceph-deploy purgedata dccn-l018 dccn-c005 dccn-c035 dccn-c036
This will remove entire directory of /var/lib/ceph
, so data on OSD will be gone.
- forget all keys
$ ceph-deploy forgetkeys
This clean up all keyrings and ceph.conf
file within the cluster profile folder.
According to the document, the ceph-deploy
offers an simple command to add new MON to the cluster.
However, I encounter the issue with it, as some command executed remotely to provision the MON service
gets stuck. It causes connection timeout, but nevertheless, the deployment continues to add MON into the cluster.
To clean up the mass leftover by the timeout, I figured out that one should login to the node on which
the new MON is deployed. Kill relevant processes (very likely the one with command ceph-mon
), and restart
the ceph daemon. After that, the service seems to be running fine.
TODO: need to try the manual step of adding new MON in the future.
MDS (metadata system) node is required for running CephFS, a filesystem interface to Ceph cluster.
$ ceph-deploy mds create dccn-c005