It is important to continuously monitor the health of a Ceph deployment from the beginning, either by using the Ceph tools or by accessing the status through the Proxmox VE API.
The following Ceph commands can be used to see if the cluster is healthy (HEALTH_OK), if there are warnings (HEALTH_WARN), or even errors (HEALTH_ERR). If the cluster is in an unhealthy state, the status commands below will also give you an overview of the current events and actions to take.
# single time output pve# ceph -s # continuously output status changes (press CTRL+C to stop) pve# ceph -w
To get a more detailed view, every Ceph service has a log file under
/var/log/ceph/
. If more detail is required, the log level can be
adjusted [25].
You can find more information about troubleshooting [26] a Ceph cluster on the official website.
[25] Ceph log and debugging https://docs.ceph.com/en/nautilus/rados/troubleshooting/log-and-debug/
[26] Ceph troubleshooting https://docs.ceph.com/en/nautilus/rados/troubleshooting/