CEPH Debugging, Tips & Tricks (rook-ceph)

Start by setting the block pool name:

export CEPH_POOL_NAME="my-block-pool-name"

General Tools

Just some general tips & tricks.

# Debugging & Status of pool & OSD
    rados df && ceph osd pool stats $CEPH_POOL_NAME && ceph status

# Find diff of objects created within 5 seconds time
    val1=$(rados -p $CEPH_POOL_NAME ls | sort ) && sleep 5 && val2=$(rados -p $CEPH_POOL_NAME ls | sort) && echo " ${val1}, ${val2}" | tr ',' '\n' | sort | uniq -u | paste -sd,

# RADOS purge data [BE CAUTIOUS, THIS WILL KILL EVERYTHING]
    rados purge $CEPH_POOL_NAME--yes-i-really-really-mean-it

Snapshots & Images

The below shows an example of how to find the snapshots and images created, and how to follow-up through from beginning to the end. Super useful, if you are dealing with snapshots.

# List all images
    rbd ls --pool $CEPH_POOL_NAME

# List snap info
    rbd info --pool $CEPH_POOL_NAME csi-vol-713d9f6a-87f8-11ec-a353-9e0afa686c7c

# List Images & Snapshots counts
    rbd pool stats --pool $CEPH_POOL_NAME

# Actual Snapshots
    rbd snap list --pool $CEPH_POOL_NAME csi-vol-6ee6f0d5-8805-11ec-a353-9e0afa686c7c

# Block sizes and diffs
    rbd diff --pool $CEPH_POOL_NAME csi-vol-2c3ba18b-82a5-11ec-bad4-c206a9ecc8f4

Crushmap [download / upload]

action=download
#action=$1
if [ $action == "download" ]; then
   ceph osd getcrushmap -o /tmp/ma-crush-map
   crushtool -d /tmp/ma-crush-map -o /tmp/ma-crush-map.txt
   cat /tmp/ma-crush-map.txt
elif [ $action == "upload" ]; then
   crushtool -c /tmp/ma-crush-map.txt -o /tmp/ma-crush-new-map
   ceph osd setcrushmap -i /tmp/ma-crush-new-map
else
   echo "Usage: ./crushmap.sh [download | upload ]"
fi

Performance Testing

## Prerequisites
  # NOTE: Before running any of the benchmarks in subsequent sections, drop all caches in the osd pods, using below:
    sudo echo 3 | sudo tee /proc/sys/vm/drop_caches && sudo sync

## Tests
  # Test the IO (Shell into OSD pod)
    dd if=/dev/zero bs=1G count=1


## Run performance benchmark on the PG (run from toolbox)
  # Write performance
      rados bench -p $CEPH_POOL_NAME 10 write --no-cleanup
  # Sequential read
      rados bench -p $CEPH_POOL_NAME 10 seq
  # Random read
      rados bench -p $CEPH_POOL_NAME 10 rand


## Cleanup benchmark data (toolbox)
  rados -p $CEPH_POOL_NAME cleanup