OSD cluster expansion/contraction
时间:2010-05-20 来源:zhengsheng2010
OSD cluster expansion/contraction
From Ceph wiki
Jump to: navigation, searchAdding OSDs takes place in two stages. First, the cosd daemon needs to be set up and added to the cluster. Then the CRUSH data placement needs to be adjusted to put data there.
NOTE: This all assumes the cluster is currently up and running. There's no need to shut anything down during this process.
Contents[hide]
|
[edit] Add the OSD to the cluster
1. First, add a new [osdN] entry to your ceph.conf with the relevant options (host, osd path, etc.), where N is the osd id. For this example, let's say we have osd0..osd3 on hosts foo0..foo3, and are adding osd4 on host foo4, on /dev/sdb. You might add something like:
[osd4]
host = foo4
btrfs devs = /dev/sdb
2. Log into foo4 (the host where the OSD will live). mkfs and mount the btrfs volume, if necessary
# mkbtrfs /dev/whatever
# mount /dev/whatever /mnt/whereever
3. Get a copy of the monmap. This needed when 'formatting' the individual osd.
# ceph mon getmap -o /tmp/monmap
4. Initialize the OSD data dir. The -i arg is the osd id, so for osd4 it's -i 4.
# cosd -c /path/to/ceph.conf -i 4 --mkfs --monmap /path/to/monmap
5. Make sure your 'max osd' is high enough; it should be the highest osd id plus 1. If you are adding osd4, you want a max of 5 (i.e., 5 osds).
# ceph osd dump -o - | grep max
max_osd 4
# ceph osd setmaxosd 5
09.10.13 21:22:32.193494 mon <- [osd,setmaxosd,5]
09.10.13 21:22:33.251192 mon1 -> 'set new max_osd = 5' (0)
4. Start up cosd:
# /etc/init.d/ceph start osd
5. Check your work. Dumping the osd map should show your new OSD as 'up' with an IP address, etc.:
# ceph osd dump -o -
[edit] Include the new OSD in the data placement
Adding the OSD to the system doesn't necessarily mean the system will put data on it. You need to check (and possibly update) the CRUSH placement map (part of the OSD map).
1. Grab the current CRUSH map and decode it:
# ceph osd getcrushmap -o /tmp/crush
# crushtool -d /tmp/crush
Generally speaking, if you see the osd included in the 'devices' section at the top of the decoded crush map, and it was an automatically generated map, you're fine. (osd4 is the same as device4, in this case.)
If you don't, you have two options. You can either update the CRUSH map yourself (the syntax is pretty simple), or you can generate a new one.
[edit] Updating an existing CRUSH map
1. Grab the CRUSH map
# ceph osd getcrushmap -o /tmp/crush
2. Decode it
# crushtool -d /tmp/crush -o /tmp/crush.txt
3. Edit to your liking
# vi /tmp/crush.txt
4. Reencode
# crushtool -c /tmp/crush.txt -o /tmp/crush.new
If you have errors, fix and repeat...
5. Inject the new CRUSH map into the cluster
# ceph osd setcrushmap -i /tmp/crush.new
read 349 bytes from cm
09.10.13 21:33:07.411293 mon <- [osd,setcrushmap]
09.10.13 21:33:08.418202 mon0 -> 'set crush map' (0)
6. Watch the data move around
# ceph -w
You should see the pg states change from active+clean to active, some degraded objects (they aren't where they're supposed to be), and finally active+clean when migration completes. (Control-c to exit.)
[edit] Building a generic CRUSH map
If you want to use a default placement strategy for N (say, 64) OSDs, osdmaptool can build a map for you.
1. Generate a new (junk) OSDMap, and extract the generic generated CRUSH map.
# osdmaptool --createsimple 64 --clobber /tmp/osdmap.junk --export-crush /tmp/crush.new
2. Take a look
# crushtool -d /tmp/crush.new
3. Inject it into the cluster
# ceph osd setcrushmap -i /tmp/crush.new
4. Watch the data move around
# ceph -w