Ubuntu, MDADM, RAID5, LVM and a Few Things In-Between
Ok, so I've been a bit slack while on leave and let Scotty
down by not posting up my speedometer. I swear I will, however, today is more
about server administration than an electronic project.
The reason I am writing about this is that it took me a
while and a lot of reading to get everything I needed to grow, repair, destroy
and create (in that order oddly enough) a RAID5 software array with and overlaying
Logical Volume.
A while back I purchased a HP micro server and with the assistance
of a mate got it up and running with Ubuntu 12.04 LTS. The OS runs of a USB key inside
the server, while the RAID5 array sits in the bay drives. I needed the help
originally as I had no real idea about LVM ‘s (or for that matter any real
working knowledge of a RAID) – so to begin with I had four 2TB hard disks,
which gave me about 5.4TB of usable space on the array.
As with all things, I 'had' to tinker with my Micro Server.
The plan (as suggested by said friend), install two more 2TB hard disks in the
optical bay slot of the server and increase the RAID5. After all, the whole point
of the LVM is to allow the administrator to easily increase/modify disk space.
So I purchased two new hard drives, a power cable splitter,
an esata to sata cable as well as a straight sata cable (all 3Gb/s). Finally, I
purchased a Nexus twin hard drive mount. The nexus is an awesome little bit of kit that
mounts two 3.5” hard drives on top of each other and pushes out vibration
dampeners to the side so that the drives mount into the 5.25” optical bay.
So now the fun part – I will actually quote direct commands
here as this was all done in CLI, thus I can state these all work in Ubuntu
12.04 LTS, having said that if you've stumbled upon this, read the whole bloody
thing - for the sake of your own sanity. I will headline each actual task as
best I can in the order they're listed above.
Growing the Array
This should be simple – and it is. It must be done a
disk at a time but overall it’s very easy. I do not have hot-plug enabled (I
have no idea why I don’t, I just decided I didn't want it, so didn't enable it
in my BIOS). If you do have hot-plug then no need to power down for you.
So I had to power down the server, physically connect one of the new
disks and power the server back up. All going well there should be a lovely new hard drive
in the system. You should be able to spot your drive
allocation by using:
sudo cat /proc/diskstats
If you don’t know, you should also find out the allocation
of your existing array:
sudo cat /proc/mdstat
So the result for mdstat is that my array iss md0 (I knew
this already, but for the sake of someone reading who doesn't know that's how it's done). Diskstats
told me that sde was my new disk.
First things first, let’s partition the new disk:
sudo fdisk /dev/sde
n – for a new partition
p – for a primary partition (followed by default boundaries)
t – to change partition type
1 – to select partition to change (you should only have one
partition in this example)
fd – as the partition type (Linux RAID auto detect)
p – to view what your about to write to the disk matches the
above
w – to write it to the disk!
Congratulations you've now partitioned the disk. Let’s add
it to the existing array:
sudo mdadm --add /dev/md0 /dev/sde1
So now your array begins to grow. It can be a long process,
an array the size of mine took around 24 – 30 hours (I can’t tell you exactly
as the speed fluctuates and I didn't hang around waiting for it), later I’ll
explain a couple of little tricks to speed this process up, some are common,
some are not.
Now this is where my problems began, I didn't really read the
output of ‘p’ in fdisk and hadn't partitioned the disk properly. This wasn't a
major deal, I just now had sda1, sdb1, sdc1, sdd1 and sde. My OCD kicked in here – I want them all to be partitioned
the same blah blah blah (it honestly doesn't matter, it really is just internal aesthetics but oh well).
Removing a Device from the Array
Now I’m not actually shrinking my array here, I just wanted
to remove my disk and re-partition it before re-adding it to the array. Should be
simple right?
It’s slow, it’s painful and here is where it all came crashing
down. In saying that, the commands below are correct, it was just the idiot telling the computer what to do that should of double checked his own work.
First I failed the disk out and removed it:
sudo mdadm --manage /dev/md0 --fail /dev/sde
sudo mdadm --manage /dev/md0 --remove /dev/sde
Now at this stage I rebooted, re-partitioned my disk (as previously done) and
re-added it:
sudo mdadm --manage /dev/md0 --add /dev/sde1
This was my undoing as the system re-allocated the disk
names on reboot and I didn't check – I (somehow) partitioned an existing,
active RAID disk, destroying the superblock in the process. So now I have a
RAID5 with two failed disks. What do we know about RAID5 kids – two failed disks
means an unrecoverable array - $#&*&!
After some googling and some luck (and through the lack of detail
here you can probably tell I’m not exactly sure how), I managed to get the RAID
to commence recovering. Recovery takes a lot longer than creating or growing,
it’s a lot more work for the drives, as in this case, the missing disk was
effectively from the middle of the array, needless to say the heads were
spinning feverishly.
The original estimate on the recovery was 4.2 days (7000
odd minutes) and it came pretty close to this. When it finished I was disappointed to see it didn't save
everything, the damage was done and I had lost some data. Thankfully I keep a
back-up of my data, the most recent being a month earlier, by combining what I
could save from the array and comparing it to my backup I found I was missing
around 18GB, out of 2.1TB of data.
This I can live with as it is not mission critical
stuff and easily replaced. So now what to do, the array is repaired but clearly not in
a great state – I WANT A FRESH START!
Destroying a MDADM Array and LV
First thing first, this is how to destroy a RAID5 array, so
be sure you want to do it – there is no coming back from this once it’s done!
In my case, my array is mounted on boot through fstab so
I had to disable it in fstab and reboot (I find it easier this way as there
aren't remnants hiding in a system process stopping the array from being shutdown). So rebooting my server without fstab mounting the array and
binding my network share to it meant that I could deactivate everything.
First we need to deactivate the Logical Volume and Volume Group:
sudo lvdisplay – get the name of the volume group (Array in this instance).
sudo lvchange –a n Array – this deactivates the Volume Group
sudo lvremove Array – removes Volume Group
An example of lvdisplay from a previous setup:
sudo vgdisplay – get the name of the volume group (RAID5 in
this instance).
sudo vgchange –a n RAID5 – this deactivates the Volume Group
sudo vgremove RAID5 – removes Volume Group
So now we need to stop the array:
sudo mdadm --stop /dev/md0
Zero the superblocks:
sudo mdadm --zero-superblock /dev/sda1
(repeat for all
disks, one at a time sda1 – sdx)
Remove the array:
sudo mdadm --remove /dev/md0
Finally, I did this to make sure the partition table was
gone too:
sudo shred –n 1 /dev/sda (repeat for all disks, one at a time sda – sdx)
It should be noted I’d let the above run for five minutes or
so and then cancel it (Ctrl – C), we only need to destroy the data on the front
of the disk so we can create our new array without mdadm complaining (it’ll
destroy the rest of the data anyway, also shredding even a single pass on a 2TB
disk will take a while and there’s no real need unless you want to be painfully security conscious).
Creating a RAID5 Software Array and LV
So now it’s time to make our new array (thank Christ!), this
is not as slow as recovering or even growing an existing array. All up creating
the (now) 10TB array (9.3TB usable), took around six hours. Creating the LVM
over the top is less than fifteen minutes.
So first, let’s partition our disks (properly this time –
take it from me, check your work):
sudo fdisk /dev/sda
n – for a new partition
p – for a primary partition (followed by default boundaries)
t – to change partition type
1 – to select partition to change (you should only have one partition in this example)
fd – as the partition type (Linux RAID auto detect)
p – to view what your about to write to the disk matches the above
w – to write it to the disk!
Complete the above for all disks to be used in the array, in
my case it was sda through sdf.
Now let’s create our new array:
sudo mdadm --create /dev/md0 --level=5 –n 6 /dev/sda1
/dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1
This is the long part as now the system will start creating the
array. Level as referenced above refers to the RAID type, 'n' is the number of devices (six hard disks). A spare drive can also be added to assist in automating things when there's a failure, though this does not interest me.
Fast forward six hours and my array is complete (at this point I was
also quite drunk after six cans of Guinness). Let’s update the mdadm configuration
file and propagate our new array through the system before we do anything (kernel
included):
sudo mdadm --detail –-scan
We need the UUID of our mdadm array (don’t confuse this with
the UUID obtained later) and a few other details that were just spat out at you
(see below for what you need). Right now though we need to edit the mdadm.conf file (it’s located at /etc/mdadm/mdadm.conf).
Below ‘#definitions of existing arrays’, add the following from the above
output (delete any existing entry here if you removed your old array too):
ARRAY /dev/md0 metadata=x.x name=xxxx:x
UUID=xxx:xxx:xxx:xxx
Where the x’s are the data spat out from the above command
(you’re probably wise to copy and paste the output TBH).
Propagate the changes through the system:
sudo update-initframs -u
The above is especially important if you've removed a
previous array and are replacing it. Anyway time to get our LVM on.
Let’s create the physical volume:
sudo pvcreate /dev/md0
Now the Volume Group:
sudo vgcreate RAID5 /dev/md0
RAID5 is the name I chose for my Volume Group, I kept it the
same as the one my friend helped me make originally so it was easier for me to
follow.
To create the Logical Volume we need to know the total size
of our Volume Group:
sudo vgdisplay RAID5 | grep “Total PE”
The result in my case was ‘2384496’. Now let’s make the LV:
sudo lvcreate –l 2384496 –m Array RAID5 /dev/md0
Where ‘Array’ is the name of my LV and ‘RAID5’ the name of
my previously created Volume Group.
Finally, we need a usable file system, in my case ext4:
sudo lvdisplay – get the full path of our LV
sudo mkfs.ext4 /dev/RAID5/Array
A short time later the process will complete and you’ll have
a software RAID5 with a ext4 Logical Volume.
I added mine back to fstab so it would auto mount, first you
need the UUID of the array:
sudo blkid
Here we’re looking for the UUID of our new array
/dev/mapper/RAI5-Array in my case (note the VG and LV name use here, that’s how
you’ll know you've got the right one).
Finally, in fstab itself add a line similar to the below
(you’ll have to be root to edit fstab BTW):
UUID=xxx-xxx-xxx-xxx-xxx /my/mount-point ext4 defaults 1 1
Where the UUID is the UUID from blkid, /my/mount-point is
where you are going to mount your new LVM (an example might be /opt/myarray),
ext4 is the file system in use and the rest you don’t need to worry about
unless you’re an advanced user (in which case you’re probably not reading this). The spacing between the mount point, ext4, default values etc should be a TAB spacing.
Reboot and you’re done.
Speeding up Growth, Recovery and Creation
If you've kept reading to this point good for you! Here’s some tips for getting better disk speeds when dealing with growing,
recovering or creating an array. Some of these can be altered while the process
is going, however things like bit mapping cannot and must be done beforehand. There’s some pros and cons and I’ll touch on them a little where applicable.
To give some ballpark figures from my system (these will be
different based on the machine used):
Growing the array, 15 – 20 MB/s
Recovering the array, 5 – 7 MB/S peaking briefly at times to
20MB/s
Creating the array, 100MB/s (this dropped to 60MB/s at the
end)
To monitor the systems progress during any of the above:
cat /proc/mdstat
or
sudo mdadm --detail /dev/md0
From here on in I did have some issues with ‘sudo’. From
what I've been told it sounds like a re-direct issue, I couldn't execute these
commands (permission denied error). Logging in as root however fixed this (sudo
su). Just something to keep in mind.
Minimum and maximum disk speeds (can be changed during a
running process):
sudo echo 20000 > /proc/sys/dev/raid/speed_limit_min
sudo echo 200000 > /proc/sys/dev/raid/speed_limit_max
This sets the minimum speed to 20MB/s and the maximum to
200MB/s. From experience this doesn't mean the system will hold these speeds if
it can’t, so adjust the according to what you’re seeing from cat /proc/mdstat
(use some common sense basically).
Stripe buffer caching (can be changed during a running
process):
This is a bit like reading ahead, it accepts values from 16
to 32768 (32MB). There were a lot of warnings about this when I was reading
about it, basically setting the value too high had caused swap storms and the
like - too much memory (and/or virtual memory) had been used for the process is the short version.
I took into account the age of the posts (they were reasonably old) and monitored the usage closely. I set the stripe buffer to 32MB and noticed an increase around the fifteen percent mark in RAM usage. I
have 8GB of RAM and a 3.7GB swap partition, at this point I decided I was safe to leave it there. The default value is 256, obviously the reference to
md0 below relates directly to the fact md0 is my array:
sudo echo 32768 > /sys/block/md0/md/stripe_cache_size
Bit-mapping (can only be enabled prior to the process
commencing):
Now I have to admit I have no experience with this one, but
if it prevents you sitting there for 4.2 days then it’s worth it. Bit-mapping reportedly improves recovery times after crashing as well as growth times, it does however
gradually degrade your arrays performance if you do not disable it at the
completion of the process (it should also be noted this is only relevant for certain RAID configurations of which RAID5 is one).
Enabled:
sudo mdadm --grow --bitmap=internal /dev/md0
Disabled:
sudo mdadm --grow --bitmap=none /dev/md0
Conclusion
Well that’s it folks. My system is now back up and running
as it was before I broke it (ya!) and I hope to never have to do this again.
Hopefully this saves someone from having as many browser windows open as I did
when trying to learn all this stuff.
Here’s a picture of my hacked up server with its two 3.5”
drives secreted in the optical bay slot – enjoy!
Comments
Post a Comment