Ubuntu, MDADM, RAID5, LVM and a Few Things In-Between

Ok, so I've been a bit slack while on leave and let Scotty down by not posting up my speedometer. I swear I will, however, today is more about server administration than an electronic project.

The reason I am writing about this is that it took me a while and a lot of reading to get everything I needed to grow, repair, destroy and create (in that order oddly enough) a RAID5 software array with and overlaying Logical Volume.

A while back I purchased a HP micro server and with the assistance of a mate got it up and running with Ubuntu 12.04 LTS. The OS runs of a USB key inside the server, while the RAID5 array sits in the bay drives. I needed the help originally as I had no real idea about LVM ‘s (or for that matter any real working knowledge of a RAID) – so to begin with I had four 2TB hard disks, which gave me about 5.4TB of usable space on the array.

As with all things, I 'had' to tinker with my Micro Server. The plan (as suggested by said friend), install two more 2TB hard disks in the optical bay slot of the server and increase the RAID5. After all, the whole point of the LVM is to allow the administrator to easily increase/modify disk space.

So I purchased two new hard drives, a power cable splitter, an esata to sata cable as well as a straight sata cable (all 3Gb/s). Finally, I purchased a Nexus twin hard drive mount. The nexus is an awesome little bit of kit that mounts two 3.5” hard drives on top of each other and pushes out vibration dampeners to the side so that the drives mount into the 5.25” optical bay. 



So now the fun part – I will actually quote direct commands here as this was all done in CLI, thus I can state these all work in Ubuntu 12.04 LTS, having said that if you've stumbled upon this, read the whole bloody thing - for the sake of your own sanity. I will headline each actual task as best I can in the order they're listed above.

Growing the Array

This should be simple – and it is. It must be done a disk at a time but overall it’s very easy. I do not have hot-plug enabled (I have no idea why I don’t, I just decided I didn't want it, so didn't enable it in my BIOS). If you do have hot-plug then no need to power down for you.

So I had to power down the server, physically connect one of the new disks and power the server back up. All going well there should be a lovely new hard drive in the system. You should be able to spot your drive allocation by using:
sudo cat /proc/diskstats

If you don’t know, you should also find out the allocation of your existing array:
sudo cat /proc/mdstat

So the result for mdstat is that my array iss md0 (I knew this already, but for the sake of someone reading who doesn't know that's how it's done). Diskstats told me that sde was my new disk.

First things first, let’s partition the new disk:
sudo fdisk /dev/sde

n – for a new partition
p – for a primary partition (followed by default boundaries)
t – to change partition type
1 – to select partition to change (you should only have one partition in this example)
fd – as the partition type (Linux RAID auto detect)
p – to view what your about to write to the disk matches the above
w – to write it to the disk!

Congratulations you've now partitioned the disk. Let’s add it to the existing array:
sudo mdadm --add /dev/md0 /dev/sde1

So now your array begins to grow. It can be a long process, an array the size of mine took around 24 – 30 hours (I can’t tell you exactly as the speed fluctuates and I didn't hang around waiting for it), later I’ll explain a couple of little tricks to speed this process up, some are common, some are not. 

Now this is where my problems began, I didn't really read the output of ‘p’ in fdisk and hadn't partitioned the disk properly. This wasn't a major deal, I just now had sda1, sdb1, sdc1, sdd1 and sde. My OCD kicked in here – I want them all to be partitioned the same blah blah blah (it honestly doesn't matter, it really is just internal aesthetics but oh well). 

Removing a Device from the Array

Now I’m not actually shrinking my array here, I just wanted to remove my disk and re-partition it before re-adding it to the array. Should be simple right? 

It’s slow, it’s painful and here is where it all came crashing down. In saying that, the commands below are correct, it was just the idiot telling the computer what to do that should of double checked his own work. 

First I failed the disk out and removed it:
sudo mdadm --manage /dev/md0 --fail /dev/sde

sudo mdadm --manage /dev/md0 --remove /dev/sde

Now at this stage I rebooted, re-partitioned my disk (as previously done) and re-added it:
sudo mdadm --manage /dev/md0 --add /dev/sde1

This was my undoing as the system re-allocated the disk names on reboot and I didn't check – I (somehow) partitioned an existing, active RAID disk, destroying the superblock in the process. So now I have a RAID5 with two failed disks. What do we know about RAID5 kids – two failed disks means an unrecoverable array - $#&*&!

After some googling and some luck (and through the lack of detail here you can probably tell I’m not exactly sure how), I managed to get the RAID to commence recovering. Recovery takes a lot longer than creating or growing, it’s a lot more work for the drives, as in this case, the missing disk was effectively from the middle of the array, needless to say the heads were spinning feverishly. 

The original estimate on the recovery was 4.2 days (7000 odd minutes) and it came pretty close to this. When it finished I was disappointed to see it didn't save everything, the damage was done and I had lost some data. Thankfully I keep a back-up of my data, the most recent being a month earlier, by combining what I could save from the array and comparing it to my backup I found I was missing around 18GB, out of 2.1TB of data. 

This I can live with as it is not mission critical stuff and easily replaced. So now what to do, the array is repaired but clearly not in a great state – I WANT A FRESH START!

Destroying a MDADM Array and LV

First thing first, this is how to destroy a RAID5 array, so be sure you want to do it – there is no coming back from this once it’s done!

In my case, my array is mounted on boot through fstab so I had to disable it in fstab and reboot (I find it easier this way as there aren't remnants hiding in a system process stopping the array from being shutdown). So rebooting my server without fstab mounting the array and binding my network share to it meant that I could deactivate everything.

First we need to deactivate the Logical Volume and Volume Group:
sudo lvdisplay – get the name of the volume group (Array in this instance).
sudo lvchange –a n Array – this deactivates the Volume Group
sudo lvremove Array – removes Volume Group

An example of lvdisplay from a previous setup:

sudo vgdisplay – get the name of the volume group (RAID5 in this instance).
sudo vgchange –a n RAID5 – this deactivates the Volume Group
sudo vgremove RAID5 – removes Volume Group

So now we need to stop the array:
sudo mdadm --stop /dev/md0

Zero the superblocks:
sudo mdadm --zero-superblock /dev/sda1 
(repeat for all disks, one at a time sda1sdx)

Remove the array:
sudo mdadm --remove /dev/md0

Finally, I did this to make sure the partition table was gone too:
sudo shred –n 1 /dev/sda  (repeat for all disks, one at a time sdasdx)

It should be noted I’d let the above run for five minutes or so and then cancel it (Ctrl – C), we only need to destroy the data on the front of the disk so we can create our new array without mdadm complaining (it’ll destroy the rest of the data anyway, also shredding even a single pass on a 2TB disk will take a while and there’s no real need unless you want to be painfully security conscious).

Creating a RAID5 Software Array and LV

So now it’s time to make our new array (thank Christ!), this is not as slow as recovering or even growing an existing array. All up creating the (now) 10TB array (9.3TB usable), took around six hours. Creating the LVM over the top is less than fifteen minutes.

So first, let’s partition our disks (properly this time – take it from me, check your work):
sudo fdisk /dev/sda

n – for a new partition
p – for a primary partition (followed by default boundaries)
t – to change partition type
1 – to select partition to change (you should only have one partition in this example)
fd – as the partition type (Linux RAID auto detect)
p – to view what your about to write to the disk matches the above
w – to write it to the disk!

Complete the above for all disks to be used in the array, in my case it was sda through sdf.

Now let’s create our new array:
sudo mdadm --create /dev/md0 --level=5 –n 6 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1

This is the long part as now the system will start creating the array. Level as referenced above refers to the RAID type, 'n' is the number of devices (six hard disks). A spare drive can also be added to assist in automating things when there's a failure, though this does not interest me. 

Fast forward six hours and my array is complete (at this point I was also quite drunk after six cans of Guinness). Let’s update the mdadm configuration file and propagate our new array through the system before we do anything (kernel included):
sudo mdadm --detail –-scan 

We need the UUID of our mdadm array (don’t confuse this with the UUID obtained later) and a few other details that were just spat out at you (see below for what you need). Right now though we need to edit the mdadm.conf file (it’s located at /etc/mdadm/mdadm.conf). Below ‘#definitions of existing arrays’, add the following from the above output (delete any existing entry here if you removed your old array too):

ARRAY /dev/md0 metadata=x.x name=xxxx:x UUID=xxx:xxx:xxx:xxx

Where the x’s are the data spat out from the above command (you’re probably wise to copy and paste the output TBH).

Propagate the changes through the system:
sudo update-initframs -u

The above is especially important if you've removed a previous array and are replacing it. Anyway time to get our LVM on.

Let’s create the physical volume:
sudo pvcreate /dev/md0
   
Now the Volume Group:
sudo vgcreate RAID5 /dev/md0

RAID5 is the name I chose for my Volume Group, I kept it the same as the one my friend helped me make originally so it was easier for me to follow.

To create the Logical Volume we need to know the total size of our Volume Group:
sudo vgdisplay RAID5 | grep “Total PE”

The result in my case was ‘2384496’. Now let’s make the LV:
sudo lvcreate –l 2384496 –m Array RAID5 /dev/md0

Where ‘Array’ is the name of my LV and ‘RAID5’ the name of my previously created Volume Group.

Finally, we need a usable file system, in my case ext4:
sudo lvdisplay – get the full path of our LV

sudo mkfs.ext4 /dev/RAID5/Array

A short time later the process will complete and you’ll have a software RAID5 with a ext4 Logical Volume.

I added mine back to fstab so it would auto mount, first you need the UUID of the array:
sudo blkid

Here we’re looking for the UUID of our new array /dev/mapper/RAI5-Array in my case (note the VG and LV name use here, that’s how you’ll know you've got the right one).



Finally, in fstab itself add a line similar to the below (you’ll have to be root to edit fstab BTW):
UUID=xxx-xxx-xxx-xxx-xxx /my/mount-point ext4 defaults  1    1   

Where the UUID is the UUID from blkid, /my/mount-point is where you are going to mount your new LVM (an example might be /opt/myarray), ext4 is the file system in use and the rest you don’t need to worry about unless you’re an advanced user (in which case you’re probably not reading this). The spacing between the mount point, ext4, default values etc should be a TAB spacing.

Reboot and you’re done.

Speeding up Growth, Recovery and Creation

If you've kept reading to this point good for you! Here’s some tips for getting better disk speeds when dealing with growing, recovering or creating an array. Some of these can be altered while the process is going, however things like bit mapping cannot and must be done beforehand. There’s some pros and cons and I’ll touch on them a little where applicable.

To give some ballpark figures from my system (these will be different based on the machine used):

Growing the array, 15 – 20 MB/s
Recovering the array, 5 – 7 MB/S peaking briefly at times to 20MB/s
Creating the array, 100MB/s (this dropped to 60MB/s at the end)

To monitor the systems progress during any of the above:
cat /proc/mdstat
or
sudo mdadm --detail /dev/md0

From here on in I did have some issues with ‘sudo’. From what I've been told it sounds like a re-direct issue, I couldn't execute these commands (permission denied error). Logging in as root however fixed this (sudo su). Just something to keep in mind.

Minimum and maximum disk speeds (can be changed during a running process):
sudo echo 20000 > /proc/sys/dev/raid/speed_limit_min

sudo echo 200000 > /proc/sys/dev/raid/speed_limit_max

This sets the minimum speed to 20MB/s and the maximum to 200MB/s. From experience this doesn't mean the system will hold these speeds if it can’t, so adjust the according to what you’re seeing from cat /proc/mdstat (use some common sense basically).


Stripe buffer caching (can be changed during a running process):
This is a bit like reading ahead, it accepts values from 16 to 32768 (32MB). There were a lot of warnings about this when I was reading about it, basically setting the value too high had caused swap storms and the like - too much memory (and/or virtual memory) had been used for the process is the short version. 

I took into account the age of the posts (they were reasonably old) and monitored the usage closely. I set the stripe buffer to 32MB and noticed an increase around the fifteen percent mark in RAM usage. I have 8GB of RAM and a 3.7GB swap partition, at this point I decided I was safe to leave it there. The default value is 256, obviously the reference to md0 below relates directly to the fact md0 is my array:
sudo echo 32768 > /sys/block/md0/md/stripe_cache_size

Bit-mapping (can only be enabled prior to the process commencing):
Now I have to admit I have no experience with this one, but if it prevents you sitting there for 4.2 days then it’s worth it. Bit-mapping reportedly improves recovery times after crashing as well as growth times, it does however gradually degrade your arrays performance if you do not disable it at the completion of the process (it should also be noted this is only relevant for certain RAID configurations of which RAID5 is one).

Enabled:
sudo mdadm --grow --bitmap=internal /dev/md0

Disabled:
sudo mdadm --grow --bitmap=none /dev/md0

Conclusion

Well that’s it folks. My system is now back up and running as it was before I broke it (ya!) and I hope to never have to do this again. Hopefully this saves someone from having as many browser windows open as I did when trying to learn all this stuff.

Here’s a picture of my hacked up server with its two 3.5” drives secreted in the optical bay slot – enjoy!


Comments

Popular posts from this blog

Installing OpenGD77 on an Retevis RT3s

Installing DD-WRT on a TP-LINK TL-WR842ND

Baby Monitor LCD Repair