Wednesday, May 11, 2011

notes from a dirty system installation

Normal system installations involve boot media, such as a CD-ROM, USB, or even a floppy. In our case it's PXE boot (netboot), which is a little more involved to set up initially because you need a network plan with DNS, DHCP, tftp and probably HTTP, but it is definitely worth the effort if you have to manage a couple of hundred systems. Some new ways to do installations have arrived with the introduction of virtual machines, and this is very easy as you only need to provide the disk or CD-ROM images as files on the host system.
But Sven asked me to transfer a system from a virtual machine to a physical box, for some reason that I won't mention here now. He would provide me with the (small) disk image of the virtual machine, and the physical box was something old that had already seen some use and was hooked up with the network.
I quickly realised that this was going to be an interesting exercise. I would need to write the image to disk, which meant that I had to boot into a ramdisk of sorts. The first problem that presented itself was that the box would only do PXE boot, and as the network was not under my control I would have to involve other system administrators.
The box had a previous installation on it (Backtrack, which is Debian based), and I figured I might as well try to do everything from within this installation.
After adding my ssh key to /root/.ssh/authorized_keys (and turning off the firewall) I could get out of the noise of the machine room and work from the peaceful quiet of my office. By inspecting /proc/partitions I found out the machine had 2 disks, and Sven agreed that we should set up a (software) RAID1 mirror set.

Now the system was running from /dev/hda1, and I couldn't mess with that disk live. (You should try this some day if you feel in a particular evil mood; run dd if=/dev/zero of=/dev/sda in the background while you continue to work. Observe how the system develops amnesia, dementia and finally something close to mad cow disease.) I decided to do something dirty: I would create a RAID1 set with just one disk. The mdadm program thinks this is a bad idea, so you have to --force the issue. The command-line was mdadm --create --level=1 -n 1 --force /dev/md0 /dev/hdb1 or something. Of course I first repartitioned /dev/hdb to have just a single partition of type raid autodetect. Next step: losetup -f vmdisk.img to treat the disk image as a block device, and kpartx -a /dev/loop0 to have mapped devices for each of the partitions inside. Now a simple dd if=/dev/mapper/loop0p1 of=/dev/md0 was all I needed to write the image.
The next step was to boot into the newly written system (a Debian 5). This involved some grubbery, after mount /dev/md0 /mnt and chroot /mnt I could navigate the system as if it was already there. I had to edit /boot/grub/menu.lst to set the root device to (hd1,0) and root=/dev/md0, and I had to install grub on the first bootable disk, which was (hd0,0). After cloning /etc/network/interfaces from the present system and setting up ssh keys again to ensure acces, I rebooted with fingers crossed.
Call it luck, but it worked. I was now running the cloned system from a RAID1 root device with only one disk. I did a resize2fs /dev/md0 because the image I originally wrote to it was really small compared to the disk. Now it was /dev/hda's turn to be added to the RAID set. After repartitioning to have the same size as its counterpart (the two disks weren't the same size), I added it with mdadm --manage --add /dev/md0 /dev/hda1 which unfortunately didn't work as expected, as the new addition just became a spare.

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 hda1[1](S) hdb1[0]
120053632 blocks [1/1] [U]

Notice the (S) which indicates that hda1 is a spare. It won't be used until another disk fails, but as this set unfortunately only has a single disk, a single disk failure means game over.
The final command to activate the spare was mdadm --grow --raid-devices=2 /dev/md0. This enlarges the raid set, and the spare will now be activated. Indeed, the system started to recover immediately:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 hda1[2] hdb1[0]
120053632 blocks [2/1] [U_]
[>....................] recovery = 0.0% (84608/120053632) finish=70.8min speed=28202K/sec


It took a while, but eventually I had a mirrored two-disk raid set!

No comments: