Thursday, November 1, 2007

LVM bad hair day

Logical Volume Management (LVM) for Linux is nice, but the snapshot feature kills your hard disk. That was my conclusion after messing about with snapshots as a cheap-n-easy chroot solution, a la the Debian schroot package.

I set up two snapshots and started writing a 5 GB file to each. 6 hours later (!) the job was done, in the meantime the box was as good as dead.

This spurred me on to investigate the LVM snapshot performance, and boy is it bad!

Thursday, September 20, 2007

Windows versus Mac OS

I just set up a virtual machine with Windows XP on my Apple MacBook. I've been using Linux since the mid-90s (and more recently, Mac OS X), and although I have used Windows at work in the past, I never actually had to go through the chores of installing and managing a Windows box myself.
The installation was a snap, but the trouble began when Windows really booted for the first time. So that is roughly the same point where so many normal consumers are when they take their shiny new computer out of the box, plug it in, and turn it on.
I was immediately warned by a yellow balloon in the lower right corner that my system may be at risk. The system insisted that I should turn on automatic updates, and install some anti-virus software. A colleague pointed me to avg, which was helpful, but before I settled down to do what I meant to do when I decided I needed windows (visiting a web site with Internet Explorer), I spent several hours boiler plating my Windows virtual machine, all the while feeling scared, like prey being hunted, as if the predatory dangers of malware and viruses could infect me any time until I secured everything completely.

So this led to the following epiphany:

The first thing you do when you use a new Mac, is get to work.
The first thing you do when you use Windows, is get to worry.

Friday, August 17, 2007

Grid software maturity

Wow, it's been two months since last post to this blog. Maybe this blogging thing is not for me. Anyway, blame it on the vacations.

Just a short post in the category of random thoughts. I'm working in grid computing, a software field that is trying to take cluster computing to a higher level. It dawned on me that grid software is rapidly evolving and that it's hard to tell end users what they should use now, because that may be the wrong advice in six months.

Then I figured that the evolution process was really a process of maturing. So in a sense, grid software is in puberty. It makes complete sense:
  • it is going through rapid, sudden changes;
  • some things are clearly getting better, but
  • some other things are (temporarily) getting much worse
  • it is getting bigger, growing more features, and
  • it's getting hairy (sorry, couldn't resist that last one)
Douse richly with a sause of anxiety, unease, insecurity and rebelliousness and you'll get the typical adolescent, misunderstood, obnoxious little monsters that our beautiful little children have to turn into before they suddenly become sensible and respectable grown-ups.

So it's a phase that grid software has to pass through if it is going to come of age. But like with children in puberty, you feel somewhat embarrassed to introduce them to your friends and relatives.

Thursday, June 14, 2007

To err is human, to really foul up you need a computer

Humans are the weakest link in any automated system, whether it is the humans at the end of the chain, sometimes derogatorily referred to as users, or the humans at the other end who did the best they could to design and build the system. Incidentally, that's why I believe the future envisioned in the Matrix is a reality; when the machines take over they will take humans out of the equation. Only they won't keep us around as organic batteries: we're not even good for that (bio fuel, maybe). But let's get back on topic.
In modern (semi-)automated systems, it's funny to see how little direct inter-human communication remains. Many of the emails and paper letters I receive are computer-generated, and most of my daily chores require little or no human interaction. To my amusement (or should I be worried?) many of the humans that are still needed around have just shut down their brains to let the thinking to the machines. The numbed look on the faces of the girls operating the supermarket checkout counters shows they know it: they are sort of lingering on the temporary state of being affordable robots, moving groceries past a bar code reader.



In a way, the computer has become a very complex, distorting communication channel between the humans at the opposing ends of the chain. Any sizable software project is like a modern tower of Babel; the software architects can't understand the network engineers, the database designer doesn't have a clue about the user's needs and the programmers are left in the dark with a cryptic technical design document. The resulting system doesn't do what the user expected, covers the wrong 90% of the technical requirements and oversteps both the deadline and the budget. The only one who thinks it's good is god.



The software industry has often been compared to other industries, often to its demerit. But it's not fair; software is extremely complex. It just doesn't stand a comparison to a car, or a coat hanger. A large program has many thousands of lines of code, and by simple statistics a very good programmer will make one mistake in a thousand lines. How far reaching a bug is can't easily be predicted.



Bug tracking systems are a great help (even though they are also semi-automated systems that humans communicate through). We are inclined to repair our faults as quickly as we may, in the hope of ever achieving the ultimate goal of having a bug-free program. But what I'm interested in has to do with software forensics. More of a biologists approach to bugs: instead of putting your heel on it to squash it, inspect it up closely to see where it came from, what it's habitat is, what it feeds on. A software bug is some form of programmer error; what was the cause? Programmer laziness? Fatigue? A misunderstanding of the requirements? An oversight? A subconscious TO DO item that was later forgotten? Hastiness? I have no hope of reducing the number of bugs in software until we start to dig in to these questions and find the key factors that influence the number of programming mistakes.



So let henceforth every bug tracker be outfitted with one more field: the blame field, indicating what was to blame for this bug. Otherwise, I have no hope.

Thursday, June 7, 2007

Success: eToken PRO on Debian

Finally, after several weeks of kicking this stubborn bucket of bits, I've managed to produce a working .deb for the Aladdin eToken PRO (TM). The eToken looks like a small USB stick, but it is a smartcard in disguise. They're about $30 and we've purchased a hundred or so for putting grid certificates on.

Many thanks to Jan Just Keijser, who pioneered the effort of getting these (poorly supported) tokens to work under Linux. He built the software as an RPM, which works under Red Hat EL 4 (and derivatives), Fedora Core 6 and Suse 10. Unfortunately, we can't distribute the RPMs due to licensing constraints (the Aladdin drivers are proprietary, bleh).

Building debs has been an interesting exercise in finding the subtle differences between RHEL4 and Ubuntu; following the Debian Policy (as well as I could); building binary debs on the fly and doing it the proper way (with sources); setting up an apt repository for said debs and, most of all, stumbling over the impossibilities with autotools and libtool (see yesterdays post).

As soon as I find a proper place to put op my debs I'll post the location. Contrary to the rpms, the deb is freely redistributable as it contains only the free (libre) software. Provided that you find the Aladdin drivers elsewhere, you may get your eToken to work on your Ubuntu laptop!

And oh, by the way: this has been tested on Ubuntu 6.06 and 7.04, and Debian 4 ("Etch").

Wednesday, June 6, 2007

On building systems of interdependent components

These ideas come forth from my difficulties with libtool. After having spend some 48 hours pontificating how to deal with the following situation.

We have to build a software system consisting of several interdependent components (no circular dependencies, that would just be weird). The idea is to build and package the components in some form for distribution and installation, such as rpms or debs. So there are at least two different machines to consider: the build system where the software is compiled, and the runtime system where the software is installed and run. In chronological order, here's what happens to the software:
(build time)
  1. compilation of source code into object code
  2. linking together of object code and dynamic shared libraries into runnable programs and libraries
(run time)
  1. linking of programs with dynamic shared libraries
  2. loading of the program into memory and executing.

The problem is in the dynamic shared libraries (.so files), which play a role both in steps 2 (build time) and 3 (run time). Somehow, this dual nature of these objects make them confusing.
Imagine you have three components, A, B and C, where B depends on A and C depends on B (and therefore indirectly on A). First, you compile and link A. That's the easy part. Then you want to build component B, but a problem starts to present itself. Component B depends on A, which is compiled but not yet installed. The GNU build system (autoconf, automake, make and libtool) is not attuned to the case where the shared objects of step 2 (needed at build time) are in another location than the shared objects of step 3 (needed at run time). Usually, if you needed a certain package, you would first install it and then build against it.
So what is against installing component A prior to building B? It turns out there are two fundamental reasons.
The first reason is that installing a component requires root privileges. Remember, we are building to package software as rpms, not ad hoc. The final destination of A is in /usr, or /opt/someplace, and a typical user has no rights there.
The other reason is that there may be a huge difference between the system you build on and the system you build for. It just makes no sense to install a component for the wrong system. And I'm not even talking about cross-compiling or anything, just imaginge that A is the C library and B is the compiler. It would be a really bad idea to replace the C library with a new, experimental version just so you could build the new compiler against it and at the same time using that new, experimental C library to do the actual build, now, wouldn't it?
But even in simpler cases the horror of having to install and test and use multiple versions of a component on your build system should be enough to want to hide in a bomb shelter armoured with regression test suites and stocked with a lifetime supply of valium.

To get back to the original discussion: where do we put A? The answer is simple, but has complications. We have to put A in a non-privileged location which will be staging for the actual run-time location. The packaging tools for rpms and debs know how to deal with that, so that liba.so ends up in

/opt/happyspot/lib/liba.so (runtime)

when it is staged in

/tmp/staging/opt/happyspot/lib/liba.so (build time).

So now we can tell B where to find A, and after building B it ends up in (or on) the stage as well.
The complications only just got started. B knows that it should look for A in staging at build-time, but once both are packaged and installed, that location is no longer valid. So we have to build B so that it can find A in the run-time location. But when we want to build C and point to B, the run-time location is not yet valid. There we have it. The build-time nature of B is inherently different from the run-time nature of B.

Do we really need two different versions of B? I would think not. There is no real difference other than the resolving of dynamic linking, which happens at build-time and run-time. But libtool is a pain: it insists on inserting the path of the staging area into the libraries and binaries in so-called RPATH entries, and the same path ends up in the libtool libraries (.la files). Although both the staging and the real path end up in RPATH and .la file, it is no good: the staging path has no place in the run-time files. It could lead to really ugly situations: someone could insert a trojan horse by recreating the staging location and installing a malicious library of the same name as the right one. Another real-life example is having the staging in /home/builduser, while the run-time system has /home automounted. Everytime a library was loaded, the NFS server would be hit with the request to serve up a non-existent directory.

While I don't have the ultimate solution for this problem, I would like to point out one more thing: having an RPATH in your executable sounds cool, but it has its drawbacks. After all, the system administrator may not like the location you picked to put the software. Relocating the software defeats the purpose and effect of your RPATH (and the same issues arise as were mentioned above).