Monday, May 26, 2008

Involuntary Grid Computing

It dawned on me how much alike botnets and grid computing really are.
  • Botnet operators control hundreds of thousands heterogeneous machines.
  • These machines have unpredictable uptimes. On this scale, there is always a sufficiently large pool of available machines.
  • The diversity of systems requires a middleware layer that is portable and 'works everywhere'.
  • The load on the machines should be kept 'under the radar' of the owner, otherwise the owner will become suspicious and reinstall the box.
  • The botnet operators rent out their powers to various applications, such as DDOS attacks, spamming, scamming and hosting dubious content.
  • A bot is deployed to a potentially hostile environment. The more advanced nets use PKI techniques, such as SSL and X509 certificates to protect their transfers, and to secure their control channel. This way they prevent being fingerprinted and counter-attacked.
I think botnets have matured enough to be called something else. Hence I coin the term 'involuntary grid computing'.

Friday, May 16, 2008

Ranting, raving madness of rpmbuild

We use rpmbuild to create nice, shining RPMs for a bundle of mostly open-source scientific software called the VL-e Proof-of-Concept. As our experience with building RPMs is growing, we automated a couple of things and have time to spare for more fun stuff.

This rant is about a problem I just discovered, and that I can only file under 'madness'.

First, the prologue. It started on a gray, drizzly morning, ... no wait; that's no good. 'A dark and stormy night.' Yeah, that's it. It started on a dark and stormy night...

One (1) of the files in a certain RPM (let's call it exhibit X) was missing an MD5 checksum. This was odd, because all files in an RPM must have a checksum. With the help of rpm2cpio I could extract the files and lo and behold: the file was there. Even better: the file was in good shape.

This particular RPM is a binary repackaging from a tarball distributed by a bunch of raving lunatics who think it's fun to ship a 1.5GB tarball, mostly consisting of irrelevant, experimental data. And to have a source tree that is impossible to build correctly, because all of the algorithms are somewhat numerically unstable. What else can you expect from brain surgeons?

So the binary tarball is used to create an RPM. And this one file I am talking about is a shared library, nothing strange about it. But rpmbuild tries to be clever...it knows about a phenomenon called prelinking.

Now, a disclaimer goes here: I know nothing about prelinking. Or rather, I wish I didn't. What I do know is that all your system's libraries are 'prelinked', which means that they are changed to behave better, or something. Apparently the change is reversible and system-dependent, because you would typically run the prelink script from a daily cron job.

Here comes the part where rpmbuild tries to be clever: it knows that RPM can verify the installed files by comparing their checksums to their original values in the RPM database. Prelinking defeats such a check, so RPM will try a just-in-time reversal of the change, save it to a temporary file, and compare the checksum of the original. If you don't believe me, look at /etc/rpm/macros.prelink.

The same reversal/md5sum thing is done on all libraries you package with rpmbuild, and on this one file, it went wrong.

prelink: libqt-mt.so.3.3.3: at least one of file's dependencies has changed since prelinking

So it decided to forego the calculation of an md5sum altogether. That's a bug, alright?

The 'solution' I came up with is editing my SPEC file to redefine the 'prelink undo' calculation.

%define __prelink_undo_cmd /bin/cat prelink library

Compare this to the original definition, and read the manual page of prelink to see why it works like this.