Wednesday, June 6, 2007

On building systems of interdependent components

These ideas come forth from my difficulties with libtool. After having spend some 48 hours pontificating how to deal with the following situation.

We have to build a software system consisting of several interdependent components (no circular dependencies, that would just be weird). The idea is to build and package the components in some form for distribution and installation, such as rpms or debs. So there are at least two different machines to consider: the build system where the software is compiled, and the runtime system where the software is installed and run. In chronological order, here's what happens to the software:
(build time)
  1. compilation of source code into object code
  2. linking together of object code and dynamic shared libraries into runnable programs and libraries
(run time)
  1. linking of programs with dynamic shared libraries
  2. loading of the program into memory and executing.

The problem is in the dynamic shared libraries (.so files), which play a role both in steps 2 (build time) and 3 (run time). Somehow, this dual nature of these objects make them confusing.
Imagine you have three components, A, B and C, where B depends on A and C depends on B (and therefore indirectly on A). First, you compile and link A. That's the easy part. Then you want to build component B, but a problem starts to present itself. Component B depends on A, which is compiled but not yet installed. The GNU build system (autoconf, automake, make and libtool) is not attuned to the case where the shared objects of step 2 (needed at build time) are in another location than the shared objects of step 3 (needed at run time). Usually, if you needed a certain package, you would first install it and then build against it.
So what is against installing component A prior to building B? It turns out there are two fundamental reasons.
The first reason is that installing a component requires root privileges. Remember, we are building to package software as rpms, not ad hoc. The final destination of A is in /usr, or /opt/someplace, and a typical user has no rights there.
The other reason is that there may be a huge difference between the system you build on and the system you build for. It just makes no sense to install a component for the wrong system. And I'm not even talking about cross-compiling or anything, just imaginge that A is the C library and B is the compiler. It would be a really bad idea to replace the C library with a new, experimental version just so you could build the new compiler against it and at the same time using that new, experimental C library to do the actual build, now, wouldn't it?
But even in simpler cases the horror of having to install and test and use multiple versions of a component on your build system should be enough to want to hide in a bomb shelter armoured with regression test suites and stocked with a lifetime supply of valium.

To get back to the original discussion: where do we put A? The answer is simple, but has complications. We have to put A in a non-privileged location which will be staging for the actual run-time location. The packaging tools for rpms and debs know how to deal with that, so that liba.so ends up in

/opt/happyspot/lib/liba.so (runtime)

when it is staged in

/tmp/staging/opt/happyspot/lib/liba.so (build time).

So now we can tell B where to find A, and after building B it ends up in (or on) the stage as well.
The complications only just got started. B knows that it should look for A in staging at build-time, but once both are packaged and installed, that location is no longer valid. So we have to build B so that it can find A in the run-time location. But when we want to build C and point to B, the run-time location is not yet valid. There we have it. The build-time nature of B is inherently different from the run-time nature of B.

Do we really need two different versions of B? I would think not. There is no real difference other than the resolving of dynamic linking, which happens at build-time and run-time. But libtool is a pain: it insists on inserting the path of the staging area into the libraries and binaries in so-called RPATH entries, and the same path ends up in the libtool libraries (.la files). Although both the staging and the real path end up in RPATH and .la file, it is no good: the staging path has no place in the run-time files. It could lead to really ugly situations: someone could insert a trojan horse by recreating the staging location and installing a malicious library of the same name as the right one. Another real-life example is having the staging in /home/builduser, while the run-time system has /home automounted. Everytime a library was loaded, the NFS server would be hit with the request to serve up a non-existent directory.

While I don't have the ultimate solution for this problem, I would like to point out one more thing: having an RPATH in your executable sounds cool, but it has its drawbacks. After all, the system administrator may not like the location you picked to put the software. Relocating the software defeats the purpose and effect of your RPATH (and the same issues arise as were mentioned above).

No comments: