Saturday, March 24, 2012

building the command-line reliably from user provided input

The shell offers many opportunities for zen-like unlightenment, by throwing spitballs at every attempt to make sense of user-generated input handling. By user-generated I mean anything originating from an unreliable entity (a.k.a. a user), and therefore without any guarantees towards being well-formed for its intended purpose. Users are notoriously good at this. You ask them to type in a number, they say 'twelve'. You ask them for their surname, they type 'Mozes Kriebel'. The last case is problematic because most programmers are just like ordinary people, and ordinary people assume surnames are just single words. Our habit is reasoning about the common case, instead of considering all possible alternatives, and therefore we live our lives believing such unbearable nonsense as 'all crocodiles are green', 'all people have ten fingers' and 'it never rains in the Sahara.' Each of us has his own set of ridiculous believes; I feel safe in the claim that there isn't a fact so wrong or it has a true believer. But maybe that's just one of my beliefs.
Back to the programmer who made the wrong assumption that surnames don't contain spaces. Our friend is trying to process a dataset of names by calling a utility for each one and he's using shell scripts. His first attempt didn't go so well.
name=`get_next_surname`
process_surname $name
The process utility expects just one argument--a surname. But when get_next_surname returned Mozes Kriebel, the shell's word splitting turned this into two arguments: "Mozes" and "Kriebel".
The programmer would soon overcome this problem by adding double quotes to prevent word splitting:
name=`get_next_surname`
process_surname "$name"
That will do for today's lesson. But wait, there is more!
Let's assume that the process_surname utility is actually richer in functionality, and may accept some options with arguments.
if [ "$name" != "$origname" ]; then
option="--nee $origname"
fi
process_surname $option "$name"
See what happened? If not, here's a short run-down. The first line checks if the name is identical to the original name (assume for the sake of the example that a name change could happen through marriage). Here, the double quotes are required as in the original script, to defend against shell script errors. If $name was unquoted, the test would read 'Mozes Kriebel != ...' which is incorrect syntax. The second line sets the --nee option, to pass to the processing function later on. This is also a quoted string.
The catch is in the final line. Observant readers spot the lack of quotes around $option. This is not a mistake! If $option were quoted, it would be passed as $1 to process_surname in its entirety, i.e. including the space following the '--nee' and the original surname after that. If this utility scans it's arguments looking for an exact match of '--nee', it won't find it. So we need the shell's word-splitting to separate '-nee' from what comes after it.
The problem is now clear. If $origname happens to be 'Jemig de Pemig' there seems to be no way to preserve the spaces on passing it as an argument to --nee.
I won't dwell on my journey along the Path of Many Misconceptions About the Shell, but I will show you just about the simplest way to do this generally.
set --
if [ "$name" != "$origname" ]; then
set -- --nee "$origname"
fi
process_surname "$@" "$name"
This is one of the few times I found a use for setting the positional parameters. The magic bit is in the use of "$@", which expands to the positional parameters, with quotes around each individual parameter. There is no other construct in the shell that does this. The set on line 3 made $1 equal to "--nee" and $2 equal to "Jemig de Pemig". The last line is then equivalent to
process_surname "--nee" "Jemig de Pemig" "Mozes Kriebel"
which is exactly what we need.

No comments: