Shell Scripting For Speed

Inspired by a forum thread

Choose which shell your scripts use, that's the "#/bin/*sh" at the top:

  • dash: very small & fast but limited.
    you will have to use more external commands to accomplish things.
  • bash: bulkier, takes longer to load, slightly slower (although to notice this at all one has to do some deliberate benchmarking), but: you can accomplish much more with shell builtins. arrays, string manipulation... often the call to awk or sed is not required anymore
  • sh: AFAIK there's no shell executable by that name (anymore), it's the default shell, often a symlink to one of the above. It depends on your setup. Make sure you code for portability (POSIX) when you use this one. Which is practically the same as coding for dash.
  • This list isn't complete. There's many more shells: ksh, zsh, etc. etc.

Once you made that choice, it should influence your coding quite a bit. To compare dash with bash: you can accomplish a lot with bash's internal capabilities (arrays, to name only one), things that will require external commands with dash.

External commands

Example sleep: which sleep or type sleep - each call to an external command slows your script down:

  • the command needs to be read from hard drive
  • it is often overkill for the task required (e.g. calling sed, awk or grep for simple string manipulation, something even dash can do)

Command substitution and Pipes

Either construct will open a subshell (a shell within a shell) and is therefore to be used sparingly.
Some discussion here and here.

Command substitution

Means that the ouput of a command (internal or external) is encapsulated so that the shell receives its output as a string:

sh
echo "Today is $(date)"

Better:

sh
echo -n "Today is "; date

Pipes

cat file | grep word

Piping commands into each other creates a subshell for each pipe. Use sparingly!

Better:

grep word file

Both

Overkill examples might look like this:

sh
time="$(date | cut -d' ' -f5 | cut -d: -f1-2 ) echo "$time" a=$(echo 'hello' | tr '[:lower:]' '[:upper:]')

Better:

sh
date +%H:%M a="hello" a="${a^^}"

The latter is a bashism and a good example why sometimes bash is the better tool for the job.

Builtins

Be aware of what the internal commands (or builtins) of the shell in question are. Many commands (e.g. echo) are both internal and external. A hint: if you have a terminal open that runs the shell in question, and you can use help somecommand, then it's a builtin, and that's what the shell will default to. Or use type somecommand.

I Like to use Bash

Knowing of its capabilities, it allows me to make do with almost no piping to sed, awk etc. I reckon that in the end this is much faster - just imagine you have one of those hideous multi-pipe oneliners inside a loop that parses through a long file, or needs to be executed in short intervals - that's many, many subshells opened, reading external commands, and closed in rapid succession. If I run this command once, it may not matter. But if it's part of a recurring task or runs automated at login, it will save resources if I can replacer all of that with bash variables and string manipulation.

It's also difficult to add proper comments to "hideous multi-pipe oneliners".

I recommend everyone who chooses bash to read up on string manipulation. There's quite a few resources out there, and evtl. you will end up on this page with the obligatory disclaimer that it's outdated. Maybe this one for a quick reference, or this one.

And you definitely need to check out Greg's Wiki.

But Then Again...

I realise the beauty of "Do One Thing and Do it Well". Sometimes a simple shell and a few external commands are the better, more elegant solution. E.g. something like find | shuf would be pretty cumbersome to code in pure bash!