Bash primer

The transcript from a short primer I did on Bash for Cogapp Tech Tuesday. I don't profess to be a Bash guru, so if you spot anything that's incorrect then please tweet at me.

What is Bash?

Bash is the Bourne Again SHell. It's an extensions of the standard sh shell, and is available pretty ubiquitously on *NIX-based systems.

Why Bash?

There are lots of things that require some kind of simple processing, and Bash aims to provide a toolkit to let you do this. In theory it's a complete programming language, but in practice I'd use a real scripting language like Python.

What Bash excels at is integrating with the system, and talking to native utilities.

The secret sauce of a lot of simple Bash scripting is the pipe—the Bash philosophy is that commands should do a single thing well, and the pipe lets you send output from one command directly into another.

Bash is really about text processing, and it's very good at that, but the tools do get quite esoteric.

A note on --help and manpages

These are documentation.

How to Bash?

We've said that Bash is available pretty globally. But where does it live? We can see all the available shells on a system with:

> cat /etc/shells
# /etc/shells: valid login shells
/bin/sh
/bin/dash
/bin/bash
/bin/rbash

We can see the current shell with echo $SHELL.

If we run a script we want to make sure it's run with Bash, rather than /bin/sh or another shell, since we might not have access to the commands we need. Fortunately, we can specify what is used to run the script using a hashbang. Let's make a basic script:

touch script.sh
chmod 775 script.sh

We could add in a conditional:

if [[ -e "/tmp" ]]
then
  echo "hello world"
fi

If we run this with /bin/sh it will give:

> /bin/sh script.sh 
script.sh: 3: script.sh: [[: not found

We need to make sure it will always run with /bin/bash, since it may not always be the executing shell:

We make the contents of the file:

#!/bin/bash
if [[ -e "/tmp" ]]
then
echo "hello world"
fi

Now this will always be run with /bin/bash if available, no matter what the login shell is.

The basics

Variables

So to start doing some Bash scripting we need some language constructs.

Variables can be assigned in Bash like this:

MYVARIABLE="hello world"
echo "$MYVARIABLE"

Double-quoting is important in Bash, as it acts as an escape mechanism for special characters which may otherwise be interpreted differently. For example;

> touch cat rat mat
> ls
cat mat rat
> ls *at
cat mat rat
> ls "*at"
ls: cannot access '*at': No such file or directory
> ls [cm]at
cat mat rat
> ls "[cmr]at"
ls: cannot access '[cmr]at': No such file or directory

See the GNU manual entry on shell expansion for more examples.

For now, just take it for granted that you should always double-quote things unless you have a reason not to.

If you want to interpolate variables within other strings, you can use curly braces:

MYVARIABLE="hello world"
echo "${MYVARIABLE}s"

Types

Bash doesn't really have 'types' by default. Variables are untyped strings, but do depend on context. There are informally these types:

  • String
  • Int
  • List/array

Where Ints are just Strings that only contain numbers, and can be used with the arithmetic operator.

You can use the declare command to more strictly define the types of variables if required (see declare --help).

Logical constructs

Bash has a few basic logical constructs:

FOO="hello"

if [[ "$FOO" = "hello" ]]
then
  echo "FOO is hello"
else
  echo "FOO is not hello"
fi
ARR=(1 2 3)
for I in "${ARR[@]}"
do
  echo "$I"
done
I=0
while (( I < 10 ))
do
  echo "$I"
  (( I++ ))
done
ARR=("foo" "bar" "baz")
for I in "${ARR[@]}"
do
  case "$I" in
    foo )
      echo "foo!"
      ;;
    baz )
      echo "baz!"
      ;;
    * )
      echo "dunno!"
      ;;
  esac
done

Braces

You've seen a couple of types of braces. The single square brace [ ] is an alias for the test command. We can see the manpage with man [.

As with quoting, the double square brace [[ ]] is the safer choice. You can use [ ], but it has some expansion and splitting caveats.

The double brace (( )) is used for arithmetic evaluation, and can be used with variables without using the dollar sign.

In conditional [[ ]] double square braces there are various different kinds of logical evaluations

You can do logical and/or with && and ||. You may see literal and and or used sometimes, but this is outdated and should not be used in new code.

See the Bash Conditional Expressions manual entry for all of the possible conditional operations.

Adding a dollar

But sometimes we want to evaluate things and save the result of that evaluation for later. Well, if we combine braces with the dollar we can do that.

#!/bin/bash

FOO="$(echo "hello world")"
echo "$FOO"

You can also see here that, due to the way parameter expansion works, we're free to nest double quotes inside the quoted variable.

#!/bin/bash

ONE=1
SUM=$(( ONE + 2 ))
echo "$SUM"

The good stuff

Special and positional parameters

So we have a script, and we want to pass arguments to it. How does this work?

When calling a script (or a function), arguments passed are automatically assigned to the positional parameters $1 onwards. $0 is reserved for the name of the script/function.

For example, we can see arguments passed to the script with:

#!/bin/bash

echo "$0"
echo "$1"
echo "$2"

This is nice, but what if we want to see all arguments without having to type them out?

We can use list expansion with the @ symbol to do this (you'll see this done with the * symbol too).

#!/bin/bash

echo "${@}"

What if we want to see the number of arguments? We can use #:

#!/bin/bash

echo "There are ${#} arguments"
echo "${@}"

These aren't restricted to just these parameters—we can use them with anything:

#!/bin/bash

FOO="hello world"
echo "'$FOO' contains ${#FOO} characters"
#!/bin/bash

LIST=("hello" "world")
echo "The list contains ${#LIST[@]} elements"
echo "${LIST[@]}"

There are some other cool special parameters that use $ too.

$$ gives the current process ID. You could use this to see your current bash process with ps "$$", or your open file descriptors with ls "/proc/$$/fd". You will see this changes when you swap terminal windows because each terminal window starts its own bash process.

$? gives the exit code for the last executed command. This can be useful in scripts if you want to do tests for success/failure. For example, run ls then echo "$?" and you'll see 0, which is a successful exit. Running ls zxcv then echo "$?" gives 2, which is a serious exit (as defined in man ls). In contrast, cat zxc returns an exit code of 1.

See more of these in the manual.

Exit codes

Now that we know about exit codes, how do we do one? Simple, just use exit <code>.

We can demonstrate this with our scripts. Other than 0 being the regular 'success' code, you can use whatever exit codes you want. The most basic is 0 for success and 1 for error, but as we saw with ls you can also have multiple error conditions with different exit codes if you want.

Pipes

Pipes are cool. Most of you probably know about pipes already—they let us send the result of one command on to be processed by another, e.g. ps | grep bash.

The pipe passes the contents of stdout to the stdin of the next command, so we can do echo "foo" | cat, but not echo "foo" | cat | echo because echo expects an argument. We can use xargs to fill this void—this passses the stdout of the previous command as an argument to the next one: echo "foo" | cat | xargs echo.

See the manual for more on pipes.

Read

It's often useful to read in user input. You can do this with read:

#!/bin/bash

read -p "Enter some text > " VAR
echo "You entered: $VAR"

You can also use this for confirmations:

#!/bin/bash

read -p "Do you want to do this? (y/n) > " VAR

if [[ "$VAR" =~ ^(y|yes)$ ]]
then
  echo "Execute order 66"
else
  echo "Surrender!"
fi

Input/output redirection

Along with process redirection we can also do input and output redirection. Most of you have probably seen output redirection before: echo "hello" > test.txt. This redirects the output of the echo to the file test.txt. If you run this a bunch of times test.txt will not change, because the single arrow overwrites the file. echo "hello" >> test.txt will append the contents, so you can do this as many times as you want.

However, this only redirects stdout. If we try cat zxc > foo.txt we'll see foo.txt is empty. This is because the output goes to stderr, and redirection only applies to stdout.

However, we can also use redirection on our stdout and stderr file descriptors, with this special syntax: 1>&2. You've probably seen this before with something like command --arg > /dev/null 2>&1. This redirects both stdout and stderr to /dev/null. In a more verbose form this is really command --arg 1>/dev/null 2>&1, which means redirect stdout to /dev/null, and redirect stderr to the same place as stdout. Ordering is important here—the other way around will not work because it will redirect stderr to stdout, then redirect stdout to /dev/null afterwards, leaving stderr going to stdout.

So going back to our previous example, cat zxc > foo.txt 2>&1 will dump the error in there.

But what if we want to redirect the text to the file and show it on stdout. Or what if we want to write to a protected file? Applying sudo doesn't work on output redirection—sudo cat zxc > foo.txt this just runs cat as sudo.

Enter tee—like a T in plumbing, it redirects the flow of information in two directions. We use this with a pipe, so we can do cat zxc 2>&1 | sudo tee foo.txt. Again, where we do the output redirection matters—we have to redirect stderr for cat so it goes to stdin for tee, as the pipe only redirects stdout and not stderr.

Output redirection is also useful for writing scripts. Remember exit? Well, if we exit with an error then any messages should also go to stderr. We can do this with output redirection:

echo "Some error message" 1>&2

Input redirection is less common, but can be useful. It's similar to process redirection, except it only works for files. It lets you redirect the contents of a file to the stdin of a command, which may be useful if you want to send stored data to a command. This is similar to what you can achieve with a pipe—for example, if we have a file containing this text:

cat
rat
dog
horse
bat

We could do cat file.txt | grep at, which redirects the contents of the file from stdout to the stdin of grep. However, we can also do this with input redirection without using a pipe grep at < file.txt. This also has the advantage of not using pipes, which as we'll see in a bit may not do exactly what you expect in a script.

Process redirection

There is another way of passing information to commands that expect a file-like input—Process Redirection. This creates something called a file descriptor (we saw these earlier when we looked at processes) and passes the output of the invoked command through that.

For example, cat "hello" doesn't work because cat expects to be passed a file or stdin (see man cat). However, we can do cat <(echo "hello"). Note that the arrow is part of the process redirection syntax.

Why does this work? Run echo <(echo "hello") and you'll see that instead of the string it returns the path to a file descriptor. Under the hood, this creates the file descriptor and passes it to the command.

Magic!

Safe scripting

We can use another special parameter $- to see what shell flags are set at the current time. You'll probably get something like this:

> echo $-
himBHs

What does that mean? Let's check help set:

-b Notify of job termination immediately.
-h Remember the location of commands as they are looked up.
-m Job control is enabled.
-B the shell will perform brace expansion
by redirection of output.
-H Enable ! style history substitution. This flag is on

But what about is? Turns out these are flags set by bash itself. From man bash:

-i  If the -i option is present, the shell is interactive.
-s  If the -s option is present, or if no arguments remain after option
processing, then commands are read from the standard input. This
option allows the positional parameters to be set when invoking an
interactive shell or when reading input through a pipe.

However, there are some other set options that are useful for almost all shell scripts:

set -o errexit
set -o nounset
set -o pipefail

errexit is pretty straightforward. Say we have a script like this:

#!/bin/bash

echo "hello"
ls zxc
echo "world"

We know that ls zxc will be an error if zxc doesn't exist, so we might expect this would do something like:

"hello"
ls: cannot access 'zxc': No such file or directory

However, by default it will do:

"hello"
ls: cannot access 'zxc': No such file or directory
"world"

This is because errexit is suppressed by default in the shell. And this makes sense—if it was on, running ls zxc would exit your terminal session. But when we run a script, this is useful. If we modify our script like this, it will do something more sensible:

#!/bin/bash

set -o errexit

echo "hello"
ls zxc
echo "world"

This makes sense for most scripts, since we wouldn't expect anything to go wrong. You can still suppress errors with output redirection if you want, but it should be done explicitly rather than implicitly.

nounset is another pretty obvious one. Imagine we have a script like this:

#!/bin/bash

MSG="${1}"

echo "$MSG"

Okay, we've just reinvented echo. But what happens if we don't pass an argument?

Now really, this isn't correct. Our contract with the script is that the caller passes an argument. This isn't super critical here, but if you aren't catering for the explicit absence of the argument you're going to be caught short when it's not supplied. If we set nounset it changes this behaviour to be more sensible:

#!/bin/bash

set -o nounset

MSG="${1}"

echo "$MSG"

Now if we don't supply an argument, we get;

> ./script.sh 
./script.sh: line 5: 1: unbound variable

Okay, but what if we want the option of passing a argument or having a default? Bash lets you do that too:

#!/bin/bash

set -o nounset

MSG="${1:-"You didn't set a message"}"

echo "$MSG"

nounset also applies to any regular variable. For example, if we removed the setting of MSG we'd get an error that it's not set when it's called in the echo.

Finally pipefail, which is a bit less obvious. Remember how I mentioned before that you need to be aware of how pipes work? If we do something like this:

#!/bin/bash

ls zxc | grep a
echo "done"

This will run quite happily and print "done", even though there's an error in the ls command. As per the manual, the exit status is the exit status of the last command in the pipeline, so in this case the grep.

What we want is for the exit status to be the one of the failed command. If we set pipefail, that's what we get:

#!/bin/bash

set -o pipefail

ls zxc | grep a
echo "done"

Wait a minute, nothing's changed! That's because although the pipeline now fails, the script doesn't exit because errexit is off. Let's add that too:

#!/bin/bash

set -o pipefail -o errexit

ls zxc | grep a
echo "done"

Now it exists at the failed pipe, and exits with a failure code. Success!?

N.B. these options can also be set using their short codes. You might see it written in shorthand like this:

set -euo pipefail

pipefail doesn't have a short option, so will always be written in long form.