Bash (and shell scripting in general) is NOT straightforward. It’s easy to mess up if we’re not careful. Even if you come from a traditional programming background and just want to plumb a few lines of code, there are still going to be a few Bash behaviours that can really confuse you. To help with that, Aaron Maxwell created the unofficial bash strict mode.
In this 3-part series, we’ll go over some misleading Bash behaviours and how the strict mode can be helpful in each case (quirks included). Throughout this series, we’ll also be referring to the Bash reference manual. We’ll talk about the problems, solutions and quirks in 3 areas: errexit, pipefail, and nounset.
Look at this Bash script:
#!/usr/bin/env bash
cat /tmp/i_do_not_exist
echo "Hey"
Given the file does not exist, should it run all the way through or should it fail?
Turns out the script continues running just fine!
When tackling the problem, I tend to resort to the “Fail Fast” approach. From the Fail Fast – C2 wiki:
This is done upon encountering an error so serious that it is possible that the process state is corrupt or inconsistent, and immediate exit is the best way to ensure that no (more) damage is done.
Sounds reasonable. So why is “fail silently” the normal behaviour in a shell script? Well, in the context of a shell, you DO NOT want to exit when there’s an error (imagine crashing your shell when you cat a file that doesn’t exist). The behaviour was simply carried out to the non-interactive shell.
How can we improve this behaviour? By setting the flag errexit:
set -o errexit
Or use the shorthand version (more commonly used):
set -e
What does this do? According to the manual, we should:
Exit immediately if a pipeline (…) returns a non-zero status.
Going back to our example, we would do this instead:
#!/usr/bin/env bash
set -e
cat /tmp/i_do_not_exist
echo "Hey"
…which would then fail. Since the file doesn’t exist, cat returns a non-zero exit code. This behaviour is described in the following Bats unit test:
#!/usr/bin/env bats
load '../../../node_modules/bats-support/load'
load '../../../node_modules/bats-assert/load'
@test "runs fine even though file does not exist" {
run "$BATS_TEST_DIRNAME/errexit.sh"
[ "$status" -eq 0 ]
}
@test "fails since file does not exist AND errexit is turned on" {
run "$BATS_TEST_DIRNAME/errexit2.sh"
[ "$status" -ne 0 ]
[ "$output" == "cat: /tmp/i_do_not_exist: No such file or directory" ]
}
That works and will definitely IMO help you, but watch out for the quirks.
Not all commands return 0 on successful runs. The most prominent example is grep. From the manual:
Normally the exit status is
However, if the -q or –quiet or –silent is used and a line is selected, the exit status is 0 even if an error occurred.
So, in the example below, echo will never be run.
#!/usr/bin/env bash
set -e
status_code=$(grep non_existant_word /dev/null)
echo "Hello world"
What can we do in this situation? Thankfully, there’s a bit in the Bash manual in the errexit section that can help (reformatted for clarity):
The shell does not exit if the command that fails is:
In our case, we can simply rewrite to comply with #2:
#!/usr/bin/env bash
set -e
if grep non_existant_word /dev/null; then
echo "Hello world"
else
echo "Does not exist"
fi
This behaviour can be verified by the following Bats test:
#!/usr/bin/env bats
load '../../../node_modules/bats-support/load'
load '../../../node_modules/bats-assert/load'
@test "fails since grep returns non 0" {
run "$BATS_TEST_DIRNAME/grep_fail.sh"
[ "$status" -ne 0 ]
}
@test "runs fine since grep is in a if statement" {
run "$BATS_TEST_DIRNAME/grep_correct.sh"
[ "$status" -eq 0 ]
[ "$output" == "Does not exist" ]
}
Source: /posts/bash-strict-mode/grep.bats
In that case, simply run an OR operation with true:
rm *.log || true
…because we don’t want to fail if there are no log files.
Why does this work? Recall #3 from the manual:
The shell does not exit if the command that fails is (…)
3. part of any command executed in a && or || list except the command following the final && or ||
As the command following the final || is true, there’s no way for the whole line to fail.
Another option would be to turn it off briefly:
set +e
command_allowed_to_fail
set -e
The + syntax means “remove” and – means “to add” (counterintuitive, yes). Therefore, we’re simply disabling that feature while our command_allowed_to_fail is called!
This is not specific to errexit, but often you need to know where the command failed.
#!/usr/bin/env bash
set -e
function random_bytes {
echo $(head -c "$1" /dev/random | base64)
}
random_bytes 10
random_bytes 50
random_bytes
random_bytes 5
But how can you tell which command failed (apart from looking at the very obvious mistake)?
Once you get the gist of errexit, read the entry on BashFAQ and the linked resources.
Other parts in this blog series:
Part 1 – errexit Part 2 – pipefail
Part 3 – nounset coming soon!
“Stay tuned for parts 2 (pipefail) and 3 (nounset)!
May 7th, 2020
by Robert Golabek in Technology
⟵ Back
See more:
December 10th, 2021
Cloud Composer – Terraform Deploymentby Patryk Golabek in Data-Driven, Technology
December 2nd, 2021
Provision Kubernetes: Securing Virtual MachinesAugust 6th, 2023
The Critical Need for Application Modernization in SMEs