Formal verification allows to find bugs in a program by systematically exploring all its feasible executions. I have used several formal verification tools during my PhD; since I feel there is a lack of technical blog posts about this topic on the Web, I’ll try and write some :)
This post deals with bounded model checking, specifically the CBMC tool and how to combine it with the Glucose solver. I believe this leads to a measurable increase in performance, at least when checking unsafe programs.
Background
Safe and unsafe programs
Assume you have a C program src.c
, which contains an assert(expr)
statement
somewhere.
We want to ensure that the assertion is never violated, i.e., that whenever
the execution of the program reaches that statement, expr
evaluates to a
non-zero value.
If this is the case, we say that the program is safe, otherwise it is unsafe.
Bounded Model Checking
Ok, but how do we do that? A popular technique is Bounded Model Checking (BMC), i.e., the exploration of all possible executions of the program, up to a certain “bound” k.
In practice, such bound is used to unwind all loops within the program,
yielding a “bounded” program that perfectly reproduces all execution of src.c
up to k iterations of its loops.
So, a program like this:
int main(void) {
int x = 0;
while (x >= 0) { /* BODY */ }
}
Becomes something like:
int main(void) {
int x = 0;
if (x >= 0) { /* BODY */ } // --+
if (x >= 0) { /* BODY */ } // |
if (x >= 0) { /* BODY */ } // +-- "k" times
// ... // |
if (x >= 0) { /* BODY */ } // --+
}
A BMC tool will then analyse the bounded program, looking for an assertion violation. Variables that are not given a specific value (e.g., those containing an input from the user) are treated as nondeterministic: this means that the tool will explore the behaviour of the program for any possible value of those variables.
Notice that these tools can only certify that a program is unsafe; if they don’t find any assertion violation for a given bound k, it may either mean that the program is safe, or that k was too shallow a bound. In contrast, other verification techniques, can actually prove the safety of a program. (Some of them are even based on BMC!)
But wait, doesn’t this mean that BMC is obsolete?
Not really. Despite its limitations, BMC has become popular because it is an intuitive and efficient approach to bug-finding, which can be (and has been) readily adopted outside the academia.
CBMC, Minisat, and Glucose
CBMC is a popular BMC tool for C programs. It performs symbolic model checking by transforming the program under verification (actually, its bounded version) into a Boolean formula. Then, it feeds the formula to a SAT solver. If the formula is satisfiable, then the program is unsafe.
+-----------+ bounded +---------+ +--------+
C | Loop | program | SAT | formula | SAT |
program ---> unwinding +-----------> encoder +---------> solver |
| (etc.) | | | | |
+-----------+ +---------+ +--------+
(Go to AsciiFlow to generate awesome ASCII diagrams!)
CBMC ships with Minisat as its default SAT solver. Minisat is a well-respected tool, but in recent years more efficient alternatives have been put forward. In this post we will focus on Glucose, which has won several competitions. However, I am pretty sure that other tools can be integrated with CBMC in a similar way (as long as they expose the same I/O interface as Minisat and Glucose).
Using Glucose as a CBMC backend
I performed this procedure on a WSL2 Ubuntu, i.e., a VM running on a Windows 10 host; of course, it should work also on s bare-metal Ubuntu install :)
Requirement
To compile Glucose, you will need a C/C++ compiler, make
, and zlib
(a compression library).
On Ubuntu, you can install all this stuff with the command
sudo apt install build-essential zlib1g-dev
Install CBMC
Go to CBMC’s GitHub repository
and get the latest .deb
release.
(The latest release was cbmc-5.20
at the time of writing.)
Download and build glucose
I am not aware of any binary distribution of Glucose. Luckily, it’s pretty easy to build from sources.
Just get the latest source tarball (v4.1), decompress it, and compile:
tar -zxvf glucose-syrup-4.1.tgz
cd glucose-syrup-4.1/simp
make rs
You should end up with a glucose_static
executable in the
glucose-syrup-4.1/simp
directory.
Make Glucose output the model
By default, Glucose will only tell you whether a formula is satisfiable or not. In the former case, we want it to also output a model, i.e., an assignment that satisfies the formula. Why we want that? Because then CBMC can use this model to produce a violation witness, i.e., a trace of the program that leads to an assertion violation. These witnesses, or counterexamples, are essential to debug the program!
Let us then create a tiny wrapper script, which we’ll call glucose_model.sh
:
#!/bin/sh
/path/to/glucose-syrup-4.1/simp/glucose_static -model $1
After saving it, let’s give ourselves permission to execute it with the command:
chmod +x glucose_model.sh
Putting it all together
We’re almost there! We can now invoke CBMC as follows:
cbmc --external-sat-solver="/path/to/glucose_model.sh" src.c --trace
CBMC will use Glucose to verify src.c
.
The --trace
flag tells CBMC to use the
model from Glucose (if any) to build a violation witness.
Thanks to Glucose, I have experienced a relevant speed-up in the verification of some (unsafe) programs. I will keep testing this setup in the future, but it does look promising!
Conclusions
I hope I gave a clear, if concise, overview of bounded model checking of C programs. This post just brushes the surface of the topic: CBMC is a complex piece of software which supports a very large subset of C, and thus has to deal with a lot of corner cases. Also, there is a plethora of verification methods other than BMC out there! I hope this post made you just a little bit curious about that :)