Formal verification allows to find bugs in a program by systematically exploring all its feasible executions. I have used several formal verification tools during my PhD; since I feel there is a lack of technical blog posts about this topic on the Web, I’ll try and write some :)

This post deals with bounded model checking, specifically the CBMC tool and how to combine it with the Glucose solver. I believe this leads to a measurable increase in performance, at least when checking unsafe programs.

Background

Safe and unsafe programs

Assume you have a C program src.c, which contains an assert(expr) statement somewhere. We want to ensure that the assertion is never violated, i.e., that whenever the execution of the program reaches that statement, expr evaluates to a non-zero value. If this is the case, we say that the program is safe, otherwise it is unsafe.

Bounded Model Checking

Ok, but how do we do that? A popular technique is Bounded Model Checking (BMC), i.e., the exploration of all possible executions of the program, up to a certain “bound” k.

In practice, such bound is used to unwind all loops within the program, yielding a “bounded” program that perfectly reproduces all execution of src.c up to k iterations of its loops.

So, a program like this:

int main(void) {
    int x = 0;
    while (x >= 0) { /* BODY */ }

}

Becomes something like:

int main(void) {
    int x = 0;
    if (x >= 0) { /* BODY */ }   // --+
    if (x >= 0) { /* BODY */ }   //   |
    if (x >= 0) { /* BODY */ }   //   +-- "k" times
    // ...                       //   |
    if (x >= 0) { /* BODY */ }   // --+

}

A BMC tool will then analyse the bounded program, looking for an assertion violation. Variables that are not given a specific value (e.g., those containing an input from the user) are treated as nondeterministic: this means that the tool will explore the behaviour of the program for any possible value of those variables.

Notice that these tools can only certify that a program is unsafe; if they don’t find any assertion violation for a given bound k, it may either mean that the program is safe, or that k was too shallow a bound. In contrast, other verification techniques, can actually prove the safety of a program. (Some of them are even based on BMC!)

But wait, doesn’t this mean that BMC is obsolete?

Not really. Despite its limitations, BMC has become popular because it is an intuitive and efficient approach to bug-finding, which can be (and has been) readily adopted outside the academia.

CBMC, Minisat, and Glucose

CBMC is a popular BMC tool for C programs. It performs symbolic model checking by transforming the program under verification (actually, its bounded version) into a Boolean formula. Then, it feeds the formula to a SAT solver. If the formula is satisfiable, then the program is unsafe.

           +-----------+  bounded  +---------+         +--------+
   C       |   Loop    |  program  |   SAT   | formula |  SAT   |
program ---> unwinding +-----------> encoder +---------> solver |
           |  (etc.)   |           |         |         |        |
           +-----------+           +---------+         +--------+

(Go to AsciiFlow to generate awesome ASCII diagrams!)

CBMC ships with Minisat as its default SAT solver. Minisat is a well-respected tool, but in recent years more efficient alternatives have been put forward. In this post we will focus on Glucose, which has won several competitions. However, I am pretty sure that other tools can be integrated with CBMC in a similar way (as long as they expose the same I/O interface as Minisat and Glucose).

Using Glucose as a CBMC backend

I performed this procedure on a WSL2 Ubuntu, i.e., a VM running on a Windows 10 host; of course, it should work also on s bare-metal Ubuntu install :)

Requirement

To compile Glucose, you will need a C/C++ compiler, make, and zlib (a compression library). On Ubuntu, you can install all this stuff with the command

sudo apt install build-essential zlib1g-dev

Install CBMC

Go to CBMC’s GitHub repository and get the latest .deb release. (The latest release was cbmc-5.20 at the time of writing.)

Download and build glucose

I am not aware of any binary distribution of Glucose. Luckily, it’s pretty easy to build from sources.

Just get the latest source tarball (v4.1), decompress it, and compile:

tar -zxvf glucose-syrup-4.1.tgz
cd glucose-syrup-4.1/simp
make rs

You should end up with a glucose_static executable in the glucose-syrup-4.1/simp directory.

Make Glucose output the model

By default, Glucose will only tell you whether a formula is satisfiable or not. In the former case, we want it to also output a model, i.e., an assignment that satisfies the formula. Why we want that? Because then CBMC can use this model to produce a violation witness, i.e., a trace of the program that leads to an assertion violation. These witnesses, or counterexamples, are essential to debug the program!

Let us then create a tiny wrapper script, which we’ll call glucose_model.sh:

#!/bin/sh

/path/to/glucose-syrup-4.1/simp/glucose_static -model $1

After saving it, let’s give ourselves permission to execute it with the command:

chmod +x glucose_model.sh

Putting it all together

We’re almost there! We can now invoke CBMC as follows:

cbmc --external-sat-solver="/path/to/glucose_model.sh" src.c --trace

CBMC will use Glucose to verify src.c. The --trace flag tells CBMC to use the model from Glucose (if any) to build a violation witness.

Thanks to Glucose, I have experienced a relevant speed-up in the verification of some (unsafe) programs. I will keep testing this setup in the future, but it does look promising!

Conclusions

I hope I gave a clear, if concise, overview of bounded model checking of C programs. This post just brushes the surface of the topic: CBMC is a complex piece of software which supports a very large subset of C, and thus has to deal with a lot of corner cases. Also, there is a plethora of verification methods other than BMC out there! I hope this post made you just a little bit curious about that :)