How does Pinaka work?
Pinaka: Symbolic Execution meets Incremental Solving
Symbolic execution as well as incremental solving has been quite well known concepts to the researchers and students working in the field of Formal Methods.
Today I am going to write about Pinaka, a symbolic execution engine that combines symbolic execution with incremental solving in a fairly simple fashion.
Incremental Solving
Many modern day constraint solvers, such as SAT and SMT solvers support incremental solving in some fashion.
Say, for some problem, we require solutions for the following set of formulas:
One of the ways to solve these is to create a solver instance for each of these formulas separately and ask the solver for a solution for each of these separately. Note however, that any assignment/model/interpretation that does not satisfy
It would result in a huge performance gain, if we use the same solver instance to solve all the three formulas. When we reuse the solver instance, the internal state of the solver including variable and clause activity scores, learned clauses, etc., are carried over and reused for the next query.
This way of solving a set of formulas with successive queries to the same solver will be denoted as partial incremental mode.
Now, suppose we have the following set of formulas:
However, many modern day solvers have another trick up their sleeve, called assumptions. Assumptions are a list of literals that can be logically thought of as added unit clauses. When you know that some constraint may no longer be required for successive queries, you can add that constraint being enforced by an assumption. Therefore, the first formula, can be rewritten as
When you think that assumptions are logically just unit clauses, then the above formula in some sense deletes
Of course, adding such
This way of using the same solver instance for successive queries will be denoted as full incremental mode.
Symbolic Execution
Let’s take a look at a motivating example for symbolic execution.
Look at the subroutine foo given below. Note that the assertion violation will happen only when variable y gets -10 as its value at line 1. Assuming an int to be of 32 bits, and if you were to test this subroutine by generating values of y uniformly randomly and running foo, then the probability of you finding the bug will be
1 void foo(int y) {
2 int x=10;
3 x=x+y;
4 if(x==0) {
5 x=x+1;
6 if(y<-10)
7 x=x-1;
8 }
9 else {
10 x=0;
11 }
12 assert(x==0);
13 }
The problem with testing, as mentioned above, is that a single run only exposes program behaviour for a single input value along a single path.
In symbolic execution, it is possible to explore program behaviour for a set of input values along a single path. We will not talk about multi-path symbolic execution in this post.
Symbolic execution uses a predicate at every program point along a path to represent the set of values program variables can take along the given path. This predicate is also often called a symbolic state.
For example, the symbolic state after executing lines 1-3 would be :
Here,
3 as it does not satisfy the predicate mentioned above. On the other hand
1-3 program variable x can be 12 and y can be 2. Similarly,
x taking the value 0 and y taking the value -10 at line 4 is a possibility.
Line 4 is a branch and gives rise to two symbolic states.
Symbolic state for the then branch is:
The symbolic state for the else branch is:
Pinaka being a single path execution engine, maintains a worklist of symbolic states that are yet to be explored further. Therefore, at a branch, it picks one state to be explored further and puts the other one on the worklist. Say, it picked the symbolic state for the then branch. Pinaka also employs eager feasibility checks. In other words, at every branch, it poses a query to a constraint solver whether the symbolic state is feasible or not. If there is no solution to the predicate, then the corresponding symbolic state is deemed to be infeasible, indicating that in an actual/concrete execution, it is not possible for the control to reach this point.
In our example, then branch at line 4 is feasible. Symbolic state for the then branch at line 6would be:
Symblic state for (implicit) else at line 8 would be:
Pinaka uses incremental solving aggressively. Note that the symbolic state for then branch at line 4 is of the form
6 differs by two additional constraint
Observe that this symbolic state given in
7 is unreachable irrespective of the input value of y.
In the DFS (depth first search) exploration strategy, Pinaka picks the last unexplored symbolic state from the list. This can easily be achieved when we operate the worklist of symbolic states pending further exploration in a LIFO (Last In First Out) manner. In this example, it would pick the symbolic state mentioned at (5) above, corresponding to the if condition being false at line 6.
In the partial-incremental mode, once a symbolic state is found to be infeasible, a new solver instance is created to explore the yet unexplored symbolic state, in this case, (5) above.
In the full incremental mode, the predicate/formula at line 6 would have looked like the following:
Notice that a new assumption is added for every branch condition as later we may want to backtrack from a then branch and explore an else branch. So in the full incremental mode, the same solver instance will be reused after the symbolic state at line 6 for the then branch is found to be infeasible. For the else branch, the formula would look like the following:
Once the symbolic execution has gone through the path depicted by lines 1-6, 8,11 we have to check for assertion violation. Essentially, we again want to know, if there is any set of values that program variables can take that can result in assertion violation. This can be done by asking if the following symbolic state is feasible:
Observe that we negated the condition inside the assertion as we want to know if the program variables can take a set of values that violates (negates) the condition given in the assertion. Note that (8) above is satisfiable with
foo can fail if the value of the parameter y is -10.
Number of paths in a program can grow exponentially in proportion to the number of branches in the program. This path explosion problem can pose quite a problem for single-path symbolic execution engines such as Pinaka. Eager infeasibility checks employed by Pinaka gives it tremendous performance boost as all the paths beyond an infeasible symbolic state is discarded. However, doing eager infeasibility checks would mean that every time a branch is encountered, a solver query has to be fired. Since querying a solver is expensive, increasing the number of queries can pose another challenge. That is when integration of incremental solving can provide tremendous benefit.
Termination
Loops and recursions are primary programming constructs in a programming language that causes non-termination in a program.
Pinaka performs loop-unrolling and function in-lining lazily and on-demand. This feature allows it to be used to check for termination in some cases.
Let’s take a look at the following program fragment:
1 while(x < 10) {
2 y = y + 1;
3 if(y < 5) {
4 x = x + 1
5 }
6 }
Note that for all the values of x and y, if they satisfy
1 continues to remain feasible. In another words, if there is any set of values for x and y which would make it possible for the loop to iterate at least one more time, Pinaka will also go around the loop symbolically executing the loop at least one more time. As a consequence, for a program which does not violate any assertions (safe programs), Pinaka will terminate only if the program itself is terminating. Of course, we are assuming that the solver queries themselves terminate and Pinaka does not run out of memory and time.
For programs, that may violate assertions, this guarantee does not hold. Since if Pinaka finds an assertion violation along some path, it will terminate even if there exist paths in the programs which are non-terminating.
Do it yourself
Those who want to try this example out and compare testing against symbolic execution, following is the C++ program for testing:
#include<random>
#include<limits>
#include<iostream>
#include<cassert>
void foo(int y) {
int x=10;
x=x+y;
if(x==0) {
x=x+1;
if(y<-10)
x=x-1;
}
else {
x=0;
}
assert(x==0);
}
int main()
{
std::random_device rseed;
std::mt19937 rng(rseed());
std::uniform_int_distribution<int> dist(std::numeric_limits<int>::min(),std::numeric_limits<int>::max());
unsigned i=0;
unsigned max=1000000;
//unsigned max=std::numeric_limits<unsigned>::max();
while(i<max)
{
std::cout<< "i : " << i << std::endl;
foo(dist(rng));
i++;
};
return 0;
}
Compile, run and observe using the following commands:
$ g++ -o footest filename.cpp
$ ./footest 2>&1 > foorun.log
$ tail foorun.log
I could not trigger the assertion violation when I tried. In contrast when you run Pinaka using the following command you will get assertion violation notification in a fraction of a second.
$ pinaka --function foo filename.c
References
- Pinaka binaries are available here. Pinaka performed reasonably well and quite fast in SVCOMP 2019 and SVCOMP 2020
- You may also refer to the following paper: Eti Chaudhary and Saurabh Joshi, “Pinaka: Symbolic Execution meets Incremental Solving”, TOOLympics, TACAS (3), LNCS Vol. 11529, pp. 234–238, 2019. Pre-print version is available on arXiv.