Optimizing Memory Safety Instrumentations with PICO

This website is a teaser for our TACO'21 article.

TL;DR

PICO optimizes programs that were instrumented by a memory safety instrumentation. It achieves that by computing Presburger formulas for memory accesses, combining such formulas to secure multiple accesses with a single check, and finding infrequently executed locations for the check placement. PICO is implemented as an optimization pass for the LLVM compiler infrastructure, operates at compile time, and is fully automated.

What is memory safety?

In languages such as C, using a pointer to access memory outside of its allocation bounds or accessing it after the allocation was freed is undefined behavior. In many cases, this will go unnoticed by the programmer. However, it is a threat to security, because such an unintended access can leak or alter information. A program is considered memory safe if no such memory access error can occur for any input.

You might wonder if memory safety is still an issue, and, unfortunately, it is. The CWE Top 25 list several memory safety errors (such as "Out-of-bounds Write/Read", "Use After Free") in their ranking of the most dangerous software vulnerabilities. Considered as a single aspect, they outclass all other errors by far.

How to ensure memory safety in C?

Fortunately, a variety of tools have been proposed to ensure memory safety in C. We will now explain how such a tool, implemented as an extension to your standard C compiler, will transform a C program to make it memory safe. We focus on spatial safety properties only, i.e. ensuring that all accesses are within their bounds. Consider the init function below:

void init(int *ar, int size) {
  for (int i = 0; i < size; i += 2) {
    if (i+1 < size) {
      ar[i+1] = 1;
    }
    ar[i] = 0;
  }
}

The function receives a pointer ar and a size, and fills the memory from ar up to ar + size - 1 with alternating zeros and ones. A memory safety instrumentation ensures that the memory accesses are safe. For this purpose, it places calls to a check function right before every access. In the instrumented init function below, we indicated this with a call to checkIB. The arguments to this function are the base and bound of the allocation, the memory location accessed, as well as the width of the access. Base and bound values are obtained through the loadBase and loadBound function that the instrumentation provides.

void init(int *ar, int size) {
  intptr_t baseAr = loadBase(ar);
  intptr_t boundAr = loadBound(ar);

  for (int i = 0; i < size; i += 2) {
    if (i+1 < size) {
      checkIB(baseAr, boundAr, ar+i+1, sizeof(int));
      ar[i+1] = 1;
    }
    checkIB(baseAr, boundAr, ar+i, sizeof(int));
    ar[i] = 0;
  }
}

As you can imagine, these additional checks come at a cost. The execution time and the binary size are usually larger, and depending on the approach, the memory consumption during execution is increased. The typical execution time increase varies from 1.7x to 11x on average for instrumentations that provide full spatial safety. This is where PICO comes into play. PICO optimizes an instrumented program for execution time, and by this, additionally reduces the binary size.

How does PICO optimize the program?

As a first step, let's see why the execution time of the instrumented program is affected negatively. The safety checks are right before the accesses and thereby both within the loop. This means, in addition to size accesses, we execute equally many checks (if the loop is executed at all). PICO tries to come up with a safety check for a program location that is executed infrequently, and if possible, checks multiple accesses at the same time. It does that by analyzing the program and computing Presburger formulas that describe the accessed memory regions. For our example, the code below is the result after optimizing it with PICO:

void init(int *ar, int size) {
  intptr_t baseAr = loadBase(ar);
  intptr_t boundAr = loadBound(ar);
  assert(size <= 0 ||
    (baseAr - (intptr_t) ar <= 0 &&
     boundAr - (intptr_t) ar >= sizeof(int) * size));

  for (int i = 0; i < size; i += 2) {
    if (i+1 < size) {
      ar[i+1] = 1;
    }
    ar[i] = 0;
  }
}

The checkIB calls are gone, and the new PICO check, shown as an assertion, is placed before the loop. The check ensures that if the loop is executed, all accesses are in bounds. The first part of the assertion therefore checks if size <= 0 holds, as in this case, no memory is accessed and hence all accesses are safe. If this does not hold, it ensures that ar is within its allocation bounds by making sure that baseAr is smaller than ar and the distance to boundAr is larger than size times the accesses width.

The PICO check uses only a few comparisons and arithmetic operations that are independent of the iteration count of the loop, and hence likely faster to execute than the original safe program.

Results

PICO improves the execution time as well as the binary size, but comes at the cost of an increased compile time in the SPEC benchmarks that we evaluated. Below we describe our findings on the execution time in more detail. The code size overhead is reduced by 24% on average. The compile time with the instrumentation enabled is increased by 2.3x compared to the unsafe program, and if we optimize it with PICO, this increases further to 4.1x on average.

Execution Time

The following graph shows the slowdown of the instrumented and optimized safe program compared to the unsafe program. The benchmarks listed on the horizontal axis are selected SPEC benchmarks. The vertical axis shows the slowdown factor, where 1x indicates the unsafe program optimized with Clang -O3. We compare three configurations: Using only the Metdata propagation (such as loadBase/loadBound calls in the example) with no checks, the fully SoftBound instrumented program, as well as the optimized version, displayed as PICO.

Runtime Improvements

The Metadata bar is a lower bound for PICO, as PICO relies on the bounds supplied by the memory safety instrumentation (visible in the example). Apart from cases such as vpr and mesa, where the Metadata propagation accounts for more than 30% of the runtime overhead, it is rather cheap. On this benchmark set, SoftBound slows down the execution by 99% on average. We can see that PICO is able to optimize every benchmark, performing almost as good as the baseline Clang -O3 for lbm. PICO reduces the overhead to 64% on average.

Redundant Check Removal

In addition to what we have seen in the example above, PICO also tracks allocation sizes intra-procedurally. By using this information, it is possible that a check is simply true. This means that the access is safe, independent of the program inputs. We call a check for such an access redundant. Many related techniques focus on proving this property, hence we evaluated the impact of removing redundant checks with our tool on our benchmark set. The following figure compares the percentage of accesses proven safe at compile time (horizontal access), to the execution time overhead reduction (vertical access). Empty circles show the results if we only remove redundant checks, while filled ones show the result of PICO.

Redundant Check Removal Comparison

It turns out that, despite being able to prove more than 50% of the accesses in bounds in cases such as milc, sjeng, and art, the execution time improvements don't match this ratio (diagonal line). We can see that PICO outperforms removing redundant checks alone in all cases, often by far.

PICO operates at compile time and is fully automated. It is implemented as an LLVM optimization pass, which can be used in the regular Clang optimization pipeline. For more details on how PICO determines locations for checks, which checks are summarized and how, as well as the full evaluation see our open access TACO article.

Publication

Journal Papers

Presentations

Slides presented at HiPEAC 2022 in Budapest.

People