Skip to content
kevinjalbert edited this page Mar 27, 2011 · 20 revisions

#Purpose The goal of ARC - Automatically Repair Concurrency is to provide a tool that attempts to repair concurrency bugs in source code. This tool is aimed at Java because of its maturity in concurrency and known concurrency idioms and bugs.

A genetic algorithm is used to mutate the buggy source code to a state where it executes without errors within a timely fashion. There are two main criteria in play in determining the fitness function of the genetic algorithms:

  • Correctness of the code -> No concurrency bugs
  • Performance of the code -> Executes as quick as possible

The mutations used can only effect the thread schedules, and as a side-effect the performance overhead/reduction of added/removed synchronization. Essentially the end purpose is to automatically repair concurrency by minimizing the bugs and maximizing the performance.

#Features

  • Adjustable number of generations to perform
  • Adjustable number of runs per generation to perform
  • Adjustable timeout period for testing phase
  • Toggle-able set of operators that can be used for mutation
  • Adjustable mutation rate
  • Adjustable crossover rate
  • Testing phase is concurrent
  • Testing is augmented through the noise making technique (random thread delays)

#Problem Definition The problem resides in the source code of a Java program that contains elements of concurrency. The following segment of example code illustrates a write to a unprotected shared variable.

obj.write ( var1 ); 

The execution of this segment might result in a data race during the execution of the program. This is caused if there are two or more threads trying to write to the value. To prevent this problem synchronization is used as a solution, this can be shown in the next segment of now protected writing to the shared variable.

synchronized ( lock ) { 
    obj.write ( var1 ); 
}

This is just one simple example of how a concurrency bug is fixed, the addition of synchronization where necessary. One problem with adding synchronization to source code is that it imposes additional overhead in the lock acquisition and thread contention. In addition, it may also be possible that adding synchronization might result in another concurrency bug called deadlock. If threads are blocking each other from proceeding, the execution halts. A fine balance between adding synchronization to ensure correctness while maintaining a sane level of performance is needed.

The problem is defined as:

Finding the best configuration of a concurrent program, that results in highly correct and performing.

The problem is of a search based type, thus genetic algorithms will be used in an attempt to mutate individual versions of the source code towards the best configuration. Genetic algorithms require three components, which are explained in the following sections:

  1. A set of mutation operators
  2. A fitness function to evaluate current mutation
  3. A terminating condition

The following Poster shows an overview of the problem and the general approach to finding a solution.

Mutation Operators

A mutation operators is a single function that mutates the source code with a single change. The change for this algorithm is related to the concurrency aspects of the source code. Three mutation operators have been found to fix data races and deadlocks, a limited set for now. These three mutation operators deal with the addition of synchronization in the source code, these are functional mutations (correcting the source code). In addition, another set of non-function (increasing performance of the source code) mutation operators, which removes the addition of synchronization. The mutations operators are broken down into the following categories.

NOTE: The mutation operators are illustrated as follows:

original source code
mutated source code

Functional

  • Synchronize an unprotected shared resource

One cause of a data race is that a shared resource is un-protected. By synchronizing around a shared resource data races may be fixed.

...
obj.write ( var1 ); 
...
synchronized ( lock ) { 
    obj.write ( var1 ); 
}
  • Expand synchronization regions to include unprotected source code

Data races can sometimes be caused if the synchronization region does not fully encapsulate access to the shared resources. Expanding the synchronization region may also fix the data race.

synchronized ( lock ) { 
    obj.write ( var1 ); 
}
obj.write ( var2 );
synchronized ( lock ) { 
    obj.write ( var1 ); 
    obj.write ( var2 );
}
  • Interchange nested lock objects

Common deadlocks occur due to the ordering of lock acquisition. By interchanging nested lock objects common deadlocks can be fixed

synchronized ( lock1 ) { 
    synchronized ( lock2 ) { 
        obj.write ( var1 ); 
    }  
}
synchronized ( lock2 ) { 
    synchronized ( lock1 ) { 
        obj.write ( var1 ); 
    }  
}

Non-Functional

  • Remove unnecessary synchronization regions

Synchronization regions create overhead due the time required in acquiring/releasing the lock and delays due to waiting for the lock. Removing unnecessary synchronization regions will improve performance.

synchronized ( lock ) { 
    obj.write ( var1 ); 
}
...
obj.write ( var1 ); 
...
  • Shrink synchronization region

Reducing the number of statements encapsulated in a synchronization region will allow the lock to be released quicker. The less time a thread holds the lock the less thread contention will exist, thus improving performance.

synchronized ( lock ) { 
    obj.write ( var1 ); 
    obj.write ( var2 );
}
synchronized ( lock ) { 
    obj.write ( var1 ); 
}
obj.write ( var2 );

Fitness Function

Throughout the evolution process of moving towards a possible solution for the source code, the concept of ranking each mutation is necessary. A fitness function is used to rank an instance of a mutation. For this problem there are two fitness functions, one which evaluates the functional behaviour of the mutation and a second which evaluates the non-functional behaviour.

Functional

The functional fitness function is used to rank the mutation in respect to the correctness of the source code:

[functional\ fitness(P) = \sum\limits_{i=0}^n\frac{interleavings\ without\ a\ bug}{interleavings\ tested}] [n = #\ of\ Test\ Cases]

To ensure a high-level of confidence, IBM's ConTest is used as a testing tool on the mutation program. The program is instrumented with random delays, which has the effect of forcing unusual thread schedules. By exploring more thread schedules it is possible to explore more of the paths which might lead to a bug.

Non-Functional

The non-functional fitness function is used to rank the mutation in respect to the performance of the source code:

[non-functional\ fitness(P) = \sum\limits_{i=0}^n\frac{average\ CPU\ time}{interleavings\ tested}] [n = #\ of\ Test\ Cases]

The correction of the program can lead to unnecessary amounts of synchronization. Performance of the program will degrade due to the overhead of lock acquisition and thread contention. By evaluating the non-functional fitness of the program it becomes possible to find the most efficient program mutation. Typically, the non-functional fitness will improve synchronization is reduced by applying the non-functional mutation operators.

Terminating Condition

There are two terminating conditions in place for this algorithm. One is when a successful state has been reached by maximizing the functional fitness (no more bugs detected) with a minimal non-functional fitness. The second is when a state has been reached without any improvements in sight (over successive generations of mutations the functional fitness is not improving), which then this state is minimized in respect to the non-functional fitness. The first condition results in the optimal state, while the second condition is a sub-optimal state.

#State Space

#Initial State

#Action

#Goal

#Cost

Clone this wiki locally