The Implementation of Linux Locksmith, a Dynamic Race Condition Detector
Even for experienced programmers, programming with threads adds a new set of challenges that do not appear in the sequential process paradigm. The main difference between debugging sequential programs and debugging concurrent programs is that most bugs in sequential programs are repeatable. In multi-threaded programs, bugs are much harder to reproduce and therefore diagnose due to the pseudo-random variations in the way the operating system switches between individual threads. A common bug that plagues multi-threaded programs is a race condition in which shared data is accessed in an unprotected way making the outcome of the “race” almost entirely dependent on the timing of the context switching. A tool to aid in the debugging of multi-threaded programs developed on the Linux would make the platform even more attractive, especially in the server environment.
We
have developed a dynamic race conditioner detector called Locksmith which is
based on a modified version of the Lockset algorithm used in Eraser, a dynamic
race detector developed for Digital Equipment Corporation’s Digital Unix
[1]. An multi-threaded executable
written with the standard pthreads package [2] and instrumented by the
Locksmith tool will during execution issue error messages that pinpoint exact
source code lines involved in a race condition. With this information, the
cause of the bug can be quickly identified.
Locksmith is able to
identify the source code lines involved in a race condition by tracking memory
reads and writes performed by each thread and noting the locks held at the time
of access. If a lock is protecting a region of memory then every thread should
hold that lock before it accesses that region. If a thread ever accesses the
data without the correct locks, then a potential race condition disks. However,
the source code contains no specifications of the programmer’s intent to
protect certain data with certain locks. Therefore, the Lockset algorithm
determines which locks could be potentially be correctly protecting each memory
address range. If this set ever becomes empty then Locksmith can report the
error.
The pseudo-code
for the simplest version of the Lockset algorithm is as follows.
Let locks_held(t) be the set of locks held by some thread t and possible_locks(m) be the set of candidates that are
possible for the protection of some memory address m and possible_locks(m) is
initialized in a virgin state.
On each access to
memory address m by thread t,
If
possible_locks(m) is virgin then possible_locks(m) := locks_held(t);
Else
possible_locks(m) := possible_locks(m) intersection locks_held(t);
If
possible_locks(m) = Empty Set;
Detected a possible race condition.
Several refinements to this
simple algorithm can be made to prevent common false alarms. For example,
errors would be reported for unprotected memory accesses to data even if that
data is used exclusively by a single thread.
Similarly, programmers may safely initialize data without holding any
locks before common case multi-threaded processing begins. Finally, unprotected access to read-only
data would cause a false alarm.
These common false alarms
are prevented by having Locksmith track additional states for each memory address:
exclusive, read-only and modified.
Memory addresses begin in the exclusive state and no errors are reported
until at least two threads have accessed the data. When a memory address leaves the exclusive state, it is put in
the read-only state until it is written by some thread. In the read-only state,
possible_locks(m) is updated, but no errors are reported. Once a write
occurs the memory address, is placed in the modified state and the simple Lockset algorithm described above is followed.
The
current implementation of Linux Locksmith is split up into two parts. First, there is the Locksmith library, which
contains the necessary data structures and stubs to implement the Lockset
algorithm. The runtime library of Locksmith is implemented entirely in C. Internally thread identifiers are the
internal pthread_t retrieved from pthread_self. Locks are represented by their address since all pthread calls
reference the address of the pthread_mutex_t.
All source level debugging information is determined from the gcc –g
output that is normally used for gdb. Second, there is a program parser that
will determine the places in code where memory is accessed and where locks are
acquired and released and insert calls from the Locksmith library as required. The
program parser operates on the assembly code of a program. The output of Locksmith program parse is
assembly code that has been modified to call the library stubs on all memory
accesses, but is otherwise the same as the original code. The only changes
required to the source code are the addition of two function calls, first at
the beginning of the program to create_locksmith and then at the end to
destroy_locksmith.
We
have used Linux Locksmith to correctly detect race conditions and report their
corresponding location. With some
additional testing and program “packaging”, we believe that this tool would be
an extremely useful tool for debugging multi-threaded applications on the Linux
platform.
[1] S.
Savage et al, Eraser:
A Dynamic Race Detector for Multithreaded Programs, Proc. of the Sixteenth ACM Symposium on
Operating Systems Principles, October 1997.
[2] B. Nicols et al Pthreads
Programming, A POSIX Standard for
Better Multiprocessing, Oreilly, September 1996.