 # Static Code Analysis

This post will tie together problem representation, the power of Lisp, and the fortuitous application of the solution of an AI exercise to a real world problem in software engineering.

Problem 3-4b in
Introduction to Artificial Intelligence is to find a path from Start to Finish that goes though each node only once. The graph was reproduced using OmniGraffle. Node t was renamed to node v for a reason which will be explained later. The immediate inclination might be to throw some code at the problem. How should the graph be represented? Thinking in one programming language might lead to creating a node structure that contained links to other nodes. A first cut in C might look something like this:

#define MAX_CONNECTIONS 6
typedef struct node
{
char *name;
struct node *next[MAX_CONNECTIONS];
} NODE;

extern NODE start, a, b, c, d, e, f, g, h, i, j, k;
extern NODE l, m, n, o, p, q, r, s, u, v, finish;
NODE start = {"start", {&a, &f, &l, &k, &s, NULL}};
NODE a = {"a", {&d, &h, &b, &f, NULL, NULL}};
NODE b = {"b", {&g, &m, &l, &a, NULL,NULL}};
...

Then we'd have to consider how to construct the path and whether path-walk information should be kept in a node or external to it.

If we're lucky, before we get too far into writing the code, we'll notice that there is only one way into
q. Since there's only one way in, visiting q means that u will be visited twice. So the problem has no solution. It's likely the intent of the problem was to stimulate thinking about problem description, problem representation, methods for mapping the "human" view of the problem into a representation suitable for a machine, and strategies for finding relationships between objects and reasoning about them.

On the other hand, maybe the graph is in error. After all, the very first paragraph of my college calculus textbook displayed a number line where 1.6 lay between 0 and 1. Suppose that
q is also connected to v, p, m, and n. Is there a solution?

Using C for this is just too painful. I want to get an answer without having to first erect scaffolding. Lisp is a much better choice. The nodes can be represented directly:

(defparameter start '(a f l k s))
(defparameter a '(f b h d))
(defparameter b '(a l m g))
(defparameter c '(d o i g))

...

The code maps directly to the problem. Minor liberties were taken. Nodes, such as
a, don't point back to start since the solution never visits the same node twice. Node t was renamed v, since t is a reserved symbol. The code is then trivial to write. It isn't meant to be pretty (the search termination test is particularly egregious), particularly efficient, or a general breadth-first search function with all the proper abstractions. Twenty-three lines of code to represent the problem and twelve lines to solve it.

One solution is (START K E S V U Q P M B L F A H G C D J O R I N FINISH). The complete set of solutions for the revised problem is:

(START K E S V U Q P M B L F A H G C D J O R I N FINISH)
(START K E S V U Q P M B L F A H D J O R I C G N FINISH)
(START K E S V U Q P L F A H D J O R I C G B M N FINISH)
(START K E P M B L F A H G C D J O R I N U Q V S FINISH)
(START K E P M B L F A H D J O R I C G N U Q V S FINISH)
(START K E P L F A H D J O R I C G B M N U Q V S FINISH)

This exercise turned out to have direct application to a real world problem. Suppose threaded systems
S1 and S2 have module M that uses non-nesting binary semaphores to protect accesses to shared resources. Furthermore, these semaphores have the characteristic that they can time out if the semaphore isn't acquired after a specified period. Eventually, changes to S1 lead to changing the semaphores in M to include nesting. So there were two systems, S1 with Mn and S2 with M. Later, both systems started exhibiting sluggishness in servicing requests. One rare clue to the problem was that semaphores in Mn were timing out. No such clue was available for M because M did not provide that particular bit of diagnostic information. On the other hand, the problem seemed to exhibit itself more frequently with M than Mn. One problem? Two? More?

Semaphores in
M and Mn could be timing out because of priority inversion. Or maybe there was a rare code path in M where a thread tried to acquire a semaphore more than once. That would explain the prevalence of the problem in S2 but would not explain the problem in S1.

This leads to at least two questions:
1. What change in S1 necessitated adding support for nesting to the semaphores in Mn?
2. Is there a code path in M where a semaphore tries to nest?
Using a representation similar to the one used with the AI problem makes finding these answers somewhat easy. Suppose we define three keywords: acquire, release, and demand. Then a function which acquires one or more semaphores and calls other functions might look like:

(defparameter func_1 '(func_2 acquire sem_1 acquire sem_2
func_10 func_4 release sem_2 release sem_1))

A function that requires a semaphore be held could be described by:

(defparameter func_4 '(demand sem_2))

The description of a benign function:

(defparameter func_7 '())

It's then easy to write code which walks these descriptions looking for pathological conditions. Absent a C/C++ to Lisp translator, the tedious part is creating the descriptions. But this is at least aided by the pathwalker complaining if a function definition isn't found and is far less error prone than manual examination of the code paths. Would I have used this method to analyze the code if I hadn't just done the AI problem? I hope so, but sometimes the obvious isn't.

After completing the descriptions for
S1, the addition of one function call to an existing function in M was the basis for changing the semaphores so that they nested. The descriptions for S2 are not yet complete.