• No results found

Restructuring Vanderbei’s code

In document Linear Programming on the Cell/BE (Page 68-72)

3.4 Revised simplex method

3.4.4 Restructuring Vanderbei’s code

Vanderbei’s code is available athttp://www.princeton.edu/˜rvdb/LPbook/.

It is written in C, and we initially chose to continue the development in that lan-guage, since we felt that gaining more practice in pure C coding would be use-ful. While we still agree to that sentiment, we later regretted the choice, since it forced us to use a number of constructs that are more cumbersome in C than in C++ (such as passing struct pointers to functions rather than calling member methods on objects), and we were also bothered by the lack of templates.

3.4.4.1 Sparse vector and matrix representations

Vanderbei’s implementation uses the Compressed Column Storage format (as described in Section 2.3.2) for sparse matrices and a similar scheme for sparse vectors. Unfortunately, he did not have a structure or class that contained the

3.4. REVISED SIMPLEX METHOD 51

arrays and variables for each sparse matrix or vector. For instance, the matrix A would be represented with the arraysa (values),ia(row indices),ka(column positions) and the variablenza(number of nonzeroes) — a naming scheme that we found to be very impractical (all variables must be passed as parameters to functions that are to manipulate sparse vectors and matrices), and which slowed down our process of understanding his code. Therefore, we introduced struc-tures that combined these related arrays and variables, and we refactored the code to use these strucures throughout. Our structure for sparse matrices looks like this:

struct SparseMatrix { int rows;

int cols;

int numNonzeroes;

int * rowIndices;

int * colPos;

TYPE * values;

};

Note thatTYPEis a preprocessor symbol which facilitates experimentation with different precisions — it should be defined as eitherfloatordouble.

Due to the vast amounts of vector manipulation (and also in order to track down some bugs we believed were related to reading/writing outside of the array bounds, but turned out to be caused by wrong memory management), we made a more elaborate sparse vector structure, which uses thevectorclass from the C++ Standard Template Library. Theat()function performs boundary access checking on each access — this is inefficient, but highly helpful during development. The compiler will most likely inline the simple accessor functions and operators, so that the usage of high-level classes such as std::vector will not incur any performance penalty (if the boundary checking is turned off). The structure can be found in the file sparse.h.

Beware that in order to save time, Vanderbei preallocates the arrays for any sparse vector with r rows to have size r, but only the first k entries are used at any time (where k is the number of nonzeroes). Whenever the contents (and the number of nonzeroes) of the vector changes, one can simply fill the arrays with as many entries as necessary, since each individual vector has a constant size throughout the program and the number of nonzeroes obviously will never ex-ceed the full vector size. This, in combination with our lack of unit tests, caused a rather insidious bug: ourcopySparseVector()function only allocated as much space for the new vector as the current amount of nonzeroes in the source vec-tor — and when other parts of the code proceeded to add more nonzeroes to the new vector, data in other vectors would be corrupted. This also demon-strates why the use of std::vector is useful (at least during development), as it would have caught such “index out of bounds” errors.

Also, Vanderbei did not explicitly store the sizes of the vectors and matri-ces, as they could always be deduced from context (normally as having m or n rows). We feel that this practice obscures the relationship between a loop header and its body — ifvis a sparse matrix withncolumns and we want to write a loop that manipulatesv, we prefer e.g. for (int j = 0; j < v.cols; ++j) to for (int j = 0; j < n; ++j). Therefore, we have included the size in-formation into our structures and have tried to use them instead ofmandn(this also makes the linear algebra functions slightly more general, and it would fa-cilitate unit testing). Note that such preallocation is not done for matrices, since this would require too much space, and because the main part of the algorithm never changes the matrices directly (it uses permutation lists to keep track of how columns are swapped).

3.4.4.2 Overview of changed files

Here, we describe the files we have created ourselves and those of Vanderbei’s files we have modified in a nontrivial manner.

tree.c|h contains a binary search tree structure. It only supported one active tree at any time (through the use of static variables). Because it is used by some of the linear algebra operations in the iteration processes, we needed to create a struct for the internal tree information so that we could have several tree instances.

sparse.c|h contains our structs and supporting functions for Vanderbei’s sparse vectors and arrays.

print.c|h is a utility for making sure that outputs from different threads do not collide with each other (often, a line that is output from one thread gets cut in two by a line from another thread). It is implemented with mutexes (making sure that only one thread is allowed to print at a time), so excessive printing may hurt performance.

2phase.c was the core of Vanderbei’s original revised simplex solver, and iterationprocess.c is strongly based on this file. We have chosen the solver() function in this file as the “entry point” of our code, be-cause the input parsing and processing has been completed at this point.

If theuseAsynplexvariable is true, we skip Vanderbei’s solver and instead launch the ASYNPLEX threads and wait for their completion.

columnselectionmanager.c|h contains the ASYNPLEX column selection manager. We had problems implementing it because we feel that [19] is unclear on how the statuses of the variables are supposed to change, in particular when new candidates arrive. Our current interpretation is that

3.4. REVISED SIMPLEX METHOD 53

a new candidate should be accepted into the pool of attractive candidates unless its status is “selected” or “rejected” and it obtained that status at a basis that is more recent than the basis where the candidate was formed.

basischangemanager.c|h contains the ASYNPLEX basis change manager, whose functionality is so simple that the code probably speaks for itself.

communication.c|h is a simple communication layer strongly inspired by MPI. A message has a sender (string), a receiver (string), tag (string) and payload (generic memory buffer). The communication primitives are se-cured with mutexes. When a thread requests to receive a message, it may choose whether or not to specify a sender (passing NULL as the sender parameter indicates “any sender”) and whether or not to specify a tag (passing NULL as the tag parameter indicates “any tag”). If no matching message is available, an empty message is returned. The implementation is somewhat inefficient in that sequential search is used to locate matching messages. Also, we should have used std::queue instead of a vector (but as noted, the project started out in C, where STL is not available).

However, in ASYNPLEX, the message queue does not grow particularily long, so this is not a big problem in practice. Still, a real MPI implementa-tion, for instance mpich, might have served us better.

invertprocessor.c|h is based on lueta.c|h. This process is continu-ously recomputing the inverse of the basis matrix, and is informed of basis changes by the iteration processes. The LU factored representation of the inverse is sent to the iteration processes upon completion of each inverse calculation.

iterationprocess.c|h is the only thread which may exist in several in-stances; therefore, we must use a struct to store all the internal data for each iteration process, and pass pointers to instances of the struct to the different functions. This code is based on 2phase.c and lueta.c.

genericvectors.c and the similarly-named files are our attempt at simu-lating C++ templates in C. The approach is to write the code with lots of macro symbols as placeholders for function and type names, and then

#includeing the code repeatedly while#defineing the symbols appro-priately. This leads to rather unreadable code, and was one of the most important reason that we eventually switched to C++.

timer.h is the timing utility described in Section4.1.3.

The functions in iterationprocess.c have been named in accordance with the pseudocode given for ASYNPLEX. Vanderbei’s original comments detail the mathematical operation that is performed by each function.

3.4.4.3 Threading

pthreadsis the de facto threading library for Unix and Linux, and since we have some prior experience with it, the choice was simple. There is no need for advanced threading features; beyond the functions for starting the threads and waiting for them to finish, we only employ the mutex (mutual exclusion) mechanism: apthread_mutex_t variable can be declared and then initialised withpthread_mutex_init(). Any thread may then callpthread_mutex_lock ()on the mutex in order to request a lock on it. The lock is granted if no other thread is holding the lock; otherwise, the thread is queued. When a thread re-leases the mutex with pthread_mutex_unlock(), an arbitrary thread among the queued threads (if any) is granted the mutex.

As usual with threading, the hard part is not the underlying concepts, but all the problematic situations that can occur when the threads start interacting. We have had many small threading bugs that were not too hard to find, but we also had one that was a bit harder and was quite interesting. Consider the following race condition: An iteration process, say, I0, has performed a pivot and sends messages about this to all other iteration processes. If the I0 thread gets pre-empted after sending only some of the messages, it could be that e.g. I1 receives the message and goes on to perform another pivot and tells everyone else about it. Then, I2 might receive the message from I1 before the message from I0, in which case it will fail an internal consistency check for the sequence of pivot op-erations. This situation can be prevented by either implementing a function that can send multiple messages at once without the risk of other messages getting interleaved with them, or letting the iteration processes keep a queue of prema-ture pivot messages. We did the former, but that required internal support from the message system, and we are not sure if such functionality can be achieved with MPI.

3.4.5 Cell/BE implementation of ASYNPLEX

In document Linear Programming on the Cell/BE (Page 68-72)