Resilient Dynamic Programming
Irene Finocchi, Saverio Caminiti, and Emanuele Fusco
Dipartimento di Informatica, “Sapienza” Università di Roma via Salaria, 113-00198 Rome, Italy.
{finocchi, caminiti, fusco}@di.uniroma1.it
Kickoff AlgoDEEP — Bertinoro, Italia. April 16-17 2010
(task C.1.1)
Outline
1 Introduction
2 A resilient framework for dynamic programming
3 Testing and experimental validation
Memories and faults
Why should we care about memory faults in algorithm design?
Memory faults happen: a large cluster of computers with a few
gigabytes per node can experience one bit error every few minutes [Sah06].
Memory faults are harmful: undetected memory faults cause
data corruption to spread; (potentially safety critical, e.g., avionics).
Hardware solutionsmay be inadequate: fault-tolerant memory
chips does not guarantee complete fault coverage; (expensive – system halt upon detection of uncorrectable errors – interruptions of service) [JNW08].
From liars to data corruption
Algorithmic research related to memory errors has focused mainly on sorting and searching problems:
late 70’s: Rényi [Rén94] and Ulam [Ula77]: “twenty questions game” against a liar, handling noise in binary search.
Yao and Yao [YY85], and then [AU91, LM99, LMP97]: destructive faults in fault-tolerant sorting networks, comparison gates can destroy one of the input values. . . .
[FI04] sorting in the faulty RAM model.
Faulty memories: an adversarial model
Memory in a faulty-RAM of word-sizew is divided in three classes:
a large unreliable memory: anadaptiveadversary of unlimited
computational powercan modify up toδ memory words;
O(1) safememory words: the adversary can readbutnot
modify this memory;
O(1) privatememory words: the adversarycannot even read this memory.
Local dependency dynamic programming – edit distance
Letei,j be the edit distance between the prefix up to the i-th symbol of the input stringX and the prefix up to thej-th symbol of the input stringY.
ei,j :=
(
ei−1,j−1 ifi,j >0 andxj =yi
1+min{ei−1,j,ei,j−1,ei−1,j−1} ifi,j >0 andxj 6=yi
(e0,j =j,ei,0 =i.)
Correctness requirements
Correctness of sorting and searching required only on uncorrupted values.
In our setting, such a relaxed definition of correctness does not seem to be natural.
We seek algorithms that correctly compute the edit distance between the two input strings, in spite of memory faults.
Correctness requirements
Correctness of sorting and searching required only on uncorrupted values.
In our setting, such a relaxed definition of correctness does not seem to be natural.
We seek algorithms that correctly compute the edit distance between the two input strings, in spite of memory faults.
Tools
Majority.
Table decomposition. Fingerprinting.
Majority
A variable can be made resilient by making 2δ+1 copies.
As at mostδ of them can be altered by the adversary, the
majority value is the correct value.
The majority value can be read in time O(δ)and space
O(1) [BM91].
Table decomposition
The DP table is split in blocks of size δ×δ. The boundaries of each block are written reliably in the faulty memory. δ2 values result in roughly
5δ2 memory words.
Fingerprinting
A fingerprint for a column is computed as: ϕk =v1◦v2◦. . .◦vδ modp
wherep is a prime number uniformly chosen at randomly in interval
[nc−1,nc](wherec is an appropriate constant).
Using logical shifts and Horner’s rule, each fingerprint can be incrementally computed while generating the valuesvh:
for h=1 toδ do
ϕ= ((ϕ×2w) +vh) modp
end for
Fingerprinting
A fingerprint for a column is computed as: ϕk =v1◦v2◦. . .◦vδ modp
wherep is a prime number uniformly chosen at randomly in interval
[nc−1,nc](wherec is an appropriate constant).
Using logical shifts and Horner’s rule, each fingerprint can be incrementally computed while generating the valuesvh:
for h=1 toδ do
ϕ= ((ϕ×2w) +vh) modp
end for
Block computation
Bi,j Bi,j−1
Bi−1,j−1 Bi−1,j
ϕ1
The first column of a block is computed reading reliably all values it depends from.
Block computation
Bi,j Bi,j−1
Bi−1,j−1 Bi−1,j
ϕ1
While computing the first column, fingerprint ϕ1 is
also computed.
Block computation
Bi,j Bi,j−1
Bi−1,j−1 Bi−1,j
ϕ1
While computing the first column, fingerprint ϕ1 is
also computed.
Block computation
ϕk Bi−1,j−1 Bi−1,j Bi,j−1 ϕk + 1 Bi, j ϕkWhile computing column
k+1, we produce two
fingerprints,ϕk+1 andϕk.
Block computation
ϕk Bi−1,j−1 Bi−1,j Bi,j−1 ϕk + 1 Bi, j ϕk Fingerprintϕk is then compared withϕk (i.e., the fingerprint produced whilecomputing column k).
Block computation
ϕk Bi−1,j−1 Bi−1,j Bi,j−1 ϕk + 1 Bi, j ϕk Ifϕk 6=ϕk, the block is recomputed from scratch.As a result...
... we have: Theorem
The edit distance between two strings of length n and m, with n≥m, can be correctly computed, with high probability, in:
O(nm+αδ2) time; O(nm) space,
whenδ is polynomial in n.
Generalizing
Theorem
A d -dimensional local dependency dynamic programming table M of size nd can be correctly computed, with high probability, in:
O(nd+αδd) time; O(nd+nδ)space,
when the actual numberα≤δ of memory faults occurring during the computation is polynomial in n.
(Edit distance, longest common subsequence, sequence alignment,. . .)
faultyLib
We are developing a library to test program behavior in presence of memory faults.
Plugging in the library should be very easy:
existing C/C++ code should require minimal changes to be tested with our library.
Implementation of different (and meaningful) adversaries should be easy.
. . .
faultyLib: usage
FaultyUInt M[n+1u][m+1u]; // An n+1 X m+1 matrix of
// faulty unsigned int ...
for (unsigned int i = 1; i <= n; i++) { for (unsigned int j = 1; j <= m; j++) {
M[i][j] = min(1 + min(M[i-1][j], M[i][j-1]), M[i-1][j-1] + ((x[i-1]==y[j-1]) ? 0 : 1)); }
} ...
faultyLib: faulty types implementation
template <typename T>
class Faulty : public FaultyBase { ... private: T _val; T read() const { FaultyMM::getInstance()->faultBeforeRead(&_val, sizeof(T), context); return _val; } void write(T v) { _val = v; FaultyMM::getInstance()->faultAfterWrite(&_val, sizeof(T), context); } } ...
typedef Faulty<unsigned int> FaultyUInt;
faultyLib: overriding operators
...
//Assignment operator template <typename Targ>
Faulty & operator=(const Targ & v) { write((T)v); return *this; }
... //OR
template <typename Targ>
bool operator||(const Targ & v) const { return (read() || (T)v);
} }
...
faultyLib: adversaries implementation
class REDAdversary : public Adversary { ...
virtual void faultAfterWrite(void * location, size_t s, Context * cnt) {
if ((cnt != NULL) && (cnt->tag == EDMATRIX_TAG)) { MatrixContext * m = (MatrixContext *)cnt; unsigned int * i = (unsigned int *)location; if (m->getIndex(0) == 3) if (m->getIndex(1) == 7) *i = *i +3; } } ...
Thanks!
Thank you for your attention!
References
[AU91] S. Assaf and E. Upfal.
Fault tolerant sorting networks.
SIAM J. Discrete Math., 4(4):472–480, 1991.
[BM91] R. S. Boyer and J. S. Moore.
Mjrty: A fast majority vote algorithm. InAutomated Reasoning: Essays in Honor of Woody Bledsoe, pages 105–118, 1991.
[FI04] Irene Finocchi and Giuseppe F. Italiano.
Sorting and searching in the presence of memory faults (without redundancy). In László Babai, editor,STOC, pages 101–110. ACM, 2004.
[JNW08] B. L. Jacob, S. W. Ng, and D. T. Wang.
Memory Systems: Cache, DRAM, Disk.
Morgan Kaufmann, 2008.
[LM99] F. T. Leighton and Y. Ma.
Tight bounds on the size of fault-tolerant merging and sorting networks with destructive faults.
SIAM J. Comput., 29(1):258–273, 1999.
[LMP97] F. T. Leighton, Y. Ma, and C. G. Plaxton.
Breaking theθ(nlog2n)barrier for
sorting with faults.
J. Comput. Syst. Sci., 54(2):265–304, 1997.
[Rén94] A. Rény.
A diary on information theory. J. Wiley and Sons, 1994. Original publication:Napló az információelméletröl, Gondolat, Budapest, 1976.
[Sah06] G. K. Saha.
Software based fault tolerance: a survey.
Ubiquity, 7(25), 2006.
[Ula77] S. M. Ulam.
Adventures of a mathematician. Charles Scribner’s Sons, New York, 1977.
[YY85] A. C. Yao and F. F. Yao.
On fault-tolerant networks for sorting.
SIAM J. Comput., 14(1):120–128, 1985.