Pre-Final Quiz Sample Solutions
December 12, 2016
1 Counterintelligence
Likely RELATIONSHIP TO FINAL EXAM: Debugging algorithms by identifying key properties and producing small examples that push on their weaknesses is a key ability. We anticipate asking about these ideas on the exam. We do not anticipate asking about the Olympic Scheduling Problem on the exam.
Recall the Olympic Scheduling Problem. The key features of this problem were:
The input is a list of events with start time, nish time, and value. Assume that all are positive and that each event's nish time is after its start time.
The solution is the best set of non-conicting events.
Two events conict if each one starts before the other nishes (i.e., they overlap in time).
The best solution is the one with the single highest-valued event, breaking ties by comparing next highest-valued events (where both solutions are assumed to have as many 0-valued events as needed to break all ties).
Consider the following algorithm that attempts to solve the problem greedily by considering the events in order of nish time and adding any event that does not conict with a higher-valued event:
ValueIncreasingSoln(E):
sort E by increasing finish time // in O(|E| lg |E|) result = new empty list of events // the result so far
current = no event // the current event under consideration for each e in E:
if there is no current event:
// just the first time through the loop current = e
else if start(e) >= finish(current):
// we've passed the range current conflicts with add current to result
current = e
else if value(e) > value(current):
// we've hit a higher-valued conflicting event current = e
else:
// otherwise, e and current conflict, but current is higher-valued
// we do nothing and ignore e // polish off the last current event if there is a current event:
add current to result return result
This seems promising. Let's investigate.
1. Sketch the key points in a (brief!) proof that the optimal solution must include any event that conicts only with lower-valued events.
SOLUTION: Imagine an instance with an event eh that conicts only with lower-valued events.
Consider any solution to this instance S that does not include eh. Let S0 be the result of deleting all events in S worth less than eh and then adding eh. S0 contains no conicts since S did not and eh
only conicts with lower-valued events, none of which appear in S0.
Thus, S0 is a solution and, since it ties with S up to eh and beats S with eh, it's a better solution than S. So, S is not optimal.
Key points: We could trade eh for some number (zero or more) of lower-valued events in any solution that excluded eh to get a solution (no conicts) that is better.
2. Despite this promising result, the greedy algorithm is not correct. Give a small counterexample on which this greedy approach fails. Be sure to clearly indicate both what the greedy approach produces and what the optimal solution is.
SOLUTION: The algorithm above is clearly correct where there are no conicts, and it's also clearly correct anytime it chooses to include an event that conicts only with lower-valued event. Let's build an example bit-by-bit to push on the gaps in these properties:
--- 1
--- 2
So far, the algorithm above will drop the left event in favor of the right. Can we make dropping the left event the wrong decision? Let's add another conicting event:
---
1 --- 2
--- 3
Now, the algorithm above will drop the middle event in favor of the rightmost event. That's the correct thing to do. However, now the optimal solution picks the leftmost event back up, which the greedy algorithm does not do.
This is our counterexample, then. The greedy algorithm gives just the third event as its solution, but the optimal solution is the rst and third.
2 Ice Cubes from Heaven
LIKELY RELATIONSHIP TO FINAL EXAM: Memoization and dynamic programming are imporant for the exam, but the open Collatz Conjecture related to the hailstone problem is not something we plan to ask about.
The next step in the hailstone sequence is dened for integers n ≥ 1 by:
h(n) =
(n/2 if n is even 3n + 1 otherwise We consider the sequence to end when n = 1.
For example, here are the sequences beginning at 15, each labeled with the number of steps taken to reach 1.
n sequence # of steps
1 1 0
2 2, 1 1
3 3, 10, 5, 16, 8, 4, 2, 1 7
4 4, 2, 1 2
5 5, 16, 8, 4, 2, 1 5
In this problem, we'll explore methods to nd the number of steps in the sequences starting at each initial number 1 ≤ i ≤ n given an input n. Note: it is unknown whether the sequence does indeed reach 1 for every starting point.
1. Identify and explain a feature of this problem that makes memoization promising for its solution.
(Note: this is way more blank space than you need but too little to include the next problem on this page.)
SOLUTION: Some sequences are subsequences of others, which means that when computing the length of each sequence over a range, we may end up solving the same problem multiple times.
A caution and a challenge: Will we get exponential speedup like we usually do? Analyse!
2. Give a memoized pseudocode algorithm that takes a number n and computes the number of steps in the sequence required to reach 1 for every value of i from 1 up to (at least) n.
You may assume that your pseudocode language allows automatically resizing sparse arrays. That is, you can dene an array without specifying how big it is and then reasonably eciently store a value at any index 1 or more (or 0 or more if you prefer 0-based indexing) without explicit resizing.
SOLUTION: We essentially take the hailstone step function and call it repeatedly, caching values as we go.
ComputeAllHailstones(n):
Let Table be an empty sparse array For i = 1 to n:
ComputeHailstones(Table, i) ComputeHailstones(T, i):
if T does not contain an entry i:
if i = 1:
T[i] = 0
else if i is even:
T[i] = 1 + ComputeHailstones(T, i/2) else:
T[i] = 1 + ComputeHailstones(T, 3*i+1) return T[i]
A couple of notes:
We may end up using a substantial amount of extra memory since the hailstone sequence can grow quite large before coming back down. We could perhaps skip caching values outside of the range 1 . . . n (or some larger pre-selected range).
It's presently unknown whether we might run into a cycle or divergent sequence and so never reach the base case of this function. We're assuming here that that won't happen; either one would cause an innite recursion in our code (and so eventually crash due to out of memory errors in practice). Identifying cycles would be easy. (We can just set each table entry to a ag
value before making our recursive calls and know we've hit a cycle if we ever reach an entry containing that ag value. Caveat: this may be quite expensive in terms of memory use and runtime, depending on the size of the cycle!) Identifying divergent sequences is far from easy.
3. The obvious subproblem ordering for a dynamic programming variant is 1, 2, 3, 4, . . .. Explain why this subproblem ordering is not correct here.
SOLUTION: Subproblem ordering for DP must ensure that we solve all the subproblems a problem depends on before solving the problem itself. 3 is a counterexample to the correctness of this ordering.
ComputeHailstones(3) relies on ComputeHailstones(10), which we have not yet computed when we ask for 3 in this ordering.
We can look at the inverse relationship of our recurrence and (for example) perform a breadth-rst search starting at 1 as our ordering, but this may compute many unnecessary values before we nd counts for all our goal initial values. So, we'll just stick with memoization.
3 Throwing Down the Guantlet
LIKELY RELATIONSHIP TO FINAL EXAM: NP-completeness is important for the exam, and a problem that is a variant of an NP-complete problem you're familiar with would make a good exam question, but it won't be this POSSAT/LPSAT variant.
We create a new class of problems called positive SAT or POSSAT. In POSSAT, all literals must be positive, rather than negated. So, for example, this is a POSSAT problem:
(x1∨ x2) ∧ (x2∨ x3∨ x4∨ x5) ∧ (x1∨ x3∨ x5) But this is not:
(x1∨ x2) ∧ (x2∨ x3∨ x4∨ x5) ∧ (x1∨ x3∨ x5)
As with satisability, the solution to a POSSAT instance is YES if and only if there is a truth assignment to the variables in the instance such that every clause contains at least one true literal (i.e., the entire statement is true).
1. Give a polynomial-time algorithm to solve any POSSAT instance. Hint: all instances of this problem are trivial in one sense or another.
SOLUTION: Every literal is positive. So, setting all variables to true makes every literal true, which must make all clauses except zero-length clauses true. Thus, the answer to a POSSAT instance like this is YES if and only if the instance has no zero-length clauses.
2. Consider a slight variant of POSSAT called Limited POSSAT or LPSAT. LPSAT takes an extra argument k. The answer to an LPSAT instance is YES if and only if there is an assignment of truth values to the variables such that at least one literal in each clause is true, and at most k of the variables are true.
(a) Show that LPSAT is in NP.
SOLUTION: To show this, we need a certicate. The real answer to a LPSAT instance will be something like the set of variables to make true. We'll use that as our certicate. We can verify this in polynomial time using these steps:
check that there are at most k variables (each part of the LPSAT instance) in the certicate (linear time)
for each clause in the statement, check that it contains at least one variable that is in the certicate (linear time with modestly clever data structures, quadratic time done completely naively)
These are all the checks we need here. If they pass, the answer is indeed YES. If any fails, the certicate was not valid for the instance.
(b) CHALLENGE (but well worth practicing): Prove via reduction from the Vertex Cover (VC) problem that LPSAT is NP-hard.
Reminder: a VC instance is an undirected graph G = (V, E) and integer k. The answer is YES if and only if there exists vertices S ⊆ V with |S| ≤ k such that for every edge (u, v) ∈ E, either u ∈ S or v ∈ S (or both).
Hint: a fairly short, simple reduction exists. If yours is long or complex, try a dierent approach!
If you cannot make a complete reduction, make clear progress on important pieces and document it!
SOLUTION: We need to model the rules of VC using the rules of LPSAT. The central rule of VC is that for each edge, at least one variable needs to be included. That sounds a bit like the rules for clauses in LPSAT: for each clause at least one literal must be true. So, we'll try making a clause for each edge with a literal for each node.
For each node v ∈ V in the VC instance, create a variable v in the LPSAT instance. If in a certicate for the LPSAT instance the variable is true, we take that to mean the vertex is in the cover; otherwise, we take it to mean the vertex is not in the cover. For each edge (u, v) ∈ E, create a clause (u ∨ v). Thus, for every edge, one or the other vertex's corresponding variable will have to be true, which by our plan above would mean to us that one or both vertices will be in the cover. Set kLP SAT = kV C. Thus, at most k of the variables will be true, which by our plan above means to us that at most k vertices will be in the cover.
Clearly this reduction takes polynomial time: creating a variable takes either no time or constant time (depending on representation). Creating a clause of two elements takes constant time. We create one such clause for each edge and so take at most linear time.
Further, the reduction is correct. A truth assignment with at most k true variables in a generated LPSAT instance precisely corresponds to a vertex cover with at most k vertices in the VC graph, as described above. (The core pieces of the proof are (1) the number of variables that are true is limited to k, just as the number of vertices in the cover is limited to k, and (2) an edge in the VC instance demands that one or both incident vertices be in the cover, just as the corresponding clause in the LPSAT instance demands that one or both variables be true.) Thus, the answer to a VC instance is YES (and so there is a certicate for the VC instance) if and only if the answer to the corresponding LPSAT instance is YES (and so there is a certicate for that instance).
4 Potpourri
LIKELY RELATIONSHIP TO FINAL EXAM: Asking about preprocessing would be fun on the nal exam. However, we won't necessarily have a problem related to the domain of consecutive sums
discussed here.
Suppose we have an array A of size n. For some value k, we want to know if A contains a set of consecutive elements that sum to k.1 We don't need to know what the elements are; we just want a YES or NO answer. We'll call the question about whether some set of consecutive elements sums to k a query.
1. Give BOTH a brute-force algorithm to solve this problem in polynomial time AND a good asymp- totic bound on your algorithm's runtime. (Include sucient justication for the runtime, which means at minimum annotations on the algorithm.)
SOLUTION: Here's a naive, pseudocode algorithm with annotations for runtime:
// We'll return a tuple (left, right), where A[left..right] sums // to k. On failure, we'll return NONE.
BruteSumToK(A, n, k):
// CONSTANT TIME SPECIAL CASE if k = 0:
// Special case: an empty range answers the question.
return (0, -1)
// Try all possible starts/ends of sequences and sum them up.
for i = 1 to n: // n iterations
for j = i to n: // (n-i)+1 iterations
sum = 0 // O(1)
for x = i to j: // (j-i)+1 iterations add A[x] into sum // O(1)
if sum = k: // O(1)
return (i, j) // O(1) // No sequence works.
return NONE // O(1)
The three loops above dominate the runtime and take O(n3)time together. The algorithm therefore runs in O(n3) time.
We could easily improve this by noting that the sum for one (i, j) pair plus A[j + 1] is the sum for the pair (i, j + 1). That means we can dump the innermost loop and cut the runtime to O(n2).
2. Now, we'll consider preprocessing the array A to make individual queries run faster. Preprocessing means spending some computing timeand possibly memoryup front to save computing time later.
Imagine that we want individual queries to run in expected constant time. Give a preprocessing strategy that would make this possible with some analysis: (1) what you would do for preprocesing, (2) how much time and memory it would take, (3) what you would do at query time, and (4) how many queries you expect to need before this strategy saves runtime compared to your brute force approach.
1Consecutive means one immediately after the other. So, we cannot skip any elements in the middle of one of these sets of consecutive elements.
SOLUTION: Running in expected constant time says hash table to me. Besides, if you have a data structures question, the answer is probably hash tables! ;)
If we had a hash table mapping each achievable sum to a range that generates that sum, then our query algorithm could become: if the hash table contains k, return the value in the hash table at k;
otherwise, return NONE. This will take expected constant time for a hash table. (Responds to (3).) To set this up, we could use the O(n2) algorithm above, inserting the key value pair (sum, (i, j)) at each iteration. The insert would take expected (amortized) constant time. So, the algorithm still runs in O(n2) time and, with a constant load factor, takes O(n2) space (assuming i, j, k all take constant space to represent). (Responds to (1) and (2).)
Abusing notation a bit, the total time for m queries is then O(n2) + O(m). Per query, this is O(nm2 + 1). Comparing this to the original brute force approach, it's immediately better. Comparing it to the O(n2)approach, it's better as soon as m ∈ ω(1). Comparing it to the constant-time long-run performance, we reach that performance once m ∈ Ω(n2), i.e., once we issue a quadratic number of queries. (Responds to (4).)
3. Now consider a similar problem except we no longer demand that the sets of values from the array be consecutive: given array A and number k, we want to know if any subset of elements of A sums to k.
Sketch the key points in a proof that the brute force solution to this problem takes at least exponential time.
SOLUTION: How many of these subsets are there? Brute force should take at least constant time considering each one.
For each element, we have an independent choice whether to include it in a subset. Therefore, we have 2 ∗ 2 ∗ . . . ∗ 2
| {z }
ntimes
= 2n subsets, and the runtime is in Ω(2n), i.e., at least exponential.
4. CHALLENGE (but well worth practicing): Consider the Partition Problem: Given a set of numbers S, determine whether there's a way to partition S into two sets S1 and S2 such that S1 and S2have the same sum. Give a reduction from the partition problem to the not-necessarily-consecutive sum problem above.
SOLUTION: We're asked to reduce PARTITION to SUBSET SUM (in polynomial time, implicitly).
Note that this will not prove PARTITION NP-hard, although if we know PARTITION to be NP-hard, it will prove that SUBSET SUM is NP-hard.
As usual, the key idea is to simulate the rules of PARTITION using the rules of SUBSET SUM. In PARTITION, we receive a list of values and check if we can partition the list into two subsets with the same total, i.e., if some subset of that list sums to half the list's total. SUBSET SUM can ask almost the same question: given a list and a number k, does some subset of the list sum to k?
The reduction, then, just requires handing PARTITION's list o to SUBSET SUM and setting k to half the total of the values in PARTITION's list. We'll leave the rest (briey proving correctness and polynomial runtime) to you!
In the rst version of this solution, I misread the question and thought we were proving PARTITION's NP-hardness using SUBSET SUM's. Here's that solution, which is somewhat harder than the other direction:
We'll sketch the key insight. First, remember to prove PARTITION is NP-hard, we need to reduce to PARTITION from a known-NP-hard problem. SUBSET SUM is very similar and so promising, but somehow we need to make it so PARTITION picks a subset of size k. We can do that by making kbe half the total of the values.
How? Try adding one extra value that makes things work out right.
5 Other Problems/Algorithms/Concepts/Domains Strongly Under Con- sideration for the Final
This nal quiz is well worth reviewing before the nal exam!
In addition, here are some other resources or concepts well worth reviewing. This is not an exhaustive list. Instead, it's meant to be advice of problems and questions you might not otherwise review for the nal exam that are denitely worth your time.
Our own midterm and practice midterm exams
Anything from the 2014W2 CPSC 320 Practice Final Exam questions except problems 3.1 and 3.2
Bloom Filters