Chapter 6 Obfuscating Program Structure
6.2 Finding and Reusing Duplicate Sequences (RDS)
Because of the nature of bytecode, there is often a fair amount of duplication. Even within one method, a sequence of instructions might appear a number of times. By finding these duplicates and replacing them with a single switched instance we can potentially reduce the size of the method and can definitely confuse the control flow.
A complex analysis could be written to handle many duplicates of a sequence, includ-ing those that straddle try block boundaries. However, this would require much more com-plicated analysis as well as the addition of more control flow logic. We chose a simple
6.2. Finding and Reusing Duplicate Sequences (RDS)
Figure 6.1: Effects of adding dead-code switch instructions.
conservative approach instead. Every sequence of 20 instructions or less within a method is collected and then the method is checked for duplicates of these sequences. There are a number of rules which define whether a sequence is a proper duplicate or not.
• The sequence must be of the same length as the original and any bytecode instruction in the duplicate sequence bdat index i must equal the original bytecode boat index i.
• The duplicate instructions and their parameters must match the original sequence -that is to say an istore is only a proper duplicate instruction if it shares the same reg-ister parameter with the original istore and an if instruction is only a proper duplicate if it shares the same jump destination as the original.
• The duplicate instruction bd at index i must be protected by exactly the same try blocks as the original instruction bo at index i. If the original is not protected at all, neither can the duplicate be protected. This simplifies the analysis a great deal.
• Every instruction in a sequence other than the first must have no predecessor in-structions that fall outside the sequence. This prevents the analysis from having to
Obfuscating Program Structure
verify that the original and duplicate sequences share the same predecessors, further simplifying the problem.
• Each instruction in the duplicate sequence bd at index i must share the same stack as its counterpart in the original sequence bo at index i. This ensures that the verifier will not complain and that each instruction in the method will always have the same initial stack height with the same initial stack type ordering.
• No instruction within the duplicate sequence can overlap with the instructions in the original sequence as per their layout within the method.
The algorithm performs rounds, starting with the longest sequences first. Thus, in the first round the algorithm collects candidate sequences of length 20 and then attempts to find duplicates of those candidate sequences. If any are found, the algorithm then creates a new method local of integer type which will act as a control flow flag. The duplicate sequences are removed and replaced with an initialization of the control flow flag to a unique number (usually 1, 2, 3, and so on, as the duplicate sequences are removed) followed by a goto instruction directed at the first instruction in the original sequence. When all duplicates have been removed, the original sequence is prepended with two instructions which store 0 to the control flow flag and appended with a load of the control flow flag followed by a tableswitchinstruction. The default action is to fall through to the next instruction (the original successor of the original sequence). A case for each duplicate sequence is added to the table which results in a jump to the original successor instruction for that sequence.
After the first round, the sequence length is decremented and the second round is run.
This round of candidate selection and duplicate detection is performed on the new version of the method which might have been changed by the previous round. This is repeated until the sequence length is less than 3. No doubt, sequences of length two are heavily repeated throughout Java bytecode and replacing these would only increase the overall method size and decrease the overall performance. For each duplicate sequence there are three instruc-tions added: the unique integer constant push onto the stack, the store of that constant to the control flow flag, and the goto instruction which jumps to the top of the original se-quence. For each original sequence, a push/store instruction pair is prepended to it
6.2. Finding and Reusing Duplicate Sequences (RDS)
and a tableswitch is added to the end of it. The tableswitch size, in the class file, is dependant on the number of cases in the instruction. The more duplicate sequences, the larger the space required to store it. Overall, this is three extra instructions for each duplicate, plus one extra case in the tableswitch and two extra instructions for each origi-nal sequence plus one tableswitch instruction: 3 + 4*duplicates. Clearly, in order for this transformation to reduce the class size, the sequence length has to be at least seven if there is a single duplicate, at least 6 if there are two duplicates, at least 5 for three, etc. An example can be seen in listings6.3and6.4, which show a method’s code before and after obfuscation, respectively.
protected static void bitreverse(double data[]){ int n=data.length/2;
Listing 6.3: Example method bitreverse before duplicate sequences have been resolved into one.
Obfuscating Program Structure
Empirical testing has found that it is possible to occasionally find duplicate sequences of high magnitude. Lengths of 17 and 19 have been seen and almost all benchmarks tested had some duplicates of length 8 or more. Nevertheless, the majority of duplicates found are in the 3 to 5 length range. While these are not productive in reducing the class file sizes they are very useful in confusing control flow. Additionally, performance testing finds that in many cases this transformation can decrease the running times, suggesting this is a possible area of study for compiler optimization.
6.2.1 Performance Results (RDS)
The effects of duplicate sequence reuse can be fairly significant as seen in Figure 6.2. In some cases performance times are improved in server mode, yet a number of benchmarks show slowdowns in interpreted mode. This is likely due to positive JIT optimizations or improved instruction caching. The transformation is usually enlarging methods and effect-ing the amount of control flow overhead so it is unsurpriseffect-ing to see measurable slowdowns in interpreted mode.
Figure 6.2: Effects of duplicate sequence reuse.
6.2. Finding and Reusing Duplicate Sequences (RDS)
protected static void bitreverse(double[] r0){ // variable declarations and initializers removed.
label 2: while (i2< i1) {
default: if (i2>= i3) break label 1;
$d1 = r0[i4];
Listing 6.4: Example method bitreverse after duplicate sequences have been resolved into one.
Decompiled by Dava. The code is semantically equivalent but much more difficult to read. Local variable declarations were removed for space considerations.
Obfuscating Program Structure
The LU benchmark is the only one to show a marked slowdown in server mode — almost 30%. The slowdown is almost entirely encapsulated in a single method, factor, which is the same method that is isolated in Section7.2.1as the cause of a JIT slowdown (which is likely due to its complex nested structure). In this situation, the JIT is simply unable to significantly optimize the method.
Indeed, individual profiling of the factor method shows a ∼26% slowdown after transformation. Since this method accounts for well over 90% of the programs normal runtime, it explains the performance degradation.1