Definition 11: Soundness of ∆ and ∆
9. HANDLING ADVANCED FEATURES FOR POINTS-TO ANALYSIS USING GPGS
9.2. Handling Structures, Unions, and Heap Data
In this section, we describe the construction of GPGs for pointers to structures, unions, and heap allocated data. We use allocation site based abstraction for heap in which all locations allocated at a particular allocation site are over-approximated and are treated alike. This approximation allows us to handle the unbounded nature of heap as if it were bounded. However, since the allocation site might not be available during GPG construction phase (because it could occur in the callers), the heap accesses within a loop may remain unbounded and we need a summarization technique to bound them. This section first introduces the concept of indirection lists (indlist) for handling struc- tures and heap accesses which is then followed by an explanation of the summarization technique we have used.
Figure 18 illustrates the GPG edges corresponding to the basic pointer assignments in C for structures and heap. Theindlev“i, j” of an edge x−→ y represents i dereferences of x and j deref-i,j erences ofy. We can also view theindlev“i, j” as lists (also referred to as indirection list orindlist) containing the dereference operator (∗) of length i and j. This representation naturally allows han- dling structures and heap field-sensitively by using indirection lists containing field dereferences. With this view, we can represent the two statements at line numbers 08 and 09 in the example of Figure 19 by GPG edges in the following two ways:
• Field-Sensitively. y−−−→ x and w[∗],[∗] −−−−−→ y; field-sensitivity is achieved by enumerating the[∗],[∗,n] field dereferences.
• Field-Insensitively. y−−→ x and w1,1 −−→ y; no distinction made between any field dereference.1,2 6 The dereference in the pointer expressiony → n on line 09 is represented by anindlist[∗, n] as- sociated with the pointer variabley. On the other hand, the access z.m on line 18 can be mapped to location by adding the offset ofm to the virtual address of z at compile time. Hence, it can be treated as a separate variable which is represented by a nodez.m with anindlist[∗] in the GPG. We can also representz.m with a node z and anindlist[m]. For our implementation, we chose the former representation forz.m. For structures and heap, we ensure field-sensitivity by maintaining indlistin terms of field names. Unions are handled similarly to structures.
Recall that an edge composition n◦ p involves balancing theindlevof the pivot in n and p. With indlistreplacingindlev, the operations remain similar in spirit although now they become operations on lists rather than operations on numbers. To motivate the operations on indlist, let us recall the operations onindlevas illustrated in the following example.
EXAMPLE9.2. Consider the example in Figure 19. Edge composition n◦ p requires balancing indlevs of the pivot (Section 4) which involves computing the difference between theindlevof the pivot in n and p. This difference is then added to theindlevof the non-pivot node in n or p. Recall that an edge composition is useful (Section 4.2) only when theindlevof the pivot in n is greater than or equal to theindlevof the pivot in p. Thus, in our example with p ≡ y−−→ x and n ≡ w1,1 −−→ y1,2 withy as pivot, an edge composition is useful becauseindlevofy in n (which is 2) is greater than indlevofy in p (which is 1). The difference (2-1) is added to theindlevofx (which is 1) resulting in an reduced edge r ≡ w−−−−−−→ x. ✷1,(2−1+1)
Analogously we can define similar operations for indlist. An edge composition is useful if the indlistof the pivot in edge p is a prefix of theindlistof the pivot in edge n. In our example, the indlistofy in p (which is [∗]) is a prefix of the indlist of y in n (which is [∗, n]) and hence the edge composition is useful. The addition of the difference in theindlevs of the pivot to theindlevof one of the other two nodes is replaced by an append operation denoted by #.
The operation of computing the difference in theindlevof the pivot is replaced by the remainder operationremainder:indlist×indlist→indlistwhich takes twoindlists as its arguments where first is a prefix of the second and returns the suffix of the secondindlistthat remains after removing the firstindlistfrom it. Givenil2= il1#il3,remainder(il1, il2) = il3. Note thatil3isǫ when il1= il2. Furtherremainder(il1, il2) is not computed when il1is not a prefix ofil2.
EXAMPLE9.3. In our example,remainder([∗], [∗, n]) returns [n] and thisindlistis appended to the indlistof node x (which is [∗]) resulting in a newindlist[∗] # [n] = [∗, n] and a reduced edge w−−−−−→ x. ✷[∗],[∗,n]
Under the allocation site based abstraction for heap, line number 07 of proceduref can be viewed as a GPG edgex−−→ heap1,0 07where heap07is the heap location created at this allocation site. We expect the heap to be bounded by this abstraction but the allocation site may not be available during the GPG construction as is the case in our example where heap is accessed through pointersx and y in a loop in procedure g whereas allocation site is available in procedure f at line 07.
EXAMPLE9.4. The fixed point computation for the loop in procedureg will never terminate as the length of the indirection list keeps on increasing. In the first iteration of the loop, at its exit, the edge composition results into a reduced edgex−−−−−−→ y. In the next iteration, the reduced edge[∗],[∗,m,n] is nowx−−−−−−−−−→ y indicating the access pattern of heap. This continues as the length of the[∗],[∗,m,n,m,n] indirection list keeps on increasing leading to a non-terminating sequence of computations. Heap 6
access where the allocation site is locally available does not face this problem of non-termination. ✷
This indicates the need of a summarization technique. We bound the indirection lists byk-limiting technique which limits the length of indirection lists uptok dereferences. All dereferences beyond k are treated as an unbounded number of field insensitive dereferences.
Note that an explicit summarization is required only for heap locations and not for stack locations because theindlists can grow without bound only for heap locations.