4.5 Node and Edge Resolutions
4.5.1 Resolution Rules
4.5.1.1 Resolutions in a single version
Before a node (or an edge) can be resolved across two versions, we need to first be able to uniquely resolve all nodes in a single version. When a symbol is processed multiple times as a result of multiple compilation units, Frappé already has an approach to de-duplicate such nodes in a single version. For example, if structure A is defined in a header file and if both foo.c and bar.c include the header file, it is de-duplicated and only a single node is created representing the structure A irrespective of the number of times A is seen as a result of the header file inclusions. To uniquely identify a structure, the combination of the symbol name, type,
source file in which the structure exists and use location attributes are used (Ref. Section 4.2.2
for use location). Next, we present some examples that illustrate the importance of including location information to resolve entities.
#ifdef BLAH struct foo{ int a; } #else struct foo{ int b; } #endif 74 75 76 77 78 79 80 81 82
Node and Edge Resolutions 81
Example 1. Figure 4.5 shows an example that highlights the importance of use location to
uniquely distinguish nodes when a pre-processor is involved. The use location of the structure
is required to uniquely distinguish it due to the pre-processor as illustrated in Figure 4.5.
Depending on whether the macro BLAH is defined, we need two different nodes created for the structure foo; if BLAH is defined, a node must be created with ‘a’ as a field and with ‘b’ otherwise. Use location (found on the containment edge) is added to name, type, source file attributes to identify them as two structures. In this example, the intricacies of the pre-processor make location an important property to distinguish nodes.
Example 2. A common local variable i may appear multiple times within the same parent. The parent refers to the parent container (function, structure, union etc.) in which the local variable is defined. Since the current Frappé model does not store scoping information, the combination of symbol name, type, source file, parent id and the variable location is used to uniquely identify a local variable. If a local variable is a part of a macro, the spelling location
(Ref. Section 4.2.2 for definition) is also required to uniquely identify it.
Table 4.3: Attributes used to resolve each NodeType apart from name and type. Function GenerateHash(v)
NodeType Resolution attributes in addition to name,type
source_file, module, primitive, directory –
namespace parent_id
parameter parent_id, index
macro source_fileid, start_line
local static_local
parent_id, source_fileid, start_line, end_line, name_start_line, name_start_column, name_fileid
other parent_id, source_fileid, start_line, signature
Table 4.3 summarises the attributes used to resolve specific node types. A hash function, GenerateHash(v), combines these attributes. Name, type and signature are attributes on nodes, while all others require traversing different incident edges of a node type. For example, in order to retrieve the parent_id, respective containment edges (such as contains, has_param, has_local) of a node must be visited and all file ids and location information are recovered from
Node and Edge Resolutions 82
the edges is fairly straightforward. Frappé graph model is a multi-graph having multiple edges between the same source destination nodes; thus source, destination, edge type combination is not sufficient to uniquely identify an edge. Therefore all the attributes on an edge, including location information, are used to uniquely identify it (in function GenerateHash(e)) as it is guaranteed to have no duplicate edges with exactly the same properties.
Once all the nodes within a single version are resolved, we define a method to resolve entities across two versions. The same rules for a single version can be applied for multiple versions. The only difference between distinguishing two nodes in a single version and a particular node across versions is the notion of time. In principle, we should be able to identify them with the
same rules. In one of our experiments (Section 4.6.2) we verify the feasibility of this approach.
However, we also need to ask several fundamental questions in order to agree on the equiv- alence of entities across versions. The effect of refactoring code in the new version may yield different interpretations for equivalence among end-users. For example, if a parameter was added to a function, do we identify the function to be a new one in the new version? If a function is renamed, can we claim that the renamed function is equivalent to the function in the previous version? The answers to these questions are goal-specific. Our objective is to provide end-users with the right level of abstraction that satisfies a reasonable set of use cases. In next sections, we demonstrate the extent to which we support these different use cases. In
Section 4.5.3 we discuss how we improve the model accounting for none or relative location information. Constructing a versioned model of code dependency graphs presents a unique set of challenges not available in other graphs and are summarised below.
• Finding the deltas. A textual change in the source files may or may not result in a change in dependency in the corresponding graph. Thus the computation of the delta should take place as a post-processing step involving a mechanism to first find equivalent entities. • Storage cost of the proposed model. The current Frappé model is lightweight, storing
only the most critical components required from the build process capturing essential semantics. Additional information such as the AST could be maintained but the cost of storage need to be considered.
• Pre-processor. Many of the challenges associated with resolving nodes within one version can be attributed to the pre-processor. As we have seen with examples, the structure of the source will have very different meaning depending on the source path of conditional compilations.
Node and Edge Resolutions 83
• Right level of abstraction for multiple versions. Determining what makes two entities equivalent across two versions that are acceptable to a majority of use cases is also chal- lenging because users will have divided assessments on what the equivalence is in the case of refactoring code.
In the following sections, we present our experiences in building a versioned dependency graph and discuss how the above challenges were addressed in the process.