6.5 Performance considerations
7.1.3 Opcode-specific instrumentation
Each opcode handler is implemented in the file <Zend/zend_vm_def.h>. When custom instrumentation of an opcode is required, it should be added to the code in this file. Using the template<Zend/zend_vm_execute.skel>
and the custom preprocessor script <Zend/zend_vm_gen.php>, the implementations are compiled into specific versions depending on the operand types that are supplied during execution. Several macros handled by this custom preprocessor are used for type-specific argument handling (e.g. retrieving the value from aliteralsstore rather than the stack if the type isCONST) which should be taken into account when adding custom code to these handlers.
7.2
Data management
Although this section also discusses several implementation codepoints, the main focus is on the mechanisms that are introduced or modified for instrumentation. It relies heavily on the background of the internals of the PHP VM described in the previous chapter.
1All core functions are located in<Zend/zend.c>
7.2.1
Literals
Literals (compile-time constants) are created within the compilation code and added to aliteralscollection of the appropriatezend_op_arrayobject. Unfortunately, tracking them based on theliteralsaddress is not easy since the address tends to change during the compilation; whenever it needs to be resized, the memory is reallocated which may cause the address of theliterals collection to change3. In order to prevent having to track every
reallocation, literals are tracked based on thezend_op_arraymemory address rather than theliteralsaddress during compilation (which does remain static as the object is not dynamically sized).
Becauseliteralscollections may be used in multiple contexts during execution (c.q. the pointer value gets copied) the tracking at run-time does need to know the address of theliteralscollection itself. To support this, whenever azend_op_arrayobject is first used during execution, the instrumentation looks at the associated literals collection and modifies the address in the tracking table for any object registered relative to thezend_op_array
memory address and changes it to its counterpart relative to theliteralsmemory address.
7.2.2
Constants
Constants are global immutable variables that are defined at runtime. They can be assigned using thedefine()
function, and retrieved by using their full name or using theconstant()function. In order to properly track constants in a central location, instrumentation is located in several helper methods. This prevents code duplication between
theconstant()function implementation and theZEND_FETCH_CONSTANTopcode for direct referencing, and helps
circumventing the constant resolution process (which is not straightforward because PHP allows for case-insensitive constant names)4. Note that there is also a
ZEND_DECLARE_CONSTopcode that is used for theconstkeyword, and
overlaps with theconstant()function in functionality.
The instrumentation itself is not complicated: it simply treats constants as globally scoped variables. The input value used for thedefine()call is a data source for the constant, and the supplied name is considered a name source. When a constant does not resolve PHP falls back to using the constant name as a string, which is then considered a compile-time constant by the instrumentation.
7.2.3
Arrays
Arrays can essentially be considered recursive variables, since they can contain any number of other variables. Programmers can modify an array in full, as well as modify specific elements. These properties mean that it makes sense to both track arrays as a whole and track each individual array element. This allows more fine-grained control and tracking of the contents of the variable, and prevents using arrays as value intermediaries to hide the origins of data from the tracking instrumentation.
As already noted in the beginning of this chapter, the tracking entry structure includes a list of member elements that are in turn other entries. Since the key of an array element can also be sourced from a variable, this includes tracking name sources for array elements during creation and modification of values in particular member slots.
Whenever a data influence is added to the array as a whole, it should also be added to each individual member. This is in fact an extension of the snapshotting and flattening logic; it enables tracking the data source of a particular member at each time, without needing to resolve any relationships the entry has with other entries aside from the first-order data influence relationships.
7.2.4
Objects
Objects are essentially used for partitioning and scoping variables and functions, and controlling access to them where appropriate. The functions (i.e. class methods) are simply a specific case of user functions, with an additional set of private scopes they have access to (private instance and class properties) and several special properties (such as the$thisvariable). Its properties are managed in a way similar to arrays: a containing variable managing any number of other variables. The instrumentation of these is thus roughly the same.
A key difference with arrays is that properties can be defined and initialized both on the class (statically) and on the instance (dynamically). When a new object is created, all definitions and values of class properties are copied over to the instance. On top of that, an instance still has access to static class properties. Instrumentation of this means equipping all member access functions and object method call wrappers with logic that follows the complicated flows of data used in the lifecycles of both classes and instances.
7.2.5
External input
For all auto globals (see section 5.2.13), the containing values are annotated with the appropriate fundamental type. Although in previous sections all of these have been lumped together as ‘external input’, it makes sense to add an additional distinction between user-controlled input and server-controlled external input. The$_SESSION,
$_SERVERand$_ENVare not user-controlled while the others all are, although it should be noted that$argvand
$argcare only user-controlled in a CLI-initiated environment.
Note that all but the$argcauto globals are arrays, for which both the array itself and each member element are annotated with the appropriate fundamental type. This means that should you choose to add a member to e.g. the
$_GETarray during your script that does not originate from user input, it will not be annotated as such (although its containing array will be).
7.3
Language constructs
In this section, we look at specific flows of opcodes that need to be instrumented to enable tracking of certain information. Make sure to refer to the examples provided in the previous chapter of how certain PHP scripts are compiled and handled internally.