Object Instance Profiling

(1)

Object Instance Profiling

Lubom´ır Bulej¹^,², Luk´aˇs Marek¹, Petr T˚uma¹ Technical report No. 2009/7, November 2009

Version 1.0, November 2009

1 Distributed Systems Research Group, Department of Software Engineering Faculty of Mathematics and Physics, Charles University

Malostransk´e n´am. 25, 118 00 Prague, Czech Republic phone +420-221914267, fax +420-221914323

2 Institute of Computer Science, Academy of Sciences of the Czech Republic Pod Vod´arenskou vˇeˇz´ı 2, 182 07 Prague, Czech Republic

phone +420-266053831

Abstract. Existing Java profilers mostly use one of two distinct profiling methods, sampling and instrumentation. Sampling is not as informative as instrumentation but the overall overhead can be small. Instru- mentation is more informative than sampling, since it intercepts every entrance and exit in the measured code, but the overhead is large. In this paper, we propose a method that collects profiling information associated with a specific object instance, rather than with a specific code location. Our method, called object instance profiling, can collect con- textual information similarly to other instrumentation methods, but can be used more selectively and therefore with lesser overhead.

1 Introduction

Profiling is a general method for determining application performance character- istics, which is based on collecting various performance data during application execution. The results of profiling typically allow identifying performance critical parts of an application, which are then prime candidates for developer attention when improving performance or addressing a performance issue.

Profiling techniques come in two basic flavors. One of them is sampling based profiling, which relies on periodic sampling³ of various data during application execution. At the very least, the collected data includes the value of the processor instruction pointer register, but other data, such as values of processor and operating system performance counters, can be collected as well. The sampled values of the instruction pointer are then mapped to individual methods and modules of the profiled application to determine their contribution to the total execution time. Sampling based profiling is generally applicable to native code

3 The sampling is driven by interrupts from a timer or a processor performance counter.

(2)

applications, because the profiled application is executed without any modification in its code. For managed execution environments such as Java, or CLR, profiling must supported by the virtual machine to obtain performance data related to application code and not the virtual machine itself. Naturally, profiling incurs overhead in the application execution, which is largely dependent on the sampling period, and the amount and kind of collected performance data.

The sampling period directly influences the precision of the results and can be configured to limit the profiling overhead.

The other technique is instrumentation based profiling, which relies on instrumenting the original application with special code to collect performance data upon occurence of significant events in application execution. Depending on the placement of the instrumentation code in the application, the events can cover a wide range of situations, including method entry and exit, performing I/O operations, etc. Due to ability to relate performance data to particular application events, instrumentation based profiling can provide more accurate and more specific information on application performance, but requires specialized tools to perform application instrumentation for particular execution environment. Overhead of instrumentation based profiling depends on the number of instrumented places in the application as well as on the kind of performance data collected. To avoid excessive overhead, the instrumentation can be performed selectively or include support for activation and passivation of the instrumentation code at runtime.

As for the kind of performance data collected during profiling, besides various performance counters, both profiling techniques often collect stack traces to provide more detailed information on the execution context. Using stack traces allows distinguishing among invocations of the same method by different callers, thus enabling, e.g., identification of code paths responsible for turning particular methods into hot spots. However, obtaining a stack trace is a time consuming operation, which significantly increases the profiling overhead. While the profiling overhead might be acceptable during development, it is undesirable to include support for instrumentation based profiling in production application to implement runtime monitoring. For this purpose, we need to limit the use of stack traces to minimum and employ different techniques for obtaining more information on execution context.

To address the issue, we propose an instrumentation based profiling technique, referred to as object instance profiling, that uses object instances instead of stack traces to obtain information on the execution context. The technique is not a replacement for stack traces, because it can be only applied on instance methods and the provided context information is not equivalent to a call stack.

But in many cases, the instance based context information is sufficient, and the technique can be also used to collect instance-specific performance data, which is difficult with stack traces. We describe the principles of object instance profiling in Section 2 and highlight the differences between stack trace and instance based context information. We provide overview of different implementation options in

(3)

Section 3 and discuss related work in Section 4. Finally, we discuss future work and conclude the paper in Section 5.

2 Instance based context information

The results of commonly used profiling techniques, hereafter refered to as code profiling, are typically mapped back to application code with the granularity of methods, classess, and packages or name spaces. If stack traces are collected during profiling, the results are parametrized by the context (call stack) in which the code executed. Different performance exhibited on instances containing different data may be observed in the same code executed in different contexts, since different callers may use different instances. If a class only has a single instance, the performance will be the same in all contexts, but we may still be able to distill valuable information from the context-specific profiles.

The proposed method, hereafter refered to as object instance profiling, does not provide explicit caller context, but allows to directly observe instance-specific code performance. The caller context provided by object instance profiling is indirect, tied to object instances. Naturally, if a class only has a single instance, the observed performance will not differ from that obtained through classic code profiling and we will be also unable to distinguish among different caller contexts, because callers sharing an object instance will appear to be in the same caller context.

If a class has multiple instances, object instance profiling requires each instance to have a unique identity, which must be mapped back to code using the particular instance. The difficulty of this task varies with the kind of application and classes it is applied to. In case of component based applications, the difficulty may be lower, because such applications tend to have a static architec- tural backbone and typically require component instances to be named. On the other hand, the difficulty will be higher for general applications with less explicit architecture, or for library classess used only temporarily in method scope.

Compared to classic code profiling with stack traces, the object instance profiling approach brings two potential advantages. The first is the reduced overhead of obtaining caller context. Even though the instance based context is not a direct substitute for call stack, it may be sufficient in many cases and, contrary to call stack, does not need to be constructed on every method invocation. The second advantage is the already mentioned ability to collect instance-specific performance data on class methods. To demonstrate the advantages of the approach, consider the following two scenarios.

In one scenario, a method uses an abstract class or an interface to work with different types of instances. In case of classic instrumentation based code profiling, a method instrumentation code will associate performance data only with the class where the method was defined. If a method calls an overriden superclass method, also the performance data for the superclass method will be collected and associated with the superclass, increasing the profiling overhead and including it in the outer method performance data. Using the object instance

(4)

profiling, performance data will only be collected in the first outer method called on an instance, without duplicating the effort when calling overriden superclass methods.

In another scenario, method code calls a particular method on multiple instances of the same class, each holding different data. If the called method execution path depends on the data, classic stack trace based code profiling will aggregate (different) performance data from all the instances, providing mean- ingless results. To obtain better results, such cases need to be identified and instrumented by hand [1], whereas the object instance based approach will provide instance-specific performance data that can be parametrized by instance content.⁴

On the other hand, stack trace based context information is available for any method, not just an instance method. The stack trace provides information on linkage among classes, which is impossible to obtain when using instance based context information. Moreover, tracking context of an instance that is used throughout the whole application is very difficult, if not impossible. That, however, can be remedied by adding stack trace context information to the object instance profiling technique. If used selectively, it can mitigate the drawbacks of the technique in special cases while keeping the overhead low in the most common cases.

3 Implementation options

Due to its nature, object instance profiling is best implemented as an instrumentation based profiling technique, which requires modifying the application code so that it collects performance data. There are many ways to instrument an application [2] and we do not intend to describe them here in detail. Even though the object instance profiling is a general technique, we are mostly interested in its implementation in a managed execution environment. Best suited for such environment are source code and byte code instrumentation techniques.

There are several ways to implement the object instance profiling. Since the work on implementing object instance profiling is still in progress, we will shortly describe each of the implementation options and only explain the basic idea without going into much detail. The common goal in all cases is to wrap the original instance and provide it with a unique name that would be stable between multiple invocations and allow tracing the use of a particular instance back to code.

3.1 Proxy objects

The most straightforward method to implement object instance profiling is to use proxy objects delegating method calls to the original instances. A proxy object will provide the same interface as the original object, but besides calling

4 Assuming the content does not change during the instance lifetime.

(5)

the original method on the target object to handle the call, the code of the proxy object will collect performance data (or trigger its collection).

Depending on the link to the original method code, a proxy can be delegation based or inheritance based. In both cases, the implementation type of the proxy object differs from that of the original, even though it implements the same interface. Delegation based proxy object is typically a separate object instance that keeps explicit reference to the target object and uses it to invoke the original methods. Inheritance based proxy object extends the class of the original object and overrides all methods that have to be instrumented. Due to inheritance, the proxy and the original share the same object instance. The main advantage of using proxy objects is precise control over instrumented instances — only the wrapped instances are instrumented and collect performance data. But there are several issues associated with proxy objects that need to be addressed in a particular context.

The delegation based proxy objects can be applied basically to any object instance, including instances of final classes. However, the most burning issue is related to object identity. Since there are two objects, there are two object references and we must ensure that only the reference to the proxy object is used by other code. If the target object leaks reference to itself to other parties as a part of method argument, return value, or global data, or if it calls its own public methods, the proxy object will be bypassed. The proxy object can include special code to replace target object reference in return values, but little can be done for the other situations in which references are leaked. The inheritance based proxy objects elegantly sidestep the identity issue, because there is only a single object and therefore a single reference. However, the inheritance based approach cannot be easily applied to final classes.

Additional issue plaguing the delegation based proxy objects is that the type of the proxy object is different from the target object. Therefore the types of all class members, method arguments and local variables intended to hold a reference to the target object must be changed to that of the proxy object, unless interfaces are used. This is not necessary for the inheritance based approach, because the proxy object class will be a subtype of the target object class.

Another issue is that to use the proxy object approach, we need to control instance creation. Instances are usually created by using the new operator, which seems easy enough to intercept, but this is not necessarily true for larger projects. In projects using Spring or EJB, object instances are wired together using dependency injection and the instances are created using reflection, based on class names contained in external configuration files. Controlling instance creation in case of Spring or EJB therefore requires special support tailored to the particular framework.

3.2 Proxy methods

An alternative method to implement object instance profiling is to wrap methods directly in the instrumented classes. The original methods will be renamed and called from the instrumentation code placed in methods using the original names.

(6)

This approach preserves object type and identity, and thus avoids most of the issues associated with using proxy objects. On the other hand, it does not provide precise control over instance instrumentation – all instances are instrumented and the instrumentation code must determine at runtime whether to collect performance data for a particular instance. When instrumenting a selected class, we can either modify only the selected class, or the selected class and all the classes along the path to the root of the inheritance hierarchy.

Single class modification. When instrumenting a single class only, all public or protected methods of the class have to be instrumented, i.e., including methods inherited from parent classes. If a class defines a method, it must be renamed and wrapped by the instrumentation code. If a class inherits a method from superclass, the instrumentation code has to explicitly call the superclass method. As a result, all public or protected method calls on the instrumented class instances go through the instrumentation stub and trigger collection of instance-specific performance data.

Event though only a single class is modified, the instrumentation is not lim- ited to that particular class. While parent classes are unaffected by the instrumentation, any classess derived from the instrumented class will be affected by the instrumentation — sometimes only partially so, when the child class overrides a virtual method. This is an undesired side effect of inheritance and to suppress it, the instrumentation code has to check the instance type at runtime and collect performance data only for instances of the instrumented class. This can be done effectively by performing the check only once during instance creation and storing the result in a boolean variable that can be consulted by the instrumentation code on each method invocation.

Instrumenting more classes in the inheritance hierarchy has one drawback, which is the accumulation of the instrumentation code from multiple classes. If an inherited method is defined in an ancestor class further in the class hierarchy, the invocation has to traverse a chain of proxy methods to reach the original code.⁵ Even though only the first proxy method in the chain will collect the performance data, the traversal of the chain of proxy methods will add unnecessary overhead.

Class hierarchy modification. To avoid long chains of proxy methods, we can spread the instrumentation along the class inheritance hierarchy. The idea is to create proxy methods only in classes which actually define the original methods, i.e., if a class inherits (but does not override) a method from an ancestor, the proxy method is created only in the ancestor. This helps to prune the chain of proxy methods, but we still need to check the instance type at runtime to determine whether to collect performance data. Moreover, while in the previous case the instrumentation code only needed to check the value of a boolean variable, in this case the value may need to be modified to disable performance data collection when calling an overriden method in the superclass. This is necessary to

5 Note that this is only true for some languages, e.g., Java. C++ makes it possible to call the original method directly from each proxy method.

(7)

avoid collecting performance data twice when a class overrides a virtual method and also calls the overriden method.

3.3 Instance identification

In our instrumentation debate so far, we have assumed that performance data will be collected for all instances of the instrumented class. To allow collecting performance data from a specific instance, we need to be able to identify the instances at runtime. The most straightforward solution is to require the instances to provide a name, either in a field or as a result of method invocation. The value should be immutable even though the identification will be probably done only once when the instance is created.

Another solution is to identify the instance by execution context in which the instance is created. The execution context can be obtained from a stack trace – in our case, this does not incur a significant profiling overhead since it is only required once during object creation. However, this can lead to the same problems as with the proxy object interceptions. Multiple instances can be created at the same place in the code, especially when the instantiation can be driven from outside the application.

4 Related work

In principle, our work could be related to many profiler projects, such as HProf [3], JFluid [4], JIP [5], JProfiler [6], JProbe [7], YourKit [8], and others. To the best of our knowledge, none of the profiler projects implements object instance profiling – that, however, is probably not the most important observation to be made.

A more interesting observation is that the overhead of the managed language profilers is not yet comparable with native profilers – while OProfile [9] claims a typical overhead of 1-3 %, JIP authors claim their profiler is extremely fast with 100 % overhead, or HProf with overhead rising easily to the range of 500- 1000 %. We believe this is one place where object instance profiling can bring an improvement. Although the profiling code is unlikely to have a significantly different overhead per observation, object instance profiling should incur this overhead less frequently than code profiling.

Another topic our work has in common with many other profiler projects is the need for instrumentation. Although this need has been around for a long time, satisfactory instrumentation methods are still not available – tools such as JVMTI [10] require bytecode manipulation, which is error prone and known to fail on some library classes, and frameworks such as AspectJ [11] exhibit similar problems on a higher level. Still, we believe we will be able to extend frameworks like AspectJ or InsECT [12] to intercept object instance invocations, to make them suitable for object instance profiling.

(8)

The analysis in Section 3 also points out that, without modifying the virtual machine internals, there might be no single way to intercept object instance invocations that would work in all situations. Unlike other profiling approaches, which (except for missing some library classes and sometimes interfering with reflection) appear to be reasonably complete, an object instance profiling tool would also have to identify situations in which the collected information is un- reliable, e.g. because of reference leaking. By providing multiple interception methods, we believe we can make object instance profiling work in most practi- cal situations.

5 Conclusion

We have presented object instance profiling as a profiling method that can, compared with code profiling, achieve lower overhead and still present detailed performance information. The basic idea of the method is to intercept invocations of particular object instances, rather than particular code locations.

The problem of object instance profiling is with implementation – in cur- rent managed language environments, there seems to be no standard way to achieve object instance profiling. We work around the problem by designing three different interception methods with different combinations of advantages and drawbacks, aiming for a palette of methods of which some will likely be usable in any particular situation.

Our work is currently in the prototyping stage – the individual interception methods are being assessed in the context of a complex application case study in Java [13].

References

1. Whitehead, N.: Java run-time monitoring, Part 1: Run-time performance and availability monitoring for Java systems; page 26–29.

http://www.ibm.com/developerworks/library/j-rtm1/index.html (2008)

2. Shende, S., Malony, A.D.: The TAU parallel performance system. Intl. Journal of High Performance Computing Applications 20(2) (2006)

3. Sun Microsystems, Inc.: Hprof: A heap/cpu profiling tool in J2SE 5.0.

http://java.sun.com/developer/technicalArticles/Programming/HPROF.html (2004)

4. Sun Microsystems, Inc.: JFluid.

http://research.sun.com/projects/dashboard.php?id=90 (2003) 5. Public development: The Java interactive profiler.

http://jiprof.sourceforge.net/ (2005) 6. ej-technologies GmbH: JProfiler.

http://www.ej-technologies.com/products/jprofiler/overview.html (2003) 7. Quest Software, Inc.: JProbe. http://www.quest.com/jprobe/ (2003) 8. YourKit, LLC: YourKit. http://www.yourkit.com/ (2003)

9. John Levon, P.E.: OProfile. http://oprofile.sourceforge.net (2003) 10. Sun Microsystems, Inc.: JVM Tool Interface.

http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/ (2004)

(9)

11. Eclipse Foundation: AspectJ. http://www.eclipse.org/aspectj/ (1998)

12. Anil Chawla, A.O.: A generic instrumentation framework for collecting dynamic information. In: ACM SIGSOFT Software Engineering Notes. (2004)

13. itemis AG: Q-impress enterprise soa showcase.

http://www.q-impress.eu/wordpress/software/ (2009)