Hierarchical Performance Modeling for Distributed System Architectures
* D. Smarkusky, R. Ammar, I. Antonios and H. ShollComputer Science and Engineering Department 191 Auditorium Road, Box U-3155
The University of Connecticut, Storrs, CT 06269-3155, USA {debs, reda, imad, hasc}@engr.uconn.edu
*
This work is partially funded by the Defense Advanced Research Projects Agency (DARPA), Small Business Innovation Research (SBIR), System Synthesis Environment Phase II in cooperation with InfoPike, Inc.
Abstract
Performance modeling and evaluation techniques are essential when designing and implementing distributed software systems. Constructing performance models for such systems can require significant effort. This paper presents Hierarchical Performance Modeling as a technique to model performance for different layers of abstraction. Once the system architecture and software functionality have been specified, this model supports performance model generation for the evaluation and analysis of computation delays of software processes, communication delays of distributed software architectures, and hardware platform alternatives. A simplified example is presented to illustrate the concepts of the Hierarchical Performance Model.
Index Terms: Software performance engineering, software performance modeling, software design, queueing network models, computation structure models.
1. Introduction
With continuing motivation for distributed systems to meet performance requirements when they are initially developed, a performance modeling technique is required for evaluating the communication and computation delays caused by distributed system architectures and executing software on various hardware platforms. Performance modeling and evaluation techniques are essential when designing and implementing software systems in application domains such as air-traffic control, e-commerce, medical systems, high-speed communications, and other real-time distributed systems. Unfortunately, many systems fail to satisfy performance requirements when they are initially developed [15].
In the absence of performance evaluation techniques, engineers must design and then implement a system before detecting critical performance defects. What if
performance requirements fail to meet specified timing constraints? What if a bottleneck is known to exist in the system and the designer modifies the design with hopes of improving, but not diminishing the performance of the system? Waiting to identify performance defects until implementation/integration results in reduced productivity, increased project costs, slipped schedules, and the redesign of software, which may cause functional defects to be injected into the system. Systems that are designed for performance from the start of the software lifecycle often exhibit better performance than those employing a "fix-it-later" approach [16]. Therefore, designers should have a methodology to assist them in identifying software architecture or design modifications for improved performance.
There are three principle methods for computer performance evaluation and analysis: direct measurement, simulation and analytic modeling. The Hierarchical Performance Model (HPM) collaborates analytical and direct measurement modeling techniques based on software performance engineering [5, 15, 20, 21], software performance modeling constructs [4, 5, 14, 22], hardware and software co-design [18], queueing networks [1, 7, 12, 19], software restructuring [6, 13], time cost evaluation criteria [2, 3, 8-10, 11] and distributed computing concepts to support the evaluation of hardware platforms and software architectures for the development of distributed real-time conventional software systems.
Related analytical performance modeling techniques are presented in Section 2, followed by a detailed discussion of the Hierarchical Performance Modeling framework. Section 4 summarizes the advantages of using HPM for performance model generation and evaluation of computation, communication and hardware assessment that are critical to the development of distributed software architectures.
2. Related
work
Queueing models have been used to model performance of software systems since the early 1970s [15]. Layered queueing models provided a framework in which queueing models were utilized in a client/server fashion to model contention for both hardware and software layers [12]. Angio traces were recently introduced as performance traces that could be generated early in the software development lifecycle to provide a mechanism for combining predictive and empirical performance modeling techniques [5].
Smith defined Software Performance Engineering (SPE) and emphasized that quantitative methods should be used at the start of the software development lifecycle to identify performance defects and eliminate designs which had unacceptable performance, thus reducing implementation and maintenance efforts [15]. The criteria for constructing and evaluating software performance models is presented in [16]. The approach used by Smith consists of a software execution model and a system execution model [15]. Smith and Williams include
synchronization nodes and present an advanced model for performance evaluation of distributed software architectures [17]. Similar to HPM, these approaches utilize queueing networks to model contention delays of software architecture and hardware devices, but fail to generate performance models based on primitive operations for various hardware platforms for the evaluation of distributed system architecture alternatives.
Modeling software performance requires numerous parameters for each layer or model. Access and storage of this information within a layer or communication of information between layers is often complex. The Maps, Paths, and Resources framework used a Core model for management and storage of performance information needed by views of the framework [20]. The HPM manages and distributes performance information between levels of our model.
3. Hierarchical performance model
Performance models are abstractions of the functional and performance characteristics of a system that are
TCE or Source Code B
Cin
Computation and Communication Merge/Join Cout Branch/Fork Cin Computation Merge/Join Cout Branch/Fork Communication I1 I2 In Interrupts System Level: Application View Operation Level Task Level: Physical View System Level: Node View C D E A Module Level
!Time cost of basic primitive operations (+, *, /, …) !Time cost for the built-in functions (sqrt, sin, cos, …) !Time cost imposed by calling the function and passing arguments
CSM or TCE of Interrupt Handler 1 Sy st em De v e lo p m en t Per fo rm an ce In fo rm atio n (d) (b) (a) (c)
utilized to determine if the system satisfies performance requirements based on the user demands and system architecture. The next four sections provide a more detailed discussion of the System, Task, Module, and Operation Levels of the Hierarchical Performance Model and is shown in Figure 1. Hierarchical Performance Modeling research concepts are currently being realized in the Hierarchical Performance Modeling System (HPMS), which is being developed jointly by InfoPike, Inc. and the University of Connecticut.
Example. This paper will also present an overly
simplified example to illustrate the concepts of the Hierarchical Performance Model. Our sample application consists of a distributed e-commerce application for browsing, purchasing and managing inventory of consumer electronics and personal computer products. Customers may browse on-line catalogs or purchase products and proceed to accounting for shipment and billing. Suppliers may modify inventory and proceed to accounting for payment.
3.1. System level
The System Level is the highest level of abstraction and represents a logical view of the application. Queueing networks are utilized to model the behavior of the system (hardware and software) and its connection with the external environment. Our research extends the work of [1, 7] for open and closed queueing models to support one- and two-moment analyses. The System Level is composed of two views, the Application View and the Node View, refer to Figure 1(a). The Application View presents a global picture of the software system and represents interactions among software processes. The Node View presents a more detailed view of the queueing properties associated with each software process.
The arrival rates from external sources, software process service rates, message multipliers (number of messages departing for each message processed), coefficient of variation, number of classes and flow probabilities for each class are performance parameters to specified for this level. Links represent flow of information from one software process to another. The Cin and Cout structures represent the flow probabilities of messages between processes for specified message classes. The queue and servers represent the combined computation and communication waiting time and service delays, respectively. Multiple servers are represented, one for each message class.
Example. The System Level of our distributed
e-commerce application is shown in Figure 2. Our application contains two message classes: a customer class and a supplier class. Customers may browse the on-line catalog and then exit the system without making a
purchase. If the customer makes a purchase, a message will be sent to the Accounting process for billing and shipment. Conversely, the supplier may not browse the on-line catalog due to inclusion of competitive products and pricing, but may add, modify or delete inventory products within the on-line catalog databases. Therefore, the supplier may not exit the system until the Accounting
process has completed. These scenarios are represented within our model by the specification of flow probabilities.
3.2. Task level
The Task Level represents the physical view of the system under development and is represented using a two-queue model. Each Task Level model represents the hardware profile for a single software process, represented at the System Level. This level represents the view of a single processor. Computation, communication and interrupt costs are modeled separately at this level, refer to Figure 1(b).
Multiple computation servers are represented, one for each message class. Multiple communication "service centers" are also modeled, one for each physical channel of the allocated processor. Interrupts, computation servers and communication servers each have a corresponding Computation Structure Model (CSM) [14] (defined at the Module Level) that specifies the functionality of service to be performed. The Cin and Cout structures represent the arriving and departing flow probabilities for message classes on physical communication channels. Creation of this level requires that the software process defined at the System Level be allocated to a processor and assigned a processing power (percentage of the processor dedicated to a task).
The service rates for both computation and communication, coefficient of variation, message sizes, communication rates, interrupt service times, interrupt cycle times, and communication co-processor availability must be specified at this level for one-moment or two-moment model specification. Although Task Level performance analysis is currently based on asynchronous
Figure 2. System level (application view)
WWW Accounting Exit Exit Exit User Computers Electronics
point-to-point communications, the communication framework of HPM can be extended to support various communication protocols.
Example. The User, Electronics, Computers and
Accounting processes are allocated to different
processors. The Task Level for Computers process is shown in Figure 3. The incoming link represents the message flow rate vector [f1a, f2a], from the User process to the Computers process for customer and supplier classes, respectively. The Cin structure merges the incoming flow rates for each class and builds class-specific messages for processing in a first-come-first-served manner. Message multipliers determine the number of class-specific messages departing from the computation component. Messages arriving at the Cout structure are class-specific and must be partitioned among outgoing physical channels based on the communication process-to-channel mapping, represented as Pij. Outgoing
message links contain multi-class information for communication with other processes. When the Computers process functionality has completed, customer and supplier messages depart the Computers process at rates of f1j and f2j, respectively. Messages are sent to the User (in response to a request), Accounting (to finalize a purchase), or Exit (the User was browsing only) processes based on the flow probabilities and process-to-processor mapping information.
3.3. Module level
The Module Level allows for the specification of software components, procedures and functions, known as Computation Structure Models (CSMs) [14] shown in Figure 1(c). Software components can be specified via source code file(s), graphical models, or pre-calculated performance equations. These models may be repeated or layered to accurately represent the software structure.
Example. Because our application contains on-line
catalogs or databases, a searching algorithm would be required. A sample CSM is shown in Figure 4 for a Binary Search Algorithm utilized within the Computers
process for each customer. Each flow value, fi, represents
a discrete value, random variable or probability mass function. These flow probability characteristics correspond to the number of times the TRUE branch of the condition will be executed. Performance parameters for the functional components and decision nodes must also be specified.
3.4. Operation level
The Operation Level provides the lowest level of abstraction for performance modeling, refer to Figure 1(d). The Operation Level provides time cost measurements for primitive operations (addition, subtraction, multiplication, etc.), built-in functions (sqrt, sin, cos, etc.), function calls and argument passing as specified in each statement of the specified software component. This level of analysis determines the interactions of the primitive computer instructions with the underlying processor architecture and is dependent on the compiler, optimization settings, operating system, and other platform profile parameters.
3.5. Performance model generation
Once the application structure and the associated performance parameters have been specified, the designer may investigate the performance of the complete application or a part of it. The first case corresponds to selecting the whole application hierarchy, whereas the second case allows the designer to select the sub-hierarchy corresponding to the part of the application of interest. The model generation algorithms that are used to generate performance equations correspond to the levels of abstraction within the selected hierarchy or sub-hierarchy. A designer may specify levels of abstraction for performance model generation. Estimated performance measures can be substituted for levels in the hierarchy that have not yet been specified.
During performance model generation, performance information is communicated in a bottom-up manner. At the Operation Level, this means that performance information, measured for primitive operations based on Figure 3. Task level for computers process
User Accounting M B M B Cout P11 P12 P21 P22 Cin Supplier Processing Communication M β1λ1 β2λ2 λ1 + λ2 [f1a, f2a] [f12, f22] [f11, f21] User M P13 P23 [f13, f23] Exit Computation Consumer Processing
the hardware profile and planned implementation language, are propagated to the Module Level for substitution into Time Cost Expressions (TCE), the performance model for that level.
Once software structure and performance parameters including flow distributions have been specified at the Module Level, a performance model representing the flow of execution through a CSM can be generated. Multiple CSM performance models can be generated with the higher layer model encapsulating the performance information of the lower layers. When the Module Level functionality is represented by graphical components, a graph reduction algorithm [8-10] is required for performance model generation. The result is a performance model, or set of performance equations, for each CSM layer and source code file(s). By applying the model generation algorithm to the graph in Figure 4, we obtain the following mean (average) TCE:
T = INIT + (1+ f1)*TEST + RTN1 + f1*DIV + f1*CMP1 + (f1 - f2)*RTN2 + f2*CMP2 + (f2-f3)*INC + f3*DEC,
where fi is the flow distribution for flow i.
Performance information for each software component and control flow structure is represented in the TCE. Performance model generation at the Module Level results in the minimum, maximum, mean and variance information being calculated for each CSM. Performance information is substituted into Task Level equations for evaluating the performance of computation, communication or interrupt processing for a specific message class.
At the Task Level, performance parameters (service time, utilization, number of customers, etc.) are generated for computation and communication models. Interrupt utilization and service time are also generated. Performance equation(s) are generated on a per-class basis and combined average over all classes. The tandem-queue computation and communication model is consolidated into a mathematically equivalent single-queue model for performance model generation and substitution at the System Level on a per-class basis.
The model generation algorithm at the System Level is an extension of standard queueing network algorithms for open and closed systems. Performance equations (service time, utilization, number of customers, etc.) are generated on a per-class basis and combined average over all classes. The performance parameters and mathematical models are generated for one- and two-moment analyses at each level of the HPM.
3.6. Model summary
The HPM includes several layers of organization from primitive operation to software architecture, providing a degree of accuracy that cannot be achieved with single
layer models. The application is developed in a top-down fashion from general to more specific, but performance
information is generated in bottom-up method, thus linking the different levels of analytic models into a composite model. Quantitative performance assessment of an entire system comprising of hardware, software and communication is provided. The HPM provides a well-defined methodology to allow designers to evaluate the application based on the system requirements of his/her application and fine-tune the values of performance parameters.
4. Evaluation of distributed architectures
The performance analysis activities of our model include performance model generation and evaluation using mathematical procedures to assess an application's performance. This includes the computation time component for software processes, the communication delays of distributed processes and the evaluation of hardware platform alternatives. The hierarchical representation of a distributed application within HPM inherently provides for great flexibility in assessing these performance attributes.
Computation. The functionality of the software is
represented within the HPM model as graphical models, source code file(s) or performance equations. The time cost associated with all functionality or a subset of the functionality is collaborated within the generated performance model. This information can be propagated to the Task and System Levels of the hierarchy or
Figure 4. CSM for binary search
START END INIT DEC INC DIV RTN2 RTN1 TEST CMP2 CMP1 f1 f2 f3
evaluated independently to identify bottlenecks within critical paths of executing software.
Communication. The communication delays caused
by distributed software architectures are modeled at the Task Level of HPM. Software processes can be statically allocated to the same or different processors for performance model generation and evaluation of software process allocation alternatives. This model also supports the evaluation of inter-process communications based on point-to-point asynchronous communication channels.
Hardware Assessment. HPM supports the assessment
of hardware alternatives through the utilization of performance values of language primitives. These primitives are identified, measured, and selected, based on processor, compiler, and operating system characteristics for propagation within the model.
Each performance model that is generated contains statistical parameters that allow for evaluation and analysis at each level of the hierarchy or between levels of the hierarchy. Assessing the application’s performance at this stage will assist the user in considering different designs and hardware architecture alternatives for satisfying the performance requirements.
5. Conclusion
This paper presented a hierarchical performance modeling technique for distributed systems which incorporated different levels of modeling abstraction. An example was used to demonstrate how the software structure and architecture are represented within corresponding levels of the model. This approach supports specification and performance model generation that incorporates computation and communication delays along with hardware profile characteristics to assist in the evaluation of performance alternatives. This model is currently being extended to support distributed multiprocessor environments.
6. References
[1] Allen, A.O., "Queuing Models of Computer Systems," IEEE Computer, Vol. 13, April 1980.
[2] Ammar, R.A., and Qin, B., "An Approach to Derive Time Costs of Sequential Computations", Journal of Systems and Software, Vol. 11, pp. 173-180, 1990.
[3] Bingman, T., MacKay, B., Schmitt, M., and Havira, M., "ASAP: A Tool for Analytic Performance Prediction of Software Systems," International Conference on Computer Applications in Industry and Engineering, Orlando FL., Dec. 1996.
[4] Booth, T.L., "Performance Optimization of Software Systems Processing Information Sequences Modeled by Probabilistic Languages", IEEE Transactions on Software Engineering, 5(1): 31-44, Jan. 1979.
[5] Hrischuk, C., Rolia, J. and Woodside, C. "Automatic Generation of a Software Performance Model using an Object-Oriented Prototype," Proc. of the IEEE International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Durham, NC, Jan. 1995.
[6] Iyer, V. and Sholl, H.A., “Software Structuring for Distributed, Sequential, Pipelined Applications”, IEEE Transactions on Software Engineering, October 1989.
[7] Lazowska, E., Zahorjan, J., Graham, G., Sevcik, K., Quantitative System Performance, Prentice Hall Inc., 1984.
[8] MacKay, B.M., and Sholl, H.A., “Communication Alternatives for a Distributed Real-Time System”, Proceedings of the ISCA Computer Applications in Industry and Engineering Conference, Honolulu, HI, November 1995.
[9] MacKay, B.M., “Hierarchical Modeling for Parallel and Distributes Software Applications”, Ph.D. Dissertation, University of Connecticut, 1996.
[10] MacKay, B.M., Sholl, H.A., Ammar, R.A., “An Overview of Hierarchical Modeling for Parallel and Distributed Software Applications”, Inter. Conference on Parallel and Distributed Computing Systems, Dijon France, September 1996.
[11] Qin, B., and Ammar, R.A., "An Analytic Approach to Derive the Time Costs of Parallel Computations", the International Journal on Computer Systems: Science & Engineering, Vol. 8, No. 2, pp. 90-100, April 1993.
[12] Rolia, J.A., and Sevcik, K.C., "The Method of Layers", IEEE Transactions on Software Engineering, Vol. 21, No. 8, August 1995.
[13] Shah, S., and Sholl, H.A., “Task Partitioning of Incompletely Specified Real-Time Distributed Systems”, ICSA 9th International Conference on Parallel and Distributed Systems, Dijon France, September 1996.
[14] Sholl, H.A. and Booth, T.L., "Software Performance Modeling Using Computation Structures," IEEE Transactions on Software Engineering, Vol.1, No. 4, Dec. 1975.
[15] Smith, C.U., Performance Engineering of Software Systems. Addison-Wesley, 1990.
[16] Smith, C.U. and Williams, L.G., "Software Performance Engineering: A Case Study Including Performance Comparison with Design Alternatives", IEEE Transactions on Software Engineering, 19(7): 720-741, July 1993.
[17] Smith, C.U. and Williams, L.G., "Performance Evaluation of a Distributed Software Architecture", Proceedings of Computer Management Group, Anaheim, December 1998.
[18] Suzuke, K. and Sangiovanni-Vincentelli, A., "Efficient Software Performance Estimation Methods for Hardware/Software Codesign," Proc. of the 33rd Annual Design Automation Conference, Las Vegas NV, June 1996.
[19] Trivedi, K.S., Probability & Statistics with Reliability, Queuing and Computer Science Applications, Prentice-Hall, Inc., New Jersey, 1982.
[20] Woodside, C., "Three-View Model for Performance Engineering of Concurrent Software," IEEE Transactions on Software Engineering, Vol. 21, No. 9, Sept. 1995.
[21] Woodside, M., Hrischuk, C, Selic, B., Bayarov, S., “A Wideband Approach to Integrating Performance Prediction into a Software Design Environment”, Proceedings of the First International Workshop on Software and Performance, Santa Fe, New Mexico, pp. 31-41, October 1998.
[22] Zhang, P., "A Design and Modeling Environment to Develop Real-time, Distributed Software Systems", PhD. Dissertation, University of Connecticut, 1993.