Creating a performance evaluation model for a basic Infinica Process Engine use

The performance of Infinica Process Engine processes typically is the most interesting use case, since there are often strict requirements on throughputs or response times.

For this reason, the evaluation of the modelling framework itself, as well as the appli-cability of the technique for such processes, were of major interest and the evaluation focused on this area. Hence, this section starts with describing the use case and then continues to describe the model definition.

Use case architecture

As for the use case to be modelled, it was decided for a basic document generation process in a configuration in which the templates were located on the local file system.

The setup of the architecture is shown in Figure 6.1. It consists of an instance of In-finica Process Engine hosted on a virtual machine, named VM_InIn-finica. Processes and templates are hosted on that virtual machine as well. The persistence storage is written every time the process state changes, as described in subsection 4.1.1. It is hosted in an Oracle database, running in another virtual machine, named VM_DB. The process executions are performed on behalf of the clients and may be running on one or more external machines. These machines are of no further interest in this analysis as it is assumed that the customer requests continuously arrive at the Infinica Process Engine instance.

For this use case and also the one which will be described in section 6.3, a process workflow for PDF generation will be used, as shown in Figure 6.2. The process starts in the start element by reading the process input parameters for the template and the XML data to be used, as well as for the destination URL of the generated PDF file. Then continues with preparing URLs to be read and written to later. Afterwards the process continues reading an XML input and an XSL-FO template resource to generate an FO file by performing an XSL transformation with the two read files. The FO file is used as

48 Chapter 6: Definition and Test of an LQM using the Modelling Framework

Figure 6.1: Basic use case architecture.

input to the PDF renderer (Apache FOP), which also reads a configuration file itself, and outputs a PDF stream in return. This PDF stream is written to the desired destination with the urlWriter activity. Here the end element only marks the end of the process and no output parameters are returned to the client.

Figure 6.2: PDF rendering process flow.

6.2.1 Performance model for an Infinica Process Engine process with local storage

The performance model was created by first deriving the software and then the device contention graph, finally the service times at the devices were determined by analysing

Chapter 6: Definition and Test of an LQM using the Modelling Framework 49

application logs. The diagram for the derived graph is shown in Figure 6.3. It consists of the following elements in the software contention model:

Run process The entity representing the client requests, with an arbitrary number of customers (in this evaluation 1 to 100).

Process scheduler The process scheduler of Infinica Process Engine. As it is run-ning in a single thread, it is no multiserver entity. Every process instance has to be scheduled before it can be executed. For this evaluation the scheduler time was also artificially increased by adding delays before each scheduling action, as described in section 6.1.

Execute process A multiserver entity which covers the actual process execution. It may run in Infinica Process Engine thread pools having arbitrary sizes. In this evaluation, the thread pool sizes 2 and 32 were used.

Database A multiserver entity representing the database. In this model the database was only used for persisting the process instances.

Figure 6.3: Performance model for a PDF rendering process with local template storage.

All software entities need to access devices, these are described in the device con-tention model part of Figure 6.3 which contains the following device entities:

Think time In this model, all users stay in the model at all times. When a request for a user is finished, the next request is fired. The think time defines how long a user waits between the end of the previous request and the start of the next request.

50 Chapter 6: Definition and Test of an LQM using the Modelling Framework

For this evaluation a think time of1mswas set.

Disk The disk of the virtual machine on which the Infinica Process Engine instance was running.

CPU The CPU of the virtual machine on which the Infinica Process Engine instance was running. In this evaluation the virtual machine had 24 CPU cores assigned.

Disk DB The disk of the virtual machine on which the database (Oracle) was running.

CPU DB The CPU of the virtual machine on which the database (Oracle) was running.

4 CPU cores were assigned to it.

For determining the service times of the software entities a single process was executed consecutively multiple times for building averages. Then the process execution logs of Infinica Process Engine were analysed for the service times required by the software entity execute process and the application logs were analysed for the process sched-uler. The artificial delay times for the scheduler were just added to the scheduler times obtained from the parametrisation runs without delays.

As for the database entity, the disk and CPU times were roughly estimated from the logs of Infinica Process Engine as well. Naturally from the logs it is not simply visible how much time an action spends utilising the disk and how much it spends on the CPU.

However, since the process structure (Figure 6.2) and the behaviour of the activities are known, it was assumed that the activities: urlReader, urlWriter and fopRenderer (only for reading the configuration) spend time on the disk, while the other process elements, and the activity fopRenderer, mainly spend time on the CPU. The CPU is also utilised in the time between the element executions in the workflow. Since all activities in this process flow run in the same main process execution thread, no additional CPU threads are needed. From the execution time also the total times found in the logs during which database actions were running have been subtracted. This total database time was used for the database service time estimations.

6.2.2 Results for the model with local storage

The model built in this chapter was tested with the method described in section 6.1.

This resulted in four charts, one per scheduler delay setting, as seen in Figure 6.4.

Each diagram shows the process throughput (in 1/s) over the number of active users.

Every chart holds four data sets, two for a thread pool size of2and two for a thread pool size of32. The pairs consist of the calculated values, obtained from the model outputs, and the measured values, which were the results of test runs on the actual system.

To reduce test efforts on the actual system, the data sets of the measured values contain significantly less data points than the calculated data. However, since the measured data points appear stable as soon as a certain number of users is reached, it is possible

Chapter 6: Definition and Test of an LQM using the Modelling Framework 51

Figure 6.4: Result comparison for disk template storage with scheduler delays of 0ms (top left), 25ms (top right), 50ms (bottom left), and 100ms (bottom right).

to interpolate the data between two measurement points.

In the following paragraphs the results for the individual charts in Figure 6.4 are dis-cussed.

Scheduler delay = 0ms

The model parameters for all scheduler settings were set for when no scheduler delay was in place and only a single process was being executed on the system. For this reason it seems natural that the modelled results fit the best for the test runs using a configuration closest to this, which is the one with a thread pool size of2.

In comparison the calculated throughputs for 32 threads are too high, especially for more than 30 simultaneous users. As the throughput is expected to be much higher by the model, it is assumed that an unidentified bottleneck is also missing in the designed model.

52 Chapter 6: Definition and Test of an LQM using the Modelling Framework

Scheduler delay = 25ms

With an artificial delay of25msfor every process execution, the calculated results are slightly higher than the measured data points for both thread pool sizes. Anyhow, es-pecially for many users and32threads, the results are closer to the actual values. In-teresting enough, the throughput of the model was significantly reduced for 32 threads, while on the actual system the throughput almost stayed the same as when no delay was added. This would mean that on the actual environment, the scheduler with a de-lay of25msis not the bottleneck in this configuration and the actual one is to be found somewhere else.

Scheduler delay = 50ms

The artificial delay of50msalso reduced the measured throughput values in the actual system. This means at this point the scheduler became the system’s bottleneck. The model would still expect higher throughputs though. Therefore it is assumed that the second, not identified, bottleneck is responsible for this difference. Also the two calcu-lated curves converge, while the measured curves still keep their distance. So in the measured data,32threads still result in faster response times than 2 threads. Due to this it is believed that the responsible bottleneck comes into play individually in each process execution thread and does not slow down all processes at the same time, such as the process scheduler does in its single thread.

Scheduler delay = 100ms

For a scheduler delay of 100ms, the measured and the calculated curves converge to a significantly lower throughput level. This means that at this point the unidentified bottleneck is superseded by the slow scheduler times.

6.3 Creating a performance evaluation model for an

In document DIPLOMA THESIS. Mr Matthias Schwarz. Performance modelling of distributed systems based on the example of microservices (Page 59-64)