1
INTRODUCTION
The BitfusionProfiler can be utilized to quickly understand the performance limitations of applications as well as the limitation of the instances on which these applications are deployed. Applications improvements, such as those obtained by utilizing Bitfusion Boost, can be quickly assessed before deployment. A scientific computing customer use-case demonstrates the utility and functionality of the Bitfusion Profiler to benchmark and profile workloads across different instance types before and after deploying Bitfusion Boost on them.
2
APPLICATION PROFILING OVERVIEW
The Bitfusion Profiler offers a complete Linux sandbox environment in a browser, into which any application or workload can be deployed. For instance, a user can easily replicate their R, python, or transcoding application on the profiler. Users can also create workload specific notes for future reference. Additionally, testing ability is integrated to ensure that during the profiling run everything goes smoothly. Upon pressing the finish button the workload is saved and submitted for profiling.
Various permutations of a workload can be created and run from the profiler dashboard. Multiple workload benchmarks can be easily compared or combined on the dashboard. Workloads can also be easily copied so that one does not have to start from scratch. Additionally, there is a pubic repository of pre-created workloads and benchmarks for users to get up and running quickly by examining existing reports as well as build on top of previous work.
3
S
CIENTIFIC
C
OMPUTING
U
SE
-C
ASE
:
B
ASELINE
P
ERFORMANCE
COMPARISON ACROSS DIFFERENT MACHINE TYPES
We used the Bitfusion Profiler (BF Profiler) to benchmark a customer use-case using the Octave language across different Amazon Web Services (AWS) machine types. Octave is a high-level interpreted language, primarily intended for numerical computations. It provides capabilities for the numerical solution of linear and non-linear problems, and for performing other numerical experiments. The basic report results for the scientific computing use-case are shown in the charts below. Several key-takeaways are immediately obvious: First, notice that for one of the instances the workload failed to run properly – indicated by the red bar. A quick look at the log files reveals that this instance has insufficient memory resources to accommodate the customer work-load. The application either needs improvement when it comes to memory management or, if that is not an option then the customer needs to opt for one of the larger instance types. Notice that for the failed run, the Profiler automatically excludes the t2.micro instance from best value calculation, as the reported run-time is not a real run-time.
Another point of interest is the poor performance of the m3.medium instance, which takes significantly longer than the other instances. A quick click on the m3.medium bar brings up the advanced report which shows the processor, memory, and file I/O performance for that instance during the profiling run. The CPU usage on the m3.medium shows that AWS is throttling the maximum CPU utilization on the m3.medium instance. Additionally, the memory utilization is also fairly high, approaching 100% on several occasions. Selecting graph regions in the detailed reports allows the user to zoom-in along the x-axis and explore data granularity of up-to one-tenth of a second.
By comparison, a click on the c4.2xlarge (the fastest machine) reveals a much better CPU utilization of nearly 100%, and a memory utilization which barely reaches 75%. However, even though the c2.4xlarge has multiple cores (4 cores, with 2 threads each), the application only uses a single core – CPU 1. This explains while the difference in run-time, as shown in the initial set of charts, between all the instances excluding the m3.medium is only ~25s. Even through many of them have multiple cores, the application is simply not taking advantage of them. Because of this limitation, the most efficient machine for this application as shown by the value graph, which depicts the number of runs which can be purchased for a single dollar, is the C3.large – a single threaded two core machine. In the next section, we will show the same benchmark run with Bitfusion Boost (BF Boost), and how these results change significantly when BF Boost is applied.
4
SCIENTIFIC COMPUTING USE-CASE: BF BOOST ACCELERATION
The charts below show the results for the exact same use case as before, no modifications were made to the actual application or the customer source code, after BF Boost was deployed across the AWS systems. BF Profiler has a report comparison feature that allows for quick visual comparison of the reports from multiple application runs. As you can see in the graphs below, BF Boost shows a clear performance gain across all the machines, ranging from 1.75x to 5x.
The results for the value graphs also change significantly. The previous value leader, the C3.large, now only places third when it comes to value. The new value leader is the t2.large which enables the customer to complete over 600 application runs for every dollar spent – a 2.8x improvement over the best achievable throughput without BF Boost. A note of caution regarding t2 instance on AWS – t2 instance are credit based, that is, unlike other instance types, once a fixed set of credits is exhausted, the T2 instances are throttled to a specific CPU level. For the t2.large this level is set to 60%, as such, for long running jobs which are CPU limited the performance of the t2 would drop significantly. The following AWS information page explains how the credits work on the t2:
How does BF Boost achieve such dramatic acceleration across all AWS instances? Once again, the Bitfusion Profiler provides an answer by showing that with BF Boost, the application is automatically capable of utilizing all the hardware resources in the system more efficiently, including using all available CPUs and threads. As a result, several of the more powerful and more costly systems now deliver the best value per dollar.
Tip: Bitfusion Profiler allows the user to select and toggle each of the measured resources, such as the CPUs in the chart above, to quickly and clearly understand which resources are utilized and how.
5
CONCLUSION
Bitfusion Profiler was utilized to quickly analyze the limitation of a scientific computing application. Limitations of the application as well as the underlying AWS instances and hardware were identified via the advanced reporting capabilities. The Bitfusion Profiler comparison feature was utilized to demonstrate up to a 5x run-time improvement for a customer application when deployed with BF Boost, without any modifications to the source code. Bitfusion Profiler and Bitfusion Boost together allow customers to obtain significantly better performance and value on AWS.