Autonomic Cloud Computing!
Walker Davis Davis Liu Arpit Sheth
[email protected] [email protected] [email protected]
NJ Governor’s School of Engineering & Technology 2012
1 Abstract
Cloud computing enables a variety of high-performance applications to be executed with limited investment from the user. Recently, the Cancer Institute of New Jersey (CINJ) created a content based image retrieval program that ana-lyzes photos of blood smears to deter-mine if cancer cells are present. How-ever, this application is too computation-ally intensive for one computer to run in a short time frame, so it has been adapted to use CometCloud, a cloud computing framework developed by the National Science Foundation’s Center for Auto-nomic Computing at Rutgers University (NSF CAC). This method utilizes multi-ple servers in the cloud to run the image analysis algorithms and return an accu-rate result in a reasonable time period. Despite the improvement in execution time, the program would have been too difficult for users to take advantage of be-cause it lacked a simple graphical user in-terface.
To resolve this issue, a web appli-cation named CompariCell was created. On CompariCell’s graphical user inter-face, users select an image and adjust pa-rameters for database comparison. The website passes input data to a remote back-end server, which accepts the trans-ferred data and launches the Comet-Cloud and content based image retrieval programs. Finally, the output of the
en-tire process is relayed back to the front-end server to be displayed on the Com-pariCell website.
2 Introduction
Using recent advances in content-based image retrieval (CBIR), the Cancer Institute of New Jersey has developed a program that compares a picture of a patient’s white blood cells to an image database of cancerous and noncancerous white blood cells to determine if the pa-tient has cancer. While this program is very useful for doctors and medical re-searchers, it is also computationally in-tensive. The process can take up to four-teen weeks to analyze a single image if run on an average computer, which lim-its lim-its usefulness [1].
Cloud computing offers a solution to this problem; by drawing on the re-sources of multiple servers and proces-sors on demand, a far larger amount of data can be analyzed than a single ma-chine could process. The content based image retrieval program was adapted to make use of the CometCloud system, de-veloped at Rutgers University by the Na-tional Science Foundation’s Center for Autonomic Computing. CometCloud enables the image analysis process to run on various servers, such as those on the Amazon EC2 cloud and others main-tained by Rutgers University in order to distribute computational tasks and
out-Resource Usage
Time
Owned Resources
Resources Bought from Cloud
Elastic Resources
Resource Usage
Time
Maintain Extra Resources Maintain Fewer Resources
Fixed Resources
Figure 1: Typical methods of resource management are inefficient (left); buying services as needed from the cloud can meet fluctuating demand more effectively (right)
put results in a reasonable timeframe. Be-cause resource management is elastic and autonomous, the user automatically ac-quires additional processing power in re-sponse to situational demands.
For the user, this process is made simple through a web application called CompariCell. A website, located on a separate server from the other programs, provides users with a graphical user in-terface to input parameters and execute the application. Just as cloud comput-ing makes it easier for users to launch re-source intensive programs, a web appli-cation also expands accessibility because most devices with Internet access can uti-lize it.
3 Background
3.1 Cloud Computing
As computers advance in processing power and efficiency, the problems they aspire to solve also increase in complex-ity. This has resulted in an increase in the user’s need for processing power, whether it is for businesses handling
large amounts of electronic traffic or re-searchers dealing with massive data sets; however, for the average user, owning and maintaining these resources is pro-hibitively expensive [2]. Cloud comput-ing is a form of distributed computcomput-ing, where third party resources are used in order to run an application or offer a service. Instead of following the typical model of the user buying software and hardware, the "cloud" allows users to rent resources or services instead. There are many benefits to the development of cloud based applications, especially in re-gards to efficiency and scalability.
For instance, a website manager can maximize cost-effectiveness by using re-sources from the cloud rather than solely relying on their own resources (see Fig-ure 1). Typically, websites encounter fluctuations in demand (e.g. more users holiday season). If the website buys enough servers to meet the maximum demand, they will be wasting resources when demand is not as high. If the web-site buys fewer servers in order to pre-vent resource waste, they will not be able
Node Task
Overlay CometSpace
Figure 2: Visualization of the CometCloud system to meet the maximum demand, leaving
some users without access and leading to potential economic loss from unsatisfied users. A cloud based service solves this problem because of elastic resource man-agement; more servers can be bought on demand from the cloud as necessary, pre-venting waste while also pleasing users [3].
Cloud computing comes in three ma-jor varieties: (1) infrastructure as a ser-vice, (2) platform as a serser-vice, and (3) software as a service. Infrastructure as a service provides virtual machines for customers to use in order to upgrade computing power. For example, Ama-zon’s Elastic Compute Cloud (EC2) al-lows users to rent servers on a CPU-hour basis. Platform as a service, the next level up in terms of complexity, allows users to rent machines that can run a user’s cus-tomized cloud applications. This type of cloud service is typically used by appli-cation developers. Finally, software as a service presents a finished product that the renter can use on demand [4].
Cloud computing can be used in vari-ous high-performance computing (HPC) situations. One method of incorporating cloud computing into HPC applications is to simply move the entire application
into the cloud. This format runs into is-sues with cloud-based response latency, as well as virtual machine failures that can halt the application. A different ap-proach to high-performance cloud com-puting is to run a portion of the HPC application on local machines and to re-quest resources from third-party services, such as Amazon’s EC2; this is known as a HPC-plus Cloud [4].
3.2 CometCloud
The Rutgers CometCloud system acts as a HPC-plus Cloud platform with four major components: (1) the master, (2) the worker, (3) the scheduler, and (4) the tuple. CometCloud sets up an overlay server where the various cloud resources are located and connected to the network. Virtual machines called nodes are set up on the cloud proces-sors. One node is designated as the "mas-ter", which creates tasks in the form of XML files, called "tuples", and in-serts them into the "CometSpace" envi-ronment. This is a common space in which unassigned tasks are placed. Then, other nodes, called "workers", take task files as designated by the "scheduler" and process the task (see Figure 2). When a worker finishes, it reports the results
Figure 3: An example of a database image (left) and query image (right) [7] to the master and then looks in the
CometSpace for new tasks to complete. CometCloud then consolidates the re-sult of each worker into a final output for the whole process [4].
CometCloud has already been used in a wide variety of applications. For example, CometCloud was used to tie together enough resources to model oil reservoirs. CometCloud connected two BlueGene clusters, one in New York and another in Saudi Arabia, and distributed computing tasks to the machines in or-der to simulate the system [5]. Addi-tionally, it has been used in a multitude of other projects, such as making pro-jections of stock market prices using a Monte-Carlo simulation [6].
3.3 CBIR Algorithm
The image-based retrieval program was originally developed using MatLab, but later converted to Java in order to utilize the CometCloud system. Using a database of about one thousand prede-termined images of cancerous and non-cancerous cells, the program determines whether a user-submitted query image is cancerous. Query images are individ-ually compared with the images in the database and ranked according to similar-ity [1].
The pattern-recognition application utilizes a hierarchical search algorithm, which determines image similarity by
fo-cusing on the color content and distri-bution of an image. By dividing the query and database image (see Figure 3) into concentric rings, the hierarchi-cal algorithm eliminates a proportion of database images after analyzing the RGB histograms of the first ring, and contin-ues the process until a designated amount of images is left. While the hierarchical algorithm is more accurate than analyz-ing the color content of the entire image at once, it involves a far larger number of comparisons and calculations than a typi-cal CBIR algorithm , which dramatitypi-cally increases the amount of time it takes to compare an image to the contents of a database [1].
To improve flexibility, the program accepts several parameters that affect the image analysis process, including: (1) overlap percentage, (2) bin size, (3) seg-ment number, and (4) mean shift. Each of these parameters offers additional re-dundancy and accuracy at the cost of slower execution time.
• Overlap percentage instructs the ap-plication how much the current and previous patch searches should over-lap. For example, if the user re-quested that the image analysis be per-formed with an overlap percentage of 90%, then 90% of consecutive patch searches would cover the same area. In other words, the search area is
incre-mented by 10% of the width of the query patch after each comparison. • Bin size determines the number of
concentric rings that the query image is divided into; for example, for a bin value of three, the query image would be split into three concentric rings. • Segment quantity then determines
how many equal regions each ring is divided into.
• Mean-shift determines to what degree the application should cluster similar regions together to prevent repetitive results.
By manipulating these parameters, the user can change the level of accu-racy the program operates at, although increased accuracy inevitably comes at the cost of increased computing time [1]. Although each individual picture comparison can be done quickly through the algorithms in the application, the en-tire process takes a large amount of time and computing power because the query image must be compared to each of the hundreds of database images thousands of times. The process could be made much more efficient by distributing the tasks among several processors, since the tasks are largely independent.
Using the cloud, the application dis-tributes the processing in "chunks" of im-ages, giving a certain number of image comparisons as tasks to each processor. The comparisons can therefore be done in parallel, reducing the time needed to run the program for a single query im-age from fourteen weeks on an averim-age computer to a few hours or minutes on high performance cloud computing [1].
4 Methods
4.1 Application Overview
Various components are linked to-gether to form the CompariCell web ap-plication. CometCloud and the Content Based Image Retrieval programs consti-tute the "back-end" aspect of the applica-tion. The graphical user interface, which the user interacts with, is part of the "front-end" of the application. PHP and Java agents are used to communicate and transfer information between the front and back ends of the web application (see Figure 4).
The purpose for developing a web ap-plication for this project is in accordance with a key goal of cloud computing: ex-panding accessibility. By adapting the CBIR program with the CometCloud framework, a user could run a computa-tionally exhaustive program from a regu-lar machine because the bulk of the pro-cess is done remotely in the cloud. Simi-larly, the web application resides on in-ternet servers so any regular user that has internet browsing capabilities can ac-cess it. The end-user only needs enough computer resources to load a simple web-page in order to view the contents of the graphical user interface. The Com-pariCell web application effectively im-proves the user’s experience because it in-tegrates all aspects of the process and pro-vides a convenient interface for input and output.
4.2 Front-End
The CometCloud and Content Based Image Retrieval programs are not very user-friendly for doctors and medical re-searchers. To address this issue, a graph-ical user interface (GUI) was designed to simplify the execution of the program
PHP Agent
CometCloud CBIR
Front-End
GUI Website
Cloud
Back-End
HTML/CSSParameters Results
Websocket Communication
User Input/Output Start Programs
Establish Cloud Network
Run Algorithms
Results
Figure 4: Overview of CompariCell application for the intended users. This GUI was
de-veloped as part of the CompariCell web application interface, and is hosted on a separate server from the CometCloud and image processing programs.
The front-end was built using the common web technologies of HTML, CSS, JavaScript, and PHP. HTML cre-ated the fundamental structure of the webpage through organized markup. CSS was used to style the design and for-mat the layout. JavaScript helped en-hance the user experience and was pri-marily used for previewing the uploaded query image. PHP was utilized to pass input data from the GUI to the local web application server and then connect with the remote server to run the main pro-gram.
While developing the web applica-tion, there were key goals established for the interface design. The main purpose of the GUI is to make the remote execu-tion of the back-end programs as simple as possible; for instance, the text size is large and subtle whites and grays are used to improve readability. Additionally, the
input fields are responsive to user inter-actions to indicate their state of use. The fields were created using specialized fea-tures introduced in HTML5 which allow data type constraints and checks if all re-quired fields have valid data before sub-mitting the parameters. For instance, the ability to select a query image is provided by a browse system file feature. When the user selects an image, it is previewed on the GUI through a JavaScript func-tion. Inline tips are also available to aid in the user’s decision making process. Fi-nally, the submit data button is kept the largest to emphasize its prominence as the ultimate step. Special attention was devoted to practical design features in or-der to improve functionality and accessi-bility.
The CompariCell web interface man-ages the process of transferring the user’s input data and communicating with the remote back-server which runs the rest of the application. Once the user inputs the required data into the input fields on the GUI, it is checked by the browser for validity. If the data is valid, it is passed to
a PHP script using the "POST" method. The only exception to this is the selected image, which is uploaded using PHP’s "FILE" method. The verification of the image is not done by the browser, but rather by PHP conditions. If the file type and file size of the selected image are valid, it is uploaded to the local server and stored in a directory. The filename of the uploaded image is then renamed to facilitate its transfer to the remote server later in the program’s process.
The CompariCell web application then establishes a connection with the back-end server. First all input parame-ters are consolidated into a single string, using an exclamation mark to separate one parameter from another. Then a websocket is opened by the local PHP client to the remote Java server. The pa-rameter string is sent through the web-socket to the Java server which is per-petually listening for the connection. If the process is successful then the user sees a successful transaction message on the CompariCell GUI, otherwise they are alerted by an error message. The web interface then waits for an output response from the back-end server to dis-play to the user; the delivery of the out-put response varies mostly on the speed of computation done on the cloud.
4.3 Back-End
The back-end server has two main components: (1) the Java agent and (2) the image analyzing programs. The agent is used to bridge communications from the web interface hosted on a separate server to the back-end server. The image analyzing programs are com-prised of the CometCloud framework and the Content Based Image Retrieval algorithms.
The Java agent runs a Java websocket server to continuously listen for a con-nection to be established from the PHP client. Once the connection is accepted, the Java agent parses the string message sent from the PHP client through the websocket. This message contains the pa-rameters submitted by the user on the web interface and consolidated by a PHP script. Since each of the parameters is separated by an exclamation mark, the Java agent breaks each parameter individ-ually and stores them in a parameters file. The parameters file will be accessed later by the image analyzing programs when they execute. The next task for the Java agent is to transfer the uploaded query image from the CompariCell front-end interface to the back-end server. The image is copied through SCP (a secure method of copying files) and stored in the temporary directory of the back-end server so it can be discarded after use. The Java agent has now completed ac-cepting input data and the user is notified on the CompariCell web interface.
With all input data prepared on the back-end server, the Java agent begins to run the program. The agent executes a shell script which initializes the Comet-Cloud Overlay on all nodes. In order to start the overlay servers on each node, the agent accesses each node through SSH and authenticates itself to the cloud servers with a generated passkey. Then, the shell script runs the Content Based Image Retrieval program starter on the master node. The master distributes the task of analyzing the query image with the large database of comparable images to the CometSpace. From the CometSpace, each worker node takes a task, runs the computationally intensive
Figure 5: CompariCell web GUI and logo image processing algorithms, and
out-puts its result back to the cloud.
Once all tasks have been completed by all worker nodes, the master consoli-dates all of their outputs into one. The result of this entire process is then for-warded back to the CompariCell web in-terface through a websocket created by the Java agent. Finally, the agent runs an additional shell script to end all Java pro-cesses, including the CometCloud Over-lay and Content Based Image Retrieval program on all servers located in the cloud. This ensures a clean system when the CompariCell application is started again at a later time.
5 Results
The GUI, CometCloud, and CBIR work collectively to form the Compari-Cell web application. The graphic user interface allows users to intuitively up-load an image and adjust various param-eters for the image analysis. The
front-end to back-front-end web sockets and auto-mated commands allow the user’s prefer-ences and query images to be transferred and prepared for use in the application, without any additional interactions from the user. The agent on the back-end server runs several shell scripts that set up the CometCloud environment and prepares the CBIR program to analyze the query patch to the specifications se-lected by the user. The results from the CBIR process computed in the cloud en-vironment are transferred back to the front-end server to be displayed on the GUI. Overall, a computationally exhaus-tive process is able to be easily executed within a few minutes by users who do not need to own expensive computer re-sources because the bulk of the process is done remotely in a cloud. The ben-efits of cloud computing are evident be-cause the CompariCell web application itself can be launched from any internet browsing capable device.
6 Discussion
The interface designed by the group will greatly improve the utility of the application and will allow it to find a wider user base than it currently pos-sesses. Without the GUI, if one was to use the application, one would have to enter manual commands in the operat-ing systems’ terminals. For example, one must SCP the query patch directly to the back-end server, manually start the CometCloud Overlay, and choose the proper run file for the CBIR program depending on their selected parameters. With the user interface our group cre-ated, potential users only have to upload a file from their computer, input parame-ters into the appropriate fields, and click on the compute button. By streamlining the usage of the application and making global access a possibility through the In-ternet, many more medical professionals will be able to take advantage of Com-pariCell’s capabilities, made possible by the CometCloud framework and CBIR algorithms; having an efficient computer analysis of images of potentially cancer-ous cells can help professionals make bet-ter decisions and diagnoses.
7 Conclusion
In the future, the Cancer Institute plans on further improving their algo-rithm by adding new pattern recognition methods. Additional attention will also be paid to the autonomous aspect of the program, incorporating a machine learn-ing algorithm into the system that will allow the application to experimentally determine what combination of resource allocation and image chunk size is op-timal, improving image processing time and cost-effectiveness. Work has also
al-ready begun on creating an iPad version of the application, offering doctors using the CBIR program more mobility and convenience.
Cloud computing is a rapidly ex-panding field because it has wide rang-ing applications in research and indus-try. Cloud services can provide for a profitable business model as seen with the recent developments of personal on-line storage, web application software, and other services. Not only is the cloud model valuable in business, but it is also very useful in research. By providing access to high-end processing power to more scientists and researchers, cloud computing offers the ability to solve problems that would otherwise be too complex for them to manage on low-end resources.
As the CompariCell project shows, the CometCloud system has many prac-tical and beneficial uses. The concept of using the cloud as a service is already being utilized by many companies, as seen through popular applications such as Google Drive and Dropbox. As cloud computing and automation meth-ods continue to develop, users such as re-searchers and scientists will increasingly have access to otherwise hard to obtain processing power, allowing them to ef-ficiently and cost-effectively tackle even the most complex of problems.
8 Acknowledgements
Without the time and energy con-tributed by numerous people, our project would not have been as success-ful as it has.We would like to thank Dr. Ivan Rodero and Dr. Manish Parashar, our research mentors, for teach-ing us and providteach-ing guidance
through-out the duration of the project. Their contribution in developing and research-ing the CometCloud technology pro-vided the fundamental building blocks of our research. We also thank the Can-cer Institute of New Jersey for allow-ing us to add extra functionality to their Content Based Image Recognition pro-gram. Additionally, we would like to thank Stoyan Lazarov, our project RTA, for supervising our progress and help-ing to edit our paper. We also recog-nize the efforts of Program Coordinator Jean Patrick Antoine and Head
Coun-selor Adrien Perkins. We would like to express our gratitude to the sponsors that that made GSET 2012 possible: the State of New Jersey, Rutgers University, Morgan Stanley, PSE&G, Lockheed Mar-tin, and South Jersey Industries Incorpo-rated. Finally, our team is extremely in-debted to the Board of Overseers, includ-ing program director Dean Ilene Rosen, who made the Governor’s School of En-gineering and Technology possible and provided us with an opportunity to pur-sue this research.
9 References
[1] Hyunjoo Kim; Parashar, M.; Foran, D.J.; Lin Yang; , ”Investigating the use of au-tonomic cloudbursts for high-throughput medical image registration,” Grid Com-puting, 2009 10th IEEE/ACM International Conference on , vol., no., pp.34-41, 13-15 Oct. 2009
[2] Buyya, R.; Chee Shin Yeo; Venugopal, S.; , ”Market-Oriented Cloud Computing: Vision, Hype, and Reality for Delivering IT Services as Computing Utilities,” High Performance Computing and Communications, 2008. HPCC ’08. 10th IEEE International Conference on , vol., no., pp.5-13, 25-27 Sept. 2008
[3] Varia, Jinesh. Cloud Architectures. Rep. Print.
[4] Parashar, Manish, Moustafa AbdelBaky, Ivan Rodero, and Aditya Devarakonda. Cloud Paradigms and Practices for CDS&E. Rep. Print.
[5] Parashar, Manish. ”HPC in the Cloud: Blue Gene Sniffs for Black Gold in the Cloud.” HPC in the Cloud: Blue Gene Sniffs for Black Gold in the Cloud. Tabor Communications. Web. 21 July 2012.
[6] ”CometCloud Applications VaR.” CometCloud Applications VaR. Center for Au-tonomic Computing. Web. 21 July 2012.