NEWT: A RESTful Service for High Performance Computing Web Application

3 Related Works

3.1 NEWT: A RESTful Service for High Performance Computing Web Application

National Energy Research Scientific Computing (NERSC) Center is a prominent and renowned flagship high performance scientific computing facility for research sponsored by the U.S.

Department of Energy Office of Science. They realized the fact that for ease of understanding and interaction to scientific data and method, web interfaces are very interactive and sophisti-cated. Though, the jargon and plethora of web decreases the possibility to use such features by the scientists unless they have previous relevant knowledge and skill set for building web applications. As part of its mission to provide high quality resources and services that accelerate scientific discovery through computation, NERSC is helping to build a number of web gateways for science.

3.1.1 Motivation

National Energy Research Scientific Computing Center web toolkit (NEWT) is the NERSC web toolkit which aims at developing a web service to capacitate interaction with High Performance Computing) (HPC) resources via a scientific gateway. This method of interfacing with the HPC center was aimed at enabling rich client side scientific computing applications

1National Energy Research Scientific Computing Center is a high performance scientific computing facility for research sponsored by the U.S. Department of Energy Office of Science.

useful for end-to-end user workflows entirely in the web browser [CSB10]. Some of the specific functionalities to be provided to the user through this web tool kit are:

1. Run jobs and enable computing with minimal effort and make it simple and interactive.

2. Access system resources like batch jobs, status, files considering authentication for the access control.

3. Access accounting information from NERSC Information Management (NIM) system.

4. Allows user defined persistent data storage for future references.

5. Run shell commands for enabling computations.

Majority of the HPC still work using command line sessions and shell scripts that run user jobs, post-process output, and perform analysis. Certainly there is also a major section of scientists which have been perceived to be uncomfortable with UNIX command line interfaces hence leaving only experts in command line session handling to run jobs which indeed does limit the cross domain science. This eventually leads to restriction in using already well exposed information from various sources. The rationale focused is to allow users to interact with HPC resources with minimal knowledge of programming the interface. Similar to avoiding the need of installing and building the computational library on each physical entity, requiring minimal knowledge of interacting with language whose library is being used. This abstraction of the technical constraint from the use of method would allow the scientists to use the functions with ease. For example, in this model getting access to a dataset is simply a matter of entering a URI with appropriate parameters. This means that there is no need to understand the XML structure of an application framework, NEWT handles all this automatically on the back-end. The interfaces are more tightly integrated with the scientific applications themselves, thus allowing the users to focus on working directly with their jobs and data. The motivation of this thesis is indeed the same referring to the context of binding the datasets and the other resources to URIs along with appropriate parameters for the computational purpose. Further elaboration of how the URIs are designed as per to locate the resource and parameters needed to deal with the SciPy function that is to be invoked by the user, has been taken care of in the later sections.

3.1.2 Approach and Implementation

NEWT requests consist of a combination of HTTP methods with URIs, parameters, and HTTP headers. NEWT responses are typically comprised of HTTP status codes, headers and JSON (JavaScript Object Notation) objects. As suggested by Cholia et al. [CSB10], the format chosen for encoding the responses was JSON for their application. Asynchronous JavaScript, XML, Asynchronous JavaScript and XML (AJAX) and HTML5 technologies provide all the tools to interface with the HPC application. NEWT is based on the idea of encapsulating the building blocks in the form of HTTP URIs. The set of these URIs and the actions that can be performed on them, give the API. This enables group of scientists to build complete web gateways by using client-side technologies alone. The RESTful architecture style supports using many URIs, often infinite resources, each addressable through at least one URI and exposed through

3.1 NEWT: A RESTful Service for High Performance Computing Web Application

the uniform interface [RR07]. The features that outweigh REST architecture over others has been explained in section 2.4. The approach to expose the offerings as a web service using RESTful API by requesting resources via URIs is the proposal for this thesis. Though, the responses would be irrespective of the format be it JSON or XML. They would be as per the user’s choice to expect a response of a particular format unlike the one explained by Cholia et al. [CSB10]. The data storage managed in NEWT varies from that proposed in this thesis in a way that we have considered storing not just the user information on the internal memory but also the requested computational results by the user are stored on the server.

This data storage is in form of data buckets. The access to this data storage is understood by the assigned user roles. This resource handling can be done via the HTTP methods. The other resources relevant to the user for computation are accessed via HTTP methods as well. The desired SciPy function call has to be mapped to the URI and the input parameters to the same are passed as a query string. This maintains the RESTful API design structure.

Figure 3.1:Illustration of NEWT – RESTful Services for HPC [CSB10]

The architectural design for NEWT has resources pertaining to the computational functions under the scope of the framework developed along with the data storage and authentication as other resources. Since, the requirement is similar to our work, the approach will be similar too for this thesis.

The REST API is built using the Django Framework for NEWT and also for this thesis. Django offers us a flexible web framework based on the Model-View Controller (MVC) pattern,

which allows us a clean separation between the data models, business logic and front-end [GHJV95][KP88]. It also has a powerful middleware stack that allows us to easily integrate authentication, user sessions and header manipulation into our application. Django does much of the heavy lifting when it comes to URI routing and request/response handling.

This framework being one of the most powerful MVC framework in Python and also most widely adopted and documented Python framework helped in structuring the resource model and map them to URIs with ease [HKM09]. Since Django applications are written in Python, it becomes easier to access the rich eco-system of Python based tools and libraries for scientific computation. This thesis clearly focuses on using Python’s SciPy library for scientific computation and its functions for statistical analysis.

In document A service-oriented and cloud-based statistical analysis framework (Page 45-48)