• No results found

A File Transfer Component for Grids

N/A
N/A
Protected

Academic year: 2021

Share "A File Transfer Component for Grids"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

A File Transfer Component for Grids

Gregor von Laszewski

Mathematics and Computer Science Division, Argonne National Laboratory Argonne, Il 60440, U.S.A.

Alunkal Beulah Kurian

Mathematics and Computer Science Division, Argonne National Laboratory Argonne, Il 60440, U.S.A.

Ravi Madhuri

Mathematics and Computer Science Division, Argonne National Laboratory Argonne, Il 60440, U.S.A.

Pawel Plaszczak

Mathematics and Computer Science Division, Argonne National Laboratory Argonne, Il 60440, U.S.A.

Corresponding Author: [email protected]

Abstract

Grids have become an important asset in large-scale scientific and engineering research. An important part of the daily work of Grid researches is the access to distributed data. It is typically as important as access to distributed computational resources. Data is available through a wide variety of data sources supporting protocols such as HTTP, HTTPS, FTP, and GridFTP. In this paper, we present the design of components and API’s that make handling of data accessed through Grids easier for the novice Grid user and programmer. We introduce the architecture of a Grid component for file transfers that separate the functionalities of enabling the actual file transfer requests, performing the file transfer, and initiating the transfer through a user interface. Our component design separates cleanly the functionality from implementation. Hence, it can be adapted to include third party file transfer protocols. An important feature is that file transfer must be exposed conveniently through an interactive component to the user in order to transparently access the Grid. Our File Transfer Component (FTC) is a collection of such interactive components. It leverages the Globus Project Grid middleware infrastructure and includes the use of newly developed Java client side GridFTP API and the Reliable File Transfer Component. Our intuitive GUI allows file transfers using drag and drop mechanism, with protocol independent interface to varied data sources. Additionally we have designed the components as Java Beans. This enables them to be integrated into Integrated Development Environment.

1

Introduction

Grids are a rapidly emerging new form of distributed computing wherein the vision is to construct an environment for the coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations [8]. To access such a complex environment, Grid computing environments and Grid portals must be provided to enable a transparent and convenient access to the Grid [13].

A frequent obstacle to the creation of advanced applications using Grid environments is based on the need to access data that is not collocated at the site on which the computation is performed. Hence, file transfer is an elementary function in Grids. Many aspects must be considers while dealing with file transfer. Such aspects include the protocol supported during the transfer, how the file transfers are organized, which

(2)

mechanism are used to instantiate a file transfer, and how the progress of file transferred can be observed. Additionally, we have to consider the different user communities Grids address. This includes common Grid users that are interested in easy to use interfaces and Grid developers that are interested n easy to use APIs and shell commands. Hence, file transfer in Grids must address the whole spectrum of the Grid architecture. We observe that in Grids, and on the Internet, files are stored on servers supporting a variety of protocols, such as FTP [11], GridFTP [1], and WebDav [9] protocols. GridFTP is a high-performance, secure, reliable data transfer protocol optimized for high-bandwidth wide-area networks that is based on FTP and developed as part of the Globus Project. WebDAV is a standard protocol for distributed authoring and versioning that defines HTTP extensions necessary to enable distributed web authoring while providing interoperability among a large number of commodity tools. Designing a single interface to connect to the various servers using different protocols provides users with a uniform view of the files located on the different servers irrespective of their protocol implementations. This enables the user to access a large number of file servers without being familiar with details about the protocol.

Hence we need to design an interface to these protocols by providing a common set of methods allowing future integration of additional protocols. Based on this common software engineering concept, we are able to support a multitude of stream based protocols, such as ftp, http and gridFTP for the file transfer.

As pointed out before, an additional concern is how file transfer requests are performed. Commodity applications such as SecureFTP [3] and LeechFTP [7] provide easy to use graphical interface, thus making them powerful tools for the Web community. They provide drag and drop for the instantiation of file transfers between clients and servers, provision to view the status of the transfer, and book marking. Unfortunately, we observe that no such tools are available to the Grid community keeping the entry level into Grid file sharing unnecessarily high. It is desirable to develop such high level component and their supporting services which must provide features such as multiple file downloads/uploads, recursive directory transfers, transmission management through queues, and quality of service guarantees. Besides the file transfer, we must also enable extensions to our Grid activities that include the design and implementation of an elementary set of such components. Besides file transfer we are also concerned with also job submission and information browsing. We have recently started the development of such components as part of a larger effort to produce an Open Grid Computing Environment that promotes the easier use of Grids and can be based on a multitude of Grid toolkits, includingThe Globus Toolkitversion 2 and version 3.

In the rest of the paper, we present a File Transfer Component (FTC) that allows protocol independent data access as well as file browsing capabilities for Grids. We also present a convenient Java client library that allows developing such components in modular fashion by the community. We first describe the requirements and our proposed solution. Next, we describe the design and implementation of FTC respectively. We conclude the paper with use cases, extensions, and future developments.

2

Requirements

The requirements for the design of our architecture follows the usual software engineering principles. We promote reusability, integratebility, and portability on multiple levels. On the protocol level we must support a variety of protocols, as already mentioned in the previous section. Apart from protocol dependency in accessing data and the need for an interactive interface for transferring files. In summary we like to address the following issues:

• Allow uniform access mechanisms to support wide variety of data sources that may include common

stream-based data services such as FTP or HTTP.

• Provide transparent and easy access to data in Grids by using both, a GUI and a Command Shell.

• Support expandability by providing mechanisms to integrate new protocols.

• Allow reuse and composability within Integrated Development Environments.

(3)

• Provide portability and minimize development effort through the choice of an apropiate choice of technologies.

Although in our requirements are language and framework neutral, we have decided to implement the component in Java allowing us to fulfill the requirements of easy porting and deployment as discussed in [12].

3

Architecture of the File Transfer Component

Our architecture is based on a number of reusable components. Figure 1 shows the architecture of the file transfer component (FTC). It consists of three main components, namely the graphical user interface, the access manager, and the transfer manager.

File Transfer Grid User Interfaces (FTC and Grid Shell)

Access Manager Component Transfer Manager Component RFT FTP Third party Grid UrlCopy FTP Grid FTP HTTP

File Access providers File Transfer providers

Access interface Transfer interface

File system File Servers

Figure 1: The architecture of the file transfer component

• Theaccess manager can connect to a data source independent of the protocol, through a single

inter-face. It is responsible for accessing the local and remote files through. It provides uniform access to all the file transfer service providers independent of the protocol they support by interfacing through Ac-cess interface. The protocols we are currently supporting are ftp, gridftp, and http. The user initiates the file transfer requests through the access manager.

• Thetransfer manager manages specific data transfers. It performs the actual movement of files from

one location to the other. It takes the transfer requests specified through the graphical user interface and schedules the transfers to the servers involved. The data movement can be provided by reusing services such as Grid UrlCopy or any FTP server providing third party transfers. For instance, if the user needs reliable file transfers between the GridFTP servers then he can use the reliable ftp server (RFT) to initiate reliable transfers with quality of service provisions.

• The user interface supports uniform display of files and directories while at the same time allowing

drag and drop features for wide variety of data sources. Additionally, a Grid shell that allows a command line interaction with the file transfer servers is under development. It is this component that interacts with both the access manager and transfer manager. It communicates with the access manager to provide graphical representation to the files located in the local as well as remote systems. It gets the file transfer requests from the access manager, and queues the requests and send them to a transfer manager that performs the actual file transfers. Internally the graphical user interfaces ruses a generalized broker for file and job execution.

(4)

We integrated these reusable components into a single component that we name the File Transfer Compo-nent (FTC) for Grids. The implementation of our FTC architecture is based on the JavaBeans [2] compoCompo-nent model, thus providing reusability of the components in Integrated Development Environment. Hence, our components interact with other components, encapsulating a set of functionalities while being based on a clearly defined interface that conforms to a prescribed behavior common to all components within our architecture. Thus, multiple components may be easily composed to build other components.

3.1

File Access Providers

As indicated previously, the FTC supports multiple protocols. This possibility is enabled through the provision of so called file access providers. We currently support basic providers for FTP, GridFTP, and HTTP. The file access providers allow us to access a file system based on API calls that are implemented while using the appropriate protocol. These providers can than easily be integrated into the display component thus the support of new protocols is possible by only designing new file access providers reused within the proper instantiation of the access manager.

The GridFTP [1] provider is of special interest to the Grid community It is an extension to the FTP protocol, which uses the GSI (Grid Security Infrastructure), for authentication. This will allow grid applica-tions to have ubiquitous, high-performance access to data in a way that is compatible with the most popular file transfer protocol in use today, FTP. A unique feature of GridFTP is the possibility to perform multiple file transfers using parallel streams, instead of having just one TCP connection between the sender and the receiver. In this model, multiple connections are opened with multiple sockets in charge. Data to be sent are split into ’n’ partitions and fed down the ’n’ connections through the ’n’ sockets. At the receiver end, these data partitions are reassembled back. This approach manifolds the efficiency of data transfers with maximum use of network bandwidth. Since the TCP send and receive buffers are specific to one socket, having parallel sockets effectively increases the buffer sizes without any change to the kernel parameters.

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It allows messages to be in the format of MIME-like messages, containing metainformation about the data transferred. Like any generic data transfer protocol, HTTP cannot regulate the content of the data that is transferred, nor is there any a priori method of determining the sensitivity of any particular piece of information within the context of any given request.

3.2

File Access Interface

The FTC component contains an access interface that allows multiple file transfer protocols to be supported while implementing them. It defines basic operations such as connecting to a remote server, changing to a directory, listing the files present in a directory, making a new directory, deleting files and directories. The pseudo code for the interface is shown in Figure 2.

The meaning of these methods are straight forward. The connectRemote() method is used initially to setup the connection with the Remote Server. The chdir() changes the location of the directory. The list() method is used for retrieving all the subdirectories and files present in a directory. The other methods provide the basic functionality required to support a simple file transfer protocol.

3.3

File Transfer Providers

In contrast to the file access providers that enable access to the actual filesystem, the file transfer providers are responsible for accessing file servers and managing the actual file transfers. File transfer requests contain the entire information needed for performing such transfers. In order to support the Grid we need to be able to access third party file transfer to allow file transfers between two different remote data sources initiated by a third party. We support such third party transfers in the Grid through file transfer providers that reuse the Java CoG Kits GridUrlCopy, which internally supports the protocols http, https, ftp and gsiftp; and the Globus Projects Reliable file Transfer service (RFT) [4], which provides an easy to use reliable file transfer service.

(5)

interface AccessInterface { public void connectRemote(); public void chdir(String dirname); public void list();

public void mkdir(String dirname); public void rmdir(String dirname); public void rmfile(String filename);

public void rename(String oldname, String newname); public void refresh();

public void disconnectRemote(); }

Figure 2: Pseudo-code for the access interface of the FTC .

Internally RFT uses the GridFTP client libraries are provided by the Java CoG Kit. It allows the transfer of files with quality of service provisions. Thus if a file transfer is interrupted at one point, the reliable file transfer service will assist in restarting the transfer at another time and will try to complete the interrupted transfer. Problems like dropped connections, machine reboots, temporary network outages, and etc, are dealt with automatically. This functionality is much the same as commonly supported in commodity file transfer clients. Nevertheless, it supports third party transfers and as it is run as a service reconnection at a later time without interrupting the transfer between two machines on which GridFTP servers must be installed.

3.4

File Transfer Interface

The Transfer Manager communicates with a file transfer provider using this interface. The pseudo code for the interface is simple with just two methods as shown in Figure 3.

interface TransferInterface {

public setFromUrl(String url); \\ public setToUrl(String url);\\ public startTransfer();

public suspendTransfer(); public resumeTransfer(); public cancelTransfer(); }

Figure 3: Pseudo code for the transfer interface of the FTC

The functionality of the methods of the interface are straight forward. The setFromUrl() provides the information regarding the details about the file that needs to be transfered. The setToUrl() provides infor-mation regarding the location where the file needs to be transferred. The String url which is taken as the parameter follows the usual syntax:

FtpRFC,

[protocol]://[user]@[host]:[port]:[file]

(6)

3.5

Graphical User component

The front end of the File transfer component can be implemented ina variety of frameworks. Initially, we have implemented it while using the powerful SWING framework. Nevertheless we envision to implement the component also as a Jetspeed portlet to allow the integration into portals with little software installed on the client side. Nevertheless, we believe that a more powerful component that supports drag and drop is needed for the easy use of Grids. The current generation of web browsers simply do not support such a modality.

Our prototype component consists of monitoring and file access windows designed as Java internal frames and are specified as JavaBeans. A Java Bean is a reusable software component that can manipulated visually in an Integrated Development Environment (IDE). FTC provides the components as Java Beans. These beans can be used in IDE such as JBuilder to develop a Grid Environment. The file access frames are drag and drop enabled. FTC offers elaborate directory browsing and directory manipulation functionality, which works identically for both local and remote hosts. The user is able to:

• View the directory entries in tree structure.

• Transfer files between two windows with few clicks of the mouse.

• Refresh entries.

• Make new directories.

• Copy files/directories.

• Delete Files.

The Monitoring window supports queuing of jobs, checking status, setting the options etc. The Message window displays the details regarding the actual file transfers. The graphical user interface is shown in Figure 4 and 5.

Figure 4: The GUI interface of the file transfer component.

3.6

Important Features of FTC

(7)

Figure 5: The GUI interface to the reliable file transfer component.

3.6.1 Authentication

GridFTP includes extensions to use the Grid Security Infrastructure (GSI) [6] for authentication. GSI provides a number of useful services for Grids, including mutual authentication, single sign-on and protects your critical and sensitive data by providing strong data encryption.

The authentication in FTC is performed through the visual-proxy-init program distributed with the Java Cog Kit. Authentication is necessary to access all GSI enabled services such as GridFtp and RFT. FTC provides the ability to start up the proxy, check the proxy information or destroy the proxy.

3.6.2 Connecting to different servers

Users have the flexibility to login to any number of the FTP or GridFTP servers. Files will be displayed in the tree structure in separate internal frames. The user can perform basic file system operations on any of the files and as well as transfer files by dragging and dropping them between the windows.

3.6.3 Using RFT

RFT can be used for achieving reliable file transfers between GridFTP servers. For using this service, the RFT server needs to be installed [5]. The settings of the RFT server such as the name and the port have to be set in the graphical interface before using RFT service. The file request for file transfers will appear in the Request queue and they can be submitted to the RFT server by clicking on the Submit button. The Status queue enables the user to explicitly control the file transfer at the RFT server. The RFT service also provides the user to view the performance of a transfer over time as a two-dimensional line plot using Netlogger [10].

3.7

Deployment

We are currently also distributing the component through convenient use of the Java WebStart technology, this providing a similar mechanism with which for example browser plug-ins are distributed, such as acrobat reader, or real player. Automatic updates are an additional nice feature. Mor information about the use of Webstart within Grids can be found in [12].

(8)

3.8

Java Client API to GridFTP

The Java client API has most recently been developed as part of the Java CoG Kit to support interface to FTP and GridFPT protocols. This library will be integrated in upcoming releases of The Globus Toolkit version 3. It implements following features: file storage and retrieval to/from FTP server (client-server Transfer, third party transfer, ASCII and IMAGE data types, file data structure, non print format control, stream transmission mode, operation in passive and active server mode, parallel transfers, striped transfers, restart markers, and performance markers. More information about this API can be found on the Java CoG Kit web pages.

4

Extensions

Application developers can benefit from the fact that FTC provides an easy interface to support new protocols like WebDav. The task of integration is simplified, as the developer only has to implement a single interface to incorporate into FTC.

Since FTC is designed as java beans, it is easy to integrate them into visual development environment such as JBuilder. For instance, we can develop applications such as file explorer using the tree frame bean and an editor bean to display the contents of the file.

The components we defined are general enough to be reused effectively for other similar applications. For instance, we could have list of jobs in one window and a list of machines being displayed in another window. Dragging and dropping the jobs on the machines generate job requests. These requests could be delegated to the Job Manager to monitor job execution.

5

Conclusion

The work performed in this project brings a user-friendly environment for interactive and transparent file access to Grid users. Such an interactive model is required for computationally mediated science experiment that requires a large degree of interaction with the system during the experimentation. It effectively supports the integration of a new protocols that support file transfers. The bean components designed for FTC are reusable in an Integrated Development Environment such as JBuilder. As we use throughout the project interfaces we can replace in future already implemented components with more sophisticated once. This will also allow us to use the same component to build a uniform interface between Globus Toolkit version 2 and Globus Toolkit version 3 (OGSA) servers This project is a continuation of advanced services that we provided in the Java CoG Kit. This component will be distributed as part of the Java CoG Kit.

6

Acknowledgements

This work is supported by the Mathematical, Information, and Computational Science Division subprogram of the Office of Advance Scientific Computing Research, U.S. Department of Energy under Contract W-31-109-Eng-38. DARPA, DOE, NSF support for the Globus Toolkit research and development.

We thank Dr. Xian-He Sun and Vijay Goup for the discussions during the development of the initial prototype. This work would not been possible without the help of the Globus team. We thank all the members of the Globus Project for the valuable help. Globus Toolkit and Globus Project are trademarks held by the University of Chicago.

References

[1] The gridftp protocol and software. http://www.globus.org/datagrid/gridftp.html. [2] Javabeans. http://java.sun.com/products/javabeans/.

(9)

[3] Secure ftp. http://www.glub.com/products/secureftp/.

[4] Bill Allcock and Ravikiran Madduri. Reliable file transfer service. http://www-unix.mcs.anl.gov/ mad-duri/RFT.html.

[5] Bill Allcock and Ravikiran Madduri. Reliable file transfer service installation instructions. http://www-unix.mcs.anl.gov/ madduri/HowTo.html.

[6] R. Butler, D. Engert, I. Foster, C. Kesselman, S. Tuecke, J. Volmer, and V. Welch. A national-scale authentication infrastructure. IEEE Computer, 33(12):60–66, 2000.

[7] Jan Debis. Leech ftp. http://www.iweb.net.au/Downloads/leech.html.

[8] I. Foster, C. Kesselman, and S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organi-zations. International Journal of Supercomputer Applications, 15(3), 2001.

[9] Y. Goland, E. Whitehead, UC Irvine, A. Faizi, S. Carter, and D. Jensen. Http extensions for distributed authoring – webdav. http://asg.web.cmu.edu/rfc/rfc2518.html, February 1999.

[10] D. Gunter, B. Tierney, B. Crowley, M. Holding, and J. Lee NetLogger. A toolkit for distributed system performance analysis. InProceedings of the IEEE Mascots 2000 Conference (Mascots 2000), number LBNL-46269, August 2000.

[11] J. Postel and J. Reynolds. File transfer protocol. http://www.w3.org/Protocols/rfc959/Overview.html, October 1985.

[12] Gregor von Laszewski, Eric Blau, Michael Bletzinger, Jarek Gawor, Peter Lane, Stuart Martin, and Michael Russell. Software, Component, and Service Deployment in Computational Grids. In Judith Bishop, editor, IFIP/ACM Working Conference on Component Deployment, volume 2370 of Lecture

Notes in Computer Science, pages 244–256, Berlin, Germany, 20-21 June 2002. Springer.

[13] Gregor von Laszewski, Ian Foster, Jarek Gawor, Peter Lane, Nell Rehn, and Mike Russell. Designing Grid-based Problem Solving Environments and Portals. In34th Hawaiian International Conference on

References

Related documents

This paper studies how these three components influence the growth of population and employment in French rural areas, exploiting a regional development model based on Boarnet’s

In addition to leader efficacy, emotional intelligence, or an individual’s ability to process and regulate emotional information (Goleman, 1995; Goleman, et al., 2002; Mayer

The Forth Replacement Crossing is currently being built across the Firth of Forth to maintain and improve reliability of a vital transport link in Scotland. The total length of the

If you insert more than one source C file into your project, MPLAB X will call the CCS C Compiler multiple times to compile each C file seperately.. After each C file is

Diijinkan menggunakan sebagian atau seluruh materi pada modul ini, baik berupa ide, foto, tulisan, konfigurasi, diagram, selama untuk. kepentingan pengajaran, dan memberikan kredit

Ifmanagement orthose charged with governance imposea limitation onthe scope of the auditor’s work in theterms of a proposed audit engagement such that the auditor believesthe

Herein I use a beauty contest game both without and with monetary incen- tives to test gender differences in the depth of strategic reasoning in China.. I confirm the

Furthermore, we prove a nonlinear mean coılvergence theorem of Baillon’s type and a weak convergence theorem of Mann’s type for generalized nonspreading mappings in a Banach