• No results found

Evaluating parallel file system security

N/A
N/A
Protected

Academic year: 2021

Share "Evaluating parallel file system security"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

Evaluating parallel file system security 1. Motivation

After successful Internet attacks on HPC centers worldwide, there has been a paradigm shift in cluster security strategies. Clusters are no longer thought of as just a collection of individual computers but rather as an integrated single unit in which any breach may result in a "class break" compromise of the entire cluster. These are the words from http://www.projects.ncassr.org/cluster-sec/ccgrid06/ which is a community that reviews the needs of cluster environment. A parallel file-system is a part of any such cluster environment and thus providing security at this file-system becomes essential. This project aims at providing a variety of security interfaces to the file-system for the varying performance budgets of the cluster environment.

2. Introduction

A parallel file-system is one where data is striped across many storage nodes across a high speed network. There exists many such parallel file-systems viz. GPFS, Lustre, Parallel Virtual File System [pvfs_home]. The most popular open source file-system in both the research and cluster users community is PVFS. Thus, this project aims at adding security interfaces to the existing PVFS file-system.

The security interface that is done for this project mostly resembles the ones provided for a networked file-system. Some of them include,

(1) Providing a secure communication path between clients/io servers/meta-data servers.

a. Per node basis b. Per file basis

(2) Providing encrypted storage on the IO servers a. Per node basis

b. Per file basis

(3) Providing encrypted storage of all the meta-data information

There are a few exceptions in this project’s security policies though, from the rule that “meta-data is more valuable than the data itself”.

The existing security libraries that are being used to incorporate the above security interfaces in the file system include,

(1) Setkey application to enable ipsec between two nodes in the network

(2) Openssl library for establishing secure socket connections between nodes at a tcp connection granularity.

(2)

3. Installation and setup of the PVFS on Penguin machines The initial setup of the machines looked like the following figure1.

Figure1: Showing Penguin15 to be the client and Penguin17, 18, 19, 20, 21 to be the io servers and penguin16 is the metadata server.

The reason for choosing such a simple configuration initially was to check the maximum performance overhead of adding security modules.

The above setup is currently configured for a VFS interface and also for the PVFS library interface. The benefits of using a VFS interface is that, existing binaries that run on Linux applications can be run on top of the PVFS file-system.

The file-system setup procedure in simple steps is the following,

(1) Run the script /usr/bin/pvfs2-genconfig from penguin15. The configuration files which knows the <ip address, server’s role > is created

(2) Start the penguin16 server using /usr/sbin/pvfs2-server /etc/pvfs2-fs.conf /etc/pvfs2-server.conf-penguin17 –f

(3) Similarly start other file-system servers at penguin17,18,19,20,21

(4) Then, include fstab entry in penguin16 (client) as follows, tcp://penguin15:3334/pvfs2-fs /mnt/pvfs2 pvfs2 default,noauto 0

(5) Mount the file-system at penguin16 using mount –t pvfs2 tcp://penguin15:3334/pvfs2-fs /mnt/pvfs2

Using the VFS interface involves the step of, loading the kernel module. The kernel module was loaded on to the penguin16 (client) machine and successfully evaluated.

(3)

This enables the use of “fopen”, “fread”, “fwrite” calls provided on a parallel file system directory.

A large write application is currently used to “test” the working of the security interfaces that is being currently added.

4. Source code of PVFS relevant to the work

The PVFS written by a group from Clemson University & ANL is organized as follows, (1) Client side code with the system interfaces to

a. lookup the handle of a file from the records in the meta server

b. Read from the io nodes with the handle & distribution parameters obtained from the meta servers.

c. Set an extended attribute to any file in the file-system

d. Maintain an attribute cache of recently looked up file handles e. Other functions such as rename a file, get the statistics of a file etc. (2) BMI layer code

a. Open a tcp/ip socket connection.

b. Send and receive data between clients, meta servers, io servers via the connection

(3) TROVE layer

a. This layer looks after the data storage on local file-system and the database

b. Trove layer at the io server takes care of storing data in the local file-system

c. Trove layer at the metadata server takes care of storing data in the Berkeley DB provided at the metadata server.

(4) Meta server side code looks after,

a. Calling the trove layer interface with the key for the database. The key is the file name provided by the client. The record retrieves <handle, distribution> only if the credentials provided by the client is satisfied! b. Similarly, creating an entry in the database with <filename, handle,

credentials, distribution> whenever a new file is created (5) IO server side code looks after,

a. Read (or write) data from (or to) the local file-system at the IO server. This involves bridging the gap between the bmi layer and the trove layer

(4)

Figure2: The source code tree used/modified for this project

The use of these individual files (for instance create.c, geteattr.c, sys-io.c, lookup.c) will be discussed in detail in the later text.

5. Enabling IPSec between the Nodes

This security measure enables the communication path between two nodes to be encrypted. Thus, anyone listening on the network would be eavesdropping on encrypted junk message.

Advantage: This security measure just requires a key to be exchanged between the two communicating end nodes.

Disadvantage: The data that is stored in the server nodes is still un-encrypted. Thus, if the server is compromised, this scheme doesn’t help much.

IPSec Setup: IPSec involves per packet authentication as well as (optional encryption). The various authentication options include,

(1) HMAC MD5 (2) HMAC SHA1

(5)

(1) AES (128, 192 bit key) (2) BLOWFISH (192 bit key) (3) 3DES

An example IPSec script file that has to be used by the setkey application at nodes penguin15 (client) and penguin18 (io node) for enabling IPSec communication with HMAC MD5 is given by,

At Penguin15 #!/sbin/setkey -f

add 130.203.36.80 130.203.36.82 ah 10014 -A hmac-md5 "1234567890123456"; add 130.203.36.82 130.203.36.80 ah 10015 -A hmac-md5 "1234567890123456"; spdadd 130.203.36.82 130.203.36.80 any -P in ipsec

ah/transport//require;

spdadd 130.203.36.80 130.203.36.82 any -P out ipsec ah/transport//require;

A similar receive communication script has to be run by the setkey application at penguin18.

The second and third lines starting with add refer to the entry to be added to the security access database of the kernel. The fourth and fifth lines starting with spdadd refer to the entry to be added to the security policy database of the kernel. The “1234567890123456” refers to the key used by the md5 hashing algorithm. The security access database is being looked up at the ingress and egress of each packet.

Just to have a hint of the performance impact due to ipsec between the client (penguin15) and one such server (penguin 18), a response time Vs write size graph was drawn with one client (penguin15), one meta-data server(penguin16), 5 io server (penguin17 to 21).

(6)

Figure3: A simple experiment where an application wrote data (divide X-axis by 100 MB) to the io servers with and without HMAC – md5 ipsec communication between clients and servers. This is a worst case experiment to evaluate the effect of adding IPSec security. The reason being, there is only one client & thus the aggregate network bandwidth is not completely utilized. Thus, the bandwidth difference because of adding security would be felt more.

An ideal experiment

My expectations on how this graph would look for a multiple client & server communication are (I am yet to start doing performance analysis) as follows.

(1) The minimal overhead due to security modules would be experienced only when there are multiple clients. If the aggregate bandwidth can be made to reach the maximum network bandwidth, it means that processor bandwidth is not a bottleneck anymore! (2) The above point means that, the network no longer waits for the packets.

It would be really interesting to evaluate the secure file sytem using scientific benchmarks such as MPI-tile, BTIO which utilize the aggregate network bandwidth to a full extent. 0 10000 20000 30000 0 10 20 30 40 50 60 70 80 90 100

Write only application (1C, 1MS, 5IOS)

No integrity HMAC-md5 Column D

Write size (/100) MB

co

m

pl

et

io

n

ti

m

e

in

s

ec

s

(7)

6. Enabling secure channel on a per file basis in the parallel file system

Advantage: This provides differential service to different files. Thus, unnecessary performance overheads can be avoided

Disadvantage: The metadata server has to maintain the extra state about the quality of security required by each file in the file-system

The steps involved in implementing are the following,

(1) Use the set extended attribute interface to set a security attribute for each file. This can be done using the PVFS_sys_seteattr( ) which takes file-handle, uid, gid of the client process as the input parameters. Thus, the uid/gid is checked to see if the client is permitted to change the security feature. The uid/gid feature exists already in the PVFS.

(2) On a lookup of such a file, the security attribute is got from the meta data server along with the handle. Then on subsequent reads (or writes), the PVFS_sys_io( ) would inspect the security attribute and use the appropriate parameter to SSL socket. The definition of the function PVFS_sys_io( ) in sys-io.c has to be changed.

(3) The appropriate parameter (say, HMAC MD5 with ESP AES 128) is passed on to the SSL interface provided by the bmi layer.

(4) For the above to be done, the sockets in bmi layer provided by the pvfs has to be changed to secure socket layer using the OpenSSL library. Thus, the source code bmi.c at the client requires to be changed.

(8)

Figure4: The figure depicts how the io path between a client (penguin15) and io server can be secure if the meta-data’s security attribute is turned on. Penguin14(client) creates the file named “file1” and hence it is the owner. Therefore, it can change the security attributes of the file to “channel_HMAC md5”. Later when Penguin15(client) looks up the metadata server to get the handle for “file1”, Penguin15 knows the security level of “file1”. Hence, the secure socket layer for communication with the Penguin17, 18, 19, 20, 21 is being used (only penguin17 is shown in figure).

7. Enabling Encrypted storage on the io server on a per file basis

Advantage: This provides differential levels of security to each file. Also, the key exchange for each connection (session as in sockets) is avoided. Instead a key is fixed for every file over a period of time.

Disadvantages: Key revocation becomes a challenge. Also, the meta-data server becomes a “hot” location because it has the key for each file!

(9)

The steps involved in implementing the security schemes are,

(1) Use the set extended attribute interface to set a security attribute for each file. This can be done using the PVFS_sys_seteattr( ) which takes file-handle, uid, gid of the client process as the input parameters. Thus, the uid/gid is checked to see if the client is permitted to change the security feature. The uid/gid feature exists already in the PVFS.

(2) On a lookup of such a file, the security attribute is got from the meta data server along with the handle. Then on subsequent reads, the PVFS_sys_io( ) would inspect the security attribute and use the cipher library provided by the GNU Crypto 2.0.0. The definition of the function PVFS_sys_io( ) in sys-io.c has to be changed.

(3) The appropriate parameter (say AES 128 bit encryption and key for the file) is passed on to the decryption library function.

(4) Similarly for writing data to a file, the key corresponding to the file is got from the meta-data attribute. Then, the file is encrypted at the client and sent to the io servers.

(10)

Figure5: The figure depicts how the encrypted storage security attribute is turned on. Penguin14(client) creates the file named “file1” and hence it is the owner. Therefore, it can change the security attributes of the file to “channel_HMAC md5”. Later when Penguin15(client) looks up the metadata server to get the handle for “file1”, Penguin15 also learns the security level of “file1” and the AES key. Hence, while reading from penguin17, 18, 19, 20, 21 which are the io nodes, Penguin15 does an AES 128-Decryption.

8. Current status

Currently, I am working on using the user level cyrpto libraries from [crypto_library]. This provides a comprehensive function for most of the security encryption/decryption. I am also working on changing the bmi layer of the pvfs using the openssl library. I could not figure out a neat way of using IPSec for each file till now. The reason being the port number binding pattern seems to be very much different for the various system interfaces PVFS offers.

9. References

[pvfs_home] http://www.pvfs.org/

[crypto_library] http://www.gnu.org/software/gnu-crypto/ [man_pages] Linux man pages on openssl, setkey

References

Related documents

It has excellent impact strength, scratch and chemical resistance, stiffness, and minimal moisture absorption.

Field experiments were conducted at Ebonyi State University Research Farm during 2009 and 2010 farming seasons to evaluate the effect of intercropping maize with

Results suggest that the probability of under-educated employment is higher among low skilled recent migrants and that the over-education risk is higher among high skilled

Leafy vegetables (lettuce, cabbage, Ethiopian kale and swiss chard) showed the higher BCF value than the root vegetable (carrot) and tuber vegetable (potato) for all analyzed

On the Saudi Gulf coast the major environmental effects of coastal and marine use.. are concentrated in and around, Jubayl and

We will apply the regularization method to convert this mixed system (ill-posed problem) to system of the second kind Volterra–Fredholm integral equations (well-posed problem)..

♦ Be sure your forms, policies and procedures are state law compliant as well as HIPAA. ♦ Preemption analysis and application to procedures and forms is a