File Systems - Data Management Systems - The Sea of Stuff : a model to manage shared sutable da

2.2 Data Management Systems

2.2.1 File Systems

The operating system (OS) is a software program that manages the computer hardware as well as the software resources for the user applications. The majority of OS provide an overlapping set of services: process management and coordination, memory management, IO and storage management. The file system is a subsystem of the operating system that provides data management over secondary storage.28

2.2.1.1 Files and Directories

Files and directories are the two most important file system user-level abstractions, or metaphors.

Traditionally, the file is defined as “a collection of related information defined by its creator [...]” and “files represent programs (both source and object forms) and data” [31, p. 26]. Alternatively, the definition of a file corresponds to the definition of data, so that a file is simply a sequence of bits stored on some storage media. This second definition, however, can be misleading as it merges together the concept of file with the one of data, and it does not include the metadata information related to the file as well as the operations that can be carried out over a file.

The user-level metaphor of a file consists of data, attributes, and operations. The attributes define the metadata information used by the OS to describe the file and to access the actual data. The two main attributes of a file are the name and the identifier. The name is a human-readable string, known also as user-level textual name, which is of use only to users and/or applications. The identifier, instead, is a unique identifier, within the file system, for the file and is also known as system-level identifier. Section 2.1.2 discusses in more details the role of naming in storage systems, in particular the relationship between naming and location.

The other attributes of a file are, but are not limited to: type, location29, size, pro-

Secondary storage is non-volatile memory, not directly accessible by the CPU, which usually stores large amount of data compared to primary storage, at the cost of slower IO.

The location attribute contains information about the storage device where the file is stored and the location within that device.

(a) Terminal (iTerm2) file explorer vials.

(b) MacOS (High Sierra, v. 10.13.3) Finder in columns view.

(v. 7.4 with GNOME 3.22.2) in icons view. (d) Windows 10 file explorer in icons view. Figure 2.3: Example of some typical file explorers for different operating systems.

tection, and statistical information (e.g., time of last access or time of last modification). These attributes, such as the location, size and protection are necessary for the operating system to actually access the data and perform various file operations. The most important file operations are: file creation, data writing and reading of a file, data search (seek), file deletion, and file truncation.

The other user-level abstraction, common to many file systems, is thedirectoryorfolder, which is used to impose a logical organisational structure for files and other directories in a file system. A Unix-like directory is simply a file containing a table of symbols that translates human-readable file names to the corresponding file identifiers or other directory identifiers. Figure 2.3 shows some examples of files and directories as seen by users via file managers (or file browsers) or a terminal window.

2.2. Data Management Systems

2.2.1.2 File System Types

File systems can be classified in different ways. From a user perspective, files systems can be classified by how files and directories are organised. From an implementation perspective one might look at how the file system is actually implemented.

The most common organisation for files and directories, from a user-perspective, is the hierarchical one (see Figure 2.4). In a hierarchical file system, entities are linked through a path.

How Files are Organised

• Flat. In this type of file system all files reside at the same ‘level’ and no concept of directory exists. All files have to be named uniquely. The implementation of a flat file system is simple and it can be useful within embedded systems.

• Hierarchical. In this type of file system files are organised over multiple ‘levels’ through the use of directories. The hierarchical organisation of files and directories often resembles a tree30, but this is not always the case. For example, any file system

implementing the VFS (Virtual File System) can, and should, support symbolic links which can break the acyclic property of trees. Figure 2.4 shows an example of a hierarchical file system. This is the most common type of file system.

bin dev etc ... tmp usr

bash cat cp ... passwd bin local lib ... Cellar man git ... Figure 2.4: Example of a hierarchical file system structure.

How Files and Directories are Implemented

• Block-based. Data is stored in blocks which are grouped together by meta-structures, like the inode in Unix systems. Block-based file systems are the focus of the rest of this section.

• Database-based. A database is used to store files, their attributes, and how they are organised. A database-based file system allows richer queries than traditional file systems. The WinFS [76] and Oracle Database File System (DBFS)31 are two

examples of database-based file systems.

• Log-Structured. Data and its attributes are written sequentially to a log, which is implemented as a circular buffer. Log-structured file systems provide good write performance on sequential access storage, have easy-to-implement snapshotting32,

and support easy recovery from failures.

• Object-based. The file system uses an object storage device or an object storage service to store its data and metadata. Section 2.2.6 discusses general object storage in more details.

2.2.1.3 The Virtual File System

Unix systems provide a common abstraction for the supported file systems via the virtual file system layer (VFS) [77, 78]. The VFS provides two main functions:

• It abstracts different file system implementations by providing a common coherent interface for the OS.

• It allows files to be uniquely identified across the file system implementations (this feature is particularly important for networked file systems).

31_{Introducing the Database File System.} _Oracle. _{https://docs.oracle.com/database/121/ADLOB/}

adlob_fs.htm[last accessed on 28/11/2017].

2.2. Data Management Systems

file system interface

VFS interface

local file system

type A local file systemtype B remote file systemtype A

disk disk

network

Figure 2.5: Schematic showing the interaction between the virtual file system and the actual file systems. Illustration derived from [31, p. 469]

The diagram shown in Figure 2.5 illustrates how the file-system interface (which enables file operations via system calls such as open(),close(), read(), andwrite()) interacts with a local file system or a remote file system via the VFS. From the perspective of the filesystem interface there is no difference between a local or a remote file system, or between two completely different local file system implementations.

The Windows operating systems abstract multiple file system implementations via the Installable File System (IFS). This thesis does not cover the implementation details of the IFS, but it should be noted that the IFS provides similar functionalities to the Unix VFS.

2.2.1.4 The File-Control Block

The file-control block (FCB) is the data structure of the file system that contains information about the file [31]. A typical FCB contains the following information:

• File unique identifier. • File permissions.

• Created time, last accessed time and last modified time. • Owner, group and access control list (ACL).

• File size.

• Data blocks and/or pointers to data blocks.

In the Unix operating systems the FCB is called inode. The inode represents the building block for file systems of the Unix family. Each inode is uniquely identified by its id (i ino) within the file system and there is exactly one inode for each file of the file system. Code 2.3 shows a simplified inode data structure (see Figure 2.6 for a graphical representation of an inode).33 s t r u c t i n o d e { umode t i m o d e ; u n s i g n e d l o n g i i n o ; // i d e n t i f i e s t h e i n o d e k u i d t i u i d ; // u s e r i d k g i d t i g i d ; // group i d l o f f t i s i z e ; s t r u c t t i m e s p e c i a t i m e ; s t r u c t t i m e s p e c i m t i m e ; s t r u c t t i m e s p e c i c t i m e ; c o n s t s t r u c t i n o d e o p e r a t i o n s ∗i o p ; b l k c n t t i b l o c k s ; }

Code 2.3: Simplified inode struct. An example of a full inode struct can be found in the header at the pathinclude/linux/fs.h of the Linux kernel.

2.2.1.5 Symbolic and Hard Links

In most modern file systems, the acyclic property of a hierarchical file system is broken by links. Links are “shortcuts” that are used to give the user (or an application) the impression that a file exists in multiple locations. There are two types of links: hard

The source code for the Linux kernel is available at the following link: https://github.com/torvalds/ linux[last accessed on 16/11/2017].

2.2. Data Management Systems

mode node id uid gid size access _Date

Data Block Pointers Level of Indirection Data Block pointers modify

Date create Date countblock inode ops ...

0 0 0 0 0 0 0 0 0 0 0 0 1 2 3

Indirection

Block Indirection Block Indirection Block Indirection

Block Indirection Block Indirection Block Data Block Data Block Data Block Figure 2.6: Schematic of the inode data structure.

links and symbolic links (sometimes referred as symlinks or soft links). The term hard link is used to refer to the linking between a file name in a directory and an inode. In a file system each file has at least one hard link, but multiple and independent hard links are also allowed. Hard links are supported by Unix-like OS as well as the Windows NTFS [79], but not by Windows file systems FAT [80] or ReFS [81]. A symbolic link, instead, is a name pointing to an inode, which itself points to another inode.

The advantages of using hard links is that no difference exists between two hard links pointing to the same inode, so for example if one of the hard links is deleted the other is still valid. If it were a symbolic link, instead, this becomes invalid as it would point to a no-longer existing entity of the file system. Hard links must exist only within the same device and cannot be used for directories, while symbolic links can. In NTFS, links to directories exist, even across different volumes, and are called junction points[82].

2.2.1.6 Application Layer File Systems

The file systems that have been seen so far are implemented at the kernel layer, which is a hard task. To avoid writing file systems at the kernel level, system designers can choose to implement file systems in the user space with libraries such as FUSE (Filesys- tem in Userspace)34 for Unix systems or Dokany (previously known as Dokan)35, WinFsp

(Windows File System Proxy)36, and Eldos CBFS (Callback File System)37 for Windows.

Within Unix systems, file systems built in the user space are known as user space file systems, while in Windows systems they are sometimes calleduser mode file systems, but for the purpose of this work we introduce a more generic term: application layer file systems.

In this section only the FUSE library is going to be taken into account because of its maturity and popularity. Other libraries provide similar properties and functionalities for Unix or Windows operating systems, or both.

FUSE

The FUSE library consists of three main components: a kernel module (fuse.ko), a user space library (libfuse.*) and a mount utility (fusermount). The mount utility allows FUSE-based file systems to be mounted and unmounted.38 The kernel module mimics a

kernel file system and interacts with the VFS. The FUSE kernel module accesses the file system implementation through the user space library, which in turns communicates with the user defined file system. The diagram in Figure 2.7 shows the interactions between the FUSE components and the OS in user space and the kernel space.

libfuse. https://github.com/libfuse/libfuse[last accessed on 17/11/2017].

35_Dokany. _{https://github.com/dokan-dev/dokany}_{[accessed on 24/11/2017].} 36

WinFsp. http://secfs.net/winfsp/[last accessed on 24/11/2017].

37_{Eldos Callback File System.} _{https://sbb.eldos.com/cbfs/}_{[last accessed on 24/11/2017].} 38

Mounting is the process of attaching a file system to a directory (usually under the root directory/, or under the directories/Volumes or/mnt) and make it available to the system.

2.2. Data Management Systems User Space Kernel Space VFS FUSE NFS Ext3 ... glibc <cmd> /tmp/fuse ./prog /tmp/fuse glibc libfuse

Figure 2.7: Schematics of the FUSE library and its interaction with the VFS through the operating system user space. Illustration derived from [83].

Examples of file systems built with FUSE are: • GmailFS39, which uses Gmail to store data.

• MinFS40, which abstracts the Amazon S3 object store as a file system.

• SSHFS (SSH Filesystem)41, which provides access to a remote file system via SSH. • WikipediaFS42, which allows access to Wikipedia as a file system.

• davfs243, which uses FUSE and a network library to provide resources on a WebDAV server as a file system.

In document The Sea of Stuff : a model to manage shared sutable data in a distributed environment (Page 52-60)