• No results found

Flexible Storage Allocation

N/A
N/A
Protected

Academic year: 2021

Share "Flexible Storage Allocation"

Copied!
55
0
0

Loading.... (view fulltext now)

Full text

(1)

Flexible Storage Allocation

A. L. Narasimha Reddy

Department of Electrical and Computer Engineering Texas A & M University

Students: Sukwoo Kang (now at IBM Almaden) John Garrison

(2)

Outline



Big Picture



Part I: Flexible Storage Allocation

– Introduction and Motivation

– Design of Virtual Allocation – Evaluation



Part II: Data Distribution in Networked Storage Systems

Part II: Data Distribution in Networked Storage Systems –– Introduction and MotivationIntroduction and Motivation

–– Design of User-Optimal Data MigrationDesign of User-Optimal Data Migration –– EvaluationEvaluation



Part III: Storage Management across diverse devices

Part III: Storage Management across diverse devices



Conclusion

(3)

Storage Allocation



Allocate entire storage space at the time of the file system creation



Storage space owned by one operating system cannot be used by another

30 GB

50 GB Windows NT

(NTFS)

Linux (ext2)

70 GB

50 GB

98 GB AIX (JFS)

Running out of space!

Actual

Allocations

(4)

Big Picture



Memory systems employ virtual memory for several reasons



Current storage systems lack such flexibility



Current file systems allocate storage statically at the time of their creation

– Storage allocation: Space on the disk is not allocated well across multiple file systems

(5)

File Systems with Virtual Allocation



When a file system is created with X GB,

Allows the file system to be created with only Y GB, where Y << X – Remaining space used as one common available pool

– As the file system grows, the storage space can be allocated on demand

30 GB

50 GB Windows NT

(NTFS)

Linux (ext2)

98 GB AIX (JFS)

10 GB

10 GB Actual

Allocations

60 GB 40 GB

100 GB Common Storage Pool

(6)

Our Approach to Design

Physical Disk Physical block address



Employ Allocate-on-write policy

– Storage space is allocated when the data is written

– Writes all data to disk sequentially based on the time at which data is written to the device

– Once data is written, data can be accessed from the same location, i.e., data is updated in-place

(7)

Allocate-on-write Policy

Physical Disk Write at t = t’

Extent



Storage space is allocated by the unit of the extent when the data is written



Extent is a group of file system blocks

– Fixed size

– Retain more spatial locality

– Reduce information that must be maintained

(8)

Allocate-on-write Policy

Physical Disk Extent

0

Extent 1 Write at t = t’

Write at t = t’’ (where t’’ > t’)



Data is written to disk sequentially based on write-time

– Further writes to the same data updated in-place

VA (Virtual Allocation) requires additional data structure

(9)

Block Map

Physical Disk Extent

0

Extent 1 Write at t = t’

Write at t = t’’ (where t’’ > t’) Extent

2

Block map



Block map keeps a mapping of logical storage locations and

real (physical) storage locations

(10)

VA Metadata

Physical Disk Extent

0

Extent 1

Extent 2

Block map

VA Meta

data

Hardening



This block map is maintained in memory and regularly written to

disk for hardening against system failures

(11)

On-disk Layout & Storage Expansion

Physical Disk

FS Meta

data

Extent 1

Extent 2

VA Meta

data

Extent 0

Virtual Disk Extent

3

Extent 4

Extent 5

Extent 6

Extent 7

Storage Expansion Threshold

Storage Expansion



When the capacity is exhausted or reaches storage expansion threshold, a physical disk can be expanded to other available storage resources

– File system unaware of the actual space allocation and expansion

(12)

Write Operation

Application Write Request

File System

Buffer/Page Cache Layer Page

Acknowledgement

Allocate new extent

and update mapping information Block I/O Layer (VA)

Search VA block map

Extent 3

FS

Meta Extent

1

Extent 2

VA

Meta Extent

Disk 0

Hardening

(13)

Read Operation

Application Read Request

File System

Buffer/Page Cache Layer

Block I/O Layer (VA)

Search VA block map

Extent 3

FS Meta

data

Extent 1

Extent 2

VA Meta

data

Extent

Disk 0

(14)

Allocate-on-write vs. Other Work



Key difference from log-structured file systems (LFS)

– Only allocation is done at the end of log

Updates are done in-place after allocation



LVM still ties up storage at the time of file system creation

(15)

Design Issues

 Extent-based Policy Example (with Ext2)

I (inode), B (data block), V (VA block map)

A  B (B is allocated to A)

 File system-based Policy Example (with Ext3 ordered mode)



VA Metadata Hardening (File System Integrity)

– Must keep certain update ordering of VA metadata and FS (meta)data

(16)

Design Issues (cont.)

 Extent Size

Larger extent size: Reduce block map size, retain more spatial locality, cause data fragmentation

 Reclaiming allocated storage space of deleted files

Needed to continue to provide the benefits of virtual allocation

Without reclamation, possible to turn virtual allocation into static allocation

 Interaction with RAID

RAID remaps blocks to physical devices to provide device characteristics VA remaps blocks for flexibility

Need to resolve performance impact of VA’s extent size and RAID’s chunk size

(17)

Spatial Locality Observations & Issues

 Metadata and data separation

 Data clustering: Reduce seek distance

 Multiple file systems

 Data placement policy

Allocate hot data in a high data region of disk Allocate hot data in the middle of the partition

(18)

Implementation & Experimental Setup



Virtual allocation prototype

– Kernel module for Linux 2.4.22

– Employ a hash table in memory for speeding up VA lookups



Setup

– A 3GHz Pentium 4 processor, 1GB main memory – Red Hat Linux 9 with a 2.4.22 kernel

– Ext2 file system and Ext3 file system



Workloads

– Bonnie++ (Large-file workload) – Postmark (Small-file workload) – TPC-C (Database workload)

(19)

VA Metadata Hardening

 Compare EXT2 and VA-EXT2-EX

 Compare EXT3 and VA-EXT3-EX, VA-EXT3-FS

(20)

Reclaiming Allocated Storage Space



Reclaim operation for deleted large files



How to keep track of deleted files?

– Employed stackable file system: Maintain duplicated block bitmap – Alternatively, could employ “Life or Death at Block-Level” (OSDI’04)

work

(21)

VA with RAID-5

 Large-file workload  Small-file workload

 Large-file workload with NVRAM

 Used Ext2 with software RAID-5 + VA

 NVRAM-X%: X% of total VA metadata size

(22)

Data Placement Policy (Postmark)

 VA NORMAL partition: Same data rate across a partition

 VA ZCAV partition: Hot data is placed in high data region of a partition

 VA-NORMAL: start allocation from the outer cylinders

 VA-MIDDLE: start allocation from the middle of a partition

(23)

Multiple File Systems

 VA-7GB: 2 x 3.5GB partition, 30% utilization

 VA-32GB: 2 x 16GB partition, 80% utilization

 Used Postmark

 VA-HALF: The 2nd file system is created after 40% of the 1st file system is written

 VA-FULL: 80%

(24)

Real-World Deployment of Virtual Allocation

Prototype built

(25)

VA in Networked Storage Environment



Flexible allocation provided by VA leads to

– Balancing locality vs. load balance issues

(26)

Part II: Data Distribution



Locality-based approach

– Use data migration (e.g. HP AutoRAID)

– Employ “hot” data migration from slower device (remote disk) to faster device (local disk)



Load balancing-based approach (Striping)

Hot data Cold data

(27)

User-Optimal Data Migration

data



Locality is exploited first

– Data is migrated from Disk B to Disk A



Load balancing is also considered

– If the load on Disk A is too high, data is migrated from Disk A to Disk B

(28)

Migration Decision Issues

data

 Where to migrate: Use I/O request response time

 When to migrate: Migration threshold

– Initiate migration from Disk A to Disk B only when

 How to migrate: Limit number of concurrent migrations (Migration token)



write writewrite read

(29)

Design Issues



Allocation policy

Striping with user-optimal migration: will improve data access localitySequential allocation with user-optimal migration: will improve load

balancing



Multi-user environment

– Each user migrates data in a user-selfish manner

– Migrations will tend to improve the performance of all users over longer periods of time

(30)

Evaluation



Implemented as a kernel block device driver



Evaluated it using SPECsfs benchmark



Configuration



SPECsfs Performance

Curve

Multi-User

(31)

Single-User Environment

 Striping with user-optimal migration

 Seq. allocation with user- optimal migration

 Configuration: (Allocation Policy)-(Migration Policy)

STR (Striping), SEQ (Seq. Alloc.), NOMIG (No migration), MIG (User-Optimal migration)

(32)

Single-User Environment (cont.)

 Comparison between migration systems

Migration based on locality: hot data (remotelocal), cold data (localremote)

(33)

Multi-User Environment - Striping



Server A: Load from 100 to 700



Server B: Load from 50 to 350

(34)

Multi-User Environment – Seq. Allocation



Server A: Load from 100 to 1100



Server B: Load from 30 to 480

(35)

Storage Management Across Diverse Devices



Flash storage becoming widely available

– More expensive than hard drives

– Faster random accesses – Low Power consumption



In Laptops now



In hybrid storage systems soon



Manage data across Different Devices

– Match application needs to device characteristics – Optimize for performance, power consumption

(36)

Motivation



VFS Allows many file systems underneath



VFS maintains 1 to 1 mapping from namespace to storage



Can we provide different storage options for different files for a single user?

– /user1/file1 storage system 1, /user2/file2  storage system 2…

(37)

Normal File System Architecture

Calc Impress Writer WinAmp

VFS

Ext2

/user1/file1 /user1/file2 /user2/file3 /user2/file4

/user1/*

User Space Kernel

FAT32 /user2/*

Magnetic Disk Flash Drive

(38)

Umbrella File System

Calc Impress Writer WinAmp

VFS

Ext2 /user1/file1 /user1/file2

User Space Kernel

Ext3 Ext2 FAT32

/FS1/user1/file3

/FS2/user1/file1 /FS2/user1/file2

/FS3/user1/file4

Encrypted Magnetic Disk

Magnetic Disk Flash Drive

UmbrellaFS

/user1/file3 /user1/file4

(39)

Example Data Organization

/usr/dir1/foo.avi /usr/dir1/foo.txt /usr/dir1/foo.jpg

/usr/dir1

/usr

/media/usr/dir1 /text/usr/dir1

/images/usr/dir1

/media/ usr /text/usr

/images/ usr

/media/usr/dir1/foo.avi /text/usr/dir1/foo.txt

/images/usr/dir1/foo.jpg

User View

Underlying data organization

(40)

Motivation --Policy Based Storage



User or System administrator Choice

– Allow different types of files on different devices – Reliability, performance, power consumption



Layered Architecture

– Leverage benefits of underlying file systems

– Map applications to file systems and underlying storage



Policy decisions can depend on namespace and metadata

– Example: Files not touched in a week  slow storage system

(41)

Rules Structure



Provided at mount time



User specified



Based on inode values (metadata) and filenames (namespace)



Provides array of branches

(42)

Umbrella File System



Sits under VFS to enforce policy



Policy enforced at open and close times



Policy also enforced periodically (less often)



UmbrellaFS acts as a “router” for files

– Not only based on namespace, but also metadata

(43)

Inode Rules Structure

Rule Inode/

Filename

Field Match Value Branch

1 Inode file permissions = Read Only /fs1, /fs2

2 Filename n/a n/a n/a n/a

3 Inode file creation time >= 8:00 am,

August 3rd, 2007

/fs2

4 Inode file length < 20 KB /fs3

(44)

Inode Rules



Provide in order of precedence



First match



Compare inode value to rule

– At file creation some inode values indeterminate – Pass over those rules

(45)

Filename Rules Structure

Rule Match String Branch

1 /*.avi /fs2,/fs1

2 /home/*.txt /fs1

3 /home/jgarrison/* /fs3

(46)

Filename Rules

 Once first filename rule triggered, all checked

 Similar to longest prefix matching

 Double index based on – Path matching

– Filename matching

 Example:

– Rules: /home/*/*.bar, /home/jgarrison/foo.bar – File: /home/jgarrison/foo.bar

– File matches second rule more closely (3 path length and 7

characters of file name vs. 3 path length and 4 characters of file name)

(47)

Evaluation



Overhead

– Throughput – CPU Limited – I/O Limited



Example Improvement

(48)

UmbrellaFS Overhead

Bonnie Read Overhead

0 5 10 15 20 25 30 35 40

Ext2 1 2 4 8 16 32

Rules

Throughput (MB/s)

Ext2

Inode Rules Filename Rules

(49)

CPU Limited Benchmarks

(50)

I/O Limited Benchmarks

(51)

Flash vs. RAID5 Read Performance

(52)

Flash vs. RAID5 Write Performance

Write Performance

0 10 20 30 40 50 60 70

1 10 100 1000 10000

File Size (kB)

Throughput (MB/s)

RAID 5 Flash SSD

(53)

Flash and Disk Hybrid System

(54)

Disks with Encryption hardware

Encryption Example

0 100 200 300 400 500 600 700 800

Partial Encryption Full Encryption

Time (s)

(55)

Conclusion



Virtual allocation allows Flexibility

– Improve the flexibility of managing storage across multiple file systems/platforms



Enabled user-optimal migration

– Balance disk access locality and load balance automatically and transparently

– Adapt to changes of workloads and loads in each storage device



Policy-based storage: Umbrella File System

– Allows matching application characteristics to devices

References

Related documents

Units with an overall length to thickness (aspect) ratio greater than 4 should not be used in vehicular applications. Those with aspect ratios between 3 and 4 may be used in areas

Unified storage is a single shared pool of storage that can combine both block and file data, eliminating the need to manage separate block and file systems.. Fewer storage

[r]

This section introduces the empirical models used to estimate the determi- nants of sovereign bond yields in the euro area over the period from January 2002 to May 2012. The

Thus, if national policies supporting internationally-oriented researchers increase the international visibility of Turkish SSH research, on the other hand they may

The overall goal of the research was to investigate the role of leadership in municipalities in the implementation of service delivery initiatives, with a

Experiments conducted on CASIA NIR database and PolyU- NIRFD database indicate that ZMs as a global feature extractor, UDWT as a local feature extractor and SRDA as a

[ 16 ] Our study aimed at addressing these two related issues by evaluating whether preoperative 25-OHD was a significant factor in postthyroidectomy hypocalcemia and examining