Introducing VA Technologies
•UK Based System Integrator
•Specialising in High Performance ZFS Storage •Partner of E4 Computing Engineering
VA HPC Powered by E4
•New HPC Solutions
•ARKA ARM and Quadro + Tegra 3 Blades •Joint New Solutions
But First. ZFS
•6 Great Reasons to Love ZFS •Architecture
•Data Integrity •Redundancy
•Transactional Copy on Write (COW) •Snapshots
Architecture
•6 Great Reasons to Love ZFS •Architecture
•Data Integrity •Redundancy
•Transactional Copy on Write (COW) •Snapshots
Architecture ‐ ZFS Layer View
e.g. pNFS
raw swapdumpiSCSI ?? ZFS NFS CIFS ??
?? ZFS POSIX Layer (ZPL)
ZFS Volume Emulator (Zvol)
Transactional Object Layer Pooled Storage Layer
Block Device Driver
Data Integrity
•5 Great Reasons to Love ZFS •Architecture
•Data Integrity •Redundancy
•Transactional Copy on Write (COW) •Snapshots
Data Integrity – Designers Quote
“The job of any file system boils down to this: when asked to read a block, it should return the same data that was previously written to that block. If it can't do that -- because the disk is offline or the data has been damaged or tampered with -- it should detect this and return an error.”
Redundancy
•6 Great Reasons to Love ZFS •Architecture
•Data Integrity •Redundancy
•Transactional Copy on Write (COW) •Snapshots
Redundancy – Mirrored Disks in 2 vdevs
Logical vdevs
Physical or leaf vdevs
Redundancy – RAIDz2 in 3 vdevs
Physical or leaf vdevs
Redundancy – Dynamic Striping
Total write size = 2816 kBytes
RAID-0 Column size = 128 kBytes, stripe width = 384 kBytes
ZFS Dynamic Stripe recordsize = 128 kBytes 384 kBytes
Transactional Copy on Write (COW)
•6 Great Reasons to Love ZFS •Architecture
•Data Integrity •Redundancy
•Transactional Copy on Write (COW) •Snapshots
Transactional Copy on Write (COW)
Snapshots
•6 Great Reasons to Love ZFS •Architecture
•Data Integrity •Redundancy
•Transactional Copy on Write (COW) •Snapshots
Hybrid Storage Pools Mixing SSD with HDD
•6 Great Reasons to Love ZFS •Architecture
•Data Integrity •Redundancy
•Transactional Copy on Write (COW) •Snapshots
Hybrid Storage Pools Mixing SSD with HDD
•New •Old
Hybrid Storage Pools Mixing SSD with HDD
HDD HDD
HDD Read optimized
device (SSD) Adaptive Replacement Cache (ARC)
Main PoolMain Pool Level 2 ARC (L2ARC)
Write optimized device (SSD)
Hybrid Storage Pools – ARC
Frequent Cache
MFU
LFU
Evict the oldest multiple accessed entry Recent
Cache LRU
MRU
Evict the oldest single-use entry
Hit Miss
ZFS Limitations
Parallel Access? Distributed File System?
Hadoop on ZFS
•Rebuild Times
•Utilize SSD’s with HDD •Better Administration
Linux and Lustre on ZFS
Linux and Lustre on ZFS
55PB – Now Active to Sequoia Users Lustre + ZFS Fully Configured
Lustre on ZFS – Write Performance
8 16 24 32 40 48 56 64 72 80 88 96 104 0 200 400 600 800 1000 1200 1400 1600Single shared file IOR (10G block, 1M transfers)
M
B
/s
Sequoia Workload
• 768 OSS Nodes
• 2048 Tasks per OSS
• 1,572,864 Compute Cores
LDISKFS
• Increased tasks per OSS degrades performance ZFS - Constant performance
• Increase I/O size for RAIDZ2
LDISKFS+RAI D6
Lustre on ZFS – Read Performance
LDISKFS
● mballocallowslargerI/O ZFS
● 128Kmaximumblocksize ● IOPs limited for ZFS+RAID6
Perfect Opportunity for Read Caching
Lustre on ZFS – Coming Soon
New HardwareOptimised for Lustre on ZFS Low Power Consumption OSS & OST Customisable for your Lustre deployment
Thanks Very Much!
Ryan Tyler VA Technologies