Transforming the UL into a Big
Data University
Current status and
planned evolutions
Sébastien Varrette, PhD Prof. Pascal Bouvry
Prof. Volker Müller
December 6th, 2013
■
Classical Storage metrics
✓ Storage capacity: multiple of bytes (TB:109 - TiB=10243)
✓ Transfer rate on a medium: Mb/s or MB/s
✓ Other metrics: Sequential vs Random R/W speed, IOPS
Preamble
2
■
Classical Storage metrics
✓ Storage capacity: multiple of bytes (TB:109 - TiB=10243)
✓ Transfer rate on a medium: Mb/s or MB/s
✓ Other metrics: Sequential vs Random R/W speed, IOPS
■
The Big Data Challenge : 4 V’s
✓ [ Volume | Velocity | Variety | Veracity ]
‣ Also relevant for Luxembourg’s research priorities
‣ Large number of diverse data sources to integrate
‣Not just about storage capacity!
In this talk: Storage infrastructure @ UL
■
Classical Storage metrics
✓ Storage capacity: multiple of bytes (TB:109 - TiB=10243)
✓ Transfer rate on a medium: Mb/s or MB/s
✓ Other metrics: Sequential vs Random R/W speed, IOPS
■
The Big Data Challenge : 4 V’s
✓ [ Volume | Velocity | Variety | Veracity ]
‣ Also relevant for Luxembourg’s research priorities
‣ Large number of diverse data sources to integrate
‣Not just about storage capacity!
Preamble
2
Storage Levels
4 CPU Registers L1 -C a c h e register reference L1-cache (SRAM) reference L2 -C a c h e L3 -C a c h e Memory L2-cache (SRAM) reference L3-cache (DRAM) reference Memory (DRAM)reference Disk memory reference Memory Bus I/O Bus
Larger, slower and cheaper
Size: Speed:
500 bytes 64 KB to 8 MB 1 GB 1 TB
sub ns 1-2 cycles 10 cycles 20 cycles hundreds cycles ten of thousands cycles
Level: 1 2 3 4
■
HDD
: (SATA @ 7,2 krpm) R/W: 100 MB/s; 190 IOps
■
SDD
: R/W: 560 MB/s; 85000 IOps
HDD vs. SSD Performances
■
HDD
: 150
€
Interconnect
■
Latency
✓ time to send a minimal (0 byte) message from A to B
■
Bandwidth
✓ max amount of data communicated per unit of time
6
Overview of the Main HPC Components
HPC Components: Interconnect
latency
: time to send a minimal (0 byte) message from A to B
bandwidth
: max amount of data communicated per unit of time
Technology E↵ective Bandwidth Latency
Gigabit Ethernet 1 Gb/s 125 MB/s 40µs to 300µs Myrinet (Myri-10G) 9.6 Gb/s 1.2 GB/s 2.3µs 10 Gigabit Ethernet 10 Gb/s 1.25 GB/s 4µs to 5µs Infiniband QDR 40 Gb/s 5 GB/s 1.29µs to 2.6µs SGI NUMAlink 60 Gb/s 7.5 GB/s 1µs 11 / 46
S. Varrette, PhD. (UL) HPC platforms @ UL
N vendredi 13 décembre 13
Data Management
■
Storage architectural classes & I/O layers
D A S SATA SAS Fiber Channel DAS Interface N A S File System SATA SAS Fiber Channel Fiber Channel Ethernet/ Network NAS Interface S A N SATA SAS Fiber Channel Fiber Channel Ethernet/ Network SAN Interface Application NFS CIFS AFP ... Network iSCSI ... Network SATA SAS FC ...
Data Management / HW Protection
■
RAID standard levels
8
■
RAID combined levels
Data Management / File System (FS)
■
Logical manner to store, manipulate and access data
■
Disk file systems
Data Management / File System (FS)
■
Logical manner to store, manipulate and access data
■
Disk file systems
✓ FAT32, NTFS, HFS, ext3, ext4, xfs...
■
Network file systems
✓ NFS, SMB
9
Data Management / File System (FS)
■
Logical manner to store, manipulate and access data
■
Disk file systems
✓ FAT32, NTFS, HFS, ext3, ext4, xfs...
■
Network file systems
✓ NFS, SMB
■
Distributed and/or Parallel file systems
✓ data are stripped over multiple servers for high performance
✓ generally add robust failover and recovery mechanisms
‣ Lustre, GPFS, FhGFS, GlusterFS
Storage HW Components Hosting
■
High density disk enclosures
✓ includes [redundant] HW RAID controllers
✓ RAID Controller card performances differs!
‣ Basic (low cost): 300 MB/s;
‣ Advanced (expansive): 1,5 GB/s
✓ Typical encl. sizing: 4U / 48 to 60 disks of 4TB
10
Storage HW Components Hosting
■
High density disk enclosures
✓ includes [redundant] HW RAID controllers
✓ RAID Controller card performances differs!
‣ Basic (low cost): 300 MB/s;
‣ Advanced (expansive): 1,5 GB/s
✓ Typical encl. sizing: 4U / 48 to 60 disks of 4TB
■
Storage racks
: 42U capacity, 15 kW
‣ HPC rack: 30-40 kW
Storage HW Components Hosting
■
High density disk enclosures
✓ includes [redundant] HW RAID controllers
✓ RAID Controller card performances differs!
‣ Basic (low cost): 300 MB/s;
‣ Advanced (expansive): 1,5 GB/s
✓ Typical encl. sizing: 4U / 48 to 60 disks of 4TB
■
Storage racks
: 42U capacity, 15 kW
‣ HPC rack: 30-40 kW
‣ Interconnect rack: 6-8 kW
■
Server rooms
✓ Power (UPS, battery), Cooling, Fire protection...
10
Data storage @ UL - SIU
■
Central storage service operated by the SIU for > 5Y
✓ on central file server - capacity ∼ 30TB
‣ including backup / archiving for all university users
‣ archived data versions takes about 180 TB (uncompressed).
✓ originally developed to secure “important user data”
■
Current Situation (Dec. 2013)
✓ Central Administration: 3 TB
✓ User Data: 20 TB
✓ Research Data : 196 TB
12
Data storage @ UL - SIU
■
Central storage service operated by the SIU for > 5Y
✓ on central file server - capacity ∼ 30TB
‣ including backup / archiving for all university users
‣ archived data versions takes about 180 TB (uncompressed).
✓ originally developed to secure “important user data”
■
Current Situation (Dec. 2013)
✓ Central Administration: 3 TB
✓ User Data: 20 TB
✓ Research Data : 196 TB
?
Data storage @ UL - SIU
■
Central storage service operated by the SIU for > 5Y
✓ on central file server - capacity ∼ 30TB
‣ including backup / archiving for all university users
‣ archived data versions takes about 180 TB (uncompressed).
✓ originally developed to secure “important user data”
■
Current Situation (Dec. 2013)
✓ Central Administration: 3 TB
✓ User Data: 20 TB
✓ Research Data : 196 TB
12
undersized for now 2Y
UL HPC Platform
http://hpc.uni.lu■
2 geographical sites
■
3 server rooms
■
3 admins (+ 1.5 in 2014)
■
4 clusters:
✓ 387 nodes, 4110 cores (43.21 TFlops)
✓ 1208.4 TB (raw storage, incl. backup)
‣ NFS + Lustre
■
>
5.7 M
€
Hardware investment so far
■
Open-Source software stack
✓ Debian, SSH, OpenLDAP, Puppet, FAI...
14
Example: the gaia Cluster
16
Lustre Storage
Gaia cluster characteristics
- Computing: 250 nodes, 2408 cores; Rpeak ≈ 21,62 TFlops - Storage: 240 TB (NFS) + 576TB (NFS backup) + 240 TB (Lustre) Kirchberg (chaos cluster) Cisco Nexus C5010 10GbE Bull R423 (2U)
(2*4c Intel Xeon L5620 @ 2,26 GHz), RAM: 16GB Gaia cluster access
Uni.lu 10 GbE IB 10 GbE 10 GbE 1 GbE Bull R423 (2U)
(2*4c Intel Xeon L5630@2,13 GHz), RAM: 24GB
NFS server
Nexsan E60 + E60X (240 TB)
120 disks (2 TB SATA 7.2krpm) = 240 TB (raw) Multipathing over 2+2 controllers (Cache mirroring)
12 RAID6 LUNs (8+2 disks) = 192 TB (lvm + xfs)
FC8
FC8
Nexsan E60 (4U, 12 TB)
20 disks (600 GB SAS 15krpm) Multipathing over 2 controllers
(Cache mirroring)
2 RAID1 LUNs (10 disks)
6 TB (lvm + lustre)
Bull R423 (2U)
(2*4c Intel Xeon L5630@2,13 GHz), RAM: 96GB
MDS1
MDS2
Bull R423 (2U)
(2*4c Intel Xeon L5630@2,13 GHz), RAM: 96GB
FC8
FC8
FC8
FC8
Bull R423 (2U)
(2*4c Intel Xeon L5630@2,13 GHz), RAM: 48GB
OSS1
2*Nexsan E60 (2*4U, 2*120 TB)
2*60 disks (2 TB SATA 7.2krpm) = 240 TB (raw) 2*Multipathing over 2 controllers (Cache mirroring)
2*6 RAID6 LUNs (8+2 disks) = 2*96 TB (lvm + lustre)
Bull R423 (2U)
(2*4c Intel Xeon L5630@2,13 GHz), RAM: 48GB
OSS2 FC8 FC8 FC8 10 GbE IB Adminfront Bull R423 (2U)
(2*4c Intel Xeon L5620 @ 2,26 GHz), RAM: 16GB
Bull R423 (2U)
(2*4c Intel Xeon L5620 @ 2,26 GHz), RAM: 16GB Columbus server IB Gaia cluster Uni.lu (Belval) Infiniband QDR 40 Gb/s (Fat tree) LCSB Belval Computing nodes
1x BullX BCS enclosure (6U) 4 BullX S6030 [160 cores] (16*10c Intel Xeon E7-4850@2GHz), RAM: 1TB
2x Viridis enclosure (4U) 96 ultra low-power SoC [384 cores]
(1*4c ARM Cortex A9@1.1GHz), RAM: 4GB
1x Dell R820 (4U) [32 cores] (4*8c Intel Xeon E5-4640@2.4GHz), RAM: 1TB
5x Bullx B enclosure (35U) 60 BullX B500 [720 cores] (2*6c Intel Xeon L5640@2.26GHz), RAM: 24GB
12 BullX B506 [144 cores] (2*6c Intel Xeon L5640@2.26GHz), RAM: 24GB
20 GPGPU Accelerator [12032 GPU cores] 4 Nvidia Tesla M2070 [448c] 20 Nvidia Tesla M2090 [512c] 12032 GPU cores 12032 GPU cores 12032 GPU cores vendredi 13 décembre 13
UL HPC: HW Investments
■
Cumulative Investment:
5
749
432
€
25#772#€# 119#274#€# 93#187#€# 1039#410#€# 413#482#€# 2249#294#€# 980#834#€# 828#178#€# #0#€# 500#000#€# 1000#000#€# 1500#000#€# 2000#000#€# 2500#000#€# 2006# 2007# 2008# 2009# 2010# 2011# 2012# 2013# UL#HPC:#Total#Yearly#HW#Investment#(VAT#including)# Other# Power#Supply# So:ware# Interconnect# Servers# Storage# CompuDng#Nodes# Server#room(s)#/#Racks#UL HPC: HW Investments (VAT incl.)*
* excluding server rooms
18 !0,00!€! 200!000,00!€! 400!000,00!€! 600!000,00!€! 800!000,00!€! 1000!000,00!€! 1200!000,00!€! 2006! 2007! 2008! 2009! 2010! 2011! 2012! 2013! UL#HPC:#Total#HW#Investment#(VAT#incl.)#excl.#server#rooms# Other! Power!Supply! So:ware! Interconnect! Servers! Storage! CompuDng!Nodes! vendredi 13 décembre 13
UL HPC: Power and cooling
229# 0,45# 12,18# 27,67# 29,17# 42,44# 89,35# 130,63# 179,41# 0# 50# 100# 150# 200# 250# 2006# 2007# 2008# 2009# 2010# 2011# 2012# 2013# Se rv er %Ro om %C ap ac ity %[K W ]% UL%HPC:%Power%&%Cooling%Usage%in%UL%server%rooms% Max#Availaible#Power#Capacity#[kW]# Max#Used#Capacity#[kW]#UL HPC Storage
20 0" 6,6" 6,6" 6,6" 32,4" 512,4" 1052,4" 1208,4" 220" 576" 0" 200" 400" 600" 800" 1000" 1200" 1400" 2006" 2007" 2008" 2009" 2010" 2011" 2012" 2013" Ra w $C ap ac ity $[T B] $ UL$HPC$Storage$[TB]$ Nyx" G5K" Gaia" Chaos" TOTAL" Backup" vendredi 13 décembre 13UL HPC Storage: 1208 TB (raw.) in 2013
■
NFS-based storage
[392 TB]
✓ home / work directories directories
■
Lustre-based storage
[240 TB]
✓ SCRATCH
■
Backup devices
[576 TB]
■
IOZone based - increasing #nodes
■
Lustre:
■
NFS (read)
UL HPC Storage Benchmarking
22 0 500 1000 1500 2000 2500 3000 0 5 10 15 20 25 30I/O bandwidth (MiB/s) Lustre, write, filesize 20GLustre, read, filesize 20G
0 0.5 1 1.5 2 0 500 1000 1500 2000 2500 3000 3500 I/O bandwidth (GB/s) NFS, read, filesize 20G vendredi 13 décembre 13
UL HPC and the Grande Region
The UL HPC platform
HPC in the Grande region and Around
TFlops TB FTEs
Country Name/Institute #Cores Rpeak Storage Manpower
Luxembourg UL 4110 43.213 1208.4 3
CRP GL 800 6.21 144 1.5
France
TGCC Curie, CEA 77184 1667.2 5000 n/a
LORIA, Nancy 3724 29.79 82 5.05
ROMEO, UCR, Reims 564 4.128 15 2
Germany
Juqueen, Juelich 393216 5033.2 448 n/a
MPI, RZG 2556 14.1 n/a 5
URZ, (bwGrid),Heidelberg 1140 10.125 32 9
Belgium UGent, VCS 4320 54.541 82 n/a
CECI, UMons/UCL 2576 25.108 156 > 4
UK Darwin, Cambridge Univ 9728 202.3 20 n/a
Legion, UCLondon 5632 45.056 192 6
Big Data @ UL: planned evolutions
Cloud Storage Analysis
■
Study initiated by LCSB (Credits: Bob Pepin)
✓ Transfer rate / time should not be under-estimated
■
Legal aspects of data externalization on Cloud?
Requirement [TB] Yearly Cost [K€] Total Cost [K€]
2013 2014 2015 2016 2017 2018 200 400 183 183 600 302 486 800 405 890 1000 507 1,397 1200 609 2,006
Big-Data @ UL Milestones
■
Early 2014: RFP for the acquisition of Big UL NAS
✓ Scalable NAS with initial effective capacity > 1PB
✓ Storage / backups of all devices (user + research)
✓ High-performant interface with HPC and UL IT
‣ mounting on computing nodes via IB QDR interconnect
‣ mounting on desktop / workstation via [new] UL network
■
Continuous increase of HPC storage capacity
✓ SCRATCH (Lustre based) - 100K€ / + 180TB / + 3 GB/s
26
Big-Data @ UL Milestones
■
UL moves to Belval!
✓ 2016: CDC (Centre de calcul) in Belval
‣ 3 storage server rooms (68 racks @ 15 kW)
‣ 2 HPC server rooms (52 racks @ 40kW)
✓ Note: 1 floor in addition for SIU + Partners
■
Phase 1 of UL internal cloud infrastructure
2014 Big-Data @ UL Milestones
28
Conclusion / Big-Data @ UL: A Necessity
■
Required to correctly handle research @ UL
✓ especially on life-science topics
■
Access/Synchronization tool
■
Backup/Archiving tool
■
Expected Budget asked (part of next 4Y Plan)
✓ 2014: BIG UL NAS [1 M€]
✓ 2015: Medium size tape library / Cloud connector [1.6 M€]
✓ 2016: Capacity ext. (Big UL NAS) by 1PB [800 K€]
✓ 2017: Capacity ext. (Tape / Archiving) [700 K€]