Archiving and Managing Remote
Sensing Data Using State of the
Art Storage Technologies
Ms B Lakshmi
C Chandrasekhar Reddy
SVSRK Kishore
SDAPSA, NRSC, HyderabadNRSC Functions
Remote Sensing Data
– Acquisition
– Processing
– Dissemination
– Analysis
IT Infrastructure Consolidation
• Storage Consolidation and virtualization
• Network Consolidation
• Servers consolidation
• Security
Considerations for
Storage Consolidation
Types of storage use:
– For online processing or archival
– High performance vs. cost effectiveness
– Write once/ read many or continuous read/ write – Frequently accessed or infrequently accessed – File data or RDBMS data
– Heterogeneous platform access
– Multiple users or single user/ limited users
– Data acquisition processes and data processing chains require high performance storage suitable for online processing, etc.
– Archival data is mainly write-once read-many
– The past one year’ archival data may be accessed more frequently than earlier years’ data.
Hence the need for multi-tiered and network consolidated storage
ISTRAC 2 Mbps Data Electronics DP servers(11) DQE Server Version Control Server EMC (2) Value Add’n Server (2) Servers ( 3 ) Station Automation Systems
Data Exchange Gateway
Test & Dev Systems (4 4 ) Data Reception Facility ( 4 Antenna System )
Storage Area Network
DI Servers ( 4 ) PQC Server Meta Data Servers ( 4 ) OCM NAS Router - SAN Storage 25 Mbps Ethernet SAC (2), ISAC, – 2 Mbps WFM Server ( 3 ) Gigabit Ethernet Gigabit Ethernet Work Stations ( 41 ) FTP Server Media Gen. NMS Mgmt Consoles Mgmt Consoles RISAT-2 Chain O2/C2 DP servers (5) ADP NRSC Balanagar (20 Mbps) Firewall
IMGEOS Architecture
NRSC Balanagar (20 Mbps) Router
Storage Tiering
• Workspace for data processing chain • Recent 3 months archival data
• Reference data • RDBMS data Tier-1 High performance storage (~100TB)
Tier-2 less expensive storage (~400TB)
• Pre-existing archival data
• Past years’ ( 4-15 months) archival data
Data processing servers TIERED STORAGE Other servers & subsystems
Shared File System implementation to facilitate efficient file
sharing
Tier-3 Tape Library
Storage Sizing
•
Provide high speed online 75 TB of FC disk storage
for all satellites data ( latest three months data) to
facilitate faster generation of satellite data products
•
Provide about 10 TB of high-speed disk space for
data processing systems for generation of products.
•
Provide about 10 TB of high-speed disk space for
Storage Sizing
Contd
.
• Provide cost effective online disk storage (400 TB)
for all recently acquired satellites data (Latest 4-15
months data), to facilitate generation of satellite data
products.
• Two Copies of online Tape Archival for all the
Satellites data acquired
• Two copies of data – Backup (Local Site) and DR
(Remote Site)
• The data available till 2011 was about 900 TB in the
form of off-line DLTs
Data acquisition Infrastructure
Terminal 1 Terminal 2 Terminal 3
Data acquisition electronics & switching matrix
Acquisition Server 1 Acquisition Server 2 Acquisition Server 3 Acquisition Server 4
Network Storage System
( for archival, data processing etc.)
Performance :
4/8Gbps (switched ) Acquisition rate : 2 x 320 Mbps
Tape Library Cluster 01 Cluster 02 Linux Servers Windows workstations FC Storage 75 TB
IMGEOS Storage Architecture
SATA Storage 200 TB SATA Storage 200 TB Metadata Servers High available SAN switches High available Gigabit switches
Multi-Tiered Storage
Products
• FC Storage (EMC Symmetrix) : 100 TB (Tier-1) • SATA Storage (EMC Clarion) : 400 TB (Tier-2) • Online Tape Storage (SL 8500) : 1.5 PB x 2 (Tier-3) • Vaulted Tape Capacity : 1.5 PB
• Metadata Servers : DELL Servers
• SAN File System : Quantum Stornext • HSM/ILM Software : Quantum Stornext
IMGEOS Storage
FC storage consists of 295 x 450GB 15k RPM disks
Provides 75TB useable capacity after RAID
overheads
Two SATA storage systems, each consisting of 265 x
1TB SATA disks, offers 400TB useable capacity
Tape Library with 4000 slots x 1.5 TB, provides 6 PB
The storage systems are connected to the servers
IMGEOS Storage
A three Tiered Storage
• FC Storage (Tier 1) for first 3 months
• SATA Storage (Tier 2) for the next 15 months
• Tape Storage (Tier 3) for Archival of the Data
Data Movement across the tiers is using Hierarchial
Storage Management (HSM) feature of Stornext
SAN File System
Storage Systems are virtualized using Stornext
SAN File System into 9 storage partitions and
presented to servers
Network Connectivity
• Facilitate high speed (FC 4Gbps) data transfer between storage, data-acquisition and processing systems - SAN
• Facilitate Data Transfer (1Gbps) for all SAN clients Metadata – Meta Data Network
• Facilitate data transfer (1Gbps) among all the acquisition &
processing systems/computing nodes, workstations, peripherals and others - Systems Network
• Facilitate Connectivity for dissemination of satellite data products to users through Internet
• Provide data connectivity to NRSC-Balanagar for operations • Provide data connectivity for ISRO participating centers, (SAC,
ISAC, ISTRAC NRSC- Balanagar, ADRIN … ) for Software maintenance and testing
• Facilitate connectivity to receive satellite data acquired at Svalbard, Denmark and Matera, Italy.
Leased Lines
• Leased Lines for Operations
– One 10 Mbps Leased line is used between Shadnagar and Balarenagar Campus for DP operations
– Second 10 Mbps Leased line is used between Shadnagar and Balarenagar Campus for Intranet/Email/Internet operations – The state vectors related to satellite pass schedules are being
provided by ISTRAC over Spacenet / ISROnet.
• Leased Lines for Software Maintenance and testing
– Software for IMGEOS will be provided by ISAC, SAC, ADRIN and NRSC. Hence one 2 Mbps leased line to each centre is provided.
• Leased Lines to receive data from Svalbard
– 45 Mbps Connectivity exists to receive satellite data acquired at Svalbard and process at Shadnagar
Chassis Switch UTP-96 FC-12 DP Servers (17) DQE Edge SwitchUT P -24 10G-2 Edge SwitchUTP -24 10G-2 Edge SwitchUTP -24 10G-2 VADS PQC GbE 1 0 G b p s Leased Lines Balanagar Operations Station Automation Systems (2) DAQLD Servers (4) ADP Servers (3) Chassis Switch UTP-96 FC-12 Version Control System Virtual Reality Systerm WFM System (2) 1 0 G b p s 1 0 G b p s Leased Lines
SAC, ISAC, ADRIN, NRSC( Balanagar)
Sky Link
ISTRAC Operations
Test & Dev Systems
(4)
Data Exchange Gateway
Layer-3 SWITCH Secure Appliance Data Exchange Gateway
Work Stations (43) Work Stations Work Stations
Product Delivery Systems
Systems’ Network
• The function of this network is to provide Ethernet
connectivity for Data acquisition and Data processing nodes
• Two Enterprise class switches as core switches. These switches are redundant to one another for high
availability.
• Each server in the data centre is connected to two switches
• Uplinks are provided with bandwidth of 10Gbps to edge switches, located in the other buildings for connecting work stations
Servers
Total of 60 servers deployed for various activities from
data ingest to data dissemination
.
– 42 4-CPU servers – 18 2-CPU servers
Total of46 workstations & 15 Thin-clients.
Arranged in 72 cubicles in the 7 rows.Workstations have 1Gbps for SAN access. Desktops for internet
Connectivity for Data Dissemination
•
The products are disseminated using web/ftp
Servers.
•
Connectivity between Systems network &
product delivery servers is using Data Exchange
Gateway (DEG).
•
From Systems Network finished data products
transferred to Web/FTP servers using DEG.
•
Internet leased line with Ethernet Interface
Network Security
• Layered Network Security
• Firewall to
protect the Infrastructure
from the public network• Intrusion prevention system for monitoring and preventing malicious activity
• Data Exchange Gateway to transfer the data products from Systems network to Web/FTP servers
• Anti-Virus Solution
• The leased lines from other ISRO centers will be connected using security appliance to the Systems Network
Scalability
•
Tier-1 FC disk system is envisaged for scalability
up to about 400TB raw
•
Tier-2 SATA disk system is being procured can be
scaled up to about 800TB raw
•
Tape Library can be expanded up to 10000 Slots
Data Management
•
Providing Data access for the servers / workstations
•
Data Storing / Archiving using defined policies
•
Data re-use
•
Data integrity
•
Data security
•
These activities are being carried out using SAN File
Level0_1 163 TB Level0_2 159 TB Workspace Prodspace OTS FTP REF IPO
100 TB T1 + 400 TB T2
Input & output for Level 0 processing
Wprking space for all processes 5.1 TB
Storage Partitions Layout
Product space for DP & VADS output 6.4 TB
Volume for OTS products
6.4 TB
Volume for FTP products 6.4 TB
Volume for all permanent files 11 TB
Volume for Initial Phase Operations 7.4 TB
Data Archival
• Data is archived using Storage Policies
• Storage policies act as the primary channels through which data protection and data recovery operations are fulfilled.
• The storage policies are created based on the following parameters.
• Number of copies to create
• Media type to use when storing data
• Amount of time to store data after data is modified • The amount of time (in days) before relocating a file
• Amount of time before truncating a file after a file is Modified
• Separate Storage policies are being created for each satellite and these policies are monitored for compliance of the policy.
Data Flow Management
• HSM/ILM Software is used for movement of data across the storage tiers.
• Data received from level-0 is stored in high performance disk storage (Tier -1), which can be used for processing by Data Processing Servers
• This data in the Tier-1 will be copied automatically to the tape library (Tier-3) based on the set policies.
• Two more copies of the same data will be made on tapes, one for online backup of the archive (which shall remain on the tape library) and other for vaulting (these will be placed in fire-safe vaults).
• Based on the set policies (Time), the data file in the tier-1, is automatically moved to another storage tier (tier-2) and tier-3
Data Protection
• Two online tape copies and two vaulted copies are created • One Vault copy in the Primary Site
• Second vault copy in the Remote site
• As tape is a magnetic medium with electro-mechanical
components, its reliability is checked regularly by performing a random read on each of the tape.
• SFS offers data integrity checks, performing a checksum on files as they are written to tape and then verified when restored back to disk.
• On daily basis few tapes are verified randomly for data access • The logs/ admin alerts of metadata servers, SFS and SM are
continuously monitored for probable alerts. Health check is periodically done by running various diagnostic checks on the SFS.