• No results found

Data storage and high-speed streaming

N/A
N/A
Protected

Academic year: 2021

Share "Data storage and high-speed streaming"

Copied!
59
0
0

Loading.... (view fulltext now)

Full text

(1)

FYS3240

PC-based instrumentation and microcontrollers

Data storage and high-speed

streaming

Spring 2011 – Lecture #8

(2)

Data streaming

• Data written to or read from a hard drive at a

sustained rate is often referred to as

streaming

• Trends in data storage

Ever-increasing amounts of data

Record “everything” and play it back later

Hard drives: faster, bigger, and cheaper

RAID hardware

PCI Express

(3)

Overview

• Hard drive performance and alternatives

• File types

• RAID

• DAQ software design for high-speed

acquisition and storage

(4)

Streaming Data with the PCI Express Bus

• A PCI Express device receives dedicated bandwidth (250 MB/s or more).

• Data is transferred from onboard device memory (typically less than 512 MB), across a dedicated PCI Express link, across the I/O bus, and into system memory (RAM; 3 GB or more

possible). It can then be transferred from system memory, across the I/O bus, onto hard drives (TB´s of data). The

CPU/DMA-controller is responsible for managing this process. • Peer-to-peer data streaming is also possible between two PCI

(5)

PXI: Streaming to/from Hard Disk

Drives

(6)

RAM – Random Access Memory

SRAM – Static RAM: Each bit stored in a flip-flop

DRAM – Dynamic RAM: Each bit stored in a capacitor (transistor). Has to be refreshed (e.g. each 15 ms)

– EDO DRAM – Extended Data Out DRAM. Data available while next bit is being set up – Dual-Ported DRAM (VRAM – Video RAM).

Two locations can be accessed at the same time – SDRAM – Synchronous DRAM.

Latched to the memory bus clock read/write on one single clock cycle

– Rambus (RDRAM) - Intel had to use RDRAM 1996-2002. Much more expensive than SDRAM

– DDR SDRAM – Double Data Rate SDRAM (2.5 V). Data transferred on both rising and falling edge of clock. – DDR2 SDRAM (1.8 V)

Same, but lower power consumption and higher clock frequency

– DDR3 SDRAM (1.5 V)

(7)

Stream to/from disk rates

• Important parameters for streaming

• Seek times • Buffer size

• Rotational speed (RPM)

• Sustained streaming rates is most affected by the rotational speed

Peak, not

(8)

HDD Performance

• HDD’s Internal Data Rate (IDR) = density x RPM x disk diameter • Outer HDD track is faster, inner track is slower

• Example above: 62 MB/s at outer track, 36 MB/s at inner track • Windows OS allocates file space from outer track and inward

Track

(9)

SSD (Solid-State drive)

• A data storage device that uses solid-state (Flash) memory to store data.

• SSDs are distinguished from traditional hard disk drives (HDDs), which are electromechanical devices containing spinning disks and movable read/write heads.

• SSDs, in contrast, use microchips which retain data in non-volatile memory chips and contain no moving parts.

• SSDs use the same interface as HDDs, thus easily replacing them in most applications

(10)

SSDs Performance versus Capacity

More robust – used in

(11)

SSDs pros and cons

• Pros

– Robustness (Less susceptible to vibrations and shock)

– Increased write/read speeds (low access time and latency)

– Not a drop in write speed when the memory fills up (as for HDDs)

– Low power consumption (reduced heat generation)

– Low boot-up time (for OS) and quicker application-launches

• Cons

– High cost

– Low capacity (in # GB)

– Great quality variations have been experienced

– Reduced write speed experienced over time (for some suppliers)

(12)

SSD example: OCZ RevoDrive

• PCI express card with flash memory and RAID-controller (RAID-0)

• Available in 50 GB to 480 GB capacities

• PCI-Express interface (x4)

• For use as primary boot drive or data storage

• Read: Up to 540 MB/s • Write: Up to 480 MB/s

(13)

Selecting hard drives for DAQ systems

• A standard HDD is a 8/5 drive

– Designed for power on for 8 hours, 5 days a week

Select Enterprise/Extended operations/ES version HDDs

– They are 24/7 drives, meaning that they are designed for continuous operation and high sustained throughput

– Almost same price as standard versions (except server disks!)

• 2.5” HDDS designed for laptops are more rugged than 3.5”

HDDs designed for use in desktop PCs

(14)

Factors that affect streaming

performance

• Beyond overall application architecture, stream-to-disk or

stream-from-disk rates can be affected by some of the following factors

– Running background programs such as virus scan

• Recommended to disable the scheduled scans and updates for the entire duration of data streaming

– How the hard drive is formatted to group data – Location of the file on the hard drive(s)

• Locate the OS on a separate HDD (to free the fastest outer tracks for data storage)

(15)
(16)
(17)
(18)

Determining Storage Format

When determining the appropriate storage format for the data, consider the following:

• What will you do with your data once you have acquire them? • Will you write and read data with the same application?

• How much data will you acquire? • At what rate will you acquire data?

• Will you need to exchange data with another program? • Will you need to search your data files?

– ASCII – Binary – TDMS – Config File – Spreadsheet – AVI – XML

(19)

ASCII Files

Pros

– Human-readable

– Easily portable to other

applications such as Microsoft Excel

– Can easily add text information (first line) for each data column • Cons

– Large file size

(20)

Binary Files

Pros

– Compact file size – Fast streaming • Cons

– Not human-readable

– Less easily exchangeable

(21)

XML files

Pros

– Can store complex data structures

– Can be displayed in a Web browser or in a text editor • Cons

– Larger disk footprint – Not for data streaming

(22)

Database (MS Access etc.)

Pros

– Searchable • Cons

– Time-intensive programming

(23)

TDM and TDMS

• A file format from National Instruments

• TDMS = Technical Data Management Streaming • Three levels of hierarchy

TDMS

Optimized for high-speed streaming

Binary-based header

Binary bulk data

Single file for both header information and bulk data

Supported on real-time software platforms

TDM

XML-based header file

Separate binary bulk data file

(24)

TDMS File Format: Optimized for Data

Storage and Search

(25)
(26)
(27)
(28)
(29)

Applications Requiring high-speed

Data Streaming (examples)

• High speed data acquisition

• Radar

– RF recording and playback

• High resolution video

(30)

Key system components for

high-speed streaming

• Hardware Platform with High-Throughput and

Low-Latency

– PXI Express

• High-Speed Data Storage

– Hard Disk Drives (HDDs) – Solid-State Drives (SSDs)

• Software for Streaming to Disk at High Rates

• Streaming Front-End Instrumentation

(31)
(32)

PXI Express Hardware Platform

• In a system composed of a x4 PXIe chassis, x4 PXIe controller, and x1 PXIe module, the module will limit the maximum data streaming rate of that particular slot to 250 Mbytes/s.

• Processing is required of the controller to manage the transfer of data from device memory to controller memory to hard drives for acquisition or from hard drives to controller memory to device memory for generation.

(33)

RAID

• RAID = Redundant Array of Independent Drives.

• RAID is a general term for mass storage schemes that split or replicate data across multiple hard drives.

• The rate at which data can be written to or read from the hard drives limits the data streaming rates.

• One way to increase this rate is to use a RAID configuration – Can be the internal RAID of the remote controller

(workstation/PC) or an in-chassis PXI RAID

– Can be a RAID connected externally to the PXIe chassis

(34)

RAID-0

Striping without redundancy

– Improved speed over streaming to a single hard drive – Unimproved system reliability

– Transparently supported by Windows OS

(35)

RAID-1

Mirrored (redundancy)

– 100% data redundancy

• Each piece of data is written to two (or more) hard drives

– No write speed increase over single disk – Highest overhead of all raid configurations

(36)

RAID-5

Distributed parity (single parity)

– Parity data distributed on all disks

– Can only tolerate one drive failure (array continues to operate with one failed drive)

– Single-parity RAID levels are as vulnerable to data loss as a RAID-0 array until the failed drive is replaced and its data rebuilt

– More write overhead than RAID-1 because of the additional parity data that has to be created and written to the disk array

– Poor performance with small files

– Gives less space for measurement data (due to parity)

• Reduces the amount of storage space available by the size of one hard drive (due to parity information)

(37)

RAID-6

Double distributed parity

• Extends RAID 5 by adding an additional parity block • Provides fault tolerance from two drive failures (array

continues to operate with up to two failed drives)

• This becomes increasingly important as large-capacity

drives lengthen the time needed to recover from the failure of a single drive

• Double parity gives time to rebuild the array without the data being at risk if a single additional drive fails before the

rebuild is complete. • Very slow write

(38)

RAID-10 (1+0)

Striping and mirroring

– Both increased speed and redundancy compared to single drive

– Can sustain multiple drive failures

(39)

Kilobyte – kB vs KB (from Wikipedia)

• The kilobyte (symbol KB)[1] is a multiple of the unit byte for digital information.

Historically and in common usage, a kilobyte refers to 1024 (210) bytes in most

fields of computer science and information technology.[2][3][4] However, hard disk

drives measure capacity in strict accordance with the rule of the International System of Units’ unit prefixes (prefixes like kilo- and mega-) that are based on decimal math.[5][6] In such mass storage contexts, megabyte equals precisely

1,000,000 bytes and gigabyte is precisely 1,000 times greater than a megabyte. This association suggests that kilobyte can also denote 1,000 bytes in the

context of mass storage. Thus, “kilobyte” and its symbol “KB” can be ambiguous, their exact value (1,024 or 1,000) dependent upon context.

• In December 1998, an international standards organization attempted to

address these dual definitions of the conventional prefixes by proposing unique

binary prefixes and prefix symbols to denote multiples of 1024, such as “kibibyte (KiB)”, which exclusively denotes 210 or 1024 bytes.[7] Had this

proposal been widely and consistently adopted, it would have liberated the standard unit prefixes to unambiguously refer only to their strict decimal

definitions wherein kilobyte would be understood to represent only 1000 bytes. However, in the over-12 years that have since elapsed, the proposal has seen little adoption by the computer industry.[8][9][10][11]

(40)

kB vs KB

k = 1000 (decimal) K = 1024 (binary) GBbinary = 10243 byte TBbinary = 10244 byte (GBbinary ~= GBdesimal*0.93) (TBbinary ~= GBdesimal*0.90)
(41)
(42)
(43)

RAID configuration

• Use the RAID management software to configure you RAID system

• Do not create any single array larger than 2 TB when using

Windows XP or Vista (32-bit OS) as they recognize partitions of only 2 TB or less

(44)

RAID configuration II

• In order to speed up the write of multiple

(simultaneous) files NI recommends to write these

files to separate RAID volumes

– The RAID must be reconfigured with a suitable number of RAID volumes of a suitable size

– If for example three equally fast high-speed digital video

streams are to be stored on a 12 disk RAID, the RAID could be configured with 3 separate RAID volumes of 4 drives

each

• This should allow for writes (and reads) on the outer edge of each of the 3 volume’s respective disks.

(45)

Memory – Storage Chain for a large

DAQ system

• RAID-0 (HDD) • RAID-5 (HDD) • SSD • RAM

• RAID-5 (HDD) • Tape drives • HDDs

• MB/s needed ?

• Total amount of data (MB) needed ?

Master DAQ computer DAQ Clients, if this is a server ”Offline”

(46)
(47)
(48)

MXI bus

• It couples two physically separate buses with either a copper or fiber-optic, e.g

– PCIe to PXIe – PXIe to PXIe

• E.g. control a PXIe chassis with a PCIe based PC (remote controller)

• Use e.g. a x4 MXI Express cable and interface cards in each PC/PXI system

PCIe PXIe

PXIe PXIe

MXI MXI

(49)

Illustration of driver problems …

Problem:

NI HDD-8264 is only achieving ~510 MB/s of sustained throughput in in Windows 7, not 600 MB/s like with Windows XP (Performance

decrease)

Solution:

NI HDD-8264 uses an LSI RAID controller, and LSI has not released an official Windows 7 driver for this RAID card. LSI recommends the driver that is integrated in the Windows 7 kernel. However, this driver has decreased performance compared to the LSI developed, Windows XP driver. Benchmarking tests between Windows XP and Windows 7 performed by National Instruments show an approximate 15%

(50)

Another solution - Direct-to-Disk

controller

• When streaming to or from controller memory or hard drives, the PCI controller (in the case of PCI), I/O bus, controller

memory, and processor are shared data paths, which divides down data streaming bandwidth.

• Additionally, the operating system, drivers, and application software managing data flow introduce latencies.

(51)

Direct-to-Disk controller

• With a PCI Express direct-to-disk controller module, data is streamed directly from the device memory onboard a DAQ-card, across the PCI or PCI Express bus, to the direct-to-disk controller for acquisition

• This gives a minimum use of the PCI-buss and the CPU

http://www.conduant.com/

Streamstor Amazon Express Controller (using PCI Express)

(52)

32-Bit vs. 64-Bit OS for DAQ applications

• A 32-bit processor can reference 232 bytes, or 4 GB of memory

• A 64-bit processor are theoretically capable of referencing 264

locations in memory. However, all 64-bit versions of Microsoft operating systems currently impose a 16 TB limit on address space

• Increased performance of 64 bit OS and application programs for DAQ applications (provided that 64-bit drivers are available) can be seen in:

– High-channel count, high-speed data analysis

– Vision acquisition and applications with large amounts of data moving directly into memory at rapid rates

• LabVIEW 2009 and newer versions are now available for 64-bit operating systems (e.g. Windows 7 x64).

(53)

LabVIEW: Bad Programming Structure

for High-speed acquisition and storage

(54)
(55)
(56)

• LabVIEW handles the queue as a block of allocated PC memory. This memory block is utilized as a temporary storage FIFO for data

passing between two loops • LabVIEW handles all the

memory access to ensure that read-write race

(57)

Increase DAQ I/O Performance

• Use the option to disable

buffering (unbuffered I/O) in Win API

• Optimizes streaming applications

• Note that you must read from or write to the file in integer

multiples of the disk sector size (usually 512 bytes) when using this option

Track

(58)
(59)

Examples of Streaming Front-End

Instrumentation

PCIe x4 dual port NIC

Streaming over PCIe and PXIe busses

byte digital information. of computer science information technology hard disk drives International System of Units standards organization binary prefixes “kibibyte

References

Related documents