• No results found

Chapter 2: Background and Technology Trends

2.3 Storage Interfaces

2.4.1 Large Storage Systems

Disk/Trend reports that the disk drive industry as a whole shipped a total of 145 mil- lion disk drives in 1998 [DiskTrend99]. With an average drive size near 5 GB, this is a total of 725 petabytes (1015) of new data storage added in a single year. Of this total, 75%

went into desktop personal computers, 13% into server systems, and 12% into portables. This means over 100 petabytes of new storage found its way into data centers and servers around the world.

There is a large variation in the types of workloads for such large data systems. The advantage of Active Disks is that they provide a mechanism whereby a wide variety of applications with a range of characteristics can be supported effectively and take advan- tage of the same underlying hardware components. The increased flexibility in placing functions allows applications to be structured in novel ways that are simply not possible in systems with “dumb” disks, where processing can only occur after data has been trans- ferred to a host. This provides system designers a new avenue for optimization and plan- ning. There are also benefits in functionality and optimization that may be possible in desktop drives with an Active Disk capability, but the focus of this dissertation is on the benefits of parallelism and offloading in systems with multiple disks, outside of the com- modity market for low-end single disks.

Table 2-7 provides a sample of several large storage systems in use today across a range of organizations and applications. We see that it is quite easy to reach several ter- abytes of data with even a modest number of users. We also see a significant variety among the uses for large data systems, meaning that storage systems must support differ- ent access patterns and concerns. A number of the most popular classes of usage are dis- cussed below, along with the general trends in the demands they place on storage systems.

Site System Storage Size Software Type of Data

Motley Fool AlphaServer 1000 2 x StorageWorks 310 2 x 30 GB SQL Server message boards, financial data Atrieva.com StorageTek 12 TB, 20 GB/week custom free Internet storage Aramark Uniforms AlphaServer 4100 ESA 10000 1 TB Oracle sales & cust info, mining Northrop Grumman 2 x AlphaServer 8400 StorageWorks 2 TB SAP, Oracle 100% mirrored Lycos n x AlphaServer 8400 StorageWorks 5 TB custom web site, catalog Mirage Resorts Tandem, NT, AS400,

UNIX

StorageTek Powder- Horn

450 GB/night backups

CERN various AIX, Sun storage arrays 1 TB AFS 15 servers, 3-5,000 active users Boeing Engineering various RS/6000, Sun 50 TB DFS 3-5,000 seats

Nagano Olympics 48 SP2 web servers 2 x RS/6000 16x9 SSA 144 GB DFS/Web 4 complete replicas Goddard SFC Cray T3E, 128 GB fibre channel disks 960 GB Unicos 650 MFLOPS, 1024 nodes Corbis Compaq ProLiants StorageWorks 2 TB NT, IIS,

SQL

high-resolution images

Cathay Pacific Sun Enterprise 10000 Sun storage arrays 1.5 TB data warehousing

Table 2-7 Large storage customers and systems. Data from www.storage.digital.com, www.stortek.com,

2.4.2 Database

Traditionally, the most important use for large data systems is to store the transac- tion databases that form the basis of the electronic world - whether in stock markets, banks, or grocery stores. As more and more of the world becomes computerized, more and more of our daily actions (and transactions) are stored and tracked.

The increasing size and performance of large transaction processing systems is illus- trated in Table 2-8 which shows the evolution of systems over the six years since the intro- duction of the TPC-C benchmark. Three manufacturers and two product lines are shown. For IBM and Hewlett-Packard, there is data for “enterprise” class systems and for com- modity or “workgroup” class systems. In the later years, data for commodity class systems from Dell is added. We see that over the six years, there is a huge increase in performance and a huge drop in price. From over $2,000 per tpmC to $17 per tpmC for a system that performs 200 times as many transactions. Figure 2-7 graphically illustrates the cost trend, with the commodity machines dropping off more steeply than the high-end systems. We also see that the amount of storage in these systems has increased significantly. In a TPC-C benchmark, the amount of storage required is proportional to the transaction rate, and we see that this has increased 100-fold since the first TPC-C benchmark machines.

A basic requirement embodied in the TPC-C benchmark is that the benchmark sys- tems provide sufficient storage to maintain roughly four months of active data. If the sys- tem were going to retain historical data beyond this time, for example to support longer-term trend analysis or decision support queries, the storage requirements would quickly grow. For example, the table shows that a 50,000 tpmC system is able to fill 3 TB

Year System Processor Memory Storage Cost tpmC $/tpmC

1993 IBM RS/6000 POWERserver 230 c/s 45 MHz RISC 64 MB 10.6 GB $245,273 115.83 2,118.00 1993 HP3000 Series 957RX 48 MHz PA-RISC 384 MB 32.7 GB $487,710 253.70 1,923.00

Enterprise Systems

1995 HP 9000 K410 4 x 120 MHz PA 2 GB 341.0 GB $1,384,763 3809.46 364.00 1997 IBM RS/6000 Enterprise Server J50 c/s 8 x 200 PPC 3 GB 591.5 GB $895,035 9,165.13 97.66 1997 HP 9000 V2200 Enterprise Server 16 x 200 MHz PA 16 GB 2,439.0 GB $3,717,105 39,469.47 94.18 1999 IBM RS/6000 Enterprise Server H70 c/s 4 x 340 MHz RS 8 GB 1,884.4 GB $1,343,526 17,133.73 78.50 1999 HP 9000 N4000 Enterprise Server 8 x 440 MHz PA 16 GB 3,787.0 GB $2,794,055 49,308.00 56.67

Commodity Systems

1995 IBM RS/6000 Workgroup Server E20 c/s 100 MHz PPC 512 MB 56.3 GB $278,029 735.27 378.00 1997 IBM RS/6000 Workgroup Server F50 c/s 4 x 166 MHz PPC 2.5 GB 495.8 GB $725,823 7,308.10 99.32 1997 HP NetServer LX Pro 2 x 200 MHz Pent 2 GB 512.4 GB $584,286 7,351.50 79.48 1997 Dell PowerEdge 6100 4 x 200 MHz PPro 2 GB 451.0 GB $327,234 7,693.03 42.53 1999 IBM Netfinity 7000 M10 c/s 4 x 450 MHz Xeon 4 GB 1,992.9 GB $577,117 22,459.80 25.70 1999 HP NetServer LH 4r 4 x 450 MHz PII 4 GB 1,310.0 GB $440,047 19,050.17 23.10 1999 Dell PowerEdge 6350 4 x 500 MHz Xeon 4 GB 1,703.0 GB $404,386 23,460.57 17.24

Table 2-8 Comparison of large transaction processing systems over several years. The table compares the size, performance, and cost of large transaction processing systems - as given by TPC-C benchmark results - over the six year period. Data from [TPC93], [TPC97], and [TPC99]. No attempt has been made to adjust the dollar figures for inflation, such an adjustment would only raise the costs of the older systems and make the improvements more striking.

of storage in only four months. If a customer were to continue to collect that data and store it for further analysis, this system would grow at a rate of over 10 TB per year. If we look to process data of this size in a decision support system, we find that 3 TB is already the largest scale factor available for a TPC-D decision support benchmark [TPC98]. This means that the fastest transaction systems of today are rapidly swamping the largest deci- sion support systems. Soon it may be necessary to add 30 and 300 TB scale factors to the TPC-D benchmark as such database sizes become commonplace.