DISK STORAGE AND DATA BASES

(1)

Proceedings of the International Conference on Information Technologies (InfoTech-2007) 21st – 22nd September 2007, Bulgaria vol. 2

DISK STORAGE AND DATA BASES

Krassimira Shvertner, Joseph Shvertner

Sofia University, Faculty of Economics

e-mail(s): [email protected], [email protected] Bulgaria

Abstract: Data Base Management Systems (DBMS) keep the data on disk devices.Due

the increasing amount of data needed for the DBMS and requirements for high process-ing speed, fault tolerance and zero maintenance delay some new requirements and knowledge about modern disk storage principles and devices are very important for the administrators.

Key words: Data base management systems, storage, disk device, storage area network. 1. INTRODUCTION

The amount of data that can be lost in a failure of disk devices is a

determin-ing factor of the recovery strategies that get present in the article .

If the company database were to fail during an outage, how long would it take until the business was negatively affected. This question can be answered by man-agement. Financial instructions that send and receive the data electronically 24 hours a day can’t afford to be down for any time at all without impairing business opera-tions.

How long the business could survive without database, specially without data, that is the question of the manager to database administrators.

2. DIRECT-ATTACHED STORAGE (DAS)

Direct-attached storage, or DAS, is the term used to describe a storage device that is directly attached to a host system. The simplest example of DAS is the inter-nal hard drive of a server computer, though storage devices housed in an exterinter-nal box come under this banner as well. Network workstations must therefore access the server in order to connect to the storage device. This is in contrast to networked

(2)

stor-to deploy and having a lower initial cost when compared stor-to networked sstor-torage. When considering DAS, it is important to know what the data availability requirements are. In order for clients on the network to access the storage device in the DAS model, they must be able to access the server it is connected to. If the server is down or ex-periencing problems, it will have a direct impact on users' ability to store and access data. In addition to storing and retrieving files, the server also bears the load of proc-essing applications such as e-mail and databases. Network bottlenecks and slowdowns in data availability may occur as server bandwidth is consumed by applications, espe-cially if there is a lot of data being shared from workstation to workstation.

DAS is ideal for localized file sharing in environments with a single server or a few servers - for example, small businesses or departments and workgroups that do not need to share information over long distances or across an enterprise. Small companies traditionally utilize DAS for file serving and e-mail, while larger enter-prises may leverage DAS in a mixed storage environment that likely includes NAS and SAN. DAS also offers ease of management and administration in this scenario, since it can be managed using the network operating system of the attached server. However, management complexity can escalate quickly with the addition of new servers, since storage for each server must be administered separately.

From an economical perspective, the initial investment in direct-attached stor-age is cheaper. This is a great benefit for IT manstor-agers faced with shrinking budgets, who can quickly add storage capacity without the planning, expense, and greater com-plexity involved with networked storage. DAS can also serve as an interim solution for those planning to migrate to networked storage in the future. For organizations that an-ticipate rapid data growth, it is important to keep in mind that DAS is limited in its scalability. From both a cost efficiency and administration perspective, networked storage models are much more suited to high scalability requirements.

Organizations that do eventually transition to networked storage can protect their investment in legacy DAS. One option is to place it on the network via bridge devices, which allows current storage resources to be used in a networked infrastruc-ture without incurring the immediate costs of networked storage. Once the transition is made, DAS can still be used locally to store less critical data.

Although most DBAs think that placing data of such devices is not a big deal (after following the instruction of Oracle Flexible Architecture) there are surprising facts.

(3)

Fig. 1. Technical Structure of Disk Device 2.1. Single Disk Performance

Single disk performance characteristics are:

- Disks hold more data towards outside edges than towards center; - Transfer rate is faster for data near outside;

- Place frequently used data towards outside of disk;

- If outside half of disk is used, then average transfer rate is 95% of optimal; - Outer portion of disk is faster and denser than central portion;

- Placing frequently used data on outer part of disk achieves most of the gain: - minimum 4ms latency for any seek;

- using outer half achieves 80% of disk latency improvement and 80% of maxi-mum disk throughput;

There are two prevailing access patterns: - random access;

- sequential access.

Optimize Sequential Access by using big IOs.

2.2. Multidisk Performance

Use SAME (Stripe And Mirror Everything) Principle.

Stripe Data across all available disks. To achieve good sequential throughput

use a stripe width (unit) at least 1 MB. Smaller stripe debts can improve disk throughput for a single process, however seek times becomes a large fraction of the total IO time. But larger stripe depths don’t help the efficiency of sequential IO. There are two reasons for this: 1. If two processes performing sequential IO happen to accesses the same disk, the disk wait time is small. 2. For sequential scans Oracle performs asynchronous read-ahead that can affect many disks.

Mirror disks to achieve high availability. Striping does not increase the

prob-ability of data loss but it does increase the damage (recovery time). Data loss only occurs on multiple disk failures and the probability for this is very low.

(4)

of storage and file serving responsibilities, and provides a lot more flexibility in data access by virtue of being independent.

Network Attached Storage, or NAS, is a data storage mechanism that uses spe-cial devices connected directly to the network media. These devices are assigned an IP address and can then be accessed by clients via a server that acts as a gateway to the data, or in some cases allows the device to be accessed directly by the clients without an intermediary.

The beauty of the NAS structure is that it means that in an environment with many servers running different operating systems, storage of data can be centralized, as can the security, management, and backup of the data. An increasing number of companies already make use of NAS technology, if only with devices such as CD-ROM towers (stand-alone boxes that contain multiple CD-CD-ROM drives) that are con-nected directly to the network.

Some of the big advantages of NAS include the expandability; need more stor-age space, add another NAS device and expand the available storstor-age. NAS also bring an extra level of fault tolerance to the network. In a DAS environment, a server go-ing down means that the data that that server holds is no longer available. With NAS, the data is still available on the network and accessible by clients. Fault tolerant measures such as RAID, which we'll discuss later, can be used to make sure that the NAS device does not become a point of failure.

NAS technology is not all that new. Several vendors have been selling this technology for several years, and others have announced offerings that are in the works. However, the recent hype about storage area networks (SANs) seems to have also generated interest in NAS. SANs and NAS are often confused. In a nutshell, NAS is the implementation of a disk block protocol (e.g., SCSI) over a common network protocol, such as TCP/IP. Some implementations actually use common net-work file access protocols such as NFS. Example is the so called Netnet-work Appli-ance’s Filer, an NAS product that lets servers access their storage over a LAN con-nection. Essentially, the Filer device is a storage cabinet with an embedded processor and a "lite" OS that exposes disk units to network-based clients (which are usually servers themselves). Clients that access NAS-based storage must also run some sort of redirector or client software that lets the OS see the NAS-based disk as locally at-tached storage. From this point, the application simply accesses NAS-based storage as if it were local to the server. The downside of NAS-based storage is that the data

(5)

transfer burden is shifted from the storage bus (SCSI, fibre channel) to the network, and the overhead of the underlying transport protocol is added to storage data transfer.

Although effective for applications such as file serving and excellent in terms of centralized management or backup and restore, NAS presents some potential pitfalls for applications such as Exchange Server. Implementing a disk protocol over IP and a dedicated LAN is relatively effective for a file server, but SCSI over IP can’t de-liver the high I/Os per second that Exchange Server demands when supporting large user loads. Because disk I/O is the key to Exchange Server performance, Microsoft officially does NOT support NAS with Exchange Server. However, many organiza-tions use NAS for their Exchange deployments and are quite satisfied. These organi-zations usually tout the great management, heterogeneous host support, and disaster recovery capabilities that NAS can bring to Exchange environments.

For centralized Exchange deployments supporting small user populations (500 users or less), NAS can be a worthwhile alternative to direct or SAN-based storage. However, if you're concerned about performance for servers with large populations, or if the fact that Microsoft doesn't support NAS with Exchange is important to you, stick to a SAN-based alternative for your future storage needs.

NAS is an ideal choice for organizations looking for a simple and cost-effective way to achieve fast data access for multiple clients at the file level. Implementers of NAS benefit from performance and productivity gains. First popularized as an entry-level or midrange solution, NAS still has its largest install base in the small to me-dium sized business sector. Yet the hallmarks of NAS - simplicity and value - are equally applicable for the enterprise market. Smaller companies find NAS to be a plug and play solution that is easy to install, deploy and manage, with or without IT staff at hand. Thanks to advances in disk drive technology, they also benefit from a lower cost of entry.

In recent years, NAS has developed more sophisticated functionality, leading to its growing adoption in enterprise departments and workgroups. It is not uncommon for NAS to go head to head with storage area networks in the purchasing decision, or become part of a NAS/SAN convergence scheme. High reliability features such as RAID and hot swappable drives and components are standard even in lower end NAS systems, while midrange offerings provide enterprise data protection features such as replication and mirroring for business continuance. NAS also makes sense for enterprises looking to consolidate their direct-attached storage resources for bet-ter utilization. Since resources cannot be shared beyond a single server in DAS, sys-tems may be using as little as half of their full capacity. With NAS, the utilization rate is high since storage is shared across multiple servers.

The perception of value in enterprise IT infrastructures has also shifted over the years. A business and ROI case must be made to justify technology investments. Considering the downsizing of IT budgets in recent years, this is no easy task. NAS is an attractive investment that provides tremendous value, considering that the main alternatives are adding new servers, which is an expensive proposition, or expanding the capacity of existing servers, a long and arduous process that is usually more.

(6)

like a normal Ethernet networking switch, act as the connectivity point for SAN's. Making it possible for devices to communicate with each other on a separate network brings with it many advantages. Consider, for instance, the ability to back up every

piece of data on the network without having to ‘pollute’ the standard network infra-structure with gigabytes of data. This is just one of the advantages of a SAN which is making it a popular choice with companies today, and is a reason why it is forecast to become the data storage technology of choice in the coming years.

With their high degree of sophistication, management complexity and cost, SANs are traditionally implemented for mission-critical applications in the enterprise space. In a SAN infrastructure, storage devices such as NAS, DAS, RAID arrays or tape libraries are connected to servers using Fibre Channel (4 Gbps). Fibre Channel is a highly reliable, gigabit interconnect technology that enables simultaneous com-munication among workstations, mainframes, servers, data storage systems and other peripherals. Without the distance and bandwidth limitations of SCSI, Fibre Channel is ideal for moving large volumes of data across long distances quickly and reliably.

Fig. 2. Storage area network

In contrast to DAS or NAS, which is optimized for data sharing at the file level,

the strength of SANs lies in its ability to move large blocks of data. This is especially important for bandwidth-intensive applications such as database, imaging and trans-action processing. The distributed architecture of a SAN also enables it to offer

(7)

higher levels of performance and availability than any other storage medium today. By dynamically balancing loads across the network, SANs provide fast data transfer while reducing I/O latency and server workload. The benefit is that large numbers of users can simultaneously access data without creating bottlenecks on the local area network and servers.

SANs are the best way to ensure predictable performance and 24x7 data avail-ability and reliavail-ability. The importance of this is obvious for companies that conduct business on the web and require high volume transaction processing. Another exam-ple would be contractors that are bound to service-level agreements (SLAs) and must maintain certain performance levels when delivering IT services. SANs have built in a wide variety of failover and fault tolerance features to ensure maximum uptime. They also offer excellent scalability for large enterprises that anticipate significant growth in information storage requirements. And unlike direct-attached storage, ex-cess capacity in SANs can be pooled, resulting in a very high utilization of resources.

There has been much debate in recent times about choosing SAN or NAS in the purchasing decision, but the truth is that the two technologies can prove quite com-plementary. Today, SANs are increasingly implemented in conjunction with NAS. With SAN/NAS convergence, companies can consolidate block-level and file-level data on common arrays

Even with all the benefits of SANs, several factors have slowed their adoption, including cost, management complexity and a lack of standardization. The backbone of a SAN is management software. A large investment is required to design, develop and deploy a SAN, which has limited its market to the enterprise space. A majority of the costs can be attributed to software, considering the complexity that is required to manage such a wide scope of devices. Additionally, a lack of standardization has re-sulted in interoperability concerns, where products from different hardware and software vendors may not work together as needed. Potential SAN customers are rightfully concerned about investment protection and many may choose to wait until standards become defined.

5. ORACLE MANAGED FILES (OMF) ORACLE9i

Oracle9i introduces a new powerful manageability feature called Oracle Man-aged Files (OMF).

The main benefits of the Oracle Managed Files are:

• Ease of Oracle file management

• Reduction of Oracle file management errors • Disk space management improvements • Easier third party application integration

The new OMF feature simplifies database administration by eliminating the need for administrators to directly manage the files of an Oracle database. This feature al-lows for specifying operations in terms of database objects. Oracle uses the standard operating system (OS) file system interfaces internally to create and delete files as

(8)

of datafiles, the parameter DB_CREATE_ONLINE_LOG_DEST_<n>, where n is any integer between 1 and 5, decides the default location for copies of online logs and controlfiles. If neither of the last two parameters are set, all the files (datafiles, controlfiles and online logs) will be created at the destination specified by the DB_CREATE_FILE_DEST parameter. Oracle Managed datafiles, created by default, will be 100 MB in size and will be auto extensible with unlimited maximum size. The default size of Oracle Managed online logs will also be 100MB.

Advantages:

o DBAs don't need to specify file names, locations and sizes when creating a

tablespace or database

o Automatic removal of files when a tablespace or log file is dropped

o Increased script portability as OS specific file names do not need to be hard

coded

o Simplified creation of test and development systems

Disadvantages:

o Can only be used with file systems, not with RAW Volumes

o Generated file names and locations might not be according to the site's

nam-ing standards

o Limited scope for file placement and I/O tuning may impact performance

al-though locations can be altered dynamically).

6. AUTOMATED STORAGE MANAGEMENT (ASM) ORACLE10g 6.1. What is Automated Storage Management?

Automated Storage Management (ASM) was designed to simplify database ad-ministration. ASM eliminates the need for the DBA to directly manage the thousands of Oracle database files that could be present in a modern Oracle instance. ASM does this by enabling ASM disk groups, which are logical units comprised of disks and the files that reside on them. Using ASM, the management of thousands of Oracle files is reduced to managing a small number of disk groups.

The SQL statements used for creating database structures, such as tablespaces, redo logs, archive log files, and control files, must specify file location in terms of ASM disk groups, in order to use ASM. ASM will then create and manage the asso-ciated underlying files for you.

(9)

ASM is the logical extension of the power of Oracle-managed files (OMF). In previous releases of OMF, files were created and managed automatically for you, but with ASM you reap the additional benefits of features such as ASM disk group mir-roring and striping. ASM was developed by the same group that developed ODM (Oracle Disk Manager) with in Oracle Corporation.

ASM was designed to preserve all existing database functionality. The existing databases will operate as they always have. Existing databases using file systems or with storage on raw devices will operate as they always have. However, even in ex-isting Oracle 10g Databases, new files can be created as ASM files while old ones are administered in the old way. This means that databases can have a mixture of ASM files, Oracle-Managed files, and manually managed files all at the same time.

Fig. 3. Automatic Storage Management (Oracle) 6.2. ASM disk groups

A disk group is basically one or more ASM disks that are managed as a single logical unit. Any data-structure stored in an ASM disk group is totally contained within that disk group, or self-contained. A database using ASM disks doesn't have to be shutdown in order for a disk to be added or dropped. ASM rebalances the spread of data to ensure an even I/O load to all disks in a disk group when the disk group configuration changes.

We mentioned that any single ASM file is self-contained in a single ASM disk group. However, an ASM disk group can contain files belonging to several data-bases, and a single database can use storage from multiple ASM disk groups. You can specify a disk group as the default disk group for files created in a database by specifying the disk group in file destination initialization parameters.

ASM divides files into 1MB extents and spreads the extents for each file evenly across all of the disks in a disk group. ASM uses pointers to record extent location instead of using a mathematical function to track the placement of each extent. When the disk group configuration changes, ASM moves individual extents of a file rather than having to move all extents to adhere to a formula based on the number of disks.

(10)

• To group disks with different external redundancy together; for example, JBOD (just a bunch of disks) would generally not be in the same disk group with disks from a RAID 1+0 or RAID5 configuration, but this is possible using ASM. • To separate work and recovery areas for a given database.

Note: In any installation, non-ASM managed operating system storage repositories are re-quired, and are used for swap files, execution libraries, and user file systems. The Oracle database and ASM executable files and libraries must reside on the server's operating system file system and cannot reside in an ASM files.

In case of RAC database environment, files for loading into externals tables are still located on non-ASM file system which can be a cluster file system or local file system.

6.3. Types of disk groups

There are three types of ASM disk groups: • Normal redundancy

• High redundancy • External redundancy

With normal and high redundancy, the disk group template specifies the ASM redundancy attributes for all files in the disk group.

Configuration of ASM high redundancy provides a greater degree of protection. With external redundancy, ASM does not provide any redundancy for the disk group.

In external redundancy, the underlying disks in the disk group must provide re-dundancy (for example, using a RAID storage array.) The rere-dundancy level or type is specified at the time of creating the disk groups.

7. CONCLUSION

The practical application of the discussed advanced disk storage principles are considered in the practical exercises of the master’s courses “Backup and Recovery of Oracle Data Base” and “Oracle Data Base Administration” and in the real practice.

REFERENCES

Freeman, G. (2004). Oracle Database 10g New Features, Osborne Oracle Press Series.

Oracle Corporation (2005). Oracle® Database, Administrator's Guide 10g Release 2 (10.2), B14231-01.