Mass Storage Devices - LPI Study Guide

The secondary storage or mass storage devices, as they are known today come in different types which are determined by their physical interface. An interface is the means by which the device physically attaches to the computer. Over time numerous interfaces for connecting mass storage devices have been developed but the four main types of disks encountered today are :

•PATA – Parallel Advanced Technology Attachment , also known as IDE

•SATA – Serial Advanced Technology Attachment, the latest standard replacing PATA especially

on desktops and laptops,

•SCSI – Small Computer System Interface disks are used in servers and other high end

machines. SCSI provides high speed access as well as the ability to connect a large number of devices

•SAS – serial SCSI, a new SCSI standard aimed at servers

Besides the interface, the type of mass storage is also determined by the device itself. For example you can have SATA optical drives (CDROMs/DVDROMS) as well as hard disk and SCSCI disks and SCSI tape drives.

PATA

PATA is an obsolete standard but you might still encounter it on older machines. Parallel refers to the manner in which data is transferred from the device to the CPU and memory. There are several variations on the PATA standard such as IDE and EIDE but since the introduction of SATA they are collectively referred to as PATA devices.

Motherboards come with two PATA connectors and PATA cables support up to two devices per cable, in a master/slave setup. Which device is master or slave depends on the location of the device on the cable and jumper settings on the hard disks themselves. Since ATA drives have been around for a long time they are well supported in Linux and most BIOSes usually have no problem in automatically identifying and configuring these devices.

Linux identifies ATA devices under the /dev file system with the naming convention /dev/hd[a z]. The last letter of the device name is determined by the whether the disk is master or slave on the primary or secondary connection.

Gaps can also appear in the device node naming. For example it is often the case that the disk is identified as /dev/hda, being the master drive on the primary connector, with the CDROM drive being identified as /dev/hdc as the master on the secondary connector. This is quiet a common setup for desktop machines as it is better to have your two most frequently accessed mass storage devices on separate cables for improved performance rather than having them share a

cable where access contention may arise.

Partitions on PATA disks are identified by a number following the letter. For example the first primary partition on the /dev/hda drive is identified by /dev/hda1 and the 2nd_primary

partition /dev/hda2 etc. For more information on disk partition numbering please refer to section 102.1

What is important about understanding device node naming conventions at this point is being able to identify the type of hard disk device and its device name.

SATA

Serial ATA (SATA) drives have largely replaced PATA drives on desktops and laptops. SATA is a serial standard but offers higher throughput than the older PATA interface. SATA drives are not configured in a master/slave setup and each have their own dedicated controller or channel. The cables for SATA drives are considerably thinner than those for PATA devices saving space and cost. SATA controllers use the Advanced Host Controller Interface (AHCI) which allows for hotplugging and hotswapping of SATA disk.

As with PATA devices most BIOSes automatically detect SATA drives and the Linux kernel usually has no problem identifying and loading the correct drivers for SATA drives. A peculiarity of SATA under Linux is that SATA disks make use of the SCSI disk sub system and hence the naming convention of these devices follows that of SCSI devices.

SATA drive naming conventions uses the scsi sub system naming conventions with devices being labeled /dev/sd[az]. The final character of the device node name is determined by the order in which the Linux kernel discovers these devices. Partitions on SATA drives are named numbered from 1 upwards.

SCSI Devices

SCSI like PATA and SATA defines a physically interface as well as protocol and commands for

transferring data between computers and peripheral devices. SCSI is most commonly used for hard disks and tape drives, but it can connect a wide range of other devices, such as scanners and CD drives. Usually SCSI devices connect to a host adapter that has its own BIOS. SCSI storage devices are faster and more robust than SATA or PATA devices but are also more expensive, hence they are used mostly in servers or high end workstations.

There are two types of SCSI interfaces: an 8-bit interface with a bus that supports 8 devices, this includes the controller, so there is only space for 7 devices and a 16-bit interface (WIDE) that supports 16 devices including the controller, so there can only be 15 block devices.

SCSI devices are uniquely identified using a set of 3 numbers called the SCSI ID. a. the SCSI channel

b. the device ID number c. the logical unit number LUN

The SCSI Channel

Each SCSI adapter supports one data channel on which to attach SCSI devices (disc, CDROM, etc)

These channels are numbered from 0 onwards.

Device ID number

Each device is assigned a unique ID number that can be set using jumpers on the disk. The IDs range from 0 to 7 for 8-bit controllers and from 0 to 15 for 16-bit controllers.

The Logical Unit Number (LUN) is used to differentiate between devices within a SCSI target number. This is used, for example, to indicate a particular partition within a disk drive or a particular tape drive within a multi-drive tape robot. It is not seen so often these days as host adapters are now less costly and can accommodate more targets per bus.

Hardware Detection

All detected devices are listed in the /proc/scsi/scsi file. The example below is from the SCSI-2.4-HOWTO

/proc/scsi/scsi Attached devices:

Host: scsi0 Channel: 00 Id: 02 Lun: 00

Vendor: PIONEER Model: DVD-ROM DVD-303 Rev: 1.10 Type: CD-ROM ANSI SCSI revision: 02 Host: scsi1 Channel: 00 Id: 00 Lun: 00

Vendor: IBM Model: DNES-309170W Rev: SA30 Type: Direct-Access ANSI SCSI revision: 03

Since SATA drives use the same scsi sub-system as real SCSI drives, the naming convention for SCSI devices is the same as that setout above for SATA drives. It is not only scsi drives that make use of the scsi sub system but USB drives as well, hence you will find USB drives following the same naming convention

The scsi_info tool uses the information in /proc/scsi/scsi to printout the SCSI_ID and the model of a specified device. From the file above scsi_info would produce the following output:

# scsi_info /dev/sda SCSI_ID="0,0,0"

MODEL="IBM DNES309170W" FW_REV="SA30"

The system will boot from the device with SCSI ID 0 by default. This can be changed in the SCSI BIOS which can be configured at boot time. If the PC has a mixture of SCSI and SATA/PATA disks, then the boot order must be selected in the system's BIOS first.

SAS

Serial attached SCSI is the latest interface on the block and is an upgrade to the SCSI protocol and interface as SATA is to PATA. In general SCSI devices are faster and more reliable than SATA/PATA drives. The performance gap between the two technologies continues to close but the two technologies are generally targeted at two different markets, namely the consumer market for SATA and the enterprise for SCSI. The enterprise has a higher requirement for reliability and speed than does the consumer market and the price of the different drives reflect this.

Identifying the correct device ID for BOOT device

The purpose of listing the different type of mass storage devices and their naming conventions is to allow you to easily identify the device ID for your disks. This is important for being able to identify which of your drives is the boot device and which disk partitions contains the root and boot directories.

There is however a problem with the Linux naming convention, and not just for disk drives. The problem is that device names may change between system reboots when hardware configuration changes are made. Since the naming convention of hardware has a component which depends on the oder the device is discovered by the kernel, adding, moving or removing devices may result in changes to device names. This was not such a problem a few years ago, as changing hard disks is not a regular occurrence; but with the advent of USB and the fact that is uses the SCSI sub-system for its device naming, the problem has become more severe. In order to uniquely identify a device, irrespective of when it is discovered or located, away has to be found to uniquely identify the device. For disks this is done by writing a universally unique id to the disk meta data and using the UUID in configuration files rather than the device node. (It is not only hard disks that need to be uniquely identifiable. Other devices need to be as well but may use different means of doing so. Network cards use their MAC address for example.) The blkid utility, which replaces vol_id, can be used to query the settings on a device or search for a device with a specific UUID.

blkid /dev/sda1

for example produces the following output: /dev/sda1: UUID="c7d63e4b-2d9f-450a-8052- 7b8929ec8a6b" TYPE="ext4" showing that the first partition on the device sda has an ext4 filesystem and a UUID of c7d63e4b-2d9f-450a-8052-7b8929ec8a6b

blkid U 75426429cc4b4bfcbeb9305e1f7f8bc9

searches for the device with the specified ID and returns /dev/sdb1. Alternatively you could look under /dev/disk/byuuid to see which ids map to which devices as seen by the Linux kernel.

In document LPI Study Guide (Page 35-38)