• No results found

Chapter 1. Components of the logical volume manager

1.3 Operation of the logical volume manager

1.3.5 Logical volume device driver

The Logical Volume Manager consists of the Logical Volume Device Driver (LVDD) and the Logical Volume Manager subroutine interface library. The logical volume device driver is a pseudo-device driver that manages and processes all I/O and operates on logical volumes through the /dev/lvn special file. It translates logical addresses into physical addresses and sends I/O requests to specific device drivers.

Like the physical disk device driver, this pseudo-device driver provides character and block entry points with compatible arguments. Each volume group has an entry in the kernel device switch table. Each entry contains entry points for the device driver and a pointer to the volume group data structure. The logical volumes of a volume group are distinguished by their minor numbers.

The logical volume device driver comes in two portions: The pinned portion and the non pinned portion. The driver’s true non-pageable portion is in hd_pin_bot while the hd_pin portion is pageable. Prior to AIX 4.1, the driver was just called hd_pin and the entire driver was pinned into memory (that is, was not pageable).

The logical volume device driver is either called by the jfs file system or lvm library routines. When a request is received by the LVDD, it calls the physical disk device driver.

Character I/O requests are performed by issuing a read or write request on a /dev/rlvn character special file for a logical volume. The read or write is processed by a file system supervisor call (SVC) handler that calls the LVDD ddread or ddwrite entry point. The ddread or ddwrite entry point transforms the character request into a block request. This is done by building a buffer for the request and calling the LVDD ddstrategy entry point.

Block I/O requests are performed by issuing a read or write on a block special file /dev/lvn for a logical volume. These requests go through the supervisor call handler to the bread or bwrite block I/O kernel services. These services build buffers for the request and call the LVDD ddstrategy entry point. The

LVDD ddstrategy entry point then translates the logical address to a physical address (handling bad block relocation and mirroring) and calls the

appropriate physical disk device driver.

On completion of the I/O, the physical device driver calls the iodone kernel service on the device interrupt level. This service then calls the LVDD I/O completion handling routine. Once this is completed, the LVDD calls the iodone service again to notify the requester that the I/O is completed. The LVDD is logically split into top and bottom halves. The top half contains the ddopen, ddclose, ddread, ddwrite, ddioctl, and ddconfig entry points. The top half runs in the context of a process address space and can page fault. It contains the following entry points as shown in Table 9.

Table 9. LVDD entry points

Entry Point Description

ddopen Called by the file system, when a logical volume is mounted, to open the logical volume specified.

ddclose Called by the file system, when a logical volume is unmounted, to close the logical volume specified. ddconfig Initializes data structures for the LVDD.

ddread Called by the read subroutine to translate character I/O requests to block I/O requests. This entry point verifies that the request is on a 512-byte boundary and is a multiple of 512 bytes in length.

When a character request spans partitions or logical tracks (32 pages of 4 K bytes each), the LVDD ddread routine breaks it into multiple requests. The routine then builds a buffer for each request and passes it to the LVDD ddstrategy entry point, which handles logical block I/O requests. If the ext parameter is set (called by the readx subroutine), and the character device is opened. The ddread entry point passes this parameter to the LVDD ddstrategy routine in the b_options field of the buffer header.

ddwrite Called by the write subroutine to translate character I/O requests to block I/O requests. The LVDD ddwrite routine performs the same processing for a write request as the LVDD ddread routine does for read requests.

The bottom half of the LVDD contains the ddstrategy entry point, which contains block read and write code. This is done to isolate the code that must run fully pinned and has no access to user process context. The bottom half of the device driver runs on interrupt levels and is not permitted to page fault. In particular, the bottom half of the LVDD:

• Validates I/O requests. Checks requests for conflicts (such as overlapping block ranges) with requests currently in progress.

• Translates logical addresses to physical addresses. • Handles mirroring and bad-block relocation.

The bottom half of the LVDD runs on interrupt levels and, as a result, is not permitted to page fault. The bottom half of the LVDD is divided into the following three layers:

• Strategy Layer • Scheduler Layer • Physical Layer

Each logical I/O request passes down through the bottom three layers before reaching the physical disk device driver. Once the I/O is complete, the request returns back up through the layers to handle the I/O completion processing at each layer. Finally, control returns to the original requestor.

ddioctl Supports the following operations:

CACLNUP Causes the mirror write consistency (MWC) cache to be written to all physical volumes (PVs) in a volume group.

IOCINFO, XLATE Return LVM configuration information and PP status information.

LV_INFO Provides information about a logical volume. This ioctl operation is available beginning with AIX Version 4.2.1

PBUFCNT Increases the number of physical buffer headers (pbufs) in the LVM pbuf pool. Entry Point Description

1.3.5.1 Strategy layer

The strategy layer deals only with logical requests. The ddstrategy entry point is called with one or more logical buf structures. A list of buf structures for requests that are not blocked are passed to the second layer, the scheduler. 1.3.5.2 Scheduler layer

The scheduler layer schedules physical requests for logical operations and handles mirroring and the MWC cache. For each logical request the scheduler layer schedules one or more physical requests. These requests involve translating logical addresses to physical addresses, handling

mirroring, and calling the LVDD physical layer with a list of physical requests. When a physical I/O operation is complete for one phase or mirror of a logical request, the scheduler initiates the next phase (if there is one). If no more I/O operations are required for the request, the scheduler calls the strategy termination routine. This routine notifies the originator that the request has been completed.

The scheduler also handles the MWC cache for the volume group. If a logical volume is using mirror write consistency, then requests for this logical volume are held within the scheduling layer until the MWC cache blocks can be updated on the target physical volumes. When the MWC cache blocks have been updated, the request proceeds with the physical data write operations. When MWC is being used, system performance can be adversely affected. This is caused by the overhead of logging or journaling that a write request does when active in a logical track group (LTG) (32 4K-byte pages or 128K bytes). This overhead is for mirrored writes only. It is necessary to guarantee data consistency between mirrors, particularly if the system crashes before the write to all mirrors has been completed.

Mirror write consistency can be turned off for an entire logical volume. It can also be inhibited on a request basis by turning on the NO_MWC flag as defined in the /usr/include/sys/lvdd.h file.

1.3.5.3 The Physical Layer

The physical layer of the LVDD handles startup and termination of the physical request. The physical layer calls a physical disk device driver's ddstrategy entry point with a list of buf structures linked together. In turn, the physical layer is called by the iodone kernel service when each physical request is completed.

make use of the hardware function, otherwise, the LVDD will handle it totally. These details are hidden from the other two layers.

The only variation to this configuration would be if HAGEO/GeoRM (the IBM products that support geographically remote mirroring) or VSD (Virtual Shared Disk) are installed.

With the HAGEO/GeoRM products, a new logical device, the GeoMirror Device (GMD), is added between the application and the Logical Volume Device Driver. This pseudo-device has a local and remote comment and controls the replication of data between sites.

The VSD software is used to make AIX logical volumes appear to be located on more than one node in an SP Cluster. However, only the logical volume can be accessed using the VSD software, not the mounted file systems. Further information can be found in the product documentation SP

Administration Guide, GC23-3897.