TWO-BYTE OPCODE FOR VVADDF : - dtj v02 04 1990 pdf

:7 •-

OPERAND SPECIFIER FOR IMMEDIATE MODE (FOR CONTROL WORD)

:8

CONTROL WORD <7:0>: V3 IS DESTINATION AND V2 IS A SOURCE

CONTROL WORD < 1 5:8>: V1 IS A SOURCE, MASKED OPERATIONS ARE ENABLED, AND MATCH =

:: J TWO-BYTE OPCODE FOR VSMULF

:C

OPERAND SPECIFIER FOR IMMEDIATE MODE (FOR CONTROL WORD)

:D

CONTROL WORD <7 0>: V5 IS DESTINATION AND V4 IS A SOURCE

:E

CONTROL WORD <1 5:8>: VA IS IGNORED. UNDERFLOW EXCEPTION CHECKING IS ENABLED

:F

OPERAND SPECIFIER FOR REGISTER MODE WITH SCALAR DATA IN R4

Figure 2 Vector Instruction Encoding

instruction, it proceeds to process other instruc tions and does not wait for the vector instruction to complete. An execution model is shown in Figure 3 .

When the scalar processor attempts t o issue a vector instruction, it checks to see if the vector pro cessor is disabled - that is, whether it will accept further vector instructions. If the vector processor is disabled, then the scalar processor takes a "vec tor processor disabled" fault. An operating system handler is then invoked on the scalar processor to examine the various error-reporting registers on the vector processor to determine the disabling con dition. The vector processor disables itself to report the occurrence of vector arithmetic exceptions or hardware errors. The operating system disables the vector processor, usually to indicate the unavaila bility of the vector processor, by writing to a privi leged vector register. If the disabling condition can be corrected, the handler enables the vector proces sor and directs the scalar processor to reissue the faulted vector instruction.

Within the constraint of maintaining the proper ordering among the operations of data-dependent instructions, the architecture explicitly allows the vector processor to execute any number of the instructions in its queue concurrently and retire them out of order. Thus, a VAX vector implementa tion can chain and overlap instru ctions to the extent best suited for its technology and cost performance. In addition, by making this feature an explicit part of the architecture, software is pro-

PHYSICAL MEMORY 1 6 GB I N STRUCTION STREAM DATA STREAM INSTRUCTIONS DATA VAX SCALAR CPU VECTOR DATA

Vector Processing on the VAX 9000 System

vided with a prograrruning model that ensures correct results regardless of the extent a particular implementation chains or overlaps. This approach differs with respect to some other existing vector architectures, such as the IBM S/370 vector archi

tecture, which give the appearance of sequential instruction execution.6

A VAX vector implementation may have its own memory management hardware, translation buffer, and cache; or it may share those of the scalar pro cessor. In high-end vector implementations, such as

the VAX 9000 system, the vector and scalar proces sors are tightly coupled. The problems of limited chip area and translation buffer and cache coher ency can be lessened by allowing high-speed mem ory management hardware and cache to be shared by both vector and scalar processors. For other implementations, such as the VAX 6000 Model 4 00 system, the vector and scalar processors are not so tightly coupled, and there is a performance advan tage in allowing separate memory management hardware and cache. 1 Little additional effort is nec essary by an operating system to support separate vector memory management hardware and cache.

A vector processor can treat vector memory management exceptions (MME) in a synchronous m a nner, as the VAX 9000 V-box does. Once the

scalar processor issues a vector memory instruc tion, it pauses until the vector processor deter mines whether an MME w i ll be encountered by the instruction. If an MME will occur, then a precise

OPCODE, CONTROL WORD

DISABLE/STATUS

Figure 3 Vector Execution Unit

exception is taken on the scalar processor and the appropriate operating system handler is invoked. If no MME will occur, the scalar processor proceeds to process other instructions and the vector proces sor completes the memory instruction. In the case of referencing a unity-strided vector, which occurs most frequently, the MME checking takes only a short time at the beginning because the vector is contained in two or less pages. (MME checking is done at the page level .)

Context Switching

Because of the asynchronous operation of the vec tor and scalar processors, the vector context state of a process is separate from its scalar comext state. Thus, it is possible for an operating system to swap in a new process to the scalar processor while allowing the vector context of the previous process to remain on the vector processor. When the previ ous process is swapped out, the vector processor is disabled by the operating system to prevent other processes from accessing this vector context.

If the subsequent processes do not use the vec tor processor, then the operating system avoids the overhead of saving and subsequently restoring 8 kilobytes (KB) of vector context state for the orig inal process. If another process does use the vector processor, the operating system must reenable the vector processor, save the vector state of the origi nal process, load the vector context of the new process, and, finally, make the vector processor available. This full context switch can take up to

100 microseconds on the VAX 9000 system.

Assuming that only a few processes require the vector processor, it is l ikely that when the original process is rescheduled to the same scalar/vector pair, the process will find its vector context state residing on the vector processor. By using this tech nique, which is referred to as "cheap vector context switching," both the VMS and ULTRlX operating sys tems reduce the time required to swap in a process that uses the vector processor.

Exceptions

Most of the exceptions encountered by VAX vector instructions are identical to those that occur for

VAX scalar instructions. The arithmetic exceptions are exactly the same. The memory management exceptions have been extended to include two new vector exceptions: vector IIO space reference and vector alignment fault. As in the VAX scalar architec ture, the reporting of floating underflow and integer overflow exceptions can be disabled by setting the

EXC bit in the vector control word.

Vector arithmetic exceptions are reported in an imprecise manner by vector processor disabled faults. When an exception occurs in the processing of a vector element, the vector processor records the exception in both a privileged exception regis ter (the vector arithmetic exception register, VAER)

and in the corresponding element of the destination vector register specified by the instruction. The vec tor processor then disables itself from receiving further vector instructions. However, the vector processor continues to execute the instruction that encountered the exception to completion by pro cessing the remaining vector register elements.

As stated earlier, memory management excep tions can be reported precisely b y a VAX vector processor to its scalar processor, as the VAX 9000

V-box does, and the scalar processor takes a normal VAX memory management fault. Exception infor mation is placed on the stack in the same format as for scalar memory management exceptions. The use of the same format minimizes the effort needed by an operating system to support these exceptions. Memory management exceptions were extended for vectors to include two new exception para meter bits: vector I/O space reference and vector aligrunent fault. A vector I/O space reference occurs whenever an attempt is made to load or store vector data to I/O space. Because of the performance degrada tion of unaligned memory data, a vector alignment fault occurs whenever an element being accessed by a vector memory instmction does not begin at an address that is an integer multiple of the length of the element in bytes. For example, a long word (4-byte) element in memory should begin at an address which is an integer multiple of 4 bytes.

Synchronization

In most cases, it is desirable for the vector processor to operate asynchronously with the scalar proces sor to achieve good performance. However, there are cases in which the operation of the vector and scalar processors must be synchronized to ensure correct results. Rather than forcing the vector pro cessor to detect and automatically provide synchro nization in these cases, the architecture provides special instructions, which software can use, to accomplish the synchronization. Some of these instructions are discussed below. Software must determine when to use these synchronization instructions to ensure correct results or establish exception checkpoints. Given the necessary sophis tication of vectorizing compilers, this requirement is not onerous.

Vector and scalar memory references may be issued simultaneously. Therefore, these references must be synchronized to prevent a conflict from occurring when accessing shared memory loca tions. This synchronization is p rovided by the MSYNC function of the M FVP instruction. Once the MSYNC function is invoked , the scalar processor does not issue further instructions u ntil all p re vious vector and scalar memory references have completed.

Because the vector and scalar processors execute asynchronously, software cannot determine when a vector exception will be reported. However, soft ware requires that exceptions be reported at certain checkpoints. For example, exceptions incurred in a procedure must be reported within the context of that procedure before another procedure is calJed. This exception reporting synchronization is pro vided by the SYNC function of the M FV P instruction. Once SYNC is invoked, the scalar processor does not issue further instructions until the exceptions of previous vector instructions, if any, are reported .

In document dtj v02 04 1990 pdf (Page 66-69)