5 A64 INSTRUCTION SET
5.7 Advanced SIMD
5.7.22 Vector Load-Store Structure
All SIMD load-store structure instructions use the syntax term vaddr as shorthand for the following addressing modes:
[base]
Memory addressed by base register Xn or SP. [base],Xm
Memory addressed by base register Xn or SP, post-incremented by 64-bit index register Xm. [base],#imm
Memory addressed by Xn or SP, post-incremented by an immediate value which must equal the total number of bytes transferred to/from memory.
Register notation of the form Vt+n in the register lists below indicates that the register number is required to be equal to (t + n) MOD 32. Furthemore the list braces “{ }” are concrete symbols, and do not indicate an optional field as elsewhere in this manual.
Like other load-store instructions they permit arbitrary address alignment, unless strict alignment checking is enabled, in which case alignment to the size of the element is checked. However unlike the general-purpose load- store instructions, the vector load-store instructions make no guarantee of atomicity, even when the address is naturally aligned to the size of element.
5.7.22.1 Load-Store Multiple Structures
In all of these instructions <T> is one of 8B, 16B, 4H, 8H, 2S, 4S, 2D and additionally the LD1 and ST1 instructions support the 1D format. The post-increment immediate offset, if present, must be 8, 16, 24, 32, 48 or 64, depending on the number of elements transferred.
LD1 {Vt.<T>}, vaddr
Load multiple 1-element structures (to one register) LD1 {Vt.<T>, Vt+1.<T>}, vaddr
Load multiple 1-element structures (to two consecutive registers) LD1 {Vt.<T>, Vt+1.<T>, Vt+2.<T>}, vaddr
Load multiple 1-element structures (to three consecutive registers) LD1 {Vt.<T>, Vt+1.<T>, Vt+2.<T>, Vt+3.<T>}, vaddr
Load multiple 1-element structures (to four consecutive registers) LD2 {Vt.<T>, Vt+1.<T>}, vaddr
Load multiple 2-element structures (to two consecutive registers) LD2 {Vt.<T>, Vt+2.<T>}, vaddr
Load multiple 2-element structures (to two alternating registers) LD3 {Vt.<T>, Vt+1.<T>, Vt+2.<T>}, vaddr
Load multiple 3-element structures (to three consecutive registers) LD3 {Vt.<T>, Vt+2.<T>, Vt+4.<T>}, vaddr
Load multiple 3-element structures (to three alternating registers) LD4 {Vt.<T>, Vt+1.<T>, Vt+2.<T>, Vt+3.<T>}, vaddr
Load multiple 4-element structures (to four consecutive registers) LD4 {Vt.<T>, Vt+2.<T>, Vt+4.<T>, Vt+6.<T>}, vaddr
ST1 {Vt.<T>}, vaddr
Store multiple 1-element structures (from one register) ST1 {Vt.<T>, Vt+1.<T>}, vaddr
Store multiple 1-element structures (from two consecutive registers) ST1 {Vt.<T>, Vt+1.<T>, Vt+2.<T>}, vaddr
Store multiple 1-element structures (from three consecutive registers) ST1 {Vt.<T>, Vt+1.<T>, Vt+2.<T>, Vt+3.<T>}, vaddr
Store multiple 1-element structures (from four consecutive registers) ST2 {Vt.<T>, Vt+1.<T>}, vaddr
Store multiple 2-element structures (from two consecutive registers) ST2 {Vt.<T>, Vt+2.<T>}, vaddr
Store multiple 2-element structures (from two alternating registers) ST3 {Vt.<T>, Vt+1.<T>, Vt+2.<T>}, vaddr
Store multiple 3-element structures (from three consecutive registers) ST3 {Vt.<T>, Vt+2.<T>, Vt+4.<T>}, vaddr
Store multiple 3-element structures (from three alternating registers) ST4 {Vt.<T>, Vt+1.<T>, Vt+2.<T>, Vt+3.<T>}, vaddr
Store multiple 4-element structures (from four consecutive registers) ST4 {Vt.<T>, Vt+2.<T>, Vt+4.<T>, Vt+6.<T>}, vaddr
Store multiple 4-element structures (from four alternating registers)
5.7.22.2 Load-Store Single Structure
In all of these instructions <T> is one of B, H, S or D, except that type B is not available in conjunction with the alternate register variant. The post-increment immediate offset, if present, must be 1, 2, 3, 4, 6, 8, 12, 16, 24 or 32, depending on the number of elements transferred.
LD1 {Vt.<T>}[index], vaddr
Load single 1-element structure to one lane (of one register) LD2 {Vt.<T>, Vt+1.<T>}[index], vaddr
Load single 2-element structure to one lane (of two consecutive registers) LD2 {Vt.<T>, Vt+2.<T>}[index], vaddr
Load single 2-element structure to one lane (of two alternating registers) LD3 {Vt.<T>, Vt+1.<T>, Vt+2.<T>}[index], vaddr
Load single 3-element structure to one lane (of three consecutive registers) LD3 {Vt.<T>, Vt+2.<T>, Vt+4.<T>}[index], vaddr
Load single 3-element structure to one lane (of three alternating registers) LD4 {Vt.<T>, Vt+1.<T>, Vt+2.<T>, Vt+3.<T>}[index], vaddr
Load single 4-element structure to one lane (of four consecutive registers) LD4 {Vt.<T>, Vt+2.<T>, Vt+4.<T>, Vt+6.<T>}[index], vaddr
Load single 4-element structure to one lane (of four alternating registers) ST1 {Vt.<T>}[index], vaddr
Store single 1-element structure from one lane (of one register) ST2 {Vt.<T>, Vt+1.<T>}[index], vaddr
ST2 {Vt.<T>, Vt+2.<T>}[index], vaddr
Store single 2-element structure from one lane (of two alternating registers) ST3 {Vt.<T>, Vt+1.<T>, Vt+2.<T>}[index], vaddr
Store single 3-element structure from one lane (of three consecutive registers) ST3 {Vt.<T>, Vt+2.<T>, Vt+4.<T>}[index], vaddr
Store single 3-element structure from one lane (of three alternating registers) ST4 {Vt.<T>, Vt+1.<T>, Vt+2.<T>, Vt+3.<T>}[index], vaddr
Store single 4-element structure from one lane (of four consecutive registers) ST4 {Vt.<T>, Vt+2.<T>, Vt+4.<T>, Vt+6.<T>}[index], vaddr
Store single 4-element structure from one lane (of four alternating registers)
5.7.22.3 Load Single Structure and Replicate
In all of these instructions <T> is one of 8B, 16B, 4H, 8H, 2S, 4S, 1D or 2D. The post-increment immediate offset, if present, must be 1, 2, 3, 4, 6, 8, 12, 16, 24 or 32, depending on the number of elements transferred.
LD1R {Vt.<T>}, vaddr
Load single 1-element structure to all lanes (of one register) LD1R {Vt.<T>, Vt+1.<T>}, vaddr
Load single 1-element structure to all lanes (of two consecutive registers) LD2R {Vt.<T>, Vt+1.<T>}, vaddr
Load single 2-element structure to all lanes (of two consecutive registers) LD2R {Vt.<T>, Vt+2.<T>}, vaddr
Load single 2-element structure to all lanes (of two alternating registers) LD3R {Vt.<T>, Vt+1.<T>, Vt+2.<T>}, vaddr
Load single 3-element structure to all lanes (of three consecutive registers) LD3R {Vt.<T>, Vt+2.<T>, Vt+4.<T>}, vaddr
Load single 3-element structure to all lanes (of three alternating registers) LD4R {Vt.<T>, Vt+1.<T>, Vt+2.<T>, Vt+3.<T>}, vaddr
Load single 4-element structure to all lanes (of four consecutive registers) LD4R {Vt.<T>, Vt+2.<T>, Vt+4.<T>, Vt+6.<T>}, vaddr