• No results found

The following listing shows the ARM startup code used in the LegUp ARM hybrid system.

1 @==================================================================

2 @==================================================================

3 @

4 @ legup_arm.s 5 @

6 @ This file contains initialization and maintenance functions for the ARM Cortex 7 @ A9 MP Core processor found in the Altera Cyclone V

8 @

9 @ It includes:

10 @ * __legup_init_arm, the entry point for LegUp bare metal code to be run on ARM 11 @ * Functions for setting up the caches on the ARM Cortex A9 MP Core

12 @ * Cache maintenance functions

13 @ * Functions for setting up the MMU, page tables, etc.

14 @ * A function for enabling the fpga2sdram bridge 15 @

16 @ Code paritially adapted from:

17 @ Cortex-A9 Embedded example - startup.s

18 @ <ACDS install directory>\embedded\ds-5\examples\Bare-metal_examples\DS-5Examples\startup_Cortex\

19 @ Cortex-A9 MP-Core Embedded example - startup.s

20 @ <ACDS install directory>\embedded\ds-5\examples\Bare-metal_examples\DS-5Examples\startup_Cortex-A9MPCore\

21 @ Hardware Library for Altera SoCs - alt_cache.c

22 @ <ACDS install directory>\embedded\ip\altera\hps\altera_hps\hwlib\src\hwmgr\

37 @ This is the memory-mapped address of the L2 cache controller (L2C-310) on the 38 @ Altera Cyclone V SoC

39 .equ L2CC_PL310, 0xfffef000 40

41 @ This is the memory-mapped address of the SDRAM controller configuration

87

42 @ on the Altera Cyclone V SoC 43 .equ SDRAM_CONTROL, 0xffc25000 44

45 @ This is the start address of the 64KB On-Chip RAM.

46 .equ OCRAM_START, 0xffff0000

53 @ This section includes the following data structures:

54 @ * Level 1 Translation Table 55 @ * String "DONE\n"

62 .align 14 @ L1 Translation table must be aligned to 16kB boundary 63 .skip 0x00004000 @ L1 Translation table takes up 16kB (4k entries X 4bytes

each)

74 .asciz "cycles: %d\nWARNING: cycle count may include printf()\n"

75 .size count_string, 12

83 @ This code runs immediately after the preloader. It sets up the MMU with a 84 @ flat translation of the first 1 GB of the address space, and translates the 85 @ 2nd and 3rd GB to the first GB as well. Esentially any memory access with 86 @ address[31:30] = 00 or 01 or 10 will access the first GB of memory. Memory 87 @ accesses to the upper GB will go to the peripherals and FPGA slaves, as usual.

88 @

89 @ MPU Address Space | SDRAM Address Space | Memory type

90 @ ========================================================================

91 @ 0x00000000..0x3FFFFFFF | 0x00000000..0x3FFFFFFF | Cacheable normal memory 92 @ 0x40000000..0x7FFFFFFF | 0x00000000..0x3FFFFFFF | Cacheable normal memory 93 @ 0x80000000..0xBFFFFFFF | 0x00000000..0x3FFFFFFF | Cacheable normal memory 94 @ 0xC0000000..0xFFFFFFFF | 0xC0000000..0xFFFFFFFF | Strongly-ordered memory 95 @

96 @ The .startup section has several functions:

97 @ 1) Enable the MMU, set up page tables, enable caches, etc.

98 @ 2) Branch and link to the beginning of the .text section, indicated by 99 @ __text_start

100 @ 3) Loops forever, should the .text section return here.

101 @

102 @==================================================================

103 @==================================================================

104

105 .section .startup

Appendix A. ARM Startup Code 89

106

107 .align

108 .global __legup_init_arm

109 .type __legup_init_arm #function

110 @ The linker script sets __legup_init_arm as the program entry point 111 __legup_init_arm:

112 @ enable fpga2sdram bridge and MMU, set up page tables, enable caches...

113 LDR sp, =__startup_stack_top

122 @ __text_start is defined in the ARM linker script, and is the address of the 123 @ beginning of the .text section in this file

124 ldr r5, =__text_start

131 @ Enable the FPGA SDRAM Bridge 132 @

133 @ To enable to FPGA to SDRAM bridge it is necessary to configure the SDRAM 134 @ controller. To do this, we should run code from the on-chip ram so the 135 @ SDRAM controller is idle. We then put the fpga2sdram peripheral in reset, 136 @ configure the fpga2sdram peripheral, then bring it out of reset.

137 @ @---144 @ copy code to on-chip ram

145 @---164 @ This code gets copied to the On-Chip RAM at 0xFFFF0000

165 @---166 code_start:

167 MOV r1, #0

168

169 @ delay to make sure the SDRAM controller is idle 170 delay_loop:

171 ADD r1, r1, #1 172 CMP r1, #0x1000 173 BNE delay_loop 174

175 @ put fpga2sdram peripheral in reset

176 MOV r0, #0xFF000000

177 ORR r0, r0, #0x00C20000 178 ORR r0, r0, #0x00005000

179 MOV r1, #0

180 STR r1, [r0,#0x80] @ FPGAPORTRST 181

182 @ apply the configuration

183 LDR r1, [r0,#0x5C] @ read STATICCFG reg 184 ORR r1, r1, #(1 << 3) @ set APPLYCFG bit 185 STR r1, [r0,#0x5C] @ write STATICCFG reg 186

187 @ take fpga2sdram peripheral out of reset

188 MOV r1, #0x3F00

189 ORR r1, r1, #0x00FF

190 STR r1, [r0,#0x80] @ FPGAPORTRST 191

192 BX lr

193 @---194 @ end code copied to On-Chip RAM

195 @---196

197

198 @==================================================================

199 @==================================================================

200 @ Enable the Memory Management Unit

201 @================================================================== @---211 @ Disable L2CC in case it was left enabled from an earlier run 212 @ This does not need to be done from a cold reset

213 @---214

215 LDR r0, =L2CC_PL310 @VA = PA 216 @ disable L2 via control register = 0x0

217 LDR r1, =0x0

218 STR r1, [r0,#0x100] @ control register at offset 0x100 219

220

@---221 @ Disable caches, MMU and branch prediction in case they were left enabled from an earlier run

222 @ This does not need to be done from a cold reset

223 @---224

225 MRC p15, 0, r0, c1, c0, 0 @ Read CP15 System Control register 226 BIC r0, r0, #(0x1 << 12) @ Clear I bit 12 to disable I Cache 227 BIC r0, r0, #(0x1 << 2) @ Clear C bit 2 to disable D Cache 228 BIC r0, r0, #0x1 @ Clear M bit 0 to disable MMU 229 BIC r0, r0, #(0x1 << 11) @ Clear Z bit 11 to disable branch

prediction

230 MCR p15, 0, r0, c1, c0, 0 @ Write value back to CP15 System Control register

231 232

Appendix A. ARM Startup Code 91

233 @---234 @ Invalidate Data and Instruction TLBs and branch predictor

235 @---236

237 MOV r0,#0

238 MCR p15, 0, r0, c8, c7, 0 @ I-TLB and D-TLB invalidation 239 MCR p15, 0, r0, c7, c5, 6 @ BPIALL - Invalidate entire branch

predictor array 240

241

242 @---243 @ Cache Invalidation code for Cortex-A9

244 @---245

246 @ Invalidate L1 Instruction Cache 247

248 MRC p15, 1, r0, c0, c0, 1 @ Read Cache Level ID Register (CLIDR)

249 TST r0, #0x3 @ Harvard Cache?

250 MOV r0, #0 @ SBZ

251 MCRNE p15, 0, r0, c7, c5, 0 @ ICIALLU - Invalidate instruction cache and flush branch target cache

252

253 @ Invalidate Data/Unified Caches 254

255 MRC p15, 1, r0, c0, c0, 1 @ Read CLIDR

256 ANDS r3, r0, #0x07000000 @ Extract coherency level 257 MOV r3, r3, LSR #23 @ Total cache levels << 1

258 BEQ Finished @ If 0, no need to clean

259

260 MOV r10, #0 @ R10 holds current cache level << 1 261 Loop1:

262 ADD r2, r10, r10, LSR #1 @ R2 holds cache "Set" position

263 MOV r1, r0, LSR r2 @ Bottom 3 bits are the Cache-type for this level

264 AND r1, r1, #7 @ Isolate those lower 3 bits

265 CMP r1, #2

266 BLT Skip @ No cache or only instruction cache at this level

267

268 MCR p15, 2, r10, c0, c0, 0 @ Write the Cache Size selection register

269 ISB @ ISB to sync the change to the CacheSizeID

reg

270 MRC p15, 1, r1, c0, c0, 0 @ Reads current Cache Size ID register

271 AND r2, r1, #7 @ Extract the line length field

272 ADD r2, r2, #4 @ Add 4 for the line length offset (log2 16

280 MOV r9, r4 @ R9 working copy of the max way size (right aligned)

281

282 Loop3:

283 ORR r11, r10, r9, LSL r5 @ Factor in the Way number and cache number into R11

284 ORR r11, r11, r7, LSL r2 @ Factor in the Set number 285 MCR p15, 0, r11, c7, c6, 2 @ Invalidate by Set/Way

286 SUBS r9, r9, #1 @ Decrement the Way number

287 BGE Loop3

288 SUBS r7, r7, #1 @ Decrement the Set number

289 BGE Loop2

290 Skip:

291 ADD r10, r10, #2 @ increment the cache number

292 CMP r3, r10 @---298 @ Clear Branch Prediction Array

299

@---300 MOV r0, #0

301 MCR p15, 0, r0, c7, c5, 6 @ BPIALL - Invalidate entire branch predictor array

302 303

304 @---=

305 @ Cortex-A9 MMU Configuration 306 @ Set translation table base

307 @---=

308 309

310 @ Cortex-A9 supports two translation tables

311 @ Configure translation table base (TTB) control register cp15,c2 312 @ to a value of all zeros, indicates we are using TTB register 0.

313 @ see section B4.1.153 of the ARM ARM 314

315 MOV r0,#0x0

316 MCR p15, 0, r0, c2, c0, 2 317

318 @ write the address of our page table base to TTB register 0 319 @ see section B4.1.154 of the ARM ARM

320 LDR r0, =L1_TTB

321 MOV r1, #0x08 @ RGN=b01 (outer cacheable write-back cached, write allocate)

322 ORR r1,r1,#0x02 @ memory is shareable

323 ORR r1,r1,#0x40 @ IRGN=b01 (inner cacheability for the translation table walk is Write-back Write-allocate)

332 @ Generate the page tables

333 @ Build a flat translation table for the whole address space.

334 @ ie: Create 4096 1MB sections from 0x000xxxxx to 0xFFFxxxxx 335 @

339 @ Bits[31:20] - Top 12 bits of VA is pointer into table

340 @ nG[17]=0 - Non global, enables matching against ASID in the TLB when set.

341 @ S[16]=1 - Indicates normal memory is shared when set.

342 @ AP2[15]=0

343 @ AP[11:10]=11 - Configure for full read/write access in all modes 344 @ TEX[14:12]=

345 @ CB[3:2]=

346 @

Appendix A. ARM Startup Code 93

347 @ IMPP[9]=0 - Ignored

348 @ Domain[5:8]=0 - Set all pages to use domain 0 349 @ XN[4]=0 - Execute never disabled

350 @ Bits[1:0]=10 - Indicate entry is a 1MB section

351 @---=

352

353 @ templates for Strongly-ordered and normal memory regions 354 @ The suffix denotes _TEX[2:0]_CB

355 @

356 @ Refer to section B3.8.2 of ARMv7-A Architecture Reference Manual 357 @ And Figure B3-4 in section B3.5.1

358 @

359 .equ L1_STRONGLY_ORDERED, 0x00000c02 @ strongly-ordered memory 360 .equ L1_DEVICE, 0x00000c02 @ device memory

361

362 .equ L1_NORMAL_000_10, 0x00010c0a @ outer and inner through, no write-allocate

363 .equ L1_NORMAL_000_11, 0x00010c0e @ outer and inner write-back, no write-allocate 364 .equ L1_NORMAL_001_11, 0x00011c0e @ outer and inner write-back, write-allocate 365 .equ L1_NORMAL_101_01, 0x00015c06 @ outer and inner write-back, write-allocate 366 .equ L1_NORMAL_110_10, 0x00016c0a @ outer and inner through, no

write-allocate

367 .equ L1_NORMAL_101_10, 0x00015c0a @ outer and inner through, no write-allocate

368 .equ L1_NORMAL_111_11, 0x00017c0e @ outer and inner write-back, no write-allocate 369 .equ L1_NORMAL_111_01, 0x00017c06 @ outer and inner back, outer no

write-allocate, inner write allocate

370 .equ L1_NORMAL_101_11, 0x00015c0e @ outer and inner back, outer write-allocate, inner no write-allocate

371 .equ L1_NORMAL_110_11, 0x00016c0e @ outer write-through, inner write-back, no write-allocate

372 .equ L1_NORMAL_111_10, 0x00017c0a @ outer write-back, no write-allocate

373 @ inner write-through, no write-allocate

374

375 @ Set up the page table/translation table such that the first three GB of the 376 @ MPU address space all point to the first GB of the SDRAM address space, since 377 @ the DE1-SoC and SoCKit only have 1 GB of DDR3 memory on the HPS side.

378 @ This should be generalized later to work for boards with other quantities of 379 @ memory.

380 @

381 @ This address mapping is created:

382 @ MPU Address Space | SDRAM Address Space | Memory type | loop 383 @

=====================================================================================

384 @ 0x00000000..0x3FFFFFFF | 0x00000000..0x3FFFFFFF | Cacheable normal memory | init_ttb_1

385 @ 0x40000000..0x7FFFFFFF | 0x00000000..0x3FFFFFFF | Cacheable normal memory | init_ttb_2

386 @ 0x80000000..0xBFFFFFFF | 0x00000000..0x3FFFFFFF | Cacheable normal memory | init_ttb_3

387 @ 0xC0000000..0xC7FFFFFF | 0xC0000000..0xC7FFFFFF | write-through cacheable | init_ttb_fpga_mem

388 @ 0xC8000000..0xFFFFFFFF | 0xC8000000..0xFFFFFFFF | Strongly-ordered memory | init_ttb_SO_mem

389

390 LDR r0, =L1_TTB

391 LDR r1, =0xfff @ loop counter

392 @ r0 contains the address of the translation table base 393 @ r1 is loop counter

394 @ r2 will be the level1 descriptor (bits 19:0 of each table entry) 395

396 @ use loop counter to create 4096 individual table entries.

397 @ this writes from address TTB_Base +

398 @ offset 0x3FFC down to offset 0x0 in word steps (4 bytes)

399

400 @ compare LMA and VMA.

401 LDR r2, =__text_lma

408 @ Setup flat-mapped translation table

409 @---=

410

411 @ set the normal memory type

412 .equ NORMAL_MEM_TYPE, L1_NORMAL_101_01 413

414 @ if LMA and VMA are the same, do a flat mapping 415 init_ttb_flat_map:

416 CMP r1, #0xC00

417 LDRPL r2, =L1_STRONGLY_ORDERED @ set upper 1/4 of memory to STRONGLY_ORDERED memory

418 LDRMI r2, =NORMAL_MEM_TYPE @ set lower 3/4 of memory to normal mode 419 ORR r3, r2, r1, LSL#20 @ R3 now contains full level1 descriptor to

write

420 STR r3, [r0, r1, LSL#2] @ Str table entry at TTB base + loopcount*4

421 SUBS r1, r1, #1 @ Decrement loop counter

422 BPL init_ttb_flat_map

428 @ Setup shifted translation table

429 @---=

430

431 @ if LMA and VMA differ, subtract __vma_offset for all sections >= __text_vma 432 init_ttb_shifted_map:

438 @ STRONGLY_ORDERED memory is non-cacheable, and flat-mapped 439 init_ttb_SO_mem:

440 CMP r1, #0xC80

441 BMI init_ttb_FPGA_mem @ if less than 0xC8000000 go to next 1/4 442 LDR r2, =L1_STRONGLY_ORDERED @ set upper 1/4 of memory to

STRONGLY_ORDERED memory

443 ORR r3, r2, r1, LSL#20 @ R3 now contains full level1 descriptor to write

444 STR r3, [r0, r1, LSL#2] @ Str table entry at TTB base + loopcount*4

445 SUBS r1, r1, #1 @ Decrement loop counter

446 B init_ttb_SO_mem 447

448 init_ttb_FPGA_mem:

449 CMP r1, #0xC00

450 BMI init_ttb_shifted_mem @ if less than 0xC0000000 go to next 1/4 451 LDR r2, =L1_NORMAL_110_10 @ write-through no write-allocate cacheable 452 ORR r3, r2, r1, LSL#20 @ R3 now contains full level1 descriptor to

write

453 STR r3, [r0, r1, LSL#2] @ Str table entry at TTB base + loopcount*4

454 SUBS r1, r1, #1 @ Decrement loop counter

455 B init_ttb_FPGA_mem 456

457 @ shifted memory is cacheable and mapped as follows: LMA = VMA - __vma_offset 458 init_ttb_shifted_mem:

Appendix A. ARM Startup Code 95

459 CMP r1, r5

460 BMI init_ttb_startup @ if less than __text_vma start flat mapping for startup code

461 LDR r2, =NORMAL_MEM_TYPE

462 SUBS r3, r1, r4 @ subtract __vma_offset for this page 463 ORR r3, r2, r3, LSL#20 @ R3 now contains full level1 descriptor to

write

464 STR r3, [r0, r1, LSL#2] @ Str table entry at TTB base + loopcount*4

465 SUBS r1, r1, #1 @ Decrement loop counter

466 B init_ttb_shifted_mem 467

468 @ startup memory is cacheable, and flat-mapped 469 init_ttb_startup:

470 LDR r2, =NORMAL_MEM_TYPE

471 ORR r3, r2, r1, LSL#20 @ R3 now contains full level1 descriptor to write

472 STR r3, [r0, r1, LSL#2] @ Str table entry at TTB base + loopcount*4

473 SUBS r1, r1, #1 @ Decrement loop counter

474 BPL init_ttb_startup 475

476 @---=

477 @ Setup domain control register - Set all domains to master mode 478 @---=

479

480 setup_domain_control_reg:

481 MRC p15, 0, r0, c3, c0, 0 @ Read Domain Access Control Register 482 LDR r0, =0xFFFFFFFF @ Initialize every domain entry to 0b11 (

master)

483 MCR p15, 0, r0, c3, c0, 0 @ Write Domain Access Control Register 484

485

486 @---=

487 @ Enable MMU

488 @ Leave the caches disabled

489 @---=

490

491 _enable_mmu:

492 MRC p15, 0, r0, c1, c0, 0 @ Read CP15 System Control register 493 BIC r0, r0, #(0x1 << 12) @ Clear I bit 12 to disable I Cache 494 BIC r0, r0, #(0x1 << 2) @ Clear C bit 2 to disable D Cache

495 BIC r0, r0, #0x2 @ Clear A bit 1 to disable strict alignment fault checking

496 ORR r0, r0, #0x1 @ Set M bit 0 to enable MMU before scatter loading

497 MCR p15, 0, r0, c1, c0, 0 @ Write CP15 System Control register 498

499 @ Now the MMU is enabled, virtual to physical address translations will occur. This will affect the next

500 @ instruction fetch.

501 @

502 @ The two instructions currently in the ARM pipeline will have been fetched before the MMU was enabled.

503 @ The branch back to main is safe because the Virtual Address (VA) is the same as the Physical Address (PA)

504 @ (flat mapping) of this code that enables the MMU and performs the branch 505

511 @ Initialization for L2 Cache Controller

512 @==================================================================

513 @==================================================================

514 .align

515 .global initialize_L2C

@---521 @ In this example PL310 PA = VA. The memory was marked as Strongly-ordered memory

522 @ in previous stages when defining CORE0 private address space 523 LDR r0, =L2CC_PL310

524

525 @ Disable L2 Cache controller just in case it is already on

526 LDR r1, =0x0

532 ORR r1, r1, #(1 << 29) @ Instruction prefetch enable 533 ORR r1, r1, #(1 << 28) @ Data prefetch enable

534 STR r1, [r0,#0x104] @ auxilary control reg at offset 0x104 535

536 @ Set tag RAM latency

537 @ 1 cycle RAM write access latency 538 @ 1 cycle RAM read access latency 539 @ 1 cycle RAM setup latency 540 LDR r1, =0x00000000

541 STR r1, [r0,#0x108] @ tag ram control reg at offset 0x108 542

543 @ Set Data RAM latency

544 @ 1 cycle RAM write access latency 545 @ 2 cycles RAM read access latency 546 @ 1 cycle RAM setup latency

547 LDR r1, =0x00000000

548 STR r1, [r0,#0x10C] @ data ram control reg at offset 0x108 549

550 @Cache maintenance - invalidate by way (0xff) - base offset 0x77C 551 LDR r1, =0xFF

552 STR r1, [r0,#0x77C] @ invalidate by way register at offset 0x77C 553 poll_invalidate:

554 LDR r1, [r0,#0x77C] @ invalidate by way register at offset 0x77C

555 TST r1, #1

556 BNE poll_invalidate 557

558 @ Ensure L2 remains disabled for the time being

559 LDR r1, =0x0

575 MRC p15, 4, r0, c15, c0, 0 @ Read periph base address 576 @ SCU offset from base of private peripheral space = 0x000 577

578 @ Set up address filtering

Appendix A. ARM Startup Code 97

579 @ From the ARM Cortex A9 MPCore TRM:

580 @When Address Filtering is enabled, SCU Control Register bit [1] = 1, any access that fits in the

581 @address range between the Filtering Start Address and the Filtering End Address is issued on the

582 @AXI Master port M1. All other accesses outside of this range are directed onto AXI Master port

583 @M0.

584 @ We set up address filtering here so deadlocks do not occur.

585 @ When the CPU reads results back from a LegUp accelerator, the accelerator 586 @ asserts waitrequest until it is finished. If the accelerator tries to 587 @ read global memory through the ACP, it is possible that the read may be 588 @ issued to the same port, causing a deadlock. By enabling address filtering 589 @ we can ensure the CPU->Accelerator read and Accelerator->Memory read are 590 @ issued to different SCU ports.

591 @ See section 2.2.5, 2.2.6, 2.3.5 in ARM Cortex A9 MPCore TRM 592 MOV r1, #0xC0000000 @ FPGA Slaves start

593 STR r1, [r0, #0x40] @ Write Address Filering Start Register 594 MOV r1, #0xFC000000 @ Peripherals Start

595 STR r1, [r0, #0x44] @ Write Address Filering End Register 596

597 @ Turn on SCU with Address Filtering

598 @ See section 2.2.1 in ARM Cortex A9 MPCore TRM

599 @ SCU Control Register offset from base of private peripheral space = 0x00 600 LDR r1, [r0, #0x0] @ Read the SCU Control Register

601 ORR r1, r1, #0x2 @ Set bit 1 (Address Filtering Enable bit) 602 ORR r1, r1, #0x1 @ Set bit 0 (The Enable bit)

603 STR r1, [r0, #0x0] @ Write back modifed value 604

611 @ See section 4.3.10 of the ARM Cortex A9 TRM

612 @==================================================================

618 MRC p15, 0, r0, c1, c0, 1 @ Read Auxiliary Control Register 619 ORR r0, r0, #(0x1 << 6) @ Set bit 6 to enable SMP

620 MCR p15, 0, r0, c1, c0, 1 @ Write Auxiliary Control Register 621

627 @ Enable caches and branch prediction

628 @==================================================================

634 MRC p15, 0, r0, c1, c0, 0 @ Read System Control Register

635 BIC r0, r0, #(0x1 << 28) @ clear TRE bit 28 to disable TEX remap ( cleared by default)

636 ORR r0, r0, #(0x1 << 12) @ Set I bit 12 to enable I Cache 637 ORR r0, r0, #(0x1 << 2) @ Set C bit 2 to enable D Cache

638 ORR r0, r0, #(0x1 << 11) @ Set Z bit 11 to enable branch prediction 639 MCR p15, 0, r0, c1, c0, 0 @ Write System Control Register

640

651 @ Configure the Generic Interrupt Controller (GIC) 652 @

653 @ Taken from Using the ARM Generic Interrupt Controller

654 @ ftp://ftp.altera.com/up/pub/Altera_Material/14.0/Tutorials/Using_GIC.pdf

662 /* CONFIG_INTERRUPT (int_ID (R0), CPU_target (R1)); */

663 MOV R0, #72 // accelerator Interrupt ID = 72 (0 in qsys) 664 MOV R1, #1 // this field is a bit-mask; bit 0 targets cpu0 665 BL CONFIG_INTERRUPT

666

667 /* Configure the GIC CPU Interface */

668 LDR R0, =0xFFFEC100 // base address of CPU Interface 669 /* Set the Interrupt Priority Mask Register (ICCPMR) */

670 LDR R1, =0xFFFF // enable interrupts of all priorities levels 671 STR R1, [R0, #0x04]

672

673 /* Set the enable bit in the CPU Interface Control Register (ICCICR) */

674 MOV R1, #1 675 STR R1, [R0]

676

677 /* Set the enable bit in the Distributor Control Register (ICDDCR) */

678 LDR R0, =0xFFFED000

684 @ Configure Set Enable Registers (ICDISERn) and Interrupt Processor Target Registers (ICDIPTRn).

685 @ The default (reset) values are used for other registers in the GIC.

686 @ Arguments: R0 holds the Interrupt ID (N), and R1 holds the CPU target 687 @

688 @ Taken from Using the ARM Generic Interrupt Controller

689 @ ftp://ftp.altera.com/up/pub/Altera_Material/14.0/Tutorials/Using_GIC.pdf

697 /* Configure Interrupt Set-Enable Registers (ICDISERn).

698 * reg_offset = (integer_div(N / 32) * 4; value = 1 << (N mod 32) */

699 LSR R4, R0, #3 // calculate reg_offset 700 BIC R4, R4, #3 // R4 = reg_offset 701 LDR R2, =0xFFFED100

702 ADD R4, R2, R4 // R4 = address of ICDISER 703

Appendix A. ARM Startup Code 99

704 AND R2, R0, #0x1F // N mod 32 705 MOV R5, #1 // enable

706 LSL R2, R5, R2 // R2 = value

707 /* using address in R4 and value in R2 set the correct bit in the GIC register

*/

708 LDR R3, [R4] // read current register value 709 ORR R3, R3, R2 // set the enable bit

710 STR R3, [R4] // store the new register value 711

712 /* Configure Interrupt Processor Targets Register (ICDIPTRn).

713 * reg_offset = integer_div(N / 4) * 4; index = N mod 4 */

719 /* using address in R4 and value in R2, write to (only) the appropriate byte */

720 STRB R1, [R4]

729 @ This section includes the following functions:

730 @ * enable_L1_D_side_prefetch()

749 @ This code will be run from the virtual address space. It has several 750 @ responsibilities:

751 @ * Set the stack pointer

752 @ * Initialize the bss section to zero 753 @ * Branch to main

754 @ * Print "DONE"

755 @ * Loop forever 756 @

757 @ The symbol __text_start, set by the linker script, should point to this code 758 @==================================================================

767 @ set the stack pointer

768 LDR sp, =__stack_top

783 @ branch and link to main 784 __call_main:

793 @ suspend (Wait For Interrupt) 794 __suspend:

795 wfi

796

797 @ loop forever, just in case WFI doesn’t work 798 __end_loop:

811 MRC p15, 0, r0, c1, c0, 0 @ Read System Control Register 812 ORR r0, r0, #(0x1 << 2) @ Set C bit 2 to enable D Cache 813 MCR p15, 0, r0, c1, c0, 0 @ Write System Control Register 814

827 MRC p15, 0, r0, c1, c0, 0 @ Read System Control Register 828 ORR r0, r0, #(0x1 << 12) @ Set I bit 12 to enable I Cache 829 MCR p15, 0, r0, c1, c0, 0 @ Write System Control Register 830

831 BX lr

832

Appendix A. ARM Startup Code 101

843 MRC p15, 0, r0, c1, c0, 0 @ Read System Control Register

844 ORR r0, r0, #(0x1 << 11) @ Set Z bit 11 to enable branch prediction 845 MCR p15, 0, r0, c1, c0, 0 @ Write System Control Register

846

868 @ Enable Store buffer device limitation

869 @==================================================================

880 LDR r1, [r0,#0x104] @ auxilary control reg at offset 0x104

881 ORR r1, r1, #(0x1 << 11) @ set bit 11 in aux. control reg to enable store buffer device limitation

882 STR r1, [r0,#0x104] @ auxilary control reg at offset 0x104 883

892 @ Clean and Invalidate L2 Cache 893 @

894 @ Clean & Invalidate L2 895 @ SYNC

896 @

897 @==================================================================

906 LDR r0, =L2CC_PL310 @ L2C Controller base address 907

908 LDR r1, =0xFFFF @ 16 bits of 1’s to invalidate all ways 909 STR r1, [r0,#0x7fC] @ clean and invalidate by way register 910

911 @==================================================================

912 @ cache SYNC: write a 0, then wait for a 0 913

914 LDR r1, =0x0

915 STR r1, [r0,#0x730] @ write a 0 to sync register 916 sync_loop:

917 DSB sy @ full SYstem dsb - See A8.8.43 of ARM ARM 918 LDR r1, [r0,#0x730] @ read sync register

919 CMP r1, #1

920 BEQ sync_loop @ loop until sync register is 0 921

941 LDR r0, =L2CC_PL310 @ L2C Controller base address 942

943 LDR r1, =0xFFFF @ 16 bits of 1’s to invalidate all ways 944 STR r1, [r0,#0x7bC] @ clean by way register

945

946 @==================================================================

947 @ cache SYNC: write a 0, then wait for a 0 948

949 LDR r1, =0x0

950 STR r1, [r0,#0x730] @ write a 0 to sync register 951 clean_sync_loop:

952 DSB sy @ full SYstem dsb - See A8.8.43 of ARM ARM 953 LDR r1, [r0,#0x730] @ read sync register

954 CMP r1, #1

955 BEQ clean_sync_loop @ loop until sync register is 0 956

Appendix A. ARM Startup Code 103

974 LDR r1, [r0,#0x104] @ auxilary control reg at offset 0x104

975 ORR r1, r1, #(0x1 << 30) @ set bit 30 in aux. control reg to enable early BRESP

976 STR r1, [r0,#0x104] @ auxilary control reg at offset 0x104 977

999 LDR r1, [r0,#0x104] @ auxilary control reg at offset 0x104

1000 ORR r1, r1, #(0x1 << 30) @ set bit 0 in aux. control reg to enable write full line of 0’s

1001 STR r1, [r0,#0x104] @ auxilary control reg at offset 0x104 1002

1003 @ enable L2C-310 1004 LDR r1, =0x1

1005 STR r1, [r0,#0x100]

1006

1007 MRC p15, 0, r0, c1, c0, 1 @ Read processor’s Auxiliary Control Register

1008 ORR r0, r0, #(0x1 << 3) @ enable write full line of zeros 1009 MCR p15, 0, r0, c1, c0, 1 @ Write processor’s Auxiliary Control

Register

1015 @ Enable L1 D-side prefetch (A9 specific)

1016 @==================================================================

1022 MRC p15, 0, r0, c1, c0, 1 @ Read Auxiliary Control Register

1023 ORR r0, r0, #(0x1 << 2) @ Set DP bit 2 to enable L1 Dside prefetch 1024 MCR p15, 0, r0, c1, c0, 1 @ Write Auxiliary Control Register

1025

1026 BX lr

1027

1028 @==================================================================

1029 @==================================================================

1030 @ Enable L2 prefetch hint

1031 @ Note: The Preload Engine can also be programmed to improve L2 hit 1032 @ rates. This may be too much work though.

1033 @==================================================================

1039 MRC p15, 0, r0, c1, c0, 1 @ Read Auxiliary Control Register 1040 ORR r0, r0, #(0x1 << 1) @ Set bit 1 to enable L2 prefetch hint 1041 MCR p15, 0, r0, c1, c0, 1 @ Write Auxiliary Control Register 1042

1059 LDR r1, [r0,#0x104] @ auxilary control reg at offset 0x104

1060 ORR r1, r1, #(0x1 << 12) @ set bit 12 in aux. control reg to enable

1060 ORR r1, r1, #(0x1 << 12) @ set bit 12 in aux. control reg to enable