A first attempt at learning about optimizing the TigerSHARC code
TigerSHARC assembly syntax
What we NOW KNOW!
What we NOW KNOW!
• Can we return from an assembly language routine without
h h ?
crashing the processor?
• Return a parameter from assembly language routine – (Is it same for ints and floats?)
• Pass parameters into assembly language – (Is it same for ints and floats?)
• Do IF THEN ELSE statementsDo IF THEN ELSE statements
• Read and write values to memory
• Read and write values in a loop
• Do some mathematics on the values fetched from memory
• Do some mathematics on the values fetched from memory All this stuff is demonstrated by coding HalfWaveRectifyASM( )
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
3 / 28
Next Sprint stage Next Sprint stage
• Debug mode for C++ function
– Works
• Release mode (optimized) for C++ function
W k – Works
• First attempt at integer ASM function
– WorksWorks
• Next stage – Test for speed
• Then test for difference between integer and floating point speed
Tests for timing test codeg
Not bad for a first effort
F h il i d b d
Faster than compiler in debug mode
Cut and paste float version
10/13/2010
TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary,
Canada
6 / 38
Where did the float ASM code suddenly
f ?
appear from?
• Integer 0 has bit pattern 0x0000 0000 Fl t 0 0 h bit tt 0 0000 0000
• Float 0.0 has bit pattern 0x0000 0000
• Integer +6 has format PLUS FORMAT b 0??? ???? ???? ???? ???? ???? ???? ????
We know more than the compiler in this example FLOATING POINT
• Float +6.0 has format PLUS FORMAT b 0### #### ##??? ???? ???? ???? ???? ????
• Integer ‐6 has format MINUS FORMAT b 1??? ???? ???? ???? ???? ???? ???? ????
Fl t 6 0 h f t MINUS FORMAT
FLOATING POINT EXPONENT SHOWN AS #####
• Float ‐6.0 has format MINUS FORMAT b 1 ### #### ##??? ???? ???? ???? ???? ????
• Format’s are very different, but the sign bit is in the same place
• Float algorithm ‐ if S == 1 (negative) set to zero Otherwise leave unchanged – same as integer algorithm
• Just re‐use integer algorithm with a change of name
10/13/2010
TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary,
Canada
7 / 38
• Just re‐use integer algorithm with a change of name
Interesting observations Interesting observations
• “C” Debug float and integer are about the same in timing
• “C” Release float is much slower than Release integer
• Our float ASM is slightly slower than integer ASMOur float ASM is slightly slower than integer ASM,
• Extra jump (10 cycles ?) split across 160 operations – not very much
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
8 / 28
How does 4.5 OLD compiler do it faster?
k d d d d h
Look at C++ source code and use mixed mode to show
• Warning – out of order instructions displayedWarning out of order instructions displayed
???????
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
9 / 28
???????
How does LATEST 5.0 compiler do it faster?
k d d i d d h
Look at source code and use mixed mode to show
MINOR DIFFERENCES BETWEEN
COMPILER?
???????
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
10 / 28
???????
Many new and parallel instructions.
Ones inside loop are key – the one’s with the Ones inside loop are key the one s with the
biggest bang for the buck for each change
How important is coding if conditional jump (NP or not) is predicted or not?
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
11 / 28
0.0926 uS / Pt 0.0752 uS / pt BIG but data dependent
Many new instructions. Many parallel
i i O i id l k
instruction. Ones inside loop are key
JMP (NP) 0.092 0.075 XR1 not J1 0.075 0.074
How important is not using J registers as destination when reading from memoryg y
XR1 rather than J1 Now need
Condition XALT rather than JLT PASS rather than COMP with 0 PASS rather than COMP with 0
Many new instructions. Many parallel
i i O i id l k
instruction. Ones inside loop are key
JMP (NP) 0.092 0.075 XR1 not J1 0.075 0.074 and ++ operator
and ++ operator
0.074 0.072
How important is not using J registers as a destination when reading from memory, and using pointers (*pt++) rather than array ( y, g p ( p ) y ( pt[count])
XR1 rather than J1
Now need Condition XALT rather than JLT
PASS (MOVE) rather than COMP WITH 0 (MATH)
Redoing our code to this point.
d Note new instructions using XR2 and R2
Try a little thing. R2 = 0 is a constant – move outside loop Data dependant Will make a difference 1 time in 5 with this data
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
14 / 28
Data dependant. Will make a difference 1 time in 5 with this data 0.072 0.0717 ‐‐OPTIMIZATION TECHNIQUE – HOW MUCH TIME SAVED?
The IF THEN JUMPS in the loop are killing pipeline.
R i C d i i i d f
Rewrite C++ code into optimized form
• Reduce loop size from 6 cycles if > 0 and 7Reduce loop size from 6 cycles if > 0 and 7 cycles if < 0 to 4 any way.
The jumps were causing us 9 cycles by disrupting the TigerSHARC pipeline FLOATasm 0.072 uS 0.038 uS
INTEGERasm = 0 038uS too – but release INTEGERasm 0.038uS too but release C++ = 0.019 uS
our ASM still too slow
Need to get rid of this jump and counter increment.
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
15 / 28
Blackfin has hardware loops Does the TigerSHARC – Duh!!
Many new and parallel instructions. Ones
i id l k bi b h
inside loop are key – biggest bang per change
JMP (NP) 0.092 0.075 XR1 not J1 0.075 0.074 and ++ operator
and ++ operator
0.074 0.072 Remove IF then 0.038
Hardware loop instructions
LC0 = loop counter 0 – may only be a few hardware loops possible
SHARC ADSP‐21061 – allows 6, Blackfin ADSP‐BF5XX – allows 2, so need to still
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
16 / 28
understand software loops
IF NLC0E, if NOT hardware loop count 0 expired
Line 124 ‐‐ IF LC0E If hardware loop expired,– MM – why used!!
Insert hardware loop – check code d
passes test ‐‐ my new code
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
17 / 28
Failure indicates Excellent result
Some ideas on making code more
ll l l
parallel in general ‐‐ Step 1
Standard code Rearrange loop
Standard code For
g p
X = read memory
For
X = read memory X1 = use X
For
X1 = use X
X1 = use X X2 = use X1 write memory X2
X2 = use X1 write memory X2 EndFor JUMP to For
write memory X2 EndFor JUMP to For
EndFor JUMP to For with X = read memory done in parallel
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
18 / 28
Some ideas on making code more
ll l l
parallel in general ‐‐ Step 2
Standard code Rearrange loop
Standard code For
g p
X = read memory X1 = use X
For
X = read memory X1 = use X
For
X2 = use X1 with X = read memory
X1 = use X X2 = use X1 write memory X2
read memory EndFor JUMP to For write
memory X2 with
write memory X2 EndFor JUMP to For
y
X1 = use X
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
19 / 28
Rearrange the loop Standard approach Standard approach
0.0319 us 0.0239 us Changed the stalls when reading memory
Need to have a closer look at what
l d b
compiler is doing better
USING NXALE and XALE USING NXALE and XALE
Got worse when we did that Got worse when we did that
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
22 / 28
Need to have a 2
ndcloser look at what
l d b
compiler is doing better
• ALSO USING DIFFERENT ADDRESSINGALSO USING DIFFERENT ADDRESSING MODE
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
23 / 28
That causes a problem when we try it
h d
with our code
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
24 / 28
Still not better. What else is different
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
25 / 28
That improvement was unexpected Perhaps outside loop now a problem Perhaps outside loop now a problem
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
26 / 28
Before we continue with the optimization
• C already works better than ours for int
• Ours works better than C for float
• Even if we found “all possible” optimizations (and we probably can’t) what is the best possible speed for this probably can t), what is the best possible speed for this processor
• Just how fast do we need to go?
• Typical target. The time for all the DSP algorithms added together must be less that the 0 5 times the added together must be less that the 0.5 times the interval between samples
– Why 0.5 times and not 0.95 times
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
27 / 28
What is the theoretical maximum speed?
What is the theoretical maximum speed?
• This is something I always work out BEFORE optimizing.
– I have a target to meet – normally finish all processing before next sample comes in.
– If my code (in theory) can’t meet that target, I need to find a different approach, not spend days optimizing useless code.pp , p y p g
• In theory – if I have written the code with no hidden stalls – 1 cycle per instruction
– 6 instructions outside the loop
d h l * l
– 4 instruction inside the loop – N * 4 cycles
– Very short loop – read that getting out of very short loop stalls the pipeline – lets add 5 cycles for that
– 6 + 24 * 4 + 5 = 107 in theory 138 in practice6 + 24 4 + 5 = 107 in theory, 138 in practice
– Difference 21 – close enough to being 24, or 1 stall per cycle – Can use the pipeline viewer to find out where the problem is
occurring. In a long loop, done 4096 times, might be worth it.
Change tests to “remove time needed
‘ h ’
to ‘time the test’. IMPORTANT VALIATION
“timer overhead removed”
10/13/2010
TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary,
Canada
30 / 38
Now using tests to explore (and
d ) “ b h ”
document) “system behaviour”
Unexpected behaviour Error is 300 larger than Error is 300 larger than measured times
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
31 / 28
What we forgot – averaged timing d l
error over 160 times round loop
2ndmistake
This indicates that not removing the time
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
32 / 28
g causes 12% error
We need to know variability in time measurement
Variability is roughly
( ) /
(max – min ) / 2
• Precision of time measure seems good – muchPrecision of time measure seems good much lower than the changes we are seeing
• Lets save Test file under different name and
• Lets save Test file under different name and try different testing method
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
33 / 28
Unclear what number to expect (in uS)
l d
– activate cycle counter instead
Doing print wrong (try convert to int)
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
34 / 28
Now printing “acceptable = 5”
th
d
4
thError – Did not turn on timer
• Timer seems to be running – error = 0x40 /Timer seems to be running error = 0x40 / 160
–Temporarily moved local array to be global array –Temporarily moved local array to be global array
and then display TigerSHARC memory
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
35 / 28
Look at cycle counter values Break lecture here Break lecture here
Key times for integer Key times for integer
• C++ debug 12022 cycles / function
• C++ release 1814 cycles / function
• First ASMFirst ASM 1352 cycles / function1352 cycles / function
D b li th b ?
• Do we believe these numbers?
– Let me count the cycles (mis‐quote from Sh k
Shakespeare
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
38 / 28
Program flow ‐‐ assumptions Program flow assumptions
• Each simple instruction line = 1 cycleEach simple instruction line = 1 cycle
• Each Jump taken – break pipeline
E t b ti BP l
–Enter subroutine BP cycles –Exit subroutine BP cycles
• Predicted jumps
–Break pipeline first time happens BP cycles –Break pipeline if not taken BP
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
39 / 28
Operations Operations
• Memory reads take extra MR cycles if fetchedMemory reads take extra MR cycles if fetched value used immediately
• Register to register moves take no extra timeRegister to register moves take no extra time over cycle time for instruction
• Possible that math operations take extra timePossible that math operations take extra time if result used immediately
• Can’t do two accesses to same memory bankCan t do two accesses to same memory bank (reads or read / write) in one cycle
• External memory operations take longerExternal memory operations take longer
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
40 / 28
Code outside loop lines 25 to 44 Code outside loop lines 25 to 44
• Enter subroutine BP cycles (9?)y ( )
• 9 instructions + 2 instruction where result used immediately (COMPJ6 then use COMP result) and memory access
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
41 / 28
Code outside loop lines 57 to 59 Code outside loop lines 57 to 59
• 2 instructions2 instructions
• + break pipeline when return BP
• Total cycles outside loop – at least
– BP + 9 + 2 + BP ‐‐ at least 30 cycles
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
42 / 28
Code in loop Code in loop
• 4 instructions + memory fetch where value used
immediately + possible pipeline break when exits routine immediately + possible pipeline break when exits routine
• Total cycles = 30 + 4 * N = 670 cycles predicted
• Actual cycles 1352
• Difference = 670 cycles – 4 extra each time around the loop
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
43 / 28
Switch to “cycle accurate” TigerSHARC l
simulator
• Takes much more time to simulate than emulate
WRONG SIMULATOR?
Pipeline viewer (Simulator debug) Pipeline viewer (Simulator debug)
E l
Extra cycle on fetch
+1 cycles extra
3 l t
+3 cycles extra
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
46 / 28 10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
47 / 28
New prediction New prediction
• Old prediction
– Outside loop 30 cycles – Inside loop N * 4
• New prediction
d l l
– Outside loop 30 cycles
– Inside loop 3 (first time hardware loop jump back) +
N * 4 + N * memory stall when value used immediately on fetch
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
48 / 28
Trying to understand what we have done Trying to understand what we have done
• Most TigerSHARC instructions can be made conditional.
• WHY? Because doing a NOP g instruction (if condition not met) is much less disruptive to the instruction pipeline than doing a JUMP (lose of 9 cycles if jump taken – probably more because of code format)
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
49 / 28
Why mostly conditional instructions?
Why mostly conditional instructions?
• TigerSHARC has a very deep pipeline, so thatTigerSHARC has a very deep pipeline, so that conditional jumps cause a potential large disruption of the pipeline
• Better to use non‐jump instructions which don’t disrupt pipeline, even if instruction is not executed (acts as nop)
If (N < 1) return_value = NULL;
else return value = NULL;
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
50 / 28
else return_value NULL;
Why mostly conditional instructions?
Why mostly conditional instructions?
If (N < 1) If (N < 1) return_value = NULL;
else return_value = value;
return_value = NULL;
else return_value = value;
COMP(N, 1);;
IF NJLT, JUMP _ELSE;;
COMP(N, 1);;
IF NJLT; DO, J5 = NULL;;
J5 = NULL;;
JUMP _END_IF;;
ELSE
IF JLT; DO, J5 = value;;
Concept is there e need to _ELSE:
J5 = value;;
Concept is there – we need to check on whether syntax is
correct
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
51 / 28
Trying to understand what we have done Trying to understand what we have done
• Use J registers for address g operations, but store values from memory in XR1 and YR1
YR1
• WHY? Instructions like this [J1] = XR1;; has the
potential to be put in parallel with more parallel with more operations
Hardware – zero overhead loop.
About 4 * N cycles better (N is times round the loop)
LC0 = N;; Load counter 0 with value N
Start_of_loop_LABEL:
Loop code here ;;
IF NLC0E, JUMP Start_of_loop_LABEL;;
NLC0E – Not LC0 expired – essentially Compare LC0 with 2 If less than 2, continue (don’t jump)
If 2 or more, then decrement LC0 and jump, j p
All sorts of stall issues if instruction is not properly aligned –TigerSHARC manual 8‐23
CAN’T USE WHEN THERE IS A FUNCTION CALL IN THE LOOP?
WHY NOT? – WHAT HAPPENS – NEED TO EXPLORE MORE.
Hardware – zero overhead loop.
BIG WARNING BIG WARNING
LC0 = N;; Load counter 0 with value N
LC0 uses UNSIGNED ARITHMETIC – MAKE SURE N is not negative as a negative number has the same bit pattern negative, as a negative number has the same bit pattern as a VERY large unsigned number, and the processor will go around the loop for a week
We did a check for N <= 0 before entering the hardware loop as another part of our code – so we lucked in – otherise could have big problems.
This issue is so important (and time wasting in the
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
54 / 28
This issue is so important (and time wasting in the laboratories) that will be deducting marks in quizzes and exams
What’s this XR1, YR1 and R1 stuff What s this XR1, YR1 and R1 stuff
• TigerSHARC isTigerSHARC is designed to do many things at once
• So you need appropriate syntax to control it
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
55 / 28
What’s this XR1, YR1 and R1 stuff What s this XR1, YR1 and R1 stuff
XYR1 = R2 + R3;;
d 2 dd
does 2 adds XR1 = XR2 + XR3 and
YR1 = YR2 + YR3;;
You can add the X values and not the Y values with this syntaxy
XR1 = R2 + R3;;
And NOT with XR1 = XR2 + XR3;;
Ugly – but they (ADI) will not change the syntax (DAMY)
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
56 / 28
What’s this XR1, YR1 and R1 stuff What s this XR1, YR1 and R1 stuff
XYR1 = [J0 += 0x1];;
Does a 32‐bit fetch and puts the same value into XR1 and YR1.
Same as doing XR1 = [J0 += 0];; AND
YR1 = [J0 += 1];; at the same time
XYR1 = L[J0 +0x2];;
Does a dual 64 bit fetch and is the same as doing
XR1 = [J0 += 1];; AND
YR1 = [J0 += 1];; at the same time
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
57 / 28
What’s this XR1, YR1 and R1 stuff What s this XR1, YR1 and R1 stuff
XYR1 = [J0 += 0x1];;
means means
XR1 = [J0 += 0];; AND YR1 = [J0 += 1];;
XYR1 = L[J0 +0x2];;
means
XR1 = [J0 += 1];; AND YR1 = [J0 += 1];; at the same time
XR1:0 = L[J0 +0x2];; [ ];;
means
XR0 = [J0 += 1];; AND XR1 = [J0 += 1];;
XYR1:0 L[J0 +0x2];;
XYR1:0 = L[J0 +0x2];;
means
XR0 = [J0 += 0];; AND YR0 = [J0 += 1];; AND XR1 = [J0 += 0];;
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
58 / 28 YR1 = [J0 += 1];;
What’s this XR1, YR1 and R1 stuff What s this XR1, YR1 and R1 stuff
XYR1:0 = L[J0 +0x2];;
means
XR0 = [J0 += 0];; AND YR0 = [J0 += 1];; AND XR1 = [J0 += 0];;
YR1 = [J0 += 1];;
[ ]
XR3:0 = Q[J0 +0x4];;
means
XR0 = [J0 += 1];; AND XR1 = [J0 += 1];; AND XR2 = [J0 += 1];; AND XR3 = [J0 += 1];;
XR3 = [J0 += 1];;
XYR3:0 = Q[J0 +0x4];;
means
XR0 = [J0 += 0];; AND YR0 = [J0 += 1];; AND XR1 = [J0 += 0];; AND YR1 = [J0 += 1];; AND XR2 = [J0 +=0];; AND YR2 = [J0 += 1];; AND XR3 = [J0 += 0];; AND YR3 [J0 1]
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
59 / 28 YR3 = [J0 += 1];;
Float release generated by C++ compiler – identify new instructionsde y e s uc o s
• I see 1 new instructionI see 1 new instruction
Difference between integer and math operations
XYR1 = R2 + R3;;
does 2 INTEGER adds does 2 INTEGER adds XR1 = XR2 + XR3 and
YR1 = YR2 + YR3;
SYNTAX XR1 = R2 + R3;;
And NOT with XR1 = XR2 + XR3;;
Use F syntax to make it a float operation
XYFR1 = R2 + R3;;
XYFR1 R2 + R3;;
does 2 FLOATING adds XFR1 = R2 + R3 and
YFR1 = R2 + R3;
YFR1 R2 + R3;
Exercise 1 – needed for Lab. 1 Exercise 1 needed for Lab. 1
• FIR filter operation ‐‐ data and filter‐coefficients are p both integer arrays – Write in C++
• New_value from Audio A/D, output sent to Audio D/A
1 1
for j= to N −
[ 1] [ ];
[0] ;
data N j data N j data newvalue
− − = −
=
1
0
[ ]* _ [ ];
N
j
output data j filter coeffs j
−
=
=
∑
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
62 / 28
Exercise – needed for Lab. 1 Exercise needed for Lab. 1
• FIR filter operation ‐‐ data and filter‐FIR filter operation data and filter coefficients are both integer arrays ‐‐ ASM
ReadAudioSource(&newvalue);
Re (& );
1 1
[ 1] [ ];
adAudioSource newvalue for j to N
data N j data N j
= −
− − = −
1
[0] ;
[ ]* _ [ ];
N
data newvalue
output data j filter coeffs j
−
=
=∑
0
_
( );
j
p j f ff j
WriteAudioSource output
∑=
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
63 / 28
Insert C++ code – for Lab. 1 Insert C++ code for Lab. 1
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
64 / 28
Insert assembler code version (Lab. 2) Insert assembler code version (Lab. 2)
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
65 / 28
What we NOW KNOW EVERYTHING FOR THE
( OS )!
FINAL (REALLY ‐‐ ALMOST)!
• Can we return from an assembly language routine without
h h ?
crashing the processor?
• Return a parameter from assembly language routine – (Is it same for ints and floats?)
• Pass parameters into assembly language – (Is it same for ints and floats?)
• Do IF THEN ELSE statementsDo IF THEN ELSE statements
• Read and write values to memory
• Read and write values in a loop
• Do some mathematics on the values fetched from memory
• Do some mathematics on the values fetched from memory All this stuff was demonstrated by coding HalfWaveRectifyASM( )
‐‐ ☺
10/13/2010
TigerSHARC assemble code 3, M. Smith, ECE, University of Calgary,
Canada
66 / 28