summaryrefslogtreecommitdiff
path: root/Optimizing-assembly-code.md
diff options
context:
space:
mode:
authorEldred Habert <eldredhabert0@gmail.com>2019-02-18 14:58:18 +0100
committerEldred Habert <eldredhabert0@gmail.com>2019-02-18 14:58:18 +0100
commit5bf9a6c3a4885c2ddf80aa41264007d26721d0f6 (patch)
tree8844e98fc7f28511cd70944ebaec6b6e7fcbfbf8 /Optimizing-assembly-code.md
parent015ed30c0162b2ddbbe20183a31abf93d4280f6d (diff)
Restructure snippets, add new ones
Diffstat (limited to 'Optimizing-assembly-code.md')
-rw-r--r--Optimizing-assembly-code.md217
1 files changed, 147 insertions, 70 deletions
diff --git a/Optimizing-assembly-code.md b/Optimizing-assembly-code.md
index c3ca655..d89c923 100644
--- a/Optimizing-assembly-code.md
+++ b/Optimizing-assembly-code.md
@@ -3,37 +3,61 @@ Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devr
## Registers
### Set A to zero
+Don't do:
```asm
ld a, 0 ; 2 bytes, 2 cycles, no changes to flags
-
-;;;
-
+```
+But do:
+```asm
xor a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1
+```
+or
+```asm
sub a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1
```
+Be careful that the optimized versions alter flags. As such, `ld a, 0` must be left intact in the code below:
+```asm
+ ld a, [wIsTrainerBattle]
+ and a ; NZ if in trainer battle
+ ld a, 0
+ jr nz, .trainer
+```
+
### Set A to some constant subtracted by A
+Don't do:
```asm
ld b, a ; 4 bytes, 4 cycles
ld a, CONST
sub b
-
-;;;
-
+```
+But do:
+```asm
cpl ; 3 bytes, 3 cycles
add CONST+1
```
-### Add A to HL
+### Add A to a 16-bit register
+(`hl` taken as an example, but any 16-bit register would work as well)
+
+Don't do:
```asm
add l ; 6 bytes, 6 cycles
ld l, a
ld a, 0
adc h
+ ld h,a
+```
+or
+```asm
+ add l ; 6 bytes, 6 cycles
+ ld l, a
+ ld a, h
+ adc 0
ld h, a
-
-;;;
-
+```
+But do:
+```asm
add l ; 5 bytes, 5 cycles
ld l, a
jr nc, .NoCarry
@@ -41,23 +65,35 @@ Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devr
.NoCarry:
```
+or better (doesn't require a label):
+```asm
+ add l ; 5 bytes, 5 cycles
+ ld l, a ; = a + l
+ adc a, h ; = a + l + carry + h
+ sub l ; = carry + h
+ ld h, a
+```
### Loading from an offset to HL
+Don't do:
```asm
ld a, [offset] ; 8 bytes, 10 cycles
ld l, a
ld a, [offset+1]
ld h, a
-
-;;;
-
+```
+But do:
+```asm
ld hl, offset ; 6 bytes, 8 cycles
ld a, [hli]
ld h, [hl]
ld l, a
```
-### Exchanging HE and DL
+### Exchanging two 16-bit registers
+(`hl` and `de` taken as examples, but any 16-bit registers are fine)
+
+If you care about speed:
```asm
ld a, d ; 6 bytes, 6 cycles
ld d, h
@@ -65,9 +101,9 @@ Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devr
ld a, e
ld e, l
ld l, a
-
-;;;
-
+```
+If you care about size:
+```asm
push de ; 4 bytes, 9 cycles
ld d, h
ld e, l
@@ -77,87 +113,128 @@ Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devr
## Branching
### Compare A to zero
+Don't do:
```asm
cp 0 ; 2 bytes, 2 cycles
-
-;;;
-
+```
+But do:
+```asm
or a ; 1 byte, 1 cycle
+```
+or
+```asm
and a ; 1 byte, 1 cycle
```
-### Compare A-1 to zero
+### Compare A to 1
```asm
cp 1 ; 2 bytes, 2 cycles
-
-;;;
-
+```
+If you don't care about the value in `a`:
+```asm
dec a ; 1 byte, 1 cycle, decrements a
```
-## Functions
-
-### Tail call optimization
-```asm
- call Function ; 4 bytes, 10 cycles
- ret
+Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare:
+```asm
+ cp 1
+ jr z, .equals1
+```
+```asm
+ dec a
+ jr z, .equals1
+ inc a
+```
-;;;
+### Compare A to $FF
+```asm
+ cp $FF ; 2 bytes, 2 cycles
+```
+If you don't care about the value in `a`:
+```asm
+ dec a ; 1 byte, 1 cycle, decrements a
+```
- jp Function ; 3 bytes, 4 cycles
+Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare:
+```asm
+ cp $FF
+ jr z, .equals255
+```
+```asm
+ inc a
+ jr z, .equals255
+ dec a
```
-### Executing subroutines
+### Chaining comparisons
+Don't do:
```asm
- ld hl, param1
- call Function1
- ld hl, param2
- call Function2
- ld hl, param3
- call Function1
- ...
- ...
+ cp 1
+ jr z, .equals1
+ cp 2
+ jr z, .equals2
+ cp 3
+ jr z, .equals3
+ ; ...
+```
+But do:
+```asm
+ dec a
+ jr z, .equals1
+ dec a
+ jr z, .equals2
+ dec a
+ jr z, .equals3
+ ; ...
+```
-.Function1:
- ...
- ret
-.Function2:
- ...
- ret
-
-;;;
- ld sp, calltable
- ret ; jump to sp (first entry on calltable)
+## Functions
-.Function1:
- pop hl
- ...
- ret
-.Function2:
- pop hl
- ...
+### Tail call optimization
+Don't do:
+```asm
+ call Function ; 4 bytes, 10 cycles
ret
-
-calltable:
- dw Function1, param1
- dw Function2, param2
- dw Function1, param3
+```
+But do:
+```asm
+ jp Function ; 3 bytes, 4 cycles
```
### Calling HL
```asm
- ld de, .retadr ; 5 bytes, 8 cycles
+ ld de, .return ; 5 bytes, 8 cycles
push de
- jp [hl]
- .retadr:
+ jp hl
+.return
+ ...
+```
+But do:
+```asm
+ call DoJump ; 4 bytes, 7 cycles
...
-;;;
+DoJump: ; TODO: such a function already exists in the code; but where is it?
+ jp hl
+```
- call DoJump ; 4 bytes, 7 cycles
+### Inlining
+Don't do:
+```asm
+ call GetOffset ; 4 additional bytes, 10 additional cycles
...
-DoJump:
- jp [hl]
+GetOffset:
+ add hl, bc
+ ld a, [hli]
+ ld h, [hl]
+ ld l, a
+```
+if `GetOffset` is only called a handful of times. Instead, do:
+```asm
+ add hl, bc
+ ld a, [hli]
+ ld h, [hl]
+ ld l, a
``` \ No newline at end of file