summaryrefslogtreecommitdiff
path: root/Optimizing-assembly-code.md
diff options
context:
space:
mode:
authorRangi <remy.oukaour+rangi42@gmail.com>2019-02-18 10:11:51 -0500
committerRangi <remy.oukaour+rangi42@gmail.com>2019-02-18 10:11:51 -0500
commitb17fc465e7f144715466e1afde66614e0b6a21f1 (patch)
treef36cc69ee96b025658e6cada1e32051ff9f87575 /Optimizing-assembly-code.md
parent5bf9a6c3a4885c2ddf80aa41264007d26721d0f6 (diff)
python toc.py Optimizing-assembly-code.md
Diffstat (limited to 'Optimizing-assembly-code.md')
-rw-r--r--Optimizing-assembly-code.md217
1 files changed, 167 insertions, 50 deletions
diff --git a/Optimizing-assembly-code.md b/Optimizing-assembly-code.md
index d89c923..3e26854 100644
--- a/Optimizing-assembly-code.md
+++ b/Optimizing-assembly-code.md
@@ -1,22 +1,52 @@
-Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devrs.com/gb/files/asmtips.txt), or [the page on Z80 Optimization on WikiTI](http://wikiti.brandonw.net/index.php?title=Z80_Optimization)
+Sometimes the simplest way to write something in assembly code isn't the best. There are optimization techniques to make code smaller and/or faster.
+
+Most of these tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devrs.com/gb/files/asmtips.txt), or [WikiTI's Z80 Optimization page](http://wikiti.brandonw.net/index.php?title=Z80_Optimization). (Note that Z80 assembly is *not* the same as GBZ80; it has more registers and some different instructions.)
+
+
+## Contents
+
+- [Registers](#registers)
+ - [Set `a` to zero](#set-a-to-zero)
+ - [Set `a` to some constant minus `a`](#set-a-to-some-constant-minus-a)
+ - [Add `a` to a 16-bit register](#add-a-to-a-16-bit-register)
+ - [Loading from an offset to `hl`](#loading-from-an-offset-to-hl)
+ - [Exchanging two 16-bit registers](#exchanging-two-16-bit-registers)
+- [Branching](#branching)
+ - [Compare `a` to zero](#compare-a-to-zero)
+ - [Compare `a` to 1](#compare-a-to-1)
+ - [Compare `a` to 255](#compare-a-to-255)
+ - [Chaining comparisons](#chaining-comparisons)
+- [Functions](#functions)
+ - [Tail call optimization](#tail-call-optimization)
+ - [Calling `hl`](#calling-hl)
+ - [Inlining](#inlining)
+
## Registers
-### Set A to zero
+
+### Set `a` to zero
+
Don't do:
+
```asm
ld a, 0 ; 2 bytes, 2 cycles, no changes to flags
```
+
But do:
+
```asm
xor a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1
```
-or
+
+or do:
+
```asm
sub a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1
```
-Be careful that the optimized versions alter flags. As such, `ld a, 0` must be left intact in the code below:
+Don't use the optimized versions if you need to preserve flags. As such, `ld a, 0` must be left intact in the code below:
+
```asm
ld a, [wIsTrainerBattle]
and a ; NZ if in trainer battle
@@ -24,150 +54,213 @@ Be careful that the optimized versions alter flags. As such, `ld a, 0` must be l
jr nz, .trainer
```
-### Set A to some constant subtracted by A
+
+### Set `a` to some constant minus `a`
+
Don't do:
+
```asm
- ld b, a ; 4 bytes, 4 cycles
+ ; 4 bytes, 4 cycles
+ ld b, a
ld a, CONST
sub b
```
+
But do:
+
```asm
- cpl ; 3 bytes, 3 cycles
- add CONST+1
+ ; 3 bytes, 3 cycles
+ cpl
+ add CONST + 1
```
-### Add A to a 16-bit register
-(`hl` taken as an example, but any 16-bit register would work as well)
+
+### Add `a` to a 16-bit register
+
+(The example uses `hl`, but `bc` or `de` would also work.)
Don't do:
+
```asm
- add l ; 6 bytes, 6 cycles
+ ; 6 bytes, 6 cycles
+ add l
ld l, a
ld a, 0
adc h
- ld h,a
+ ld h, a
```
-or
+
+and don't do:
+
```asm
- add l ; 6 bytes, 6 cycles
+ ; 6 bytes, 6 cycles
+ add l
ld l, a
ld a, h
adc 0
ld h, a
```
+
But do:
+
```asm
- add l ; 5 bytes, 5 cycles
+ ; 5 bytes, 5 cycles
+ add l
ld l, a
- jr nc, .NoCarry
+ jr nc, .no_carry
inc h
-.NoCarry:
+.no_carry:
```
+
or better (doesn't require a label):
+
```asm
- add l ; 5 bytes, 5 cycles
- ld l, a ; = a + l
- adc a, h ; = a + l + carry + h
+ ; 5 bytes, 5 cycles
+ add l ; = a + l
+ ld l, a ; cache a + l
+ adc h ; = a + l + carry + h
sub l ; = carry + h
ld h, a
```
-### Loading from an offset to HL
+
+### Loading from an offset to `hl`
+
Don't do:
+
```asm
- ld a, [offset] ; 8 bytes, 10 cycles
+ ; 8 bytes, 10 cycles
+ ld a, [offset]
ld l, a
ld a, [offset+1]
ld h, a
```
+
But do:
+
```asm
- ld hl, offset ; 6 bytes, 8 cycles
+ ; 6 bytes, 8 cycles
+ ld hl, offset
ld a, [hli]
ld h, [hl]
ld l, a
```
+
### Exchanging two 16-bit registers
-(`hl` and `de` taken as examples, but any 16-bit registers are fine)
+
+(The example uses `hl` and `de`, but any pair of `bc`, `de`, or `hl` would also work.)
If you care about speed:
+
```asm
- ld a, d ; 6 bytes, 6 cycles
+ ; 6 bytes, 6 cycles
+ ld a, d
ld d, h
ld h, a
ld a, e
ld e, l
ld l, a
```
+
If you care about size:
+
```asm
- push de ; 4 bytes, 9 cycles
+ ; 4 bytes, 9 cycles
+ push de
ld d, h
ld e, l
pop hl
```
+
## Branching
-### Compare A to zero
+### Compare `a` to zero
+
Don't do:
+
```asm
cp 0 ; 2 bytes, 2 cycles
```
+
But do:
+
```asm
or a ; 1 byte, 1 cycle
```
-or
+
+or do:
+
```asm
and a ; 1 byte, 1 cycle
```
-### Compare A to 1
+
+### Compare `a` to 1
+
```asm
cp 1 ; 2 bytes, 2 cycles
```
+
If you don't care about the value in `a`:
+
+
```asm
dec a ; 1 byte, 1 cycle, decrements a
```
-Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare:
+Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken. Compare:
+
```asm
cp 1
jr z, .equals1
```
+
+with:
+
```asm
dec a
jr z, .equals1
inc a
```
-### Compare A to $FF
+
+### Compare `a` to 255
+
+(255, or $FF in hexadecimal, is the same as −1 due to [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement).)
+
```asm
- cp $FF ; 2 bytes, 2 cycles
+ cp $ff ; 2 bytes, 2 cycles
```
+
If you don't care about the value in `a`:
+
```asm
dec a ; 1 byte, 1 cycle, decrements a
```
-Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare:
+Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken. Compare:
+
```asm
- cp $FF
+ cp $ff
jr z, .equals255
```
+
+with:
+
```asm
inc a
jr z, .equals255
dec a
```
+
### Chaining comparisons
+
Don't do:
+
```asm
cp 1
jr z, .equals1
@@ -177,7 +270,9 @@ Don't do:
jr z, .equals3
; ...
```
+
But do:
+
```asm
dec a
jr z, .equals1
@@ -192,49 +287,71 @@ But do:
## Functions
### Tail call optimization
+
Don't do:
+
```asm
- call Function ; 4 bytes, 10 cycles
+ ; 4 bytes, 10 cycles
+ call Function
ret
```
+
But do:
+
```asm
- jp Function ; 3 bytes, 4 cycles
+ ; 3 bytes, 4 cycles
+ jp Function
```
-### Calling HL
+
+### Calling `hl`
+
```asm
- ld de, .return ; 5 bytes, 8 cycles
+ ; 5 bytes, 8 cycles
+ ld de, .return
push de
jp hl
+
.return
...
```
+
But do:
+
```asm
- call DoJump ; 4 bytes, 7 cycles
+ ; 4 bytes, 7 cycles
+ call _hl_
+; return
...
+```
+
+`_hl_` is a routine already defined in [home.asm](../blob/master/home.asm):
-DoJump: ; TODO: such a function already exists in the code; but where is it?
+```asm
+_hl_::
jp hl
```
+
### Inlining
+
Don't do:
+
```asm
- call GetOffset ; 4 additional bytes, 10 additional cycles
+ ; 4 additional bytes, 10 additional cycles
+ call GetOffset
...
GetOffset:
- add hl, bc
- ld a, [hli]
- ld h, [hl]
- ld l, a
+ (some code)
+ ret
```
+
if `GetOffset` is only called a handful of times. Instead, do:
+
```asm
- add hl, bc
- ld a, [hli]
- ld h, [hl]
- ld l, a
-``` \ No newline at end of file
+; GetOffset
+ (some code)
+```
+
+You can set `(some code)` apart with blank lines and put a comment on top to make its self-contained nature clear without the extra `call` and `ret`.