diff options
author | Rangi <remy.oukaour+rangi42@gmail.com> | 2019-02-18 10:11:51 -0500 |
---|---|---|
committer | Rangi <remy.oukaour+rangi42@gmail.com> | 2019-02-18 10:11:51 -0500 |
commit | b17fc465e7f144715466e1afde66614e0b6a21f1 (patch) | |
tree | f36cc69ee96b025658e6cada1e32051ff9f87575 /Optimizing-assembly-code.md | |
parent | 5bf9a6c3a4885c2ddf80aa41264007d26721d0f6 (diff) |
python toc.py Optimizing-assembly-code.md
Diffstat (limited to 'Optimizing-assembly-code.md')
-rw-r--r-- | Optimizing-assembly-code.md | 217 |
1 files changed, 167 insertions, 50 deletions
diff --git a/Optimizing-assembly-code.md b/Optimizing-assembly-code.md index d89c923..3e26854 100644 --- a/Optimizing-assembly-code.md +++ b/Optimizing-assembly-code.md @@ -1,22 +1,52 @@ -Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devrs.com/gb/files/asmtips.txt), or [the page on Z80 Optimization on WikiTI](http://wikiti.brandonw.net/index.php?title=Z80_Optimization) +Sometimes the simplest way to write something in assembly code isn't the best. There are optimization techniques to make code smaller and/or faster. + +Most of these tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devrs.com/gb/files/asmtips.txt), or [WikiTI's Z80 Optimization page](http://wikiti.brandonw.net/index.php?title=Z80_Optimization). (Note that Z80 assembly is *not* the same as GBZ80; it has more registers and some different instructions.) + + +## Contents + +- [Registers](#registers) + - [Set `a` to zero](#set-a-to-zero) + - [Set `a` to some constant minus `a`](#set-a-to-some-constant-minus-a) + - [Add `a` to a 16-bit register](#add-a-to-a-16-bit-register) + - [Loading from an offset to `hl`](#loading-from-an-offset-to-hl) + - [Exchanging two 16-bit registers](#exchanging-two-16-bit-registers) +- [Branching](#branching) + - [Compare `a` to zero](#compare-a-to-zero) + - [Compare `a` to 1](#compare-a-to-1) + - [Compare `a` to 255](#compare-a-to-255) + - [Chaining comparisons](#chaining-comparisons) +- [Functions](#functions) + - [Tail call optimization](#tail-call-optimization) + - [Calling `hl`](#calling-hl) + - [Inlining](#inlining) + ## Registers -### Set A to zero + +### Set `a` to zero + Don't do: + ```asm ld a, 0 ; 2 bytes, 2 cycles, no changes to flags ``` + But do: + ```asm xor a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1 ``` -or + +or do: + ```asm sub a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1 ``` -Be careful that the optimized versions alter flags. As such, `ld a, 0` must be left intact in the code below: +Don't use the optimized versions if you need to preserve flags. As such, `ld a, 0` must be left intact in the code below: + ```asm ld a, [wIsTrainerBattle] and a ; NZ if in trainer battle @@ -24,150 +54,213 @@ Be careful that the optimized versions alter flags. As such, `ld a, 0` must be l jr nz, .trainer ``` -### Set A to some constant subtracted by A + +### Set `a` to some constant minus `a` + Don't do: + ```asm - ld b, a ; 4 bytes, 4 cycles + ; 4 bytes, 4 cycles + ld b, a ld a, CONST sub b ``` + But do: + ```asm - cpl ; 3 bytes, 3 cycles - add CONST+1 + ; 3 bytes, 3 cycles + cpl + add CONST + 1 ``` -### Add A to a 16-bit register -(`hl` taken as an example, but any 16-bit register would work as well) + +### Add `a` to a 16-bit register + +(The example uses `hl`, but `bc` or `de` would also work.) Don't do: + ```asm - add l ; 6 bytes, 6 cycles + ; 6 bytes, 6 cycles + add l ld l, a ld a, 0 adc h - ld h,a + ld h, a ``` -or + +and don't do: + ```asm - add l ; 6 bytes, 6 cycles + ; 6 bytes, 6 cycles + add l ld l, a ld a, h adc 0 ld h, a ``` + But do: + ```asm - add l ; 5 bytes, 5 cycles + ; 5 bytes, 5 cycles + add l ld l, a - jr nc, .NoCarry + jr nc, .no_carry inc h -.NoCarry: +.no_carry: ``` + or better (doesn't require a label): + ```asm - add l ; 5 bytes, 5 cycles - ld l, a ; = a + l - adc a, h ; = a + l + carry + h + ; 5 bytes, 5 cycles + add l ; = a + l + ld l, a ; cache a + l + adc h ; = a + l + carry + h sub l ; = carry + h ld h, a ``` -### Loading from an offset to HL + +### Loading from an offset to `hl` + Don't do: + ```asm - ld a, [offset] ; 8 bytes, 10 cycles + ; 8 bytes, 10 cycles + ld a, [offset] ld l, a ld a, [offset+1] ld h, a ``` + But do: + ```asm - ld hl, offset ; 6 bytes, 8 cycles + ; 6 bytes, 8 cycles + ld hl, offset ld a, [hli] ld h, [hl] ld l, a ``` + ### Exchanging two 16-bit registers -(`hl` and `de` taken as examples, but any 16-bit registers are fine) + +(The example uses `hl` and `de`, but any pair of `bc`, `de`, or `hl` would also work.) If you care about speed: + ```asm - ld a, d ; 6 bytes, 6 cycles + ; 6 bytes, 6 cycles + ld a, d ld d, h ld h, a ld a, e ld e, l ld l, a ``` + If you care about size: + ```asm - push de ; 4 bytes, 9 cycles + ; 4 bytes, 9 cycles + push de ld d, h ld e, l pop hl ``` + ## Branching -### Compare A to zero +### Compare `a` to zero + Don't do: + ```asm cp 0 ; 2 bytes, 2 cycles ``` + But do: + ```asm or a ; 1 byte, 1 cycle ``` -or + +or do: + ```asm and a ; 1 byte, 1 cycle ``` -### Compare A to 1 + +### Compare `a` to 1 + ```asm cp 1 ; 2 bytes, 2 cycles ``` + If you don't care about the value in `a`: + + ```asm dec a ; 1 byte, 1 cycle, decrements a ``` -Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare: +Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken. Compare: + ```asm cp 1 jr z, .equals1 ``` + +with: + ```asm dec a jr z, .equals1 inc a ``` -### Compare A to $FF + +### Compare `a` to 255 + +(255, or $FF in hexadecimal, is the same as −1 due to [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement).) + ```asm - cp $FF ; 2 bytes, 2 cycles + cp $ff ; 2 bytes, 2 cycles ``` + If you don't care about the value in `a`: + ```asm dec a ; 1 byte, 1 cycle, decrements a ``` -Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare: +Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken. Compare: + ```asm - cp $FF + cp $ff jr z, .equals255 ``` + +with: + ```asm inc a jr z, .equals255 dec a ``` + ### Chaining comparisons + Don't do: + ```asm cp 1 jr z, .equals1 @@ -177,7 +270,9 @@ Don't do: jr z, .equals3 ; ... ``` + But do: + ```asm dec a jr z, .equals1 @@ -192,49 +287,71 @@ But do: ## Functions ### Tail call optimization + Don't do: + ```asm - call Function ; 4 bytes, 10 cycles + ; 4 bytes, 10 cycles + call Function ret ``` + But do: + ```asm - jp Function ; 3 bytes, 4 cycles + ; 3 bytes, 4 cycles + jp Function ``` -### Calling HL + +### Calling `hl` + ```asm - ld de, .return ; 5 bytes, 8 cycles + ; 5 bytes, 8 cycles + ld de, .return push de jp hl + .return ... ``` + But do: + ```asm - call DoJump ; 4 bytes, 7 cycles + ; 4 bytes, 7 cycles + call _hl_ +; return ... +``` + +`_hl_` is a routine already defined in [home.asm](../blob/master/home.asm): -DoJump: ; TODO: such a function already exists in the code; but where is it? +```asm +_hl_:: jp hl ``` + ### Inlining + Don't do: + ```asm - call GetOffset ; 4 additional bytes, 10 additional cycles + ; 4 additional bytes, 10 additional cycles + call GetOffset ... GetOffset: - add hl, bc - ld a, [hli] - ld h, [hl] - ld l, a + (some code) + ret ``` + if `GetOffset` is only called a handful of times. Instead, do: + ```asm - add hl, bc - ld a, [hli] - ld h, [hl] - ld l, a -```
\ No newline at end of file +; GetOffset + (some code) +``` + +You can set `(some code)` apart with blank lines and put a comment on top to make its self-contained nature clear without the extra `call` and `ret`. |