Sometimes the simplest way to write something in assembly code isn't the best. All of your resources are limited: CPU speed, ROM size, RAM space, register use. You can rewrite code to use those resources more efficiently (sometimes by trading one for another). Most of these tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devrs.com/gb/files/asmtips.txt), or [WikiTI's Z80 Optimization page](http://wikiti.brandonw.net/index.php?title=Z80_Optimization). (Note that Z80 assembly is *not* the same as GBZ80; it has more registers and some different instructions.) ## Contents - [Registers](#registers) - [Set `a` to 0](#set-a-to-0) - [Set `a` to some constant minus `a`](#set-a-to-some-constant-minus-a) - [Invert the bits of `a`](#invert-the-bits-of-a) - [Multiply `hl` by 2](#multiply-hl-by-2) - [Add `a` to a 16-bit register](#add-a-to-a-16-bit-register) - [Loading from an address to `hl`](#loading-from-an-address-to-hl) - [Exchanging two 16-bit registers](#exchanging-two-16-bit-registers) - [Loading a constant into `[hl]`](#loading-a-constant-into-hl) - [Loading two constants into a register pair](#loading-two-constants-into-a-register-pair) - [Loading a constant into `[hl]` and incrementing or decrementing `hl`](#loading-a-constant-into-hl-and-incrementing-or-decrementing-hl) - [Incrementing or decrementing `[hl]`](#incrementing-or-decrementing-hl) - [Branching (control flow)](#branching-control-flow) - [Relative jumps](#relative-jumps) - [Compare `a` to 0](#compare-a-to-0) - [Compare `a` to 1](#compare-a-to-1) - [Compare `a` to 255](#compare-a-to-255) - [Add `a` to `hl` without using a 16-bit register](#add-a-to-hl-without-using-a-16-bit-register) - [Subroutines (functions)](#subroutines-functions) - [Tail call optimization](#tail-call-optimization) - [Calling `hl`](#calling-hl) - [Inlining](#inlining) - [Fallthrough](#fallthrough) - [Jump and lookup tables](#jump-and-lookup-tables) - [Chaining comparisons](#chaining-comparisons) ## Registers ### Set `a` to 0 Don't do: ```asm ld a, 0 ; 2 bytes, 2 cycles, no changes to flags ``` But do: ```asm xor a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1 ``` or do: ```asm sub a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1 ``` Don't use the optimized versions if you need to preserve flags. As such, `ld a, 0` must be left intact in the code below: ```asm ld a, [wIsTrainerBattle] and a ; NZ if in trainer battle ld a, 0 jr nz, .trainer ``` ### Set `a` to some constant minus `a` Don't do: ```asm ; 4 bytes, 4 cycles ld b, a ld a, CONST sub b ``` But do: ```asm ; 3 bytes, 3 cycles cpl add CONST + 1 ``` ### Invert the bits of `a` Don't do: ```asm xor $ff ; 2 bytes, 2 cycles ``` But do: ```asm cpl ; 1 byte, 1 cycle ``` ### Multiply `hl` by 2 Don't do: ```asm ; 6 bytes, 6 cycles sla l rl h ``` But do: ```asm ; 1 byte, 2 cycles add hl, hl ``` (The `SpeciesItemBoost` routine in [engine/battle/effect_commands.asm](../blob/master/engine/battle/effect_commands.asm) actually does this!) ### Add `a` to a 16-bit register (The example uses `hl`, but `bc` or `de` would also work.) Don't do: ```asm ; 6 bytes, 6 cycles add l ld l, a ld a, 0 adc h ld h, a ``` and don't do: ```asm ; 6 bytes, 6 cycles add l ld l, a ld a, h adc 0 ld h, a ``` But do: ```asm ; 5 bytes, 5 cycles add l ld l, a jr nc, .no_carry inc h .no_carry: ``` Or better (doesn't require a label): ```asm ; 5 bytes, 5 cycles add l ; = a + l ld l, a ; cache a + l adc h ; = a + l + carry + h sub l ; = carry + h ld h, a ``` ### Loading from an address to `hl` Don't do: ```asm ; 8 bytes, 10 cycles ld a, [Address] ld l, a ld a, [Address+1] ld h, a ``` But do: ```asm ; 6 bytes, 8 cycles ld hl, Address ld a, [hli] ld h, [hl] ld l, a ``` ### Exchanging two 16-bit registers (The example uses `hl` and `de`, but any pair of `bc`, `de`, or `hl` would also work.) If you care about speed: ```asm ; 6 bytes, 6 cycles ld a, d ld d, h ld h, a ld a, e ld e, l ld l, a ``` If you care about size: ```asm ; 4 bytes, 9 cycles push de ld d, h ld e, l pop hl ``` ### Loading a constant into `[hl]` Don't do: ```asm ; 3 bytes, 4 cycles ld a, CONST ld [hl], a ``` But do: ```asm ; 2 bytes, 3 cycles ld [hl], CONST ``` ### Loading two constants into a register pair (The example uses `bc`, but `hl` or `de` would also work.) Don't do: ```asm ; 4 bytes, 4 cycles ld b, ONE ld c, TWO ``` But do: ```asm ; 3 bytes, 3 cycles ld bc, ONE << 8 | TWO ``` Or better, use the `lb` macro in [macros/code.asm](../blob/master/macros/code.asm): ```asm ; 3 bytes, 3 cycles lb bc, ONE, TWO ``` ### Loading a constant into `[hl]` and incrementing or decrementing `hl` Don't do: ```asm ; 2 bytes, 4 cycles ld [hl], a inc hl ``` But do: ```asm ; 1 bytes, 2 cycles ld [hli], a ``` And don't do: ```asm ; 2 bytes, 4 cycles ld [hl], a dec hl ``` But do: ```asm ; 1 bytes, 2 cycles ld [hld], a ``` ### Incrementing or decrementing `[hl]` Don't do: ```asm ; 3 bytes, 5 cycles ld a, [hl] inc a ld [hl], a ``` But do: ```asm ; 1 bytes, 3 cycles inc [hl] ``` And don't do: ```asm ; 3 bytes, 5 cycles ld a, [hl] dec a ld [hl], a ``` But do: ```asm ; 1 bytes, 3 cycles dec [hl] ``` ## Branching (control flow) ### Relative jumps Don't do: ```asm jp Somewhere ; 3 bytes, 4 cycles ``` But do: ```asm jr Somewhere ; 2 bytes, 3 cycles ``` This only applies if `Somewhere` is within ±127 bytes of the jump. ### Compare `a` to 0 Don't do: ```asm cp 0 ; 2 bytes, 2 cycles ``` But do: ```asm or a ; 1 byte, 1 cycle ``` or do: ```asm and a ; 1 byte, 1 cycle ``` ### Compare `a` to 1 ```asm cp 1 ; 2 bytes, 2 cycles ``` If you don't care about the value in `a`: ```asm dec a ; 1 byte, 1 cycle, decrements a ``` Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken. Compare: ```asm cp 1 jr z, .equals1 ``` with: ```asm dec a jr z, .equals1 inc a ``` ### Compare `a` to 255 (255, or $FF in hexadecimal, is the same as −1 due to [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement).) ```asm cp $ff ; 2 bytes, 2 cycles ``` If you don't care about the value in `a`: ```asm dec a ; 1 byte, 1 cycle, decrements a ``` Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken. Compare: ```asm cp $ff jr z, .equals255 ``` with: ```asm inc a jr z, .equals255 dec a ``` ### Add `a` to `hl` without using a 16-bit register Don't do: ```asm ; 4 bytes, 5 cycles ld d, 0 ld e, a add hl, de ``` But do: ```asm ; 5 bytes, 5 or 6 cycles add l ld l, a jr nc, .no_carry inc h .no_carry ``` ## Subroutines (functions) ### Tail call optimization Don't do: ```asm ; 4 bytes, 10 cycles call Function ret ``` But do: ```asm ; 3 bytes, 4 cycles jp Function ``` ### Calling `hl` ```asm ; 5 bytes, 8 cycles ld de, .return push de jp hl .return ... ``` But do: ```asm ; 4 bytes, 7 cycles call _hl_ ; return ... ``` `_hl_` is a routine already defined in [home.asm](../blob/master/home.asm): ```asm _hl_:: jp hl ``` ### Inlining Don't do: ```asm ; 4 additional bytes, 10 additional cycles call GetOffset ... GetOffset: (some code) ret ``` if `GetOffset` is only called a handful of times. Instead, do: ```asm ; GetOffset (some code) ``` You can set `(some code)` apart with blank lines and put a comment on top to make its self-contained nature clear without the extra `call` and `ret`. ### Fallthrough Don't do: ```asm ... call Function ret Function: ... ``` But do: ```asm ... ; fallthrough Function: ... ``` You can still `call Function` elsewhere, but one tail call can be optimized into a fallthrough. ## Jump and lookup tables ### Chaining comparisons Don't do: ```asm cp 1 jr z, .equals1 cp 2 jr z, .equals2 cp 3 jr z, .equals3 ... ``` But do: ```asm dec a jr z, .equals1 dec a jr z, .equals2 dec a jr z, .equals3 ... ``` Or do: ```asm dec a ld hl, .jumptable ld e, a ld d, 0 add hl, de add hl, de ld a, [hli] ld h, [hl] ld l, a jp hl .jumptable: dw .equals1 dw .equals2 dw .equals3 ... ``` Or better, do: ```asm dec a ld hl, .jumptable rst JumpTable ... .jumptable: dw .equals1 dw .equals2 dw .equals3 ... ```