Sometimes the simplest way to write something in assembly code isn't the best. All of your resources are limited: CPU speed, ROM size, RAM space, register use. You can rewrite code to use those resources more efficiently (sometimes by trading one for another). Most of these tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devrs.com/gb/files/asmtips.txt), or [WikiTI's Z80 Optimization page](http://wikiti.brandonw.net/index.php?title=Z80_Optimization). (Note that Z80 assembly is *not* the same as GBZ80; it has more registers and some different instructions.) ## Contents - [Registers](#registers) - [Set `a` to 0](#set-a-to-0) - [Invert the bits of `a`](#invert-the-bits-of-a) - [Set `a` to some constant minus `a`](#set-a-to-some-constant-minus-a) - [Multiply `hl` by 2](#multiply-hl-by-2) - [Add `a` to a 16-bit register](#add-a-to-a-16-bit-register) - [Add `a` to an address](#add-a-to-an-address) - [Increment or decrement a 16-bit register](#increment-or-decrement-a-16-bit-register) - [Load from an address to `hl`](#load-from-an-address-to-hl) - [Exchange two 16-bit registers](#exchange-two-16-bit-registers) - [Load two constants into a register pair](#load-two-constants-into-a-register-pair) - [Load a constant into `[hl]`](#load-a-constant-into-hl) - [Increment or decrement `[hl]`](#increment-or-decrement-hl) - [Load a constant into `[hl]` and increment or decrement `hl`](#load-a-constant-into-hl-and-increment-or-decrement-hl) - [Branching (control flow)](#branching-control-flow) - [Relative jumps](#relative-jumps) - [Compare `a` to 0](#compare-a-to-0) - [Compare `a` to 1](#compare-a-to-1) - [Compare `a` to 255](#compare-a-to-255) - [Subroutines (functions)](#subroutines-functions) - [Tail call optimization](#tail-call-optimization) - [Call `hl`](#call-hl) - [Inlining](#inlining) - [Fallthrough](#fallthrough) - [Jump and lookup tables](#jump-and-lookup-tables) - [Chain comparisons](#chain-comparisons) ## Registers ### Set `a` to 0 Don't do: ```asm ld a, 0 ; 2 bytes, 2 cycles, no changes to flags ``` But do: ```asm xor a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1 ``` Or do: ```asm sub a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1 ``` Don't use the optimized versions if you need to preserve flags. As such, `ld a, 0` must be left intact in the code below: ```asm ld a, [wIsTrainerBattle] and a ; NZ if [wIsTrainerBattle] is nonzero ld a, 0 jr nz, .trainer ``` ### Invert the bits of `a` Don't do: ```asm xor $ff ; 2 bytes, 2 cycles ``` But do: ```asm cpl ; 1 byte, 1 cycle ``` ### Set `a` to some constant minus `a` Don't do: ```asm ; 4 bytes, 4 cycles ld b, a ld a, CONST sub b ``` But do: ```asm ; 3 bytes, 3 cycles cpl add CONST + 1 ``` ### Multiply `hl` by 2 Don't do: ```asm ; 6 bytes, 6 cycles sla l rl h ``` But do: ```asm add hl, hl ; 1 byte, 2 cycles ``` (The `SpeciesItemBoost` routine in [engine/battle/effect_commands.asm](../../blob/master/engine/battle/effect_commands.asm#L2812-L2814) actually does this!) ### Add `a` to a 16-bit register (The example uses `hl`, but `bc` or `de` would also work.) Don't do: ```asm ; 6 bytes, 6 cycles add l ld l, a ld a, 0 adc h ld h, a ``` and don't do: ```asm ; 6 bytes, 6 cycles add l ld l, a ld a, h adc 0 ld h, a ``` But do: ```asm ; 5 bytes, 5 or 6 cycles add l ld l, a jr nc, .no_carry inc h .no_carry: ``` Or better, do: ```asm ; 5 bytes, 5 cycles add l ld l, a adc h sub l ld h, a ``` Or if you can spare another 16-bit register and want to optimize for size over speed, do: ```asm ; 4 bytes, 5 cycles ld d, 0 ld e, a add hl, de ``` ### Add `a` to an address (The example uses `hl`, but `bc` or `de` would also work.) Don't do: ```asm ; 7 bytes, 8 cycles; uses another 16-bit register ld e, a ld d, 0 ld hl, Address add hl, de ``` But do: ```asm ; 7 bytes, 7 cycles add a, LOW(Address) ld l, a adc a, HIGH(Address) sub l ld h, a ``` ### Increment or decrement a 16-bit register When possible, avoid doing: ```asm inc hl ; 1 byte, 2 cycles ``` ```asm dec hl ; 1 byte, 2 cycles ``` If the low byte won't overflow, then do: ```asm inc l ; 1 byte, 1 cycle ``` ```asm dec l ; 1 byte, 1 cycle ``` ### Load from an address to `hl` Don't do: ```asm ; 8 bytes, 10 cycles ld a, [Address] ld l, a ld a, [Address+1] ld h, a ``` But do: ```asm ; 6 bytes, 8 cycles ld hl, Address ld a, [hli] ld h, [hl] ld l, a ``` ### Exchange two 16-bit registers (The example uses `hl` and `de`, but any pair of `bc`, `de`, or `hl` would also work.) If you care about speed: ```asm ; 6 bytes, 6 cycles ld a, d ld d, h ld h, a ld a, e ld e, l ld l, a ``` If you care about size: ```asm ; 4 bytes, 9 cycles push de ld d, h ld e, l pop hl ``` ### Load two constants into a register pair (The example uses `bc`, but `hl` or `de` would also work.) Don't do: ```asm ; 4 bytes, 4 cycles ld b, ONE ld c, TWO ``` But do: ```asm ld bc, ONE << 8 | TWO ; 3 bytes, 3 cycles ``` Or better, use the `lb` macro in [macros/code.asm](../blob/master/macros/code.asm): ```asm lb bc, ONE, TWO ; 3 bytes, 3 cycles ``` ### Load a constant into `[hl]` Don't do: ```asm ; 3 bytes, 4 cycles ld a, CONST ld [hl], a ``` But do: ```asm ld [hl], CONST ; 2 bytes, 3 cycles ``` ### Increment or decrement `[hl]` Don't do: ```asm ; 3 bytes, 5 cycles ld a, [hl] inc a ld [hl], a ``` ```asm ; 3 bytes, 5 cycles ld a, [hl] dec a ld [hl], a ``` But do: ```asm inc [hl] ; 1 bytes, 3 cycles ``` ```asm dec [hl] ; 1 bytes, 3 cycles ``` ### Load a constant into `[hl]` and increment or decrement `hl` Don't do: ```asm ; 2 bytes, 4 cycles ld [hl], a inc hl ``` ```asm ; 2 bytes, 4 cycles ld [hl], a dec hl ``` But do: ```asm ld [hli], a ; 1 bytes, 2 cycles ``` ```asm ld [hld], a ; 1 bytes, 2 cycles ``` ## Branching (control flow) ### Relative jumps Don't do: ```asm jp Somewhere ; 3 bytes, 4 cycles ``` But do: ```asm jr Somewhere ; 2 bytes, 3 cycles ``` This only applies if `Somewhere` is within ±127 bytes of the jump. ### Compare `a` to 0 Don't do: ```asm cp 0 ; 2 bytes, 2 cycles ``` But do: ```asm or a ; 1 byte, 1 cycle ``` Or do: ```asm and a ; 1 byte, 1 cycle ``` ### Compare `a` to 1 ```asm cp 1 ; 2 bytes, 2 cycles ``` If you don't care about the value in `a`: ```asm dec a ; 1 byte, 1 cycle, decrements a ``` Note that you can still do `inc a` afterwards, which is one cycle faster if the jump is taken. Compare: ```asm cp 1 jr z, .equals1 ``` with: ```asm dec a jr z, .equals1 inc a ``` ### Compare `a` to 255 (255, or $FF in hexadecimal, is the same as −1 due to [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement).) ```asm cp $ff ; 2 bytes, 2 cycles ``` If you don't care about the value in `a`: ```asm inc a ; 1 byte, 1 cycle, increments a ``` Note that you can still do `dec a` afterwards, which is one cycle faster if the jump is taken. Compare: ```asm cp $ff jr z, .equals255 ``` with: ```asm inc a jr z, .equals255 dec a ``` ## Subroutines (functions) ### Tail call optimization Don't do: ```asm ; 4 bytes, 10 cycles call Function ret ``` But do: ```asm jp Function ; 3 bytes, 4 cycles ``` ### Call `hl` ```asm ; 5 bytes, 8 cycles ld de, .return push de jp hl .return ... ``` But do: ```asm call _hl_ ; 4 bytes, 7 cycles, counting the definition of _hl_ ... ``` `_hl_` is a routine already defined in [home.asm](../blob/master/home.asm): ```asm _hl_:: jp hl ``` ### Inlining Don't do: ```asm ; 4 additional bytes, 10 additional cycles call GetOffset ... GetOffset: (some code) ret ``` if `GetOffset` is only called a handful of times. Instead, do: ```asm ; GetOffset (some code) ``` You can set `(some code)` apart with blank lines and put a comment on top to make its self-contained nature clear without the extra `call` and `ret`. ### Fallthrough Don't do: ```asm ... call Function ret Function: (some code) ret ``` And don't do: ```asm ... jp Function Function: (some code) ret ``` But do: ```asm ... ; fallthrough Function: (some code) ret ``` You can still `call Function` elsewhere, but one tail call can be optimized into a fallthrough. ## Jump and lookup tables ### Chain comparisons Don't do: ```asm cp 1 jr z, .equals1 cp 2 jr z, .equals2 cp 3 jr z, .equals3 ... ``` But do: ```asm dec a jr z, .equals1 dec a jr z, .equals2 dec a jr z, .equals3 ... ``` Or do: ```asm dec a ld hl, .jumptable ld e, a ld d, 0 add hl, de add hl, de ld a, [hli] ld h, [hl] ld l, a jp hl .jumptable: dw .equals1 dw .equals2 dw .equals3 ... ``` Or better, do: ```asm dec a ld hl, .jumptable rst JumpTable ... .jumptable: dw .equals1 dw .equals2 dw .equals3 ... ```