Sometimes the simplest way to write something in assembly code isn't the best. All of your resources are limited: CPU speed, ROM size, RAM space, register use. You can rewrite code to use those resources more efficiently (sometimes by trading one for another).
Most of these tricks come from [Jeff's GB Assembly Code Tips v1.0](http://www.devrs.com/gb/files/asmtips.txt), [WikiTI's Z80 Optimization page](http://wikiti.brandonw.net/index.php?title=Z80_Optimization), [z80 Heaven's optimization tutorial](http://z80-heaven.wikidot.com/optimization), and [GBDev Wiki's ASM Snippets](https://gbdev.gg8.se/wiki/articles/ASM_Snippets). (Note that Z80 assembly is *not* the same as GBZ80; it has more registers and some different instructions.)
WikiTI's advice fully applies here:
> Note that the following tricks act much like a [peephole optimizer](https://en.wikipedia.org/wiki/Peephole_optimization) and are the last optimization step: remember to first optimize your algorithm and register allocation before applying any of the following if you really want the fastest speed and the smallest code.
>
> Also note that nearly every trick turns the code less understandable and documenting them is a good idea. You can easily forgot after a while without reading parts of the code.
>
> Be warned that some tricks are not exactly equivalent to the normal way and may have exceptions on their use; comments warn about them. Some tricks apply to other cases, but again you have to be careful.
>
> There are some tricks that are nothing more than the correct use of the available instructions on the Z80. Keeping an [instruction set summary](https://rednex.github.io/rgbds/gbz80.7.html) helps to visualize what you can do during coding.
(There's also a "cheat sheet" [table of instructions](https://gbdev.io/gb-opcodes//optables/classic) summarizing their bytes, cycles, and affected flags, if you don't need a long listing of what each one does.)
## Contents
- [8-bit registers](#8-bit-registers)
- [Set `a` to 0](#set-a-to-0)
- [Increment or decrement `a`](#increment-or-decrement-a)
- [Invert the bits of `a`](#invert-the-bits-of-a)
- [Rotate the bits of `a`](#rotate-the-bits-of-a)
- [Reverse the bits of `a`](#reverse-the-bits-of-a)
- [Set `a` to some constant minus `a`](#set-a-to-some-constant-minus-a)
- [Set `a` to one constant or another depending on the carry flag](#set-a-to-one-constant-or-another-depending-on-the-carry-flag)
- [Increment or decrement `a` when the carry flag is set](#increment-or-decrement-a-when-the-carry-flag-is-set)
- [Toggle `a` between two different constants](#toggle-a-between-two-different-constants)
- [Divide `a` by 8 (shift `a` right 3 bits)](#divide-a-by-8-shift-a-right-3-bits)
- [Divide `a` by 16 (shift `a` right 4 bits)](#divide-a-by-16-shift-a-right-4-bits)
- [Set `a` to some value plus or minus carry](#set-a-to-some-value-plus-or-minus-carry)
- [Add or subtract the carry flag from a register besides `a`](#add-or-subtract-the-carry-flag-from-a-register-besides-a)
- [Load from HRAM to `a` or from `a` to HRAM](#load-from-hram-to-a-or-from-a-to-hram)
- [16-bit registers](#16-bit-registers)
- [Multiply `hl` by 2](#multiply-hl-by-2)
- [Add `a` to a 16-bit register](#add-a-to-a-16-bit-register)
- [Subtract an 8-bit constant from a 16-bit register](#subtract-an-8-bit-constant-from-a-16-bit-register)
- [Set a 16-bit register to `a` plus a constant](#set-a-16-bit-register-to-a-plus-a-constant)
- [Set a 16-bit register to `a` multiplied by 16](#set-a-16-bit-register-to-a-multiplied-by-16)
- [Increment or decrement a 16-bit register](#increment-or-decrement-a-16-bit-register)
- [Add or subtract the carry flag from a 16-bit register](#add-or-subtract-the-carry-flag-from-a-16-bit-register)
- [Load from an address to `hl`](#load-from-an-address-to-hl)
- [Load from an address to `sp`](#load-from-an-address-to-sp)
- [Exchange two 16-bit registers](#exchange-two-16-bit-registers)
- [Subtract two 16-bit registers](#subtract-two-16-bit-registers)
- [Load two constants into a register pair](#load-two-constants-into-a-register-pair)
- [Load a constant into `[hl]`](#load-a-constant-into-hl)
- [Increment or decrement `[hl]`](#increment-or-decrement-hl)
- [Load a constant into `[hl]` and increment or decrement `hl`](#load-a-constant-into-hl-and-increment-or-decrement-hl)
- [Branching (control flow)](#branching-control-flow)
- [Relative jumps](#relative-jumps)
- [Compare `a` to 0](#compare-a-to-0)
- [Compare `a` to 1](#compare-a-to-1)
- [Compare `a` to 255](#compare-a-to-255)
- [Compare `a` to 0 after masking it](#compare-a-to-0-after-masking-it)
- [Compare `a` to a mask after masking it](#compare-a-to-a-mask-after-masking-it)
- [Test whether `a` is negative (compare `a` to $80)](#test-whether-a-is-negative-compare-a-to-80)
- [Subroutines (functions)](#subroutines-functions)
- [Tail call optimization](#tail-call-optimization)
- [Call `hl`](#call-hl)
- [Inlining](#inlining)
- [Fallthrough](#fallthrough)
- [Conditional fallthrough](#conditional-fallthrough)
- [Conditional return](#conditional-return)
- [Conditional call](#conditional-call)
- [Conditional `rst $38`](#conditional-rst-38)
- [Enable interrupts and return](#enable-interrupts-and-return)
- [Jump and lookup tables](#jump-and-lookup-tables)
- [Chain comparisons](#chain-comparisons)
- [Off-by-one `AddNTimes`](#off-by-one-addntimes)
## 8-bit registers
### Set `a` to 0
Don't do this:
```asm
ld a, 0 ; 2 bytes, 2 cycles; no changes to flags
```
Instead, do this:
```asm
xor a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1
```
Or do this:
```asm
sub a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1
```
Don't use the optimized versions if you need to preserve flags. As such, `ld a, 0` must be left intact in the code below:
```asm
ld a, [wIsTrainerBattle]
and a ; sets zero flag if [wIsTrainerBattle] == 0
ld a, 0 ; sets a to 0 without affecting zero flag
jr nz, .is_trainer_battle
; is not trainer battle
```
### Increment or decrement `a`
When possible, avoid doing this:
```asm
add 1 ; 2 bytes, 2 cycles; sets carry for -1 to 0 overflow
```
```asm
sub 1 ; 2 bytes, 2 cycles; sets carry for 0 to -1 underflow
```
If you don't need to set the carry flag, then do this:
```asm
inc a ; 1 byte, 1 cycle
```
```asm
dec a ; 1 byte, 1 cycle
```
### Invert the bits of `a`
Don't do this:
```asm
xor $ff ; 2 bytes, 2 cycles
```
Instead, do this:
```asm
cpl ; 1 byte, 1 cycle
```
### Rotate the bits of `a`
Don't do this:
```asm
rl a ; 2 bytes, 2 cycles; updates Z and C flags
```
```asm
rlc a ; 2 bytes, 2 cycles; updates Z and C flags
```
```asm
rr a ; 2 bytes, 2 cycles; updates Z and C flags
```
```asm
rrc a ; 2 bytes, 2 cycles; updates Z and C flags
```
Instead, do this:
```asm
rla ; 1 byte, 1 cycle; updates C flag
```
```asm
rlca ; 1 byte, 1 cycle; updates C flag
```
```asm
rra ; 1 byte, 1 cycle; updates C flag
```
```asm
rrca ; 1 byte, 1 cycle; updates C flag
```
The exception is if you need to set the zero flag when the operation results in 0 for `a`; the two-byte operations can set `z`, the one-byte operations cannot.
### Reverse the bits of `a`
(This optimization is based on [Retro Programming](http://www.retroprogramming.com/2014/01/fast-z80-bit-reversal.html)).
(The example uses `b` and `c`, but any of `d`, `e`, `h`, or `l` would also work.)
Don't do this:
```asm
; 26 bytes, 26 cycles
rept 8
rra ; nor rla
rl b ; nor rr b
endr
ld a, b
```
And don't do this:
```asm
; 17 bytes, 17 cycles
ld b, a
rlca
rlca
xor b
and $aa
xor b
ld b, a
rlca
rlca
rlca
rrc b
xor b
and $66
xor b
```
Instead, do this:
```asm
; 15 bytes, 15 cycles
ld b, a
rlca
rlca
xor b
and $aa
xor b
ld b, a
swap b
xor b
and $33
xor b
rrca
```
Or if you really want to optimize for size over speed, then don't do this:
```asm
; 10 bytes, 59 cycles
ld bc, 8 ; lb bc, 0, 8
.loop
rra ; nor rla
rl b ; nor rr b
dec c
jr nz, .loop
ld a, b
```
Instead, do this:
```asm
; 8 bytes, 50 cycles
ld b, 1
.loop
rra
rl b
jr nc, .loop
ld a, b
```
Or if you really want to optimize for speed over size, then do this:
```asm
; 6 bytes, 12 cycles
; (4 bytes, 5 cycles if you don't need the push hl/pop hl)
push hl
ld h, HIGH(ReversedBitTable)
ld l, a
ld a, [hl]
pop hl
```
```asm
; 256 bytes; placed in ROM0 or the same ROMX section as the bit reversal
SECTION "ReversedBitTable", ROM0, ALIGN[8]
ReversedBitTable::
for x, 256
; http://graphics.stanford.edu/~seander/bithacks.html#ReverseByteWith32Bits
db LOW(((((x * $802) & $22110) | ((x * $8020) & $88440)) * $10101) >> 16)
endr
```
### Set `a` to some constant minus `a`
Don't do this:
```asm
; 4 bytes, 4 cycles
ld b, a
ld a, FOOBAR
sub b
```
Instead, do this:
```asm
; 3 bytes, 3 cycles
cpl
add FOOBAR + 1
```
("What's [foobar](https://en.wikipedia.org/wiki/Foobar)?")
### Set `a` to one constant or another depending on the carry flag
(The example sets `a` to `CVAL` if the carry flag is set (`c`), or `NCVAL` is the carry flag is not set (`nc`).)
Don't do this:
```asm
; 6 bytes, 6 or 7 cycles
ld a, CVAL
jr c, .carry
ld a, NCVAL
.carry
```
And don't do this:
```asm
; 6 bytes, 6 or 7 cycles
ld a, NCVAL
jr nc, .no_carry
ld a, CVAL
.no_carry
```
And if either is 0, don't do this:
```asm
; 5 bytes, 5 cycles
ld a, CVAL ; nor NCVAL
jr c, .carry ; nor jr nc
xor a
.carry
```
And if either is 1 more or less than the other, don't do this:
```asm
; 5 bytes, 5 cycles
ld a, CVAL ; nor NCVAL
jr c, .carry ; nor jr nc
inc a ; nor dec a
.carry
```
Instead use `sbc a`, which copies the carry flag to all bits of `a`. So do this:
```asm
; 5 bytes, 5 cycles
sbc a ; if carry, then $ff, else 0
and CVAL - NCVAL ; $ff becomes CVAL - NCVAL, 0 stays 0
add NCVAL ; CVAL - NCVAL becomes CVAL, 0 becomes NCVAL
```
Or do this:
```asm
; 5 bytes, 5 cycles
sbc a ; if carry, then $ff, else 0
and CVAL ^ NCVAL ; $ff becomes CVAL ^ NCVAL, 0 stays 0
xor NCVAL ; CVAL ^ NCVAL becomes CVAL, 0 becomes NCVAL
```
And if certain conditions apply, then do something more efficient:
If this case... |
...then do this: |
`CVAL` == $FF (aka −1) and `NCVAL` == 0
|
```asm
; 1 byte, 1 cycle
sbc a ; if carry, then $ff, else 0
```
|
`CVAL` == 0 and `NCVAL` == $FF (aka −1)
|
```asm
; 2 bytes, 2 cycles
ccf ; invert carry flag
sbc a ; if originally carry, then 0, else $ff
```
|
`CVAL` == 0 and `NCVAL` == 1
|
```asm
; 2 bytes, 2 cycles
sbc a ; if carry, then $ff aka -1, else 0
inc a ; -1 becomes 0, 0 becomes 1
```
|
`CVAL` == $FF (aka −1)
|
```asm
; 3 bytes, 3 cycles
sbc a ; if carry, then $ff, else 0
or NCVAL ; $ff stays $ff, $00 becomes NCVAL
```
|
`NCVAL` == 0
|
```asm
; 3 bytes, 3 cycles
sbc a ; if carry, then $ff, else 0
and CVAL ; $ff becomes CVAL, 0 stays 0
```
|
`CVAL` == `NCVAL - 1`, aka `CVAL + 1` == `NCVAL`
|
```asm
; 3 bytes, 3 cycles
sbc a ; if carry, then $ff aka -1, else 0
add NCVAL ; -1 becomes NCVAL - 1 aka CVAL, 0 becomes NCVAL
```
|
`CVAL` == `NCVAL - 2`, aka `CVAL + 2` == `NCVAL`
|
```asm
; 3 bytes, 3 cycles
sbc a ; if carry, then $ff aka -1, else 0; doesn't change the carry flag
sbc -NCVAL ; -1 becomes NCVAL - 2 aka CVAL, 0 becomes NCVAL
```
|
`CVAL` == 0
|
```asm
; 4 bytes, 4 cycles
ccf ; invert carry flag
sbc a ; if originally carry, then 0, else $ff
and NCVAL ; 0 stays 0, $ff becomes NCVAL
```
|
`NCVAL` == $FF (aka −1)
|
```asm
; 4 bytes, 4 cycles
ccf ; invert carry flag
sbc a ; if originally carry, then 0, else $ff
or CVAL ; $00 becomes CVAL, $ff stays $ff
```
|
`CVAL` == `NCVAL + 1`, aka `CVAL - 1` == `NCVAL`
|
```asm
; 4 bytes, 4 cycles
ccf ; invert carry flag
sbc a ; if originally carry, then 0, else $ff aka -1
add CVAL ; -1 becomes CVAL - 1 aka NCVAL, 0 becomes CVAL
```
|
`CVAL` == `NCVAL + 2`, aka `CVAL - 2` == `NCVAL`
|
```asm
; 4 bytes, 4 cycles
ccf ; invert carry flag
sbc a ; if carry, then 0, else $ff aka -1; doesn't change the carry flag
sbc -CVAL ; -1 becomes CVAL - 2 aka NCVAL, 0 becomes CVAL
```
|
### Increment or decrement `a` when the carry flag is set
Don't do this:
```asm
; 3 bytes, 3 cycles
jr nc, .ok
inc a
.ok
```
```asm
; 3 bytes, 3 cycles
jr nc, .ok
dec a
.ok
```
Instead, do this:
```asm
adc 0 ; 2 bytes, 2 cycles
```
```asm
sbc 0 ; 2 bytes, 2 cycles
```
### Toggle `a` between two different constants
Don't do this:
```asm
; 12 bytes, 9 or 10 cycles
cp FOO
jr z, .foo_to_bar
jr .bar_to_foo
.foo_to_bar
ld a, BAR
jr .done
.bar_to_foo
ld a, FOO
.done
...
```
And don't do this:
```asm
; 10 bytes, 7 or 9 cycles
cp FOO
jr z, .foo_to_bar ; nor jr nz, .bar_to_foo
ld a, FOO ; nor ld a, BAR
jr .done
.foo_to_bar ; nor .bar_to_foo
ld a, BAR ; nor ld a, FOO
.done
...
```
(That would be applying the "[Conditional fallthrough](#conditional-fallthrough)" optimization to the first way.)
Instead, do this:
```asm
xor FOO ^ BAR ; 2 bytes, 2 cycles
```
(This works for the same reason as the [XOR swap algorithm](https://en.wikipedia.org/wiki/XOR_swap_algorithm) for swapping the values of two variables.)
### Divide `a` by 8 (shift `a` right 3 bits)
Don't do this:
```asm
; 6 bytes, 9 cycles
; (15 bytes, at least 21 cycles, counting the definition of SimpleDivide)
ld c, 8 ; divisor
call SimpleDivide
ld a, b ; quotient
```
And don't do this:
```asm
; 6 bytes, 6 cycles
srl a
srl a
srl a
```
Instead, do this:
```asm
; 5 bytes, 5 cycles
rrca
rrca
rrca
and %00011111
```
### Divide `a` by 16 (shift `a` right 4 bits)
Don't do this:
```asm
; 6 bytes, 9 cycles
; (15 bytes, at least 21 cycles, counting the definition of SimpleDivide)
ld c, 16 ; divisor
call SimpleDivide
ld a, b ; quotient
```
And don't do this:
```asm
; 8 bytes, 8 cycles
srl a
srl a
srl a
srl a
```
Instead, do this:
```asm
; 4 bytes, 4 cycles
swap a
and $f
```
### Set `a` to some value plus or minus carry
(The example uses `b` and `c`, but any registers besides `a` would also work, including `[hl]`.)
Don't do this:
```asm
; 4 bytes, 4 cycles
ld b, a
ld a, c
adc 0
```
```asm
; 4 bytes, 4 cycles
ld b, a
ld a, c
sbc 0
```
And don't do this:
```asm
; 4 bytes, 4 cycles
ld b, a
ld a, 0
adc c
```
```asm
; 4 bytes, 4 cycles
ld b, a
ld a, 0
sbc c
```
Instead, do this:
```asm
; 3 bytes, 3 cycles
ld b, a
adc c
sub b
```
```asm
; 3 bytes, 3 cycles
ld b, a
sbc c
add b
```
Also, don't do this:
```asm
; 5 bytes, 5 cycles
ld b, a
ld a, N
adc 0
```
```asm
; 5 bytes, 5 cycles
ld b, a
ld a, N
sbc 0
```
And don't do this:
```asm
; 5 bytes, 5 cycles
ld b, a
ld a, 0
adc N
```
```asm
; 5 bytes, 5 cycles
ld b, a
ld a, 0
sbc N
```
Instead, do this:
```asm
; 4 bytes, 4 cycles
ld b, a
adc N
sub b
```
```asm
; 4 bytes, 4 cycles
ld b, a
sbc N
add b
```
(If the original value of `a` was not backed up in `b`, this optimization would not apply.)
### Add or subtract the carry flag from a register besides `a`
(The example uses `b`, but any of `c`, `d`, `e`, `h`, or `l` would also work.)
Don't do this:
```asm
; 4 bytes, 4 cycles
ld a, b
adc 0
ld b, a
```
```asm
; 4 bytes, 4 cycles
ld a, b
sbc 0
ld b, a
```
And don't do this:
```asm
; 4 bytes, 4 cycles
ld a, 0
adc b
ld b, a
```
```asm
; 4 bytes, 4 cycles
ld a, 0
sbc b
ld b, a
```
Instead, do this:
```asm
; 3 bytes, 3 or 4 cycles
jr nc, .no_carry
inc b
.no_carry
```
```asm
; 3 bytes, 3 or 4 cycles
jr nc, .no_carry
dec b
.no_carry
```
### Load from HRAM to `a` or from `a` to HRAM
Don't do this:
```asm
ld a, [hFoobar] ; 3 bytes, 4 cycles
```
```asm
ld [hFoobar], a ; 3 bytes, 4 cycles
```
Instead, do this:
```asm
ldh a, [hFoobar] ; 2 bytes, 3 cycles
```
```asm
ldh [hFoobar], a ; 2 bytes, 3 cycles
```
## 16-bit registers
### Multiply `hl` by 2
Don't do this:
```asm
; 4 bytes, 4 cycles
sla l
rl h
```
Instead, do this:
```asm
add hl, hl ; 1 byte, 2 cycles
```
### Add `a` to a 16-bit register
(The example uses `hl`, but `bc` or `de` would also work.)
Don't do this:
```asm
; 6 bytes, 6 cycles
add l
ld l, a
ld a, 0
adc h
ld h, a
```
And don't do this:
```asm
; 6 bytes, 6 cycles
add l
ld l, a
ld a, h
adc 0
ld h, a
```
And don't do this:
```asm
; 5 bytes, 5 cycles
add l
ld l, a
jr nc, .no_carry
inc h
.no_carry
```
Instead, do this:
```asm
; 5 bytes, 5 cycles; no labels
add l
ld l, a
adc h
sub l
ld h, a
```
Or if you can spare another 16-bit register and want to optimize for size over speed, then do this:
```asm
; 4 bytes, 5 cycles
ld d, 0
ld e, a
add hl, de
```
### Subtract an 8-bit constant from a 16-bit register
(The example uses `hl`, but `bc` or `de` would also work.)
Don't do this:
```asm
; 8 bytes, 8 cycles
ld a, l
sub FOOBAR
ld l, a
ld a, h
sbc 0
ld h, a
```
Instead, do this:
```asm
; 7 bytes, 7 or 8 cycles
ld a, l
sub FOOBAR
ld l, a
jr nc, .no_carry
dec h
.no_carry
```
(This is a case of "[Add or subtract the carry flag from a register besides `a`](#add-or-subtract-the-carry-flag-from-a-register-besides-a)", applied to the high part of a 16-bit register.)
Or if you can spare another 16-bit register, do this:
```asm
; 4 bytes, 5 cycles
ld de, -FOOBAR
add hl, de
```
### Set a 16-bit register to `a` plus a constant
(The example uses `hl`, but `bc` or `de` would also work.)
Don't do this:
```asm
; 7 bytes, 8 cycles; uses another 16-bit register
ld e, a
ld d, 0
ld hl, FooBar
add hl, de
```
And don't do this:
```asm
; 8 bytes, 8 cycles
ld hl, FooBar
add l
ld l, a
adc h
sub l
ld h, a
```
And don't do this:
```asm
; 8 bytes, 8 cycles
ld h, HIGH(FooBar)
add LOW(FooBar)
ld l, a
jr nc, .no_carry
inc h
.no_carry
```
Instead, do this:
```asm
; 7 bytes, 7 cycles
add LOW(FooBar)
ld l, a
adc HIGH(FooBar)
sub l
ld h, a
```
Or if the constant is 8-bit and nonzero (i.e. 0 < `FooBar` < 256), then do this:
```asm
; 6 bytes, 6 cycles
sub LOW(-FooBar)
ld l, a
sbc a
inc a
ld h, a
```
Or if the constant is zero (i.e. `FooBar` == 0 and `a` + `FooBar` == `a`), then do this:
```asm
; 3 bytes, 3 cycles
ld l, a
ld h, 0
```
### Set a 16-bit register to `a` multiplied by 16
(The example uses `hl`, but `bc` or `de` would also work.)
You can do this:
```asm
; 7 bytes, 11 cycles
ld l, a
ld h, 0
add hl, hl
add hl, hl
add hl, hl
add hl, hl
```
```asm
; 7 bytes, 11 cycles
ld l, a
ld h, 0
rept 4
add hl, hl
endr
```
But if `a` is definitely small enough, and its value can be changed, then do one of these:
```asm
; 7 bytes, 10 cycles; sets a = a * 2; requires a < $80
add a
ld l, a
ld h, 0
add hl, hl
add hl, hl
add hl, hl
```
```asm
; 7 bytes, 9 cycles; sets a = a * 4; requires a < $40
add a
add a
ld l, a
ld h, 0
add hl, hl
add hl, hl
```
```asm
; 7 bytes, 8 cycles; sets a = a * 8; requires a < $20
add a
add a
add a
ld l, a
ld h, 0
add hl, hl
```
```asm
; 5 bytes, 5 cycles; sets a = a * 16; requires a < $10
swap a
ld l, a
ld h, 0
```
Or if the value of `a` can be changed and you want to optimize for speed over size, then do one of these:
```asm
; 8 bytes, 8 cycles; sets a = l
swap a
ld l, a
and $f
ld h, a
xor l
ld l, a
```
```asm
; 8 bytes, 8 cycles; sets a = h
swap a
ld h, a
and $f0
ld l, a
xor h
ld h, a
```
### Increment or decrement a 16-bit register
When possible, avoid doing this:
```asm
inc hl ; 1 byte, 2 cycles
```
```asm
dec hl ; 1 byte, 2 cycles
```
If the low byte *definitely* won't overflow, then do this:
```asm
inc l ; 1 byte, 1 cycle
```
```asm
dec l ; 1 byte, 1 cycle
```
This is applicable, for instance, if you're reading a data table via `hl` one byte at a time, it has no more than 256 entries, and it's in its own `SECTION` which has been `ALIGN`ed to 8 bits. It's unlikely to apply to pokecrystal's existing systems.
### Add or subtract the carry flag from a 16-bit register
(The example uses `hl`, but `bc` or `de` would also work.)
Don't do this:
```asm
; 8 bytes, 8 cycles
ld a, l ; nor ld a, 0
adc 0 ; nor adc l
ld l, a
ld a, h ; nor ld a, 0
adc 0 ; nor adc h
ld h, a
```
```asm
; 8 bytes, 8 cycles
ld a, l ; nor ld a, 0
sbc 0 ; nor sbc l
ld l, a
ld a, h ; nor ld a, 0
sbc 0 ; nor sbc h
ld h, a
```
And don't do this:
```asm
; 7 bytes, 7 cycles
ld a, l ; nor ld a, 0
adc 0 ; nor adc l
ld l, a
adc h
sub l
ld h, a
```
```asm
; 7 bytes, 7 cycles
ld a, l ; nor ld a, 0
sbc 0 ; nor sbc l
ld l, a
sbc h
add l
ld h, a
```
(That would be applying the "[Set `a` to some value plus or minus carry](#set-a-to-some-value-plus-or-minus-carry)" optimization to part of the first way.)
And don't do this:
```asm
; 7 bytes, 7 or 8 cycles
ld a, l ; nor ld a, 0
adc 0 ; nor adc l
ld l, a
jr nc, .no_carry
inc h
.no_carry
```
```asm
; 7 bytes, 7 or 8 cycles
ld a, l ; nor ld a, 0
sbc 0 ; nor sbc l
ld l, a
jr nc, .no_carry
dec h
.no_carry
```
(That would be applying the "[Add or subtract the carry flag from a register besides `a`](#add-or-subtract-the-carry-flag-from-a-register-besides-a)" optimization to part of the first way.)
Instead, do this:
```asm
; 3 bytes, 4 or 5 cycles
jr nc, .no_carry
inc hl
.no_carry
```
```asm
; 3 bytes, 4 or 5 cycles
jr nc, .no_carry
dec hl
.no_carry
```
### Load from an address to `hl`
Don't do this:
```asm
; 8 bytes, 10 cycles
ld a, [Address] ; LSB first
ld l, a
ld a, [Address+1]
ld h, a
```
Instead, do this:
```asm
; 6 bytes, 8 cycles
ld hl, Address
ld a, [hli]
ld h, [hl]
ld l, a
```
And don't do this:
```asm
; 8 bytes, 10 cycles
ld a, [Address] ; MSB first
ld h, a
ld a, [Address+1]
ld l, a
```
Instead, do this:
```asm
; 6 bytes, 8 cycles
ld hl, Address
ld a, [hli]
ld l, [hl]
ld h, a
```
### Load from an address to `sp`
Don't do this:
```asm
; 9 bytes, 12 cycles
ld a, [Address]
ld l, a
ld a, [Address+1]
ld h, a
ld sp, hl
```
And don't do this:
```asm
; 7 bytes, 10 cycles
ldh a, [hAddress]
ld l, a
ldh a, [hAddress+1]
ld h, a
ld sp, hl
```
And don't do this:
```asm
; 7 bytes, 10 cycles
ld hl, Address
ld a, [hli]
ld h, [hl]
ld l, a
ld sp, hl
```
(That would be applying the "[Load from an address to `hl`](#load-from-an-address-to-hl)" optimization to the first way.)
Instead, do this:
```asm
; 5 bytes, 8 cycles
ld sp, Address
pop hl
ld sp, hl
```
Or if the address is already in `hl`, then don't do this:
```asm
; 4 bytes, 7 cycles
ld a, [hli]
ld h, [hl]
ld l, a
ld sp, hl
```
Instead, do this:
```asm
; 3 bytes, 7 cycles
ld sp, hl
pop hl
ld sp, hl
```
### Exchange two 16-bit registers
(The example uses `hl` and `de`, but any pair of `bc`, `de`, or `hl` would also work.)
If you care about speed, then do this:
```asm
; 6 bytes, 6 cycles
ld a, d
ld d, h
ld h, a
ld a, e
ld e, l
ld l, a
```
If you care about size, then do this:
```asm
; 4 bytes, 9 cycles
push de
ld d, h
ld e, l
pop hl
```
### Subtract two 16-bit registers
(The example uses `hl` and `de`, but any pair of `bc`, `de`, or `hl` would also work.)
Don't do this:
```asm
; 9 bytes, 10 cycles; modifies subtrahend de
ld a, $ff
xor d
ld d, a
ld a, $ff
xor e
ld e, a
add hl, de
```
And don't do this:
```asm
; 7 bytes, 8 cycles; modifies subtrahend de
ld a, d
cpl
ld d, a
ld a, e
cpl
ld e, a
add hl, de
```
Instead, do this:
```asm
; 6 bytes, 6 cycles
ld a, l
sub e
ld l, a
ld a, h
sbc d
ld h, a
```
### Load two constants into a register pair
(The example uses `bc`, but `hl` or `de` would also work.)
Don't do this:
```asm
; 4 bytes, 4 cycles
ld b, FOO
ld c, BAR
```
Instead, do this:
```asm
ld bc, FOO << 8 | BAR ; 3 bytes, 3 cycles
```
Or better, use the `lb` macro in [macros/code.asm](../blob/master/macros/code.asm):
```asm
lb bc, FOO, BAR ; 3 bytes, 3 cycles
```
### Load a constant into `[hl]`
Don't do this:
```asm
; 3 bytes, 4 cycles
ld a, FOOBAR
ld [hl], a
```
Instead, do this:
```asm
ld [hl], FOOBAR ; 2 bytes, 3 cycles
```
### Increment or decrement `[hl]`
Don't do this:
```asm
; 3 bytes, 5 cycles
ld a, [hl]
inc a
ld [hl], a
```
```asm
; 3 bytes, 5 cycles
ld a, [hl]
dec a
ld [hl], a
```
Instead, do this:
```asm
inc [hl] ; 1 bytes, 3 cycles
```
```asm
dec [hl] ; 1 bytes, 3 cycles
```
### Load a constant into `[hl]` and increment or decrement `hl`
Don't do this:
```asm
; 2 bytes, 4 cycles
ld [hl], a
inc hl
```
```asm
; 2 bytes, 4 cycles
ld [hl], a
dec hl
```
Instead, do this:
```asm
ld [hli], a ; 1 bytes, 2 cycles
```
```asm
ld [hld], a ; 1 bytes, 2 cycles
```
And if you can use `a`, then don't do this:
```asm
; 3 bytes, 5 cycles
ld [hl], FOO
inc hl
```
```asm
; 3 bytes, 5 cycles
ld [hl], FOO
dec hl
```
Instead, do this:
```asm
; 3 bytes, 4 cycles
ld a, FOO
ld [hli], a
```
```asm
; 3 bytes, 4 cycles
ld a, FOO
ld [hld], a
```
## Branching (control flow)
### Relative jumps
Don't do this:
```asm
jp Somewhere ; 3 bytes, 4 cycles
```
Instead, do this:
```asm
jr Somewhere ; 2 bytes, 3 cycles
```
This only applies if `Somewhere` is within ±128 bytes of the jump.
You can define a `jmp` macro to use instead of `jp`, which will warn you when it can be `jr` instead:
```
jmp: MACRO
if _NARG == 1
jp \1
else
jp \1, \2
shift
endc
assert warn, (\1) - @ > 127 || (\1) - @ < -129, "jp can be jr"
ENDM
```
### Compare `a` to 0
Don't do this:
```asm
cp 0 ; 2 bytes, 2 cycles
```
And don't do this:
```asm
or 0 ; 2 bytes, 2 cycles
```
And don't do this:
```asm
and $ff ; 2 bytes, 2 cycles
```
Instead, do this:
```asm
or a ; 1 byte, 1 cycle
```
Or do this:
```asm
and a ; 1 byte, 1 cycle
```
### Compare `a` to 1
Do this:
```asm
cp 1 ; 2 bytes, 2 cycles; updates Z and C flags
```
Or if you don't care about the value in `a`, and don't need to set the carry flag, then do this:
```asm
dec a ; 1 byte, 1 cycle; decrements a, updates Z flag
```
Note that you can still do `inc a` afterwards, which is one cycle faster if the jump is taken. Compare this:
```asm
; 4 bytes, 4 or 5 cycles
cp 1
jr z, .equals1
```
with this:
```asm
; 4 bytes, 4 cycles
dec a
jr z, .equals1
inc a
```
### Compare `a` to 255
(255, or $FF in hexadecimal, is the same as −1 due to [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement).)
Do this:
```asm
cp $ff ; 2 bytes, 2 cycles; updates Z and C flags
```
Or if you don't care about the value in `a`, and don't need to set the carry flag, then do this:
```asm
inc a ; 1 byte, 1 cycle; increments a, updates Z flag
```
Note that you can still do `dec a` afterwards, which is one cycle faster if the jump is taken. Compare this:
```asm
; 4 bytes, 4 or 5 cycles
cp $ff
jr z, .equals255
```
with this:
```asm
; 4 bytes, 4 cycles
inc a
jr z, .equals255
dec a
```
### Compare `a` to 0 after masking it
Don't do this:
```asm
; 3 bytes, 3 cycles; sets zero flag if a == 0
and MASK
and a
```
Instead, do this:
```asm
and MASK ; 2 bytes, 2 cycles; sets zero flag if a == 0
```
### Compare `a` to a mask after masking it
Don't do this:
```asm
; 4 bytes, 4 cycles; sets zero flag if a == MASK and carry flag if a < MASK
and MASK
cp MASK
```
If you don't need to set the carry flag, and don't need the masked value of `a`, then do this:
```asm
; 3 bytes, 3 cycles; sets zero flag if a was equal to MASK
or ~MASK
inc a
```
### Test whether `a` is negative (compare `a` to $80)
If you don't need to preserve the value in `a`, then don't do this:
```asm
; 4 bytes, 4 or 5 cycles
cp $80
jr nc, .negative
```
And don't do this:
```asm
; 4 bytes, 4 or 5 cycles
bit 7, a
jr nz, .negative
```
Instead, do this:
```asm
; 3 bytes, 3 or 4 cycles; modifies a
rlca
jr c, .negative
```
## Subroutines (functions)
### Tail call optimization
Don't do this:
```asm
; 4 bytes, 10 cycles
call Function
ret
```
Instead, do this:
```asm
jp Function ; 3 bytes, 4 cycles
```
### Call `hl`
Don't do this:
```asm
; 5 bytes, 8 cycles
(some code)
ld de, .return
push de
jp hl
.return:
(some more code)
```
Instead, do this:
```asm
; 3 bytes, 6 cycles
; (4 bytes, 7 cycles, counting the definition of _hl_)
(some code)
call _hl_
(some more code)
```
`_hl_` is a routine already defined in [home/call_regs.asm](../blob/master/home/call_regs.asm):
```asm
_hl_::
jp hl
```
### Inlining
Don't do this:
```asm
; 4 additional bytes, 10 additional cycles
(some code)
call Function
(some more code)
Function:
(function code)
ret
```
if `Function` is only called a handful of times. Instead, do:
```asm
(some code)
; Function
(function code)
(some more code)
```
You shouldn't do this if `Function` used any `ret`urns besides the one at the very end, or if inlining its code would make some `jr`s too distant from their targets.
### Fallthrough
Don't do this:
```asm
(some code)
call Function
ret
Function:
(function code)
ret
```
And don't do this:
```asm
(some code)
jp Function
Function:
(function code)
ret
```
Instead, do this:
```asm
(some code)
; fallthrough
Function:
(function code)
ret
```
Fallthrough is what you get when you combine inlining with tail calls. You can still `call Function` elsewhere, but one tail call can be optimized into a fallthrough.
## Conditional fallthrough
(The example uses `z`, but `nz`, `c`, or `nc` would also work.)
Don't do this:
```asm
(some code)
jr z, .foo
jr .bar
.foo
(foo code)
.bar
(bar code)
```
Instead, do this:
```asm
(some code)
jr nz, .bar
; fallthrough
.foo
(foo code)
.bar
(bar code)
```
## Conditional return
(The example uses `z`, but `nz`, `c`, or `nc` would also work.)
Don't do this:
```asm
; 3 bytes, 3 or 6 cycles
jr z, .skip
ret
.skip
...
```
And don't do this:
```asm
; 3 bytes, 7 or 2 cycles
jr nz, .return
...
.return
ret
```
Instead, do this:
```asm
; 1 byte, 5 or 2 cycles
ret nz
...
```
### Conditional call
(The example uses `z`, but `nz`, `c`, or `nc` would also work.)
Don't do this:
```asm
; 5 bytes, 3 or 9 cycles
jr nz, .skip
call Foo
.skip
```
Instead, do this:
```asm
; 3 bytes, 6 or 3 cycles
call z, Foo
```
And don't do this:
```asm
; 5 bytes, 3 or 9 cycles
jr nz, .skip
jp Foo
.skip
```
Instead, do this:
```asm
; 3 bytes, 6 or 3 cycles
jp z, Foo
```
### Conditional `rst $38`
(The example uses `z`, but `nz`, `c`, or `nc` would also work.)
Don't do this:
```asm
; 5 bytes, 3 or 14 cycles
call z, RstVector38
...
RstVector38:
rst $38
ret
```
And don't do this:
```asm
; 3 bytes, 3 or 6 cycles
jr nz, .no_rst_38
rst $38
.no_rst_38
...
```
And don't do this:
```asm
; 3 bytes, 3 or 6 cycles
call z, $0038
...
```
Instead, do this:
```asm
; 2 bytes, 2 or 7 cycles
jr z, @ + 1 ; the byte for @ + 1 is $ff, which is the opcode for rst $38
...
```
(The label `@` evaluates to the current `pc` value, which in `jr z, @ + 1` is right before the `jr` instruction. The instruction consists of two bytes, the opcode and the relative offset. `@ + 1` evaluates to in-between those two bytes. The `jr` instruction encodes its offset relative to the *end* of the instruction, i.e. the *next* `pc` value after the instruction has been read, so the relative offset is `-1`, aka `$ff`.)
### Enable interrupts and return
Don't do this:
```asm
; 2 bytes, 5 cycles
ei
ret
```
Instead, do this:
```asm
; 1 byte, 4 cycles
reti
```
## Jump and lookup tables
### Chain comparisons
Don't do this:
```asm
cp 1
jr z, .equals1
cp 2
jr z, .equals2
cp 3
jr z, .equals3
...
```
Instead, do this:
```asm
dec a
jr z, .equals1
dec a
jr z, .equals2
dec a
jr z, .equals3
...
```
Or do this:
```asm
dec a
ld hl, .jumptable
ld e, a
ld d, 0
add hl, de
add hl, de
ld a, [hli]
ld h, [hl]
ld l, a
jp hl
.jumptable:
dw .equals1
dw .equals2
dw .equals3
...
```
Or better, do:
```asm
dec a
ld hl, .jumptable
rst JumpTable
ret
.jumptable:
dw .equals1
dw .equals2
dw .equals3
...
```
`JumpTable` is an `rst` routine already defined in [home/header.asm](../blob/master/home/header.asm):
```asm
JumpTable::
push de
ld e, a
ld d, 0
add hl, de
add hl, de
ld a, [hli]
ld h, [hl]
ld l, a
pop de
jp hl
```
### Off-by-one `AddNTimes`
Don't do this:
```asm
ld hl, Foo
ld bc, BAR
dec a
call AddNTimes
```
Instead, as long as you don't need to add 255 times when a is 0, then do this:
```asm
ld hl, Foo - BAR
ld bc, BAR
call AddNTimes
```