diff options
author | Eldred Habert <eldredhabert0@gmail.com> | 2019-02-18 14:58:18 +0100 |
---|---|---|
committer | Eldred Habert <eldredhabert0@gmail.com> | 2019-02-18 14:58:18 +0100 |
commit | 5bf9a6c3a4885c2ddf80aa41264007d26721d0f6 (patch) | |
tree | 8844e98fc7f28511cd70944ebaec6b6e7fcbfbf8 /Optimizing-assembly-code.md | |
parent | 015ed30c0162b2ddbbe20183a31abf93d4280f6d (diff) |
Restructure snippets, add new ones
Diffstat (limited to 'Optimizing-assembly-code.md')
-rw-r--r-- | Optimizing-assembly-code.md | 217 |
1 files changed, 147 insertions, 70 deletions
diff --git a/Optimizing-assembly-code.md b/Optimizing-assembly-code.md index c3ca655..d89c923 100644 --- a/Optimizing-assembly-code.md +++ b/Optimizing-assembly-code.md @@ -3,37 +3,61 @@ Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devr ## Registers ### Set A to zero +Don't do: ```asm ld a, 0 ; 2 bytes, 2 cycles, no changes to flags - -;;; - +``` +But do: +```asm xor a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1 +``` +or +```asm sub a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1 ``` +Be careful that the optimized versions alter flags. As such, `ld a, 0` must be left intact in the code below: +```asm + ld a, [wIsTrainerBattle] + and a ; NZ if in trainer battle + ld a, 0 + jr nz, .trainer +``` + ### Set A to some constant subtracted by A +Don't do: ```asm ld b, a ; 4 bytes, 4 cycles ld a, CONST sub b - -;;; - +``` +But do: +```asm cpl ; 3 bytes, 3 cycles add CONST+1 ``` -### Add A to HL +### Add A to a 16-bit register +(`hl` taken as an example, but any 16-bit register would work as well) + +Don't do: ```asm add l ; 6 bytes, 6 cycles ld l, a ld a, 0 adc h + ld h,a +``` +or +```asm + add l ; 6 bytes, 6 cycles + ld l, a + ld a, h + adc 0 ld h, a - -;;; - +``` +But do: +```asm add l ; 5 bytes, 5 cycles ld l, a jr nc, .NoCarry @@ -41,23 +65,35 @@ Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devr .NoCarry: ``` +or better (doesn't require a label): +```asm + add l ; 5 bytes, 5 cycles + ld l, a ; = a + l + adc a, h ; = a + l + carry + h + sub l ; = carry + h + ld h, a +``` ### Loading from an offset to HL +Don't do: ```asm ld a, [offset] ; 8 bytes, 10 cycles ld l, a ld a, [offset+1] ld h, a - -;;; - +``` +But do: +```asm ld hl, offset ; 6 bytes, 8 cycles ld a, [hli] ld h, [hl] ld l, a ``` -### Exchanging HE and DL +### Exchanging two 16-bit registers +(`hl` and `de` taken as examples, but any 16-bit registers are fine) + +If you care about speed: ```asm ld a, d ; 6 bytes, 6 cycles ld d, h @@ -65,9 +101,9 @@ Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devr ld a, e ld e, l ld l, a - -;;; - +``` +If you care about size: +```asm push de ; 4 bytes, 9 cycles ld d, h ld e, l @@ -77,87 +113,128 @@ Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devr ## Branching ### Compare A to zero +Don't do: ```asm cp 0 ; 2 bytes, 2 cycles - -;;; - +``` +But do: +```asm or a ; 1 byte, 1 cycle +``` +or +```asm and a ; 1 byte, 1 cycle ``` -### Compare A-1 to zero +### Compare A to 1 ```asm cp 1 ; 2 bytes, 2 cycles - -;;; - +``` +If you don't care about the value in `a`: +```asm dec a ; 1 byte, 1 cycle, decrements a ``` -## Functions - -### Tail call optimization -```asm - call Function ; 4 bytes, 10 cycles - ret +Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare: +```asm + cp 1 + jr z, .equals1 +``` +```asm + dec a + jr z, .equals1 + inc a +``` -;;; +### Compare A to $FF +```asm + cp $FF ; 2 bytes, 2 cycles +``` +If you don't care about the value in `a`: +```asm + dec a ; 1 byte, 1 cycle, decrements a +``` - jp Function ; 3 bytes, 4 cycles +Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare: +```asm + cp $FF + jr z, .equals255 +``` +```asm + inc a + jr z, .equals255 + dec a ``` -### Executing subroutines +### Chaining comparisons +Don't do: ```asm - ld hl, param1 - call Function1 - ld hl, param2 - call Function2 - ld hl, param3 - call Function1 - ... - ... + cp 1 + jr z, .equals1 + cp 2 + jr z, .equals2 + cp 3 + jr z, .equals3 + ; ... +``` +But do: +```asm + dec a + jr z, .equals1 + dec a + jr z, .equals2 + dec a + jr z, .equals3 + ; ... +``` -.Function1: - ... - ret -.Function2: - ... - ret - -;;; - ld sp, calltable - ret ; jump to sp (first entry on calltable) +## Functions -.Function1: - pop hl - ... - ret -.Function2: - pop hl - ... +### Tail call optimization +Don't do: +```asm + call Function ; 4 bytes, 10 cycles ret - -calltable: - dw Function1, param1 - dw Function2, param2 - dw Function1, param3 +``` +But do: +```asm + jp Function ; 3 bytes, 4 cycles ``` ### Calling HL ```asm - ld de, .retadr ; 5 bytes, 8 cycles + ld de, .return ; 5 bytes, 8 cycles push de - jp [hl] - .retadr: + jp hl +.return + ... +``` +But do: +```asm + call DoJump ; 4 bytes, 7 cycles ... -;;; +DoJump: ; TODO: such a function already exists in the code; but where is it? + jp hl +``` - call DoJump ; 4 bytes, 7 cycles +### Inlining +Don't do: +```asm + call GetOffset ; 4 additional bytes, 10 additional cycles ... -DoJump: - jp [hl] +GetOffset: + add hl, bc + ld a, [hli] + ld h, [hl] + ld l, a +``` +if `GetOffset` is only called a handful of times. Instead, do: +```asm + add hl, bc + ld a, [hli] + ld h, [hl] + ld l, a ```
\ No newline at end of file |