python toc.py Optimizing-assembly-code.md

author: Rangi <remy.oukaour+rangi42@gmail.com> 2019-02-18 10:11:51 -0500
committer: Rangi <remy.oukaour+rangi42@gmail.com> 2019-02-18 10:11:51 -0500
commit: b17fc465e7f144715466e1afde66614e0b6a21f1 (patch)
tree: f36cc69ee96b025658e6cada1e32051ff9f87575 /Optimizing-assembly-code.md
parent: 5bf9a6c3a4885c2ddf80aa41264007d26721d0f6 (diff)
1 files changed, 167 insertions, 50 deletions
diff --git a/Optimizing-assembly-code.md b/Optimizing-assembly-code.md
index d89c923..3e26854 100644
--- a/Optimizing-assembly-code.md
+++ b/Optimizing-assembly-code.md
@@ -1,22 +1,52 @@
-Most tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devrs.com/gb/files/asmtips.txt), or [the page on Z80 Optimization on WikiTI](http://wikiti.brandonw.net/index.php?title=Z80_Optimization)
+Sometimes the simplest way to write something in assembly code isn't the best. There are optimization techniques to make code smaller and/or faster.
+
+Most of these tricks come from either [Jeff's GB Assembly Code Tips v1.0](http://www.devrs.com/gb/files/asmtips.txt), or [WikiTI's Z80 Optimization page](http://wikiti.brandonw.net/index.php?title=Z80_Optimization). (Note that Z80 assembly is *not* the same as GBZ80; it has more registers and some different instructions.)
+
+
+## Contents
+
+- [Registers](#registers)
+  - [Set `a` to zero](#set-a-to-zero)
+  - [Set `a` to some constant minus `a`](#set-a-to-some-constant-minus-a)
+  - [Add `a` to a 16-bit register](#add-a-to-a-16-bit-register)
+  - [Loading from an offset to `hl`](#loading-from-an-offset-to-hl)
+  - [Exchanging two 16-bit registers](#exchanging-two-16-bit-registers)
+- [Branching](#branching)
+  - [Compare `a` to zero](#compare-a-to-zero)
+  - [Compare `a` to 1](#compare-a-to-1)
+  - [Compare `a` to 255](#compare-a-to-255)
+  - [Chaining comparisons](#chaining-comparisons)
+- [Functions](#functions)
+  - [Tail call optimization](#tail-call-optimization)
+  - [Calling `hl`](#calling-hl)
+  - [Inlining](#inlining)
+
 
 ## Registers
 
-### Set A to zero
+
+### Set `a` to zero
+
 Don't do:
+
 ```asm
 	ld a, 0 ; 2 bytes, 2 cycles, no changes to flags
 ```
+
 But do:
+
 ```asm
 	xor a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1
 ```
-or
+
+or do:
+
 ```asm
 	sub a ; 1 byte, 1 cycle, sets flags C to 0 and Z to 1
 ```
 
-Be careful that the optimized versions alter flags. As such, `ld a, 0` must be left intact in the code below:
+Don't use the optimized versions if you need to preserve flags. As such, `ld a, 0` must be left intact in the code below:
+
 ```asm
 	ld a, [wIsTrainerBattle]
 	and a ; NZ if in trainer battle
@@ -24,150 +54,213 @@ Be careful that the optimized versions alter flags. As such, `ld a, 0` must be l
 	jr nz, .trainer
 ```
 
-### Set A to some constant subtracted by A
+
+### Set `a` to some constant minus `a`
+
 Don't do:
+
 ```asm
-	ld b, a ; 4 bytes, 4 cycles
+	; 4 bytes, 4 cycles
+	ld b, a
 	ld a, CONST
 	sub b
 ```
+
 But do:
+
 ```asm
-	cpl ; 3 bytes, 3 cycles
-	add CONST+1
+	; 3 bytes, 3 cycles
+	cpl
+	add CONST + 1
 ```
 
-### Add A to a 16-bit register
-(`hl` taken as an example, but any 16-bit register would work as well)
+
+### Add `a` to a 16-bit register
+
+(The example uses `hl`, but `bc` or `de` would also work.)
 
 Don't do:
+
 ```asm
-	add l ; 6 bytes, 6 cycles
+	; 6 bytes, 6 cycles
+	add l
 	ld l, a
 	ld a, 0
 	adc h
-	ld h,a
+	ld h, a
 ```
-or
+
+and don't do:
+
 ```asm
-	add l ; 6 bytes, 6 cycles
+	; 6 bytes, 6 cycles
+	add l
 	ld l, a
 	ld a, h
 	adc 0
 	ld h, a
 ```
+
 But do:
+
 ```asm
-	add l ; 5 bytes, 5 cycles
+	; 5 bytes, 5 cycles
+	add l
 	ld l, a
-	jr nc, .NoCarry
+	jr nc, .no_carry
 	inc h
 
-.NoCarry:
+.no_carry:
 ```
+
 or better (doesn't require a label):
+
 ```asm
-	add l ; 5 bytes, 5 cycles
-	ld l, a ; = a + l
-	adc a, h ; = a + l + carry + h
+	; 5 bytes, 5 cycles
+	add l ; = a + l
+	ld l, a ; cache a + l
+	adc h ; = a + l + carry + h
 	sub l ; = carry + h
 	ld h, a
 ```
 
-### Loading from an offset to HL
+
+### Loading from an offset to `hl`
+
 Don't do:
+
 ```asm
-	ld a, [offset] ; 8 bytes, 10 cycles
+	; 8 bytes, 10 cycles
+	ld a, [offset]
 	ld l, a
 	ld a, [offset+1]
 	ld h, a
 ```
+
 But do:
+
 ```asm
-	ld hl, offset ; 6 bytes, 8 cycles
+	; 6 bytes, 8 cycles
+	ld hl, offset
 	ld a, [hli]
 	ld h, [hl]
 	ld l, a
 ```
 
+
 ### Exchanging two 16-bit registers
-(`hl` and `de` taken as examples, but any 16-bit registers are fine)
+
+(The example uses `hl` and `de`, but any pair of `bc`, `de`, or `hl` would also work.)
 
 If you care about speed:
+
 ```asm
-	ld a, d ; 6 bytes, 6 cycles
+	; 6 bytes, 6 cycles
+	ld a, d
 	ld d, h
 	ld h, a
 	ld a, e
 	ld e, l
 	ld l, a
 ```
+
 If you care about size:
+
 ```asm
-	push de ; 4 bytes, 9 cycles
+	; 4 bytes, 9 cycles
+	push de
 	ld d, h
 	ld e, l
 	pop hl
 ```
 
+
 ## Branching
 
-### Compare A to zero
+### Compare `a` to zero
+
 Don't do:
+
 ```asm
 	cp 0 ; 2 bytes, 2 cycles
 ```
+
 But do:
+
 ```asm
 	or a ; 1 byte, 1 cycle
 ```
-or
+
+or do:
+
 ```asm
 	and a ; 1 byte, 1 cycle
 ```
 
-### Compare A to 1
+
+### Compare `a` to 1
+
 ```asm
 	cp 1 ; 2 bytes, 2 cycles
 ```
+
 If you don't care about the value in `a`:
+
+
 ```asm
 	dec a ; 1 byte, 1 cycle, decrements a
 ```
 
-Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare:
+Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken. Compare:
+
 ```asm
 	cp 1
 	jr z, .equals1
 ```
+
+with:
+
 ```asm
 	dec a
 	jr z, .equals1
 	inc a
 ```
 
-### Compare A to $FF
+
+### Compare `a` to 255
+
+(255, or $FF in hexadecimal, is the same as −1 due to [two's complement](https://en.wikipedia.org/wiki/Two%27s_complement).)
+
 ```asm
-	cp $FF ; 2 bytes, 2 cycles
+	cp $ff ; 2 bytes, 2 cycles
 ```
+
 If you don't care about the value in `a`:
+
 ```asm
 	dec a ; 1 byte, 1 cycle, decrements a
 ```
 
-Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken; compare:
+Note that you can still do `inc a` afterwards, which is 1 cycle faster if the jump is taken. Compare:
+
 ```asm
-	cp $FF
+	cp $ff
 	jr z, .equals255
 ```
+
+with:
+
 ```asm
 	inc a
 	jr z, .equals255
 	dec a
 ```
 
+
 ### Chaining comparisons
+
 Don't do:
+
 ```asm
 	cp 1
 	jr z, .equals1
@@ -177,7 +270,9 @@ Don't do:
 	jr z, .equals3
 	; ...
 ```
+
 But do:
+
 ```asm
 	dec a
 	jr z, .equals1
@@ -192,49 +287,71 @@ But do:
 ## Functions
 
 ### Tail call optimization
+
 Don't do:
+
 ```asm
-	call Function ; 4 bytes, 10 cycles
+	; 4 bytes, 10 cycles
+	call Function
 	ret
 ```
+
 But do:
+
 ```asm
-	jp Function ; 3 bytes, 4 cycles
+	; 3 bytes, 4 cycles
+	jp Function
 ```
 
-### Calling HL
+
+### Calling `hl`
+
 ```asm
-	ld de, .return ; 5 bytes, 8 cycles
+	; 5 bytes, 8 cycles
+	ld de, .return
 	push de
 	jp hl
+
 .return
 	...
 ```
+
 But do:
+
 ```asm
-	call DoJump ; 4 bytes, 7 cycles
+	; 4 bytes, 7 cycles
+	call _hl_
+; return
 	...
+```
+
+`_hl_` is a routine already defined in [home.asm](../blob/master/home.asm):
 
-DoJump: ; TODO: such a function already exists in the code; but where is it?
+```asm
+_hl_::
 	jp hl
 ```
 
+
 ### Inlining
+
 Don't do:
+
 ```asm
-	call GetOffset ; 4 additional bytes, 10 additional cycles
+	; 4 additional bytes, 10 additional cycles
+	call GetOffset
 	...
 
 GetOffset:
-	add hl, bc
-	ld a, [hli]
-	ld h, [hl]
-	ld l, a
+	(some code)
+	ret
 ```
+
 if `GetOffset` is only called a handful of times. Instead, do:
+
 ```asm
-	add hl, bc
-	ld a, [hli]
-	ld h, [hl]
-	ld l, a
-```
-\ No newline at end of file
+; GetOffset
+	(some code)
+```
+
+You can set `(some code)` apart with blank lines and put a comment on top to make its self-contained nature clear without the extra `call` and `ret`.
author	Rangi <remy.oukaour+rangi42@gmail.com>	2019-02-18 10:11:51 -0500
committer	Rangi <remy.oukaour+rangi42@gmail.com>	2019-02-18 10:11:51 -0500
commit	b17fc465e7f144715466e1afde66614e0b6a21f1 (patch)
tree	f36cc69ee96b025658e6cada1e32051ff9f87575 /Optimizing-assembly-code.md
parent	5bf9a6c3a4885c2ddf80aa41264007d26721d0f6 (diff)