I wrote some 6502 ASM tutorials a while back.
Feel free to check them out (5 pages total). You can test various things with this online 6502 emulator…
https://skilldrick.github.io/easy6502/
All the information here will transfer perfectly toward 65816 programming. Stay on this until you understand it, before moving on to any more.
Or, if you prefer video tutorials…
Opcodes References 6502
http://www.6502.org/tutorials/6502opcodes.html
http://www.obelisk.me.uk/6502/reference.html
Quick Explanation of 65816 ASM
I will just cover some basics, and then mention the differences between 6502 and 65816.
Data transfer.
You need to load data to a register to move it. Any of the registers can do this.
LDA $1000 ; load A from address $1000
STA $800 ; store/copy A to address $800
LDX $1000 ; load X from address $1000
STX $800 ; store/copy X to address $800
LDY $1000 ; load Y from address $1000
STY $800 ; store/copy Y to address $800
and, depending on register size this would move 1 byte or 2. If it moved 2 bytes, it would get the lower byte from $1000 and the upper byte from $1001.
Note, you can write comments in ASM with a ; semicolon. Everything after the semicolon is ignored by the assembler.
Addressing modes.
Depending on how the LDA is written in assembly, you can perform multiple kinds of operations.
Direct Page
(similar to the zero page from 6502)
LDA $12 – load A from the direct page address $12. If direct page register is $0000 this will load A from $000012 (direct page is always in the $00 bank).
Absolute
LDA $1234 – loads A from the address $1234, in the bank defined by the Data Bank Register. If the Data Bank is $00… will load A from $001234.
Absolute Long
LDA $123456 – loads A from address $3456 in the $12 bank.
Immediate
LDA #$12 – loads A with the value $12. Always needs a preceding #. Might be an 8 bit or a 16 bit value depending on the mode of A.
Direct Page Indexed
Indexed modes are for arrays of bytes, using index registers to select an element of that byte array. Direct page is always in bank zero.
LDA $12, X – same as direct page, but the X register is added to the address number. If X is $10, this would load A from the address $22.
LDA $12, Y – same, but the Y register is added to the address number.
(X and Y are NOT restricted to 8 bit, and can extend $ffff bytes forward, except that the final address bank will be $00. Direct page mode always uses bank $00 as the final location.)
Absolute Indexed
LDA $1234, X – same as absolute, but the X register is added to the address number.
LDA $1234, Y – same as absolute, but the Y register is added to the address number.
(X and Y don’t wrap, and if address + X > $ffff it will temporarily increase the data bank byte to extend into the next bank. This is true of every indexed mode except for the direct page indexed.)
Absolute Indexed LongĀ
LDA $123456, X – same as absolute long, but the X register is added to the address number. (only X can do this mode)
Indirect
This is how pointers work on the 6502 (65816) CPU. The pointer is loaded to 2 consecutive direct page addresses.
LDA ($12) – $12 is an address in the Direct Page. It takes a byte from $12 (lower byte) and $13 (upper byte) to construct an address, then the bank byte from data bank register, and then loads from that address. If $12 = $00 and $13 = $80, then this would load A with the value at address $018000 (if the data bank is $01).
Indirect Long
Like Indirect, but 3 consecutive bytes are stored in the Direct Page to construct a long address. Low byte, High byte, then Bank byte.
LDA [$12] – If $12 = $00 and $13 = $80 and $14 = $02, loads A from the value at address $028000.
Indirect, Y
LDA ($12), Y – same as Indirect, but the indirect address is added to the Y register to get a final address to load to A from.
Indirect Long, Y
LDA [$12], Y – same as Indirect Long, but the indirect long address is added to the Y register to get a final address to load to A from.
Indirect, X
This is for an array of pointers. Each pointer (2 bytes each) is in the Direct Page, and you will need to increase X by 2 to switch between them.
LDA ($12, X) – Let’s say X is 2, so we don’t want to look at RAM addresses $12 and $13, but rather $14 and $15. RAM address $14 holds the value 00, RAM address $15 holds $80, and the data bank is $01. This will load A with the value at address $018000.
.
https://wiki.superfamicom.org/65816-reference
Changes in the 65816 (from 6502)
** If you don’t understand all these things, don’t worry. You can always come back to it later, as these things come up. I frequently have to check the WDC manual to be reminded of all the details of each instruction, and I’ve been doing this 10 years. **
Zero page has been replaced with direct page, which is movable by changing the DP register. Just keep it $0000 for most purposes.
The hardware stack is no longer fixed. It can be any address in the zero bank. (on the SNES should be set at $1fff at the start of the program).
The A, X, and Y registers can be 8 or 16 bits. See SEP / REP below.
Many operations can now be 8 or 16 bytes depending on the size of the A register. ADC, AND, ASL, BIT, CMP, DEC, EOR, LDA, LSR, ORA, PHA, PLA, ROL, ROR, SBC, STA, STZ, TRB, and TSB… are all dependent on the size of the A register.
BRK has its own vector. Could be used for software purposes or debugging.
.
NEW INSTRUCTIONS
Long addressing
(can’t be Y)
ADC long
ADC long, X
AND long
AND long, X
CMP long
CMP long, X
EOR long
EOR long, X
JMP long aka JML
JSR long aka JSL (also RTL return long)
LDA long
LDA long, X
ORA long
ORA long, X
SBC long
SBC long, X
STA long
STA long, X
Store Zero
Stores zero at an address without changing A. (1 or 2 bytes depending on size of A)
STZ dp
STZ dp, X
STZ absolute
STZ absolute, X
(can’t do long)
Branching
BRA branch always
BRL branch always long (2 bytes, signed)
(don’t use BRL, just do JMP. BRL is for a system that might load a program anywhere in the RAM, relocatable code. Not really for the SNES.)
JMP (indirect) will look for a 2 byte address on bank zero, and jump to an that address, but always jumping to the current program bank. If it says JMP ($1234) it will look at $001234 and $001235. If 001234 is $50 and 001235 is $60, it will jump to address $6050 in the current program bank.
JMP [indirect long] will look for a 3 byte address on bank zero, and combine them to create a long jump address to anywhere. If the 2 byte value in brackets is [$1234] it will look at $001234, $001235, and $001236 for the 3 bytes, combine them to a long address, and jump to that.
JMP (indirect, X) is for an array of function pointers (a jump table), using X to switch between the different indirect jump addresses. Unlike JMP (indirect), which looks for the indirect address on the zero bank, the JMP (indirect, X) mode will look for the indirect address in the CURRENT PROGRAM BANK. (and it will jump to an address in the current program bank). X should be an even number. You should have a table of addresses (2 bytes each) at this location, and use X to choose which one. This indirect jump is the most useful. Remember it.
JSR (indirect, X) . same as above, except you can return from the function with RTS.
INC / DEC
now available for the A register.
dec A . is the same as A = A – 1
inc A . is the same as A = A + 1
Indirect with or without Y Index
(dp means that the pointer needs to be located in the direct page)
ADC (dp) . . ADC (dp), Y
AND (dp) . . AND (dp), Y
CMP (dp) . . CMP (dp), Y
EOR (dp) . . EOR (dp), Y
LDA (dp) . . LDA (dp), Y
ORA (dp) . . ORA (dp), Y
SBC (dp) . . SBC (dp), Y
STA (dp) . . STA (dp), Y
Indirect Long and Indirect Long Indexed
With or without Y indexing
ADC [dp] . . ADC [dp], Y
AND [dp] . . AND [dp], Y
CMP [dp] . . CMP [dp], Y
EOR [dp] . . EOR [dp], Y
LDA [dp] . . LDA [dp], Y
ORA [dp] . . ORA [dp], Y
STA [dp] . . STA [dp], Y
SBC [dp] . . SBC [dp], Y
SEP/REP
To set register size, we use REP or SEP (reset processor flag, set processor flag).
REP #$20 set A 16 bit
SEP #$20 set A 8 bit
REP #$10 set XY 16 bit
SEP #$10 set XY 8 bit
or combine them…
REP #$30 set AXY 16 bit
SEP #$30 set AXY 8 bit
(REP and SEP can be used to change other processor status flags).
(note the # for immediate addressing)
Transfers between registers.
now include
TXY – transfer x to y
TYX – transfer y to x
TCS – transfer A register to stack pointer
TSC – transfer stack pointer to A register
Size mismatch from transfers between A and index registers X or Y. Think about the destination size, that will tell you how many bytes will transfer.
A8 -> X16 or Y16 transfers 2 bytes, remember that A in 8 bit, the high bit exists
A16 -> X8 or Y8 transfers 1 byte
X8 or Y8 -> A16 transfers 2 bytes, and the upper byte of A is zeroed. XY in 8 bit always have zero as their upper byte.
X16 or Y16 -> A8 transfers 1 byte, the upper byte of A unchanged
Stack Relative
Uses the stack pointer as a base, added to a constant as the index.
You would push variables to the stack before calling a jsr or jsl.
The stack pointer is always points to 1 less than the last value pushed, so start from 1. If JSR to a function, then add 2 more. If JSL to a function then add 3 more.
ADC sr, S
AND sr, S
CMP sr, S
EOR sr, S
LDA sr, S
ORA sr, S
SBC sr, S
STA sr, S
Example… STA 1, S
Stack Relative Indirect
Push a pointer to an array to the stack. Index that array with Y.
ADC (sr, S), Y
AND (sr, S), Y
CMP (sr, S), Y
EOR (sr, S), Y
LDA (sr, S), Y
ORA (sr, S), Y
SBC (sr, S), Y
STA (sr, S), Y
Block Moves
To copy a chunk of bytes from one memory area to another. MVN Block Move Next and MVP Block Move Previous.
You are supposed to use MVN to move from a lower address to a higher one, and MVP from a higher address to a lower. For MVN, X holds the start address of src and Y holds the start address of dest, and A (always 16 bit, regardless of size of A) holds the # of bytes to transfer minus 1. For MVP, X holds the end address of the src block and Y holds the end address of the dest block.
Just use MVN, it’s easier to use.
The byte order in the binary is opposite of what the standard syntax indicates, so I tend to use a macro to handle this, because it’s confusing. And there was a change in ca65 source code which reverses the order, so code will break if you use the wrong version of ca65 (grumble).
MVN src bank, dest bank
MVP src bank, dest bank
The registers should be 16 bit before using MVN or MVP. Also, they have an annoying issue, where they will overwrite the data bank register, so it is probably a good idea to push that register to the stack before MVN/MVP and restore it (pull it from the stack) after the MVN/MVP procedure.
Push to stack
PEA address
PEI (dp)
PER relative-address
PEA which is called push effective “address”, but it really just pushes a 16 bit value to the stack without using a register. It doesn’t have to be an address. It is very useful for any 16 bit immediate push to the stack. You don’t need to change a register size either, it always pushes a 16 bit value.
PEI pushes a (16 bit) value stored on the direct page (in bank zero) to the stack.
PER pushes a value from the same bank, in a 16 bit relative distance from this instruction. You could use stack relative or pull it to a register from after pushing the value or address to the stack.
NOTE: the standard syntax here is confusing for PEA and PEI. PEA actually works like a 16-bit immediate mode, but (for unknown reasons) omits the # hash. PEI actually works like Direct Page Addressing, but (for unknown reasons) has unnecessary parentheses () making it look like an Indirect Mode. I have reread the documents 4-5 times and it works like PEI $12… but the official syntax is PEI ($12). ca65 expects the official syntax.
Pushing / pulling the new registers
PHB – push data bank register to stack
PHD – push direct page register to stack
PHK – push program bank register to stack
PHX – push X register to stack
PHY – push Y register to stack
PLB – pull from stack to data bank register
PLD – pull from stack to direct page register
PLX – pull from stack to X register
PLY – pull from stack to Y register
Transfers with A
(always copies 16 bits regardless of size of A)
TCD – transfer from A to direct page register
TCS – transfer from A to stack pointer
TDC – transfer from direct page register to A
TSC – transfer from stack pointer to A
Test and Set Bits / Test and Reset Bits
TRB dp
TRB address
TSB dp
TSB address
TRB, test and reset bits. A register (8 or 16 bits) has the bits to change. If a bit in A is 1 it will be zeroed at the address location. If a bit in A is 0 it remains unchanged.
TSB, test and set bits. A register (8 or 16 bits) has the bits to change. If a bit in A is 1 it will be set (1) at the address location. If a bit in A is 0 it remains unchanged.
There is also a testing operation, as if the value in A was ANDed with the address, and the z flag is set if A AND value at address would equal zero. Unrelated to the setting or resetting operation.
More
COP – jump to COP vector (for a coprocessor routine)
XBA – swap high and low bytes of A (works even if A is 8 bit)
XCE – move carry to CPU mode (emulator or native modes)
STP – stops the CPU, only reset will start it again. Don’t use this.
WAI – wait till interrupt, halts the CPU until IRQ or NMI trigger.
WDM # – nothing, but useful for debugging. Followed by a number, which could be used to locate where you are in the code (in a debugger).
(in older version of ca65, WDM won’t work. I think it was fixed around 2017.)
Some more links, to other descriptions of 65816 ASM
https://www.smwcentral.net/?p=section&a=details&id=14268
http://6502.org/tutorials/65c816opcodes.html
And these links again, for reference.
https://wiki.superfamicom.org/65816-reference
.