I wrote some 6502 ASM tutorials a while back.
Feel free to check them out (5 pages total). You can test various things with this online 6502 emulator…
All the information here will transfer perfectly toward 65816 programming. Stay on this until you understand it, before moving on to any more.
Or, if you prefer video tutorials…
Opcodes References 6502
Quick Explanation of 65816 ASM
I will just cover some basics, and then mention the differences between 6502 and 65816.
You need to load data to a register to move it. Any of the registers can do this.
LDA $1000 ; load A from address $1000
STA $800 ; store/copy A to address $800
LDX $1000 ; load X from address $1000
STX $800 ; store/copy X to address $800
LDY $1000 ; load Y from address $1000
STY $800 ; store/copy Y to address $800
and, depending on register size this would move 1 byte or 2. If it moved 2 bytes, it would get the lower byte from $1000 and the upper byte from $1001.
Note, you can write comments in ASM with a ; semicolon. Everything after the semicolon is ignored by the assembler.
Depending on how the LDA is written in assembly, you can perform multiple kinds of operations.
(similar to the zero page from 6502)
LDA $12 – load A from the direct page address $12. If direct page register is $0000 this will load A from $000012 (direct page is always in the $00 bank).
LDA $1234 – loads A from the address $1234, in the bank defined by the Data Bank Register. If the Data Bank is $00… will load A from $001234.
LDA $123456 – loads A from address $3456 in the $12 bank.
LDA #$12 – loads A with the value $12. Always needs a preceeding #. Might be an 8 bit or a 16 bit value depending on the mode of A.
Direct Page Indexed
Indexed modes are for arrays of bytes, using index registers. Direct page is always in bank zero.
LDA $12, X – same as direct page, but the X register is added to the address number. If X is $10, this would load A from the address $22.
LDA $12, Y – same, but the Y register is added to the address number.
(X and Y are NOT restricted to 8 bit, and can extend $ffff bytes forward without wraparound, except that the final address bank will be $00 for direct page)
LDA $1234, X – same as absolute, but the X register is added to the address number.
LDA $1234, Y – same as absolute, but the Y register is added to the address number.
(X and Y don’t wrap, and if address + X > $ffff it will temporarily increase the data bank byte to extend into the next bank. This is true of every indexed mode except for the direct page indexed.)
Absolute Indexed Long
LDA $123456, X – same as absolute long, but the X register is added to the address number. (only X can do this mode)
This is how pointers work on the 6502 (65816) CPU. The pointer is loaded to 2 consecutive direct page addresses.
LDA ($12) – $12 is an address in the Direct Page. It takes a byte from $12 (lower byte) and $13 (upper byte) to construct an address, then the bank byte from data bank register, and then loads from that address. If $12 = $00 and $13 = $80, then this would load A with the value at address $018000 (if the data bank is $01).
Like Indirect, but 3 consecutive bytes are stored in the Direct Page to construct a long address. Low byte, High byte, then Bank byte.
LDA [$12] – If $12 = $00 and $13 = $80 and $14 = $02, loads A from the value at address $028000.
LDA ($12), Y – same as Indirect, but the indirect address is added to the Y register to get a final address to load to A from.
Indirect Long, Y
LDA [$12], Y – same as Indirect Long, but the indirect long address is added to the Y register to get a final address to load to A from.
This is for an array of pointers. Each pointer (2 bytes each) is in the Direct Page, and you will need to increase X by 2 to switch between them.
LDA ($12, X) – Let’s say X is 2, so we don’t want to look at $12 and $13, but rather $14 and $15. $14 = 00, $15 is $80, and the data bank is $01. This will load A with the value at address $018000.
Changes in the 65816 (from 6502)
** If you don’t understand all these things, don’t worry. You can always come back to it later, as these things come up. I frequently have to check the WDC manual to be reminded of all the details of each instruction, and I’ve been doing this 10 years. **
Zero page has been replaced with direct page, which is movable by changing the DP register. Just keep it $0000 for most purposes.
The hardware stack is no longer fixed. It can be any address in the zero bank. (on the SNES should be set at $1fff at the start of the program).
The A, X, and Y registers can be 8 or 16 bits. See SEP / REP below.
Many operations can now be 8 or 16 bytes depending on the size of the A register. ADC, AND, ASL, BIT, CMP, DEC, EOR, LDA, LSR, ORA, PHA, PLA, ROL, ROR, SBC, STA, STZ, TRB, and TSB… are all dependent on the size of the A register.
BRK has its own vector. Could be used for software purposes or debugging.
(can’t be Y)
ADC long, X
AND long, X
CMP long, X
EOR long, X
JMP long aka JML
JSR long aka JSL (also RTL return long)
LDA long, X
ORA long, X
SBC long, X
STA long, X
Stores zero at an address without changing A. (1 or 2 bytes depending on size of A)
STZ dp, X
STZ absolute, X
(can’t do long)
BRA branch always
BRL branch always long (2 bytes, signed)
(don’t use BRL, just do JMP. BRL is for a system that might load a program anywhere in the RAM, relocatable code. Not really for the SNES.)
JMP (absolute, X) . for an array of function pointers (a jump table) in the direct page, using X to switch between the different indirect jump addresses.
JMP [absolute] . jump indirect long. Like the JMP (absolute) indirect jump instruction, but 3 bytes long to make a long address to jump to.
INC / DEC
now available for the A register.
dec A . is the same as A = A – 1
inc A . is the same as A = A + 1
Indirect with or without Y Index
(dp means that the pointer needs to be located in the direct page)
ADC (dp) . . ADC (dp), Y
AND (dp) . . AND (dp), Y
CMP (dp) . . CMP (dp), Y
EOR (dp) . . EOR (dp), Y
LDA (dp) . . LDA (dp), Y
ORA (dp) . . ORA (dp), Y
SBC (dp) . . SBC (dp), Y
STA (dp) . . STA (dp), Y
Indirect Long and Indirect Long Indexed
With or without Y indexing
ADC [dp] . . ADC [dp], Y
AND [dp] . . AND [dp], Y
CMP [dp] . . CMP [dp], Y
EOR [dp] . . EOR [dp], Y
LDA [dp] . . LDA [dp], Y
ORA [dp] . . ORA [dp], Y
STA [dp] . . STA [dp], Y
SBC [dp] . . SBC [dp], Y
To set register size, we use REP or SEP (reset processor flag, set processor flag).
REP #$20 set A 16 bit
SEP #$20 set A 8 bit
REP #$10 set XY 16 bit
SEP #$10 set XY 8 bit
or combine them…
REP #$30 set AXY 16 bit
SEP #$30 set AXY 8 bit
(REP and SEP can be used to change other processor status flags).
(note the # for immediate addressing)
Transfers between registers.
TXY – transfer x to y
TYX – transfer y to x
TCS – transfer A register to stack pointer
TSC – transfer stack pointer to A register
Size mismatch from transfers between A and index registers X or Y. Think about the destination size, that will tell you how many bytes will transfer.
A8 -> X16 or Y16 transfers 2 bytes, remember that A in 8 bit, the high bit exists
A16 -> X8 or Y8 transfers 1 byte
X8 or Y8 -> A16 transfers 2 bytes, and the upper byte of A is zeroed. XY in 8 bit always have zero as their upper byte.
X16 or Y16 -> A8 transfers 1 byte, the upper byte of A unchanged
Uses the stack pointer as a base, added to a constant as the index.
You would push variables to the stack before calling a jsr or jsl.
The stack pointer is always points to 1 less than the last value pushed, so start from 1. If JSR to a function, then add 2 more. If JSL to a function then add 3 more.
ADC sr, S
AND sr, S
CMP sr, S
EOR sr, S
LDA sr, S
ORA sr, S
SBC sr, S
STA sr, S
Example… STA 1, S
Stack Relative Indirect
Push a pointer to an array to the stack. Index that array with Y.
ADC (sr, S), Y
AND (sr, S), Y
CMP (sr, S), Y
EOR (sr, S), Y
LDA (sr, S), Y
ORA (sr, S), Y
SBC (sr, S), Y
STA (sr, S), Y
To copy a chunk of bytes from one memory area to another. MVN Block Move Next and MVP Block Move Previous.
You are supposed to use MVN to move from a lower address to a higher one, and MVP from a higher address to a lower. For MVN, X holds the start address of src and Y holds the start address of dest, and A (always 16 bit, regardless of size of A) holds the # of bytes to transfer minus 1. For MVP, X holds the end address of the src block and Y holds the end address of the dest block.
Just use MVN, it’s easier to use.
The byte order in the binary is opposite of what the standard syntax indicates, so I tend to use a macro to handle this, because it’s confusing. And there was a change in ca65 source code which reverses the order, so code will break if you use the wrong version of ca65 (grumble).
MVN src bank, dest bank
MVP src bank, dest bank
The registers should be 16 bit before using MVN or MVP. Also, they have an annoying issue, where they will overwrite the data bank register, so it is probably a good idea to push that register to the stack before MVN/MVP and restore it (pull it from the stack) after the MVN/MVP procedure.
Push to stack
PEA which is called push effective “address”, but it really just pushes a 16 bit value to the stack without using a register. It doesn’t have to be an address. It is very useful for any 16 bit immediate push to the stack. You don’t need to change a register size either, it always pushes a 16 bit value.
PEI pushes a value stored on the direct page (16 bit) to the stack.
PER was designed for a computer system that can load a program anywhere in the RAM and run it…relocatable code. It isn’t really useful for SNES. It pushes a value from the same bank, in a 16 bit relative distance from this instruction. You could use stack relative or pull it to a register from after pushing the value or address to the stack.
NOTE: the standard syntax here is confusing for PEA and PEI. PEA actually works like a 16-bit immediate mode, but (for unknown reasons) omits the # hash. PEI actually works like Direct Page Addressing, but (for unknown reasons) has unnecessary parentheses () making it look like an Indirect Mode. I have reread the documents 4-5 times and it works like PEI $12… but the official syntax is PEI ($12). ca65 expects the official syntax.
Pushing / pulling the new registers
PHB – push data bank register to stack
PHD – push direct page register to stack
PHK – push program bank register to stack
PHX – push X register to stack
PHY – push Y register to stack
PLB – pull from stack to data bank register
PLD – pull from stack to direct page register
PLX – pull from stack to X register
PLY – pull from stack to Y register
Transfers with A
(always copies 16 bits regardless of size of A)
TCD – transfer from A to direct page register
TCS – transfer from A to stack pointer
TDC – transfer from direct page register to A
TSC – transfer from stack pointer to A
Test and Set Bits / Test and Reset Bits
TRB, test and reset bits. A register (8 or 16 bits) has the bits to change. If a bit in A is 1 it will be zeroed at the address location. If a bit in A is 0 it remains unchanged.
TSB, test and set bits. A register (8 or 16 bits) has the bits to change. If a bit in A is 1 it will be set (1) at the address location. If a bit in A is 0 it remains unchanged.
There is also an testing operation, as if the value in A was ANDed with the address, and the z flag is set if A AND value at address would equal zero. Unrelated to the setting or resetting operation.
COP – jump to COP vector (for a coprocessor routine)
XBA – swap high and low bytes of A (works even if A is 8 bit)
XCE – move carry to CPU mode (emulator or native modes)
STP – stops the CPU, only reset will start it again. Don’t use this.
WAI – wait till interrupt, halts the CPU until IRQ or NMI trigger.
WDM # – nothing, but useful for debugging. Followed by a number, which could be used to locate where you are in the code (in a debugger).
(in older version of ca65, WDM won’t work. I think it was fixed around 2017.)
Some more links, to other descriptions of 65816 ASM
And these links again, for reference.