March 2016 – nesdoug

30. ASM part 5

Probably the final 6502 ASM lesson. I’m going to try to cover everything I forgot.

Switch (foo){
case 0:
…
break;
case 1:
…
break;
case 2:
…
}

Let’s say we have a variable ‘state’ that if state = 0, we do one thing. If state = 1, we do another thing. Etc. Generally, it would be handled like this…

  LDA foo      load A from address foo, will set a zero flag if foo = 0
  BNE Check_1  branch ahead if foo != 0
  ...
  JMP Done     break

Check_1:       A is still loaded with value from foo
  CMP #1       compare to value 1, sets zero flag if foo = 1
  BNE Check_2  branch ahead if foo != 1
  ...
  JMP Done     break

Check_2:       A is still loaded with value from foo
  CMP #2       compare to value 2, sets zero flag if foo = 2
  BNE Done     branch ahead if foo != 2
  ...

Done:

Comments are done with a ; in ca65

;this is a comment

You add additional ASM source code like this…

.include “Second_ASM_File.asm”

You add binary files like this…

.incbin “Something.bin”

Often, you put a binary file just below a label, so you can index it from the label.

Label_Name:
.incbin “Something.bin”

You might wonder why I never add the lines .P02 (to set to 6502 mode) or -t nes (to set the target as ‘NES’), and the answer is…it makes no difference. The default mode of ca65 assembles exactly how I want, so I don’t bother.

I skipped over the V flag (overflow). The V flag is only set by ADC and SBC (and BIT). This is a way to treat the numbers as signed -128 to +127.

For ADC, the V flag is only set if 2 positive numbers add together to get a negative, or if 2 negative numbers add together to get a positive. Any other ADC operation will clear the V flag.

For SBC, the V flag is only set if Pos-Neg = Neg…or if Neg-Pos = Pos. All other SBC operations will clear the V flag.

What you do with the result is up to you, but you have BVC (branch if V clear) and BVS (branch if V set) to help you decide.

MULTIPLICATION/DIVISION

I went and wrote several routines that would do these as efficiently as I could think…and then I found these webpages, which do the same thing about 10x faster.

http://6502org.wikidot.com/software-math-intmul

http://6502org.wikidot.com/software-math-intdiv

I’ve tried them out. They work great.

OH, and before I forget, I found another webpage with an online assembler, that you can test out code.

https://skilldrick.github.io/easy6502/

I can’t think of anything else at this time, but you can look at these resources for more infomation…

http://www.6502.org/

http://wiki.nesdev.com/w/index.php/Programming_guide

Click to access Programmanual.pdf

The last one has lots of info about 65C02 and 65816 processors too, but if you scroll down to Chapter 18 (p.326) it will describe how all the instructions work. Most of them are relevant to 6502 programming also.

29. ASM part 4

Yet another 6502 ASM lesson.

Arrays

The way to access arrays in 6502 ASM is to use indexed addresses. The X (or Y) register is used as the indexer. As usual, X=0 will get the first byte of the array.

LDX #0         load X with value 0
LDA Array1, X  will load A from the address Array + X
STA foo        A = $3f, store A at foo
...

Array1:
.byte $3f, $4f, $5f, $6f

*Warning, if your array address is in the zero page, and your index would put the address in the next page, it won’t fetch from the $100 page, but rather from zero-page.

This is a bug of zero-page indexing on the 6502 processor. If you absolutely must put an array half in the zero-page and half out (I don’t know why you would), you can force the assembler to use an ‘absolute address’…ie. a 16-bit address, like this…

LDA a:Array2, X ;this will correctly get the byte from the $100 page

Here’s another array example using the Y register.

LDY #1
LDA Array1, Y

And, you can use STA the same way…to fill an array.
LDA #1
LDX #5
STA Array1, X store the value 1, at the address ‘Array1’ + 5

Loops

Loops are fairly easy…

for (X = 0;X < 50; X++)

  LDX #0
Loop:
  ...some code...
  INX 
  CPX #50   compare X to 50
  BNE Loop  not 50, branch back to Loop

It can also be done like this…

  LDX #50
Loop:
  ...some code...
  DEX         X--, if result = 0, sets zero flag
  BNE Loop    if no zero flag, branch back to Loop

Bigger Loop, if you need a loop bigger than 256

  LDY #4
  LDX #0
Loop:
  ...some code...
  DEX
  BNE Loop
  DEY
  BNE Loop  Will loop 1024 times

Way bigger than anyone will ever need Loop, just for fun…

  LDA #5
  STA counter
  LDY #0
  LDX #0
Loop:
  ...some code...
  DEX
  BNE Loop
  DEY
  BNE Loop
  DEC counter
  BNE Loop  256*256*5 = 327680 times

Indirect Indexing

LDA (ZP_address,X)
LDA (ZP_address),Y
The first…(ZP_address, X)…I never use, and I don’t like it, so I’m going to skip it altogether. Sorry. I’ve never seen any code that uses it.

The second…(ZP_address), Y…is very useful. It’s the 6502 equivalent of a pointer. You store an address in the zero-page, and you can access the data at the address that it points to…or index from that address with the Y register.

pointer = 2 zero-page addresses reserved

LDA #<SOME_ARRAY
STA pointer
LDA #>SOME_ARRAY
STA pointer+1
LDY #0
LDA (pointer), y load a from address pointer is pointing to...SOME_ARRAY
  A = $5e
LDY #1
LDA (pointer), y load a from address pointer is pointing to plus Y...SOME_ARRAY + 1, A = $7f
SOME_ARRAY:
.byte $5e, $7f

Let’s say you have multiple rooms in the game, and you want to load the graphics for room #3. So, you index to a list of addresses of each room’s data, and store the address in the zero-page, and now you can indirect index from that address using the Y register as the indexer. In this example, pointer and pointer+1 are zero-page addresses.

  LDA room  room = 3
  ASL A     we multiply by 2, because each address is 2 bytes long
  TAX       transfer A to X
  LDA ADDRESSES, X load A with the low byte of the room address
  STA pointer  store A in the zero-page RAM
  LDA ADDRESSES+1, X load A with the high byte of the room address
  STA pointer+1  store A in the zero-page RAM
  LDY #0
LOOP:
  LDA (pointer), Y load A with the fist byte of the array Room3
  STA somewhere, Y Maybe we store this data to another array, for parsing later
  CMP #$ff         let's say, the data set is terminated with $ff
  BEQ EXIT_LOOP    if = $ff, leave this loop
  INY
  BNE LOOP         it will keep looping for 256 bytes, 
                   when Y wraps around to zero

EXIT_LOOP:

ADDRESSES:
.word Room0, Room1, Room2, Room3 
the assembler will replace these with the addresses of each label.

Room0:
...data for room0
Room1:
...data for room1
Room2:
...data for room2
Room3:
...data for room3

Multiple-condition If/then statements…some more examples.

if ((foo == 0)&&(bar < 20))…do code if both true

LDA foo   load A from address foo, sets a few flags, zero flag if = 0
BNE Skip_Ahead  skip the code if foo != 0
LDA bar   load A from address, bar
CMP #20   compare to value 20
BCS Skip_Ahead  skip the code if bar >= 20
...   some code here

Skip_Ahead:

if ((foo == 0) || (bar < 20))…do code if either true

  LDA foo
  BNE Check_Bar  skip if foo != 0, but also check bar
Do_Code:
  ...
  JMP Ahead
Check_Bar:
  LDA bar
  CMP #20
  BCC Do_Code  branch to Do_Code if bar < 20

Ahead:

28. ASM part 3

Welcome to part 3 of my 6502 ASM lessons.

Jumping, moves the execution of the program somewhere else.

Examples:

  LDA #5
  JMP Skip_Next_Line jump to label 'Skip_Next_Line'
  LDA #7             never does this
Skip_Next_Line:
  STA foo            A = 5, store A at address foo

Infinite_Loop:
  LDA #5
  JMP Infinite_Loop     jump to the label 'Infinite_Loop'
  STA foo               never does this

Indirect jumping

One way to control flow of a program is to have an array of addresses of various parts of the program, and jump to them indirectly.

LDA program_state load A from address program_state
ASL A             multiply by 2, since each address is 2 bytes long
TAX               transfer A to X
LDA ADDRESSES, X  load A from ADDRESSES + X (low byte of an address)
STA jump_address  store A at jump_address
LDA ADDRESSES+1, X load A from ADDRESSES+1 + X (high byte of an address)
STA jump_address+1 store A at jump_address+1
JMP (jump_address) jump to the address pointed to by jump_address

ADDRESSES:
.word FUNCTION1, FUNCTION2 
the assembler will replace these with the address of each label

FUNCTION1:
 ...

FUNCTION2:
 ...

*Warning, there is a bug related to indirect jumps. The first byte of the indirect jump address can’t be on the last byte of a page (such as $3ff). Rather than fetching the second byte from the next page $400, it will fetch the second byte from the same page $300. In the example above, jump_address can’t be located $xff. The addresses of Function1 and Function2 can be anywhere.

Sub-Routines

When you use JSR, you jump to the label, and save the return address on the stack. Once that sub-routine is complete, use RTS to return from where we were before. (the program will pull the return address from the stack)

LDA #2
JSR Multiply_16
STA foo         A=32, store A at address foo

...
Multiply_16:
  ASL A
  ASL A
  ASL A
  ASL A
  RTS

Branching

First lets review the 6502 processor status flags.

c = carry flag
z = zero flag
i = interrupt flag
d = decimal mode (a removed feature, not functioning on the NES)
v = overflow flag
n = negative flag

Most of these flags are useful for comparisons and flow control (branching). Here are some instructions that will set/clear various flags.

CLC = clear the carry flag
SEC = set the carry flag
CLI = clear the interrupt disable flag (allows IRQ interrupts to work)
SEI = set the interrupt disable flag (prevents IRQ interrupts)
CLD = clear decimal flag (set hexadecimal math)
SED = set demical flag (set decimal math) (does not work on the NES)
CLV = clear the overflow flag

It’s important to know why and when each flag is set (and which operations won’t set flags).

ADC, SBC sets the z, n, c, v flags flags
AND, ORA, EOR sets the z,n flags
ASL, LSR, ROR, ROL sets the z, n, c flags
BIT sets the z, n, v flags
CMP, CPX, CPY sets the z, n, c flags
DEC, DEX, DEY sets the z, n flags
INC, INX, INY sets the z, n flags
LDA, LDX, LDY sets the z, n flags
TAX, TXA, TAY, TYA sets the z, n flags
PLA sets z, n flags
JMP, JSR, RTS, and BRANCHES do not set any flags
STA, STX, STY do not set any flags
PHA and PHP do not set any flags
PLP changes ALL the flags…that’s what it’s supposed to do

Between the event that set a flag, and the logic that handles the flag, it is safe to store the value somewhere, and safe to branch/jump to another location.

*note: RTI will wipe all your flags, and replace them from a value stored in the stack. When an NMI or IRQ occur, it pushes the processor status and return address to the stack. RTI will return from these interrupts, and it will restore processor status flags (but not the A,X,Y register values).

COMPARISONS (using flags to branch)

CMP compares A to a value
CPX compares X to a value
CPY compares Y to a value

Comparisons work as if a subtraction happened, but without changing the value of A. So think of CMP #5 as SEC,SBC #5. (Also, you don’t need to SEC before CMP.)
If the result is zero, z = 1, else z = 0.
If the result is negative, n = 1, else n = 0.
If the A < value, c = 0. If A >= value, c = 1.
OK, now some branching examples.

LDA foo
CMP bar   does foo = bar ?
BEQ They_are_equal 
 if zero flag set, branch to They_are_equal
BNE They_are_not_equal 
 if zero flag not set, branch to They_are_not_equal

They_are_equal:
  ...
  JMP Next_code

They_are_not_equal:
  ...

Next_code:

LDA foo
CMP #1   does foo = 1 ?
BEQ Foo_Is_One  
 if zero flag set, branch to Foo_Is_One
BNE Foo_Is_Not_One 
 if zero flag not set, branch to Foo_Is_Not_One

Foo_Is_One:
  ...
  JMP Next_code

Foo_Is_Not_One:
  ...

Next_code:

Also, we can use BEQ/BNE to test if a value is zero, because LDA/LDX/LDY sets the zero flag if the value being loaded is zero.

LDA foo     if foo = 0, zero flag set
BEQ Foo_is_zero
BNE Foo_is_not_zero

Foo_is_zero:
  ...
  JMP Next_code

Foo_is_not_zero:
  ...

Next_code:

I discourage the use of CMP with BMI and BPL. You should use BCC and BCS for > < comparisons. Here’s an example without CMP.

*note $80-ff are considered negative. $0-$7f are considered positive. Look at them in binary…
$80 = 10000000
$7f = 01111111
So, if the upper bit = 1, it’s considered negative. If 0, positive.

LDA foo   if foo = negative, n flag set
BMI Foo_is_negative  branch if n flag set
BPL Foo_is_positive  branch if n flag not set

Foo_is_negative:
  ...
  JMP Next_code

Foo_is_positive:
  ...

Next_code:

Comparisons. BCC is equivalent to ‘Branch if Less Than’. BCS is equivalent to ‘Branch if Greater Than or Equal’.

(if foo < 40)…branch

LDA foo
CMP #40
BCC Somewhere branch if foo < 40

(if foo <= 40)…branch

LDA foo
CMP #40
BCC Somewhere branch if foo < 40
BEQ Somewhere branch if foo = 40

or…

LDA foo
CMP #41
BCC Somewhere branch if foo < 41

or…reverse them

LDA #40
CMP foo
BCS Somewhere branch if 40 >= foo

(if foo >= 40)…branch

LDA foo
CMP #40
BCS Somewhere branch if foo >= 40

(if foo > 40)…branch

LDA foo
CMP #41
BCS Somewhere branch if foo >= 41

or…reverse them…

lda #40
CMP foo
BCC Somewhere branch if 40 < foo

And, CPX for X register. CPY for Y register. They work the same as CMP…
LDX foo
CPX #41
BCS Somewhere branch if foo >= 41

More about BCC and BCS

There are many, many more uses for BCC and BCS.

Let’s say, you want to add numbers, but if result > 255, you want to force it to stay at 255. This works because, if the result of ADC is over 255, the carry flag is set.

LDA foo
CLC
ADC #5   ;carry will only be set if result > 255
BCC Still_Under_256 ;branch if carry clear
  LDA #255  
Still_Under_256:

Similarly, say you want to subtract, but if the result < 0, you want to keep it at zero.

LDA foo
SEC
SBC #5
BCS Still_Zero_Or_More
LDA #0
Still_Zero_Or_More:

You can also use BCC and BCS with ASL/LSR, ROL/ROR. Maybe, you want to use LSR as a modulo 2…to see if a number is even or odd.

LDA foo
LSR A   ;bit-shift right, rightmost bit goes into carry flag
BCC Foo_is_Even
BCS Foo_is_Odd

Foo_is_Even:
  ...
  JMP Next_Code
Foo_is_Odd:
  ...
Next_Code:

One more kind of comparison…

BIT, test a memory without affecting any register A,X,Y.
8-bits are like…(76543210)
bit 7 goes to n (negative flag)
bit 6 goes to v (overflow flag)

example:
BIT foo
BMI foo_is_negative

Also, the BIT instruction tests specific bits of a RAM address, as if an AND instruction were used. LDA with a value…let’s say #1. BIT $30 (tests the bit zero of RAM address 0x30). If it’s that #1 bit is set, Z (zero flag) = 0. If it’s 0, Z = 1. That is…if the result of an AND of the 2 values is zero or not, it affects the Z flag.

You would use BEQ (branch if z set) or BNE (branch if z not set) to control flow from here.

*final note:you can only branch +127 or -128 bytes (relative to the byte after the branch instruction. Any more and the assembler will give you branch-out-of-range errors. The standard solution is to replace those long branches with the opposite branch, and a jump to the label.

BEQ label
…over 127 bytes of code…error, too far
label:
……………………………….replace it with…

BNE :+
JMP label
:
…
label:
: (colon) is an unnamed label. :+ means branch forward to the next unnamed label. :- means branch backwards to the next unnamed label.

27. ASM part 2

Bit Shifting

The 6502 has 2 ways of shifting bits left and right. In these examples, I will number each bit…there are 8 bits, numbered 0-7. Only the accumulator (A register) can do bit shifts and bitwise operations. Also, you can do bit shifting to a RAM address, without affecting A.

LSR – shift right

zero -> 76543210 -> carry flag

ROR – roll right

old carry flag -> 76543210 -> new carry flag

ASL – shift left

carry flag <- 76543210 <- zero

ROL – roll left

new carry flag <- 76543210 <- old carry flag

LSR shifts all the bits right one position, and a zero goes into the highest bit. Effectively, this is the same as dividing by 2, with some rounding error.

zero -> 00010000
        00001000, carry = 0

zero -> 00001111
        00000111, carry = 1

ROR works the same as LSR, except the old carry flag goes in rather than zero.

ASL shifts all the bits left one position, and a zero goes into the lowest bit. Effectively, this is the same as multiplying by 2. Right to left here…

   00010000 <- zero
<- 00100000
carry = 0

   11110000 <- zero
<- 11100000
carry = 1

ROL works the same as ASL, except the old carry flag goes in rather than zero.

Now, some examples, using C programming examples to start with.

foo = bar << 2; // 8-bit numbers

LDA bar  load A from address bar
ASL A    bit-shift A left
ASL A    bit-shift A left
STA foo  store A to address foo

foo = bar >> 3; //8-bit numbers

LDA bar  load A from address bar
LSR A  bit-shift A right
LSR A  bit-shift A right
LSR A  bit-shift A right
STA foo  store A to address foo

And, here's some 16-bit examples.

foo = bar << 2; // 16-bit numbers

LDA bar+1  load A from address bar+1 (the high byte)
STA foo+1  store A to a address foo+1 (the high byte)
LDA bar    lda A from address bar (the low byte)
ASL A      bit-shift A left (high bit shifted into carry flag)
ROL foo+1  bit-shift left address 'foo+1', rolling that carry flag in
ASL A      bit-shift A left (high bit shifted into carry flag)
ROL foo+1  bit-shift left address 'foo+1', rolling that carry flag in
STA foo    store A to the address foo (the low byte)

foo = bar >> 3; //16-bit numbers

LDA bar   load A from address bar (the low byte)
STA foo   store A to the address foo (the low byte)
LDA bar+1 load A from the address bar+1 (the high byte)
LSR A     bit-shift A right (low bit shifted into carry flag)
ROR foo   bit shift right address 'foo', rolling that carry flag in
LSR A     ...
ROR foo
LSR A
ROR foo   ...3 times
STA foo+1 store A to the address foo+1 (the high byte).

Bitwise Operations.

AND, OR, and XOR…called here AND, ORA, and EOR. These things only work with the A register.

Here’s what AND does. bit by bit.

AND #value
A, AND value= result

0 AND 0 = 0
0 AND 1 = 0
1 AND 0 = 0
1 AND 1 = 1

AND only sets a bit if both bits in A and value are 1.

Example:

A      00010001
value  00000101
result 00000001

AND is used to isolate a single bit. The way I handle button presses, I roll them into joypad1. If I want to find out if the Left button is being pressed…I know that it is this bit of joypad1 00000010 ($02). Here’s how the code would usually go…

LEFT = $02 ;defined at the top of the page

LDA joypad1  load A from address joypad1
AND #LEFT    AND A with value 2, result now in A
BEQ :+       branch if the result is zero (to unnamed label)
             skipping this next line of code
JSR Left_Pressed jump to sub-routine handline left button presses
:            just a label

Another use for AND, is to ‘mask’ out certain bits. Let’s say, I have a peice of data, where the upper bit is a special flag, and the lower 7 bits is the data. If I want just the data, I would AND #$7f (01111111) to remove the upper-bit…

LDA data
AND #$7f

ORA (bitwise OR operation)

Here’s what ORA does. bit by bit.

ORA #value
A, ORA value= result
0 ORA 0 = 0
0 ORA 1 = 1
1 ORA 0 = 1
1 ORA 1 = 1

If either A or the value has a bit set, the result will have that bit set.

Example:
A      00010001
value  00000101
result 00010101

ORA is a way ensure that certain bits are set, without effecting the other bits (as math would do).

Music code is a good example. The left 4 bits control the sound. The right 4 bits volume. So, if you want to keep the ‘instrument’ the same, the first 4 bits would always be $C (for example), while the volume may change. So, you might store $c0 in variable ‘instrument’ and store the volume in variable ‘volume’. When you need to combine them, you would use ORA.

LDA instrument  instrument is $c0
ORA volume      volume is 0 - $0f
STA $4000       result stored to music register, address $4000.

EOR (exclusive OR operation), means one or the other, but not both.

EOR #value
A, EOR value= result
0 EOR 0 = 0
0 EOR 1 = 1
1 EOR 0 = 1
1 EOR 1 = 0

Example:
A      00010001
value  00000101
result 00010100

EOR is usually used to get the negative value of a number. Say you have -5 ($fb) and you want to turn it into 5, you EOR #$ff and add 1. The same for converting back to -5.

LDA foo   Let's say foo = $fb (-5)
EOR #$ff  A = 4 now
CLC
ADC #1    A = 5 now

and reverse works too…

LDA foo   Let's say foo = 5
EOR #$ff  A = $fa now
CLC
ADC #1    A = $fb now (-5)

TRANSFERRING registers

TAX A transfers to X
TXA X transfers to A
TAY A transfers to Y
TYA Y transfers to A

TXS X transferred to the stack pointer
TSX stack pointer transferred to X

This is the only way to access the stack pointer. Usually, the stack pointer is set to $ff at the start of the program and never thought of again.

LDX #$ff load X with value $ff
TXS      transfer to stack pointer

(the stack grows down from $1ff)

More Stack Operations
PHA push A to the stack (and adjust the stack pointer -1)
PLA pull A (pop A) from the stack (and adjust the stack pointer +1)

PHP push the processor status to the stack (and stack pointer -1)
PLP pull the processor status from the stack (and stack pointer +1)

Unfortunately, you can’t push a few arguments to the stack, jump to a sub-routine and then use those numbers…not easily, at least. Because, the jump to sub-routine also pushes the return address to the stack, on top of your numbers. PHA and PLA can be used as a cheap local variable. But, be careful. If you’re inside a sub-routine and you PHA, and forget to PLA, your program will crash when it tries to pull the return address, and gets your PHA number instead.

A few more things…

NOP does nothing but wastes 2 cycles of CPU time
BRK a non-maskable interrupt…will jump the program to wherever the BRK vector tells it…this usually only happens if a big error has occured, as the machine code for BRK is #00…which indicates that the program has branched to an area of the ROM with nothing there.

And, next time we will go over jumping, branching, and comparison.

26. ASM Basics

Intro to 6502 ASM, with the ca65 assembler. Links for reference…

http://www.6502.org/tutorials/6502opcodes.html

http://www.obelisk.me.uk/6502/reference.html

I’ve had a lot of questions about 6502 ASM. One of the features of cc65 is the ca65 assembler, which is a very good one. You can write any, or all, your functions in assembly. But, it would help if you knew how 6502 ASM works…so I’m going to write a few tutorials. All my examples will be ca65 specific, but it might be useful for users of other assemblers. Note, I will usually use $ to indicate hex numbers.

The 6502 has 3 registers. A (the accumulator) and X and Y (index registers). A is for most purposes and calculations. X and Y are for counting loops or indexing arrays.

The NES has 256 zeropage RAM addresses (that is addresses 0 – 0xff) and 1792 non-zeropage RAM addresses (addresses 0x100 – 0x7ff). Some games also have an additional RAM chip on the cartridge, usually mapped to 0x6000-0x7fff. If powered by a battery, it SAVES the game. We prefer zeropage variables to the others, because they only need 1 byte of ROM, to refer to it, rather than 2. Also, they are faster operations.

Let’s go over the syntax of the most common opcode, LDA (load the A register).

LDA 10 (will load the A register from the zero page address $000A). $A is hexadecimal for decimal 10.

LDA $10 (will load the A register from the zero page address $0010. $ means hexadecimal number.)

LDA #10 (will load the A register with the value 10. # means immediate mode. We want a value and not a RAM address.)

LDA #$10 (will load the A register with the value 16, which is the same as $10. Again, the # means it’s a value, not an address.)

Declaring constants.

zip = 0
FOO = $3f
LIVES = $03e3

When used in assembly code, the assembler will replace the symbol with the value you define.

Examples:

-> here means ‘assembles into’
the left-most part is what you will be typing

LDA #zip -> LDA #0   load A with value 0, the '#' means value, not address
LDA zip  -> LDA 0  load A from the 8-bit address $00
LDA #FOO -> LDA #$3f load A with the value $3f
LDA FOO  -> LDA $3f  load A from the 8-bit address $3f
LDA LIVES -> LDA $03e3 load A from the 16-bit address $03e3
LDA # LDA #$e3 load A with the value (lower byte of $03e3 = $e3)
LDA #>LIVES -> LDA #$03 load A with the value (upper byte of $03e3 = $03)

< gets the lower byte of a 2 byte expression > gets the upper byte of a 2 byte expression

LDA #LIVES -> produces an error. The assembler was expecting an 8-bit number.

Declaring variables.

.segment "ZEROPAGE"
foo: .res 1
bar: .res 2

foo will be reserved 1 byte at address $00, and bar will be reserved 2 bytes at addresses $01 and $02.

LDA foo  -> LDA $00 load A from the 8-bit address $00
LDA bar  -> LDA $01 load A from the 8-bit address $01
LDA bar+1  -> LDA $02 load A from the 8-bit address ($01 + 1 = $02)

Note that the bar+1 here is a “compile time constant”. The assembler knows the value of bar and will add one to it at compile time. This will still execute as a standard LDA from a zero page address.

.segment  "BSS"
fooz: .res 1
baz: .res 2

As I described above, I’m defining the BSS section in the .cfg file as being from $300-$3ff. Therefore, the assembler will reserve 1 byte for fooz at $0300 and 2 bytes for baz at $0301 and $0302.

LDA fooz  -> LDA $0300 load A from the 16-bit address $0300
LDA baz  -> LDA $0301 load A from the 16-bit address $0301
LDA baz+1  -> LDA $0302 load A from the 16-bit address ($0301 + 1 = $0302)

Importantly, we don’t need to know what value is reserved when writing code. The assembler will keep track of the values and addresses of every label, you just have to reference them in your code using the labels/variable name.
* constants and variables should be defined at the very top of the ASM page.

Referencing ROM addresses in code, using labels.

.segment  "CODE"
LDA Table1
LDA Table1+1
...
Table1:
.byte $01, $02

(note, you could also put Table1 in the “RODATA” segment, if you like)

Let’s say that the assembler has calculated that Table1 will be at address $8050. This will assemble into…

LDA $8050 load A from the 16-bit address $8050
LDA $8051 load A from the 16-bit address $8051

when the program is RUNNING, the first line will load A with the value #01 and the second line will load A with the value #02…because those are the values in the ROM at 8050 and 8051.

OK, now that we understand how the labels work, let’s do some code…using C examples, and how to do it in ASM. There are 3 registers in the 6502. A, X, and Y.

foo = 3;

LDA #3  	load A with value 3
STA foo  	store A at address foo

or we could have used the other registers…

LDX #3  	load X with value 3
STX foo  	store X at address foo

or

LDY #3  	load Y with value 3
STY foo  	store Y at address foo

There is no difference which register you use for this kind of thing.
.

bar = $31f; //a 16-bit value

From working with cc65, I now have a habit of using A for low bytes and X for high bytes (as the cc65 compiler tends to do)…

LDA #$1f 	load A with the value $1f
LDX #$03 	load X with the value 3
STA bar  	store A to address bar
STX bar+1 	store X to the address bar+1

we could also have done...
LDA #$1f 	load A with the value $1f
STA bar  	store A to address bar
LDA #$03 	load A with the value 3
sta bar+1 	store A to the address bar+1

baz = bar; //a 16-bit value

LDA bar  	load A from the address bar
LDX bar+1 	load X from the address bar+1
STA baz  	store A to the addres baz
STX baz+1 	store X to the address baz+1

again, we could have done...
LDA bar  	load A from the address bar
STA baz  	store A to the addres baz
LDA bar+1 	load A from the address bar+1
STA baz+1 	store A to the addres baz+1

Next thing…increment / decrement

foo++;

INC foo  add 1 to the value stored at foo

foo- -;

DEC foo  subtract 1 from the value stored at foo

you can also increment and decrement the X and Y registers

INX  add 1 to the X register

INY  add 1 to the Y register

DEX  subtract 1 from the X register

DEY  subtract 1 from the Y register

You will have to use adding/subtraction to ++ or – – the A register.
Which brings us to simple math…in fact very very simple math. The 6502 can only do addition and subtraction, and bit-shift multiplication. And, ONLY the A register can do math or bit-shifting.

Adding is always done ‘with carry’. The 6502 has certain FLAGS to assist math, and for doing comparisons. If the result of addition is > 255, then it sets the carry flag – in case you are doing 16-bit math (or more). If the result of addition is <= 255, the the carry flag is reset to zero. But, it always adds A + value + carry flag. Therefore, we must ‘clear the carry flag’ before addition. Here’s an example…

A reg. + value + carry flag = 
result now in A // carry flag
4+4+0  = A = 8, carry = 0
4+4+1  = A = 9, carry = 0
255+4+0  = A = 3, carry = 1
255+4+1 = A = 4, carry = 1

foo = fooz + 1; //8-bit only

LDA fooz 	load A from address fooz
CLC  		clear the carry flag
ADC #1  	add w carry A + value 1, the result is now in A
STA foo  	store A to the address foo

or, reverse them, get the same result…
foo = 1 + fooz; //8-bit only

LDA #1  	load A with value 1
CLC  		clear the carry flag
ADC fooz 	add w carry A + value at address fooz, result now in A
STA foo  	store A to the address foo

Let’s do a 16-bit example.

bar = baz + $315; 16-bit values

LDA baz  	load A from address baz
CLC  		clear the carry flag
ADC #$15 	add w carry A + value $15 (the low byte)
  if the result is > 255, the carry flag will be set, else reset to zero
STA bar  	store A to the address bar
LDA baz+1 	load A from the address baz+1
  ...notice, we don't clear the carry flag, we are using the
  carry flag result of the previous addition as part of this addition
ADC #$03 	add w carry A + value $03 (the high byte)
STA bar+1 	store A to the address bar+1

And, some subtraction. Like the ADC, subtraction always uses the carry flag, but in reverse. It’s called Subtract with Carry. You need to SET the carry flag before a SBC operation. If the result of subtraction underflows below 0, it will reset the carry flag to zero. Else, it will set the carry flag. Again, this is in case you want to do 16-bit (or more) math.

(on a side note the carry flag here works the opposite as is does on most other microprocessors, where the carry flag works as a borrow.)

Here’s some examples…
! = NOT…ie, the opposite

A reg. - value - !carry flag = 
result now in A // new carry flag status
8-4-!1  = A = 4 // carry = 1
8-4-!0  = A = 3 // carry = 1
4-5-!1  = A = 255 // carry = 0
4-5-!0  = A = 254 // carry = 0

foo = fooz – 1; //8-bit only

LDA fooz 	load A from address fooz
SEC  		set the carry flag
SBC #1  	subtract value 1 from A, result is now in A
STA foo  	store A to address foo

And the reverse, which is a different thing altogether…

foo = 1 – fooz; //8-bit only

LDA #1  	load A with value 1
SEC  		set the carry flag
SBC fooz 	subtract value at address fooz from A, result is now in A
STA foo  	store A to address foo

And a 16-bit example…
bar = baz – $315; //16-bit numbers

LDA baz  	load A from address baz
SEC  		set the carry flag
SBC #$15 	subtract value $15 from A, result is now in A
STA bar  	store A to address bar
LDA baz+1 	load A from addres baz+1
  ...notice, we DON'T set the carry flag. We are using the result
  of the last math to set/reset the carry flag.
SBC #$03 	subtract value 3 from A (and subtract !carry), result now in A
STA bar+1 	store A to the address bar+1

Stay tuned for many more ASM lessons to come.

	alangfiles on 24. Advanced Mapper –…
	iNCEPTIONAL on SNES Programming Guide
	dougfraker on SNES Programming Guide
	iNCEPTIONAL on SNES Programming Guide
	matthughson on 24. Advanced Mapper –…