SNES game development, continued…
Just one more subject before we can actually get to write our SNES program. Using the assembler. You should have read some of the 6502 tutorials and read up on 65816 assembly basics… before heading any further.
First, we need to write our program in a text editor. I use Notepad++. You can use any similar app that can save a plain text file. We will save our files as .s or .asm. It might help if you include a path to the ca65 “bin” folder in environmental variables, so windows can find it. You can also just type a path in the command prompt, which will tell the system to look for ca65 and ld65 in the bin folder, which is one level up from the current directory.
ca65 is a command line tool. If you just double click ca65, a box will open and then close. To run it, you need to first open a command prompt (terminal). To open a command prompt in Windows 10, you click on the address bar and type CMD. A black box should appear. You would type something like…
ca65 main.asm -g
for each assembly file. The -g means include the debugging symbols. If it assembles correctly, you should have .o (object files) of the same name. Then you use another program ld65 (the linker) to put them all together using a .cfg file as a map of how all the peices go together.
ld65 -C lorom256k.cfg -o program1.sfc main.o -Ln labels.txt
The -C is to indicate the .cfg filename (lorom256k.cfg). The -o indicates the output filename (program1.sfc). Then it lists all the object files (there is only 1, main.o). Finally, the -Ln labels.txt outputs the addresses of all the labels (for debugging purposes).
I use a batch file to automate the writes to the command line. Instead of opening a command prompt box, I just double click on the compile.bat file. I don’t want to go into detail about writing batch files, but mostly you will just need to add a ca65 line for each assembly file (unless they are “included” in the main assembly file, in which case they become part of that asm file). Then edit the ld65 line to include all object files.
Here’s some links to the ca65 and ld65 documents.
Take a look at some of my example code, such as this one.
It has a .cfg file and some basic assembly files just to get to square one. There is some initial code (init.asm), which zeroes the RAM and the hardware registers back to a standard state. We don’t want to touch that code. It works. Then there is a header section of the ROM so that emulators will know what kind of SNES file we have (see header.asm).
.byte “ABCDEFGHIJKLMNOPQRSTU” ;rom name 21 chars
.byte $30 ;LoROM FastROM
.byte $00 ; extra chips in cartridge, 00: no extra RAM; 02: RAM with battery
.byte $08 ; ROM size (2^# kByte, 8 = 256kB)
.byte $00 ; backup RAM size
.byte $01 ;US
.byte $33 ; publisher id
.byte $00 ; ROM revision number
.word $0000 ; checksum of all bytes
.word $0000 ; $FFFF minus checksum
The checksum isn’t actually important. If it’s wrong, nothing bad will happen. The important line is the one that says “LoROM FastROM” after it.
And there are VECTORS here. The vectors are part of how the 65816 chip works. It is a table of addresses of important program areas. The reset vector is where the CPU jumps when the SNES is first turned on, or if the user presses RESET. There are some interrupt vectors like NMI and IRQ which we can discuss later. The important thing is that our reset vector points to the start of our init code, and that the end of the init code jumps to our main code. Also, our reset code MUST be in bank 00.
;ffe4 – native mode vectors
ABORT (not used)
RESET (not used in native mode)
;fff4 – emulation mode vectors
COP (not used)
ABORT (not used)
Let’s talk about the basic terminology of assembly files.
Foo = 62
They look like this. Foo is just a symbol that the assembler will convert to a number at compile time. It should go above the code that uses it.
LDA #Foo …becomes… LDA #62
There are 2 types of variables. BSS (standard) and Zero Page. On the SNES we call it Direct Page, but the assembler still calls it Zero Page. You have to put their definitions in a zeropage segment, which our linker file will specifically define as zeropage type (it will recognize this as a special type of RAM).
temp1: .res 2
This reserves 2 bytes for the variable “temp1”.
pal_buffer: .res 512
This reserves 512 bytes for a palette buffer. Our linker .cfg file will probably define the BSS segment to be in the $100-$1fff range.
Our code will go in a ROM / read-only type segment.
Main: LDA #1 STA $100
Main: is a label. It should be flush left in the line. To the assembler, Main is a number, an address in the ROM file. We could then jump to Main…
or branch to Main…
One assembly file may not know the value of a label in another file. So we might need a .export Main in the file where Main lives, and a .import Main in the other file.
Also called opcodes. These are 3 letter mnemonics that the assembler converts to machine code. Some assemblers require whitespace to be on the left of the instructions (such as a tab or 2-3 spaces). I don’t believe ca65 requires this, but you might as well follow that standard practice.
Code: LDA cats AND #1 CLC ADC #$23 STA cats JSR sleep
Use a semicolon ; to start a comment. The assembler will ignore anything after the semicolon. In the linker .cfg file, use # to start a comment.
These are commands that the assembler will understand.
segment tells the assembler that everything below this should go in the “blah” segment. 816 tell it that we are using a 65816 cpu. smart means automatically set the assembler to 8-bit or 16-bit depending on SEP and REP instructions. a16 sets the assembler to generate 16-bit assembler instructions for the A register. a8 for 8-bit. i16 sets the assembler to have 16-bit index instructions. i8 for 8 bit index registers. “byte” is to insert an 8-bit value into the ROM ($12 in this example). “word” is to insert a 16-bit value into the ROM ($34 then $12 in this example).
There are many other directives. Here are some important ones…
to include an assembly language file in another file.
to include a binary (ie. data) file in an assembly file. This example, CHR, is a graphics file.
65816 specific precautions
The most important thing to be careful with is register size. Your code needs REP and SEP commands to change the register size ( I use macros called A8, A16, XY8, XY16, AXY8, and AXY16). If you have .smart at the top of the code, the assembler will automatically adjust the assembly to the correct register size when it sees a REP or SEP that affect the register size flags… but, it is a good idea to put the explicit directives in at the top of each function. We need to make sure that the function above it doesn’t set the wrong register sizes. Those directives are .a8 .i8 .a16 and .i16.
Just to clarify– .a8 is an assembler directive to change the assembly output. A8 is a macro that will output a SEP #$20, which (when executed) will set the CPU into 8-bit Accumulator mode. .smart will see the SEP #$20 and automatically set the assembly output to 8-bit. But there are still possible errors, for example, something like this…
Something_stupid: A16 ;set A to 16 bit mode lda controller1 and #KEY_B beq Next_Bit A8 ;set A to 8 bit mode lda #2 sta some_variable Next_Bit:
What do you think would happen? The assembler will think everything below A8 has the A register in 8 bit mode, including everything below Next_Bit, even though the beq could branch there with the processor still in 16 bit mode. This could crash or create unusual bugs. So, you should put an A16 directly after the Next_Bit label, to ensure registers are in a consistent size.
Also, you might want to bookend many of your subroutines with php (at the start) and plp (at the end) if the subroutine changes the processor size in any way. This will ensure that it returns safely from the subroutine with the exact processor size that it arrived with.
Alternatively, you could try to do have a consistent register size for most of your code. For example, keep the A register 8 bit and the XY registers 16 bit… or perhaps keep all registers 16 bit for 90% of the code. An approach like that would reduce REP SEP changes and have fewer potential register size bugs.
If the subroutine changes any other registers (such as the data bank register B) you should also push that to the stack at the beginning of the subroutine and restore it at the end.
It is common to have data, and the code that manages that data, in the same bank. An easy way to set the data bank register to the same bank that the code is executing in is PHK (push program bank) then PLB (pull data bank). I have seen code that jumps to another bank do this, to save/restore the original data bank settings…
PHB PHK PLB JSR code PLB RTL
But, maybe we don’t need to do that at EVERY subroutine. The overhead would be quite tedious and slow.
Let’s review the linker file. lorom256k.cfg.
# Physical areas of memory
ZEROPAGE: start = $000000, size = $0100;
BSS: start = $000100, size = $1E00;
BSS7E: start = $7E2000, size = $E000;
BSS7F: start = $7F0000, size =$10000;
ROM0: start = $808000, size = $8000, fill = yes;
ROM1: start = $818000, size = $8000, fill = yes;
ROM2: start = $828000, size = $8000, fill = yes;
ROM3: start = $838000, size = $8000, fill = yes;
ROM4: start = $848000, size = $8000, fill = yes;
ROM5: start = $858000, size = $8000, fill = yes;
ROM6: start = $868000, size = $8000, fill = yes;
ROM7: start = $878000, size = $8000, fill = yes;
# Logical areas code/data can be put into.
# Read-only areas for main CPU
CODE: load = ROM0, align = $100;
RODATA: load = ROM0, align = $100;
SNESHEADER: load = ROM0, start = $80FFC0;
CODE1: load = ROM1, align = $100, optional=yes;
RODATA1: load = ROM1, align = $100, optional=yes;
CODE2: load = ROM2, align = $100, optional=yes;
RODATA2: load = ROM2, align = $100, optional=yes;
CODE3: load = ROM3, align = $100, optional=yes;
RODATA3: load = ROM3, align = $100, optional=yes;
CODE4: load = ROM4, align = $100, optional=yes;
RODATA4: load = ROM4, align = $100, optional=yes;
CODE5: load = ROM5, align = $100, optional=yes;
RODATA5: load = ROM5, align = $100, optional=yes;
CODE6: load = ROM6, align = $100, optional=yes;
RODATA6: load = ROM6, align = $100, optional=yes;
CODE7: load = ROM7, align = $100, optional=yes;
RODATA7: load = ROM7, align = $100, optional=yes;
# Areas for variables for main CPU
ZEROPAGE: load = ZEROPAGE, type = zp, define=yes;
BSS: load = BSS, type = bss, align = $100, optional=yes;
BSS7E: load = BSS7E, type = bss, align = $100, optional=yes;
BSS7F: load = BSS7F, type = bss, align = $100, optional=yes;
The memory area defines several RAM areas. Then it defines 8 ROM areas ROM0, ROM1, etc. Notice they all start at xx8000 and are all $8000 bytes (32kB). This is typical for LoROM mapping. In LoROM, the ROM is always mapped to the $8000-FFFF area. The 0-7FFF area is almost always a mirror of this…
$0-1FFF LoRAM (mirror of 7e0000-7e1fff)
$2000-$4FFF Hardware registers
In LoROM, we have access to these almost all the time with regular addressing modes.
The alternative is called HiROM, which can have ROM banks extend from $0000-FFFF. This doubles the maximum size of ROM, but makes access to LoRAM and Hardware Registers more awkward. This tutorial won’t be using HiROM.
You might notice that the bank is $80 instead of $00. $80 is a mirror of $00 (they access the same memory), but $80+ has faster ROM accesses, whereas $00 are slower. (you also need to change a hardware setting in the $420d register, and should indicate FastROM type in the SNES header). The game will reset into the $00 bank, and we need to jump long to the $80 bank to speed it up slightly.
On a side note, a 256kB ROM size is actually unusually small. 512 and 1024 would be more standard, and you should be able to double the size of the test ROMs with no trouble. Just double the number of ROM banks in the config file and in the header file. LoROM should be able to go up to 2048 kB also (I believe HiROM goes up to 4096 kB).
Ok. Some real code next time.