Things I Forgot

There are so many topics to discuss. It would be very difficult to cover 100%. Let’s briefly mention a few of the things I forgot to mention.

 

Multiplication and Division

SNES has hardware to perform multiplication and division. Actually, it has 2 ways to do multiplication. You can look at my EasySNES code to find examples of these (link below, search for multiply: and divide:). They are a little slow, so you can’t expect to do this 100x a frame. You have to wait a several cycles before you get a result for the regular multipy and divide functions (see the NOP opcodes, which do nothing but wait).

If you aren’t using Mode 7, there is a second multiply function (signed) which is much faster. It’s called multiply_fast: in this link.

https://github.com/nesdoug/SNES_00/blob/master/easySNES.asm

Any other higher level math will have to be done with LUTs (look up tables), precalculated byte arrays.

 

Development Cart

I have a Super Everdrive, made by Krikzz, from Ukraine. And, ooh, they even have one that supports the SuperFX chip. I wish I had that one.

https://krikzz.com/store/home/54-fxpak-pro.html

Well, this one is over $200 US, but the basic model is less than $100, and it is well worth it. It works great. It uses a MicroSD card to hold the game ROMs.

 

Mode 7

This is the big enchilada, but I haven’t quite figured out this mode. Especially, setting up a tool chain for editing. Mode 7 can stretch and zoom and rotate. Many of the coolest SNES games use this in some way. It’s just currently above my skill level.

I do plan to work on this in the future. I might even make my own tools. But, that might be months, or even years from now.

 

IRQ Timers

This is for timing mid screen events. You should try to use HDMA instead. If all 8 HDMA channels are being used, you could do a 9th thing with IRQ timers.

You need to enable IRQ timers (probably just the V timer). and CLI to enable IRQs on the CPU. And you need to add code to the IRQ handler. Once set, the counter will trigger an IRQ signal once the PPU reaches a specific scanline. H counter would fire an IRQ signal every scanline, and you probably don’t want that.

 

Enhancement Chips

Another thing that is a bit over my head. The SA-1 chip is just a much faster 65816 chip, and that might be the easiest to use.

Some chip names = DSP1, DSP1A, DSP1B, DSP2, DSP3, DSP4, GSU1 (aka MarioChip1 aka SuperFX), GSU2, GSU2-SP1, OBC1, SA-1, S-DD1, S-RTC, SPC7110, ST010, ST011, ST018, CX4.

Some of the functions they do…

decompression code

trigonometry functions

image zooming and rotation

converting bitmaps to tiles

drawing vector graphics, triangles

real time clock

enemy AI functions (probably wouldn’t be useful to you)

.

The cool chip is the SuperFX chip (GSU). That’s what StarFox used. It would be nice if I could figure it out, and explain it. But, I can not.

 

Other Modes

Hi resolution. Modes 5 and 6 are double horizontal resolution. They can also, optionally, do an interlaced mode which doubles vertical resolution. Very few games used hi resolution.

Offset per tile Modes 2 and 4. I need to investigate these a bit more. I don’t want to put incorrect information here.

 

SRAM

For LoROM, SRAM is mapped to banks $70–$7D in the $0000-$7FFF addresses. And also in the $FE-$FF banks in the $0000-$7FFF addresses. (7e and 7f banks are the WRAM, so that couldn’t be used for SRAM). That gives a total possible 512kB SRAM (battery backed save RAM).

HiROM, as usual, is completely different.

https://en.wikibooks.org/wiki/Super_NES_Programming/SNES_memory_map

You will also need to indicate in the header that the ROM is using SRAM. I think that’s mapped to $FFD7, but it’s this line in the header.asm file

.byte $00 ; backup RAM size

The value is (2^# in kB). 3 is 8kB, 4 is 16kB, 5 is 32kB, 6 is 64kB, 7 is 128 kB, 8 is 256kB, and 9 is 512kB. 0 means 0kB.

Oh, and the previous line, mapped to $FFD6, should have the d1 bit (0000 0010) set. To indicate a battery for the SRAM.

Once you have correctly set this, the emulator should automatically be creating SRAM save files, that persist after power off. You can freely read and write to this anytime, and you can save your game by keeping the progress stored in the SRAM.

 

SNES main page

Color Math

SNES programming tutorial. Example 12.

https://github.com/nesdoug/SNES_12

 

What is color math? If you’ve ever worked with Photoshop, it would be like blending 2 different layers. In this case the layers are the MAIN screen and the SUB screen. Everything we have done so far deals with the MAIN screen. So let me try to explain the SUB screen.

All that stuff that the SNES does to produce a picture, putting layers on top of each other, tile priorities, sprite priorities, etc… it does all that TWICE. If you set the settings for the MAIN screen exactly the same as the settings for the SUB screen, it would produce the exact same picture TWICE… with 1 difference.  The main screen uses color index zero as the backdrop color (any pixel that is transparent), and the sub screen uses the “fixed color” as the backdrop color (register $2132).

You would never see the SUB screen, unless you turned on the color math registers, which would then blend the 2 pictures together, using either addition or subtraction. And then there is an optional halving step after that. Each pixel on the screen, the R values are added or subtracted, and the G value, then the B value. That value is clamped to the max and min without overflow. (each RBG value is 0-31)

Let’s say we have it set to ADD. And the main screen pixel is gray 15,15,15, and the sub screen pixel is dark red 10,0,0. The final pixel would be 25,15,15.

If we added the HALF option, each value would shift right once (rounding down), giving a final pixel of 12,7,7.

If we set the color math to SUBTRACT (no halving), the final pixel would be 5,15,15. The RGB values of the sub screen are subtracted from the RBG values on the main screen.

If we added the HALF option, each value would shift right once (rounding down), giving a final pixel of 2,7,7.

Note, any pixel in the sub screen that is transparent will not be halved.

The main use for Color Math is for transparency effects. You will want Adding and Halving. That would equally blend the main and sub screen.

The least useful setting is the subtract and halving. That would just produce a very dark picture, and almost no games used this.

.

There is a completely different kind of color math operation, that uses ONLY the fixed color. That color is applied to the entire MAIN screen, and if halving is set, it will work for the whole screen. If you set the fixed color register to green, and had the color math set to ADD, it would add a green tint to the screen.

The fixed color register is weird. The wiki example suggest writing each color separately to it (3 writes for R,B, and G). However, you could set them all to a specific value with 1 write. Such as LDA #$E0, STA $2132 would set all fixed colors to zero.

Before we dive into the code, here’s a video. You can probably skip most of this video, which goes into too many details about how the 2 different screens are generated.

 

Example ROM

I put BG1 on the main screen (gray rocks) and BG2 on the sub screen (color bars).

No effect. Color Math disabled.

SNES_12_000

 

Just the Sub screen. (seen by setting the “clipping always to black” bits in the color math logic, and adding the sub screen).

SNES_12_006

Note, the top left is black (non-zero index). The bottom left is zero index (transparent).  The sub screen will show the “fixed color” (register 2132) where there is transparent. Right now the fixed color is black. Color halving will not work for a transparent pixel on the sub screen. If you notice, the bottom left square will not change at all for these examples, even when halving is indicated.

 

Color Math Adding.

SNES_12_001

 

Color Math Adding and Halving.

SNES_12_002

 

Color Math Subtracting.

SNES_12_003

 

Color Math Subtracting and Halving.

SNES_12_004

 

Fixed color only (red at 50%), Color Math Adding.

SNES_12_005

 

Here’s a YouTube video of the Example code.

 

Example Code

$2130 
ccmm--sd
cc = main screen black if... *
--mm---- = prevent color math if... *
------0- = fixed color
------1- = sub screen
d is for an unrelated thing

* 00 => Never
  01 => Outside Color Window only
  10 => Inside Color Window only
  11 => Always

$2131
shbo4321
0------- add
1------- subtract
-0------ normal
-1------ result is halved
b = backdrop, o = sprites, 4321 = layers enabled for color math

 

So let’s go over each examples.

1- no effect, turn off color math

lda #$30 ; = off
sta color_add_sel ; $2130
;and make sure fixed color is black
lda #$e0 ; RGB, value = 0
sta color_fixed ; $2132

2- adding

lda #$02 ; color math with subscreen
sta color_add_sel ; $2130

;adding, not half, affect all layers 
lda #$3f
sta color_add_des ; $2131

3- adding and half, same as last one, just add one bit to the 2131 write

;adding, half, affect all layers 
lda #$7f
sta color_add_des ; $2131

4- subtracting

lda #$02 ; color math with subscreen
sta color_add_sel ; $2130

;subtracting, not half, affect all layers 
lda #$bf
sta color_add_des ; $2131

5- subtracting and half. Same as last one, but add one bit to the 2131 write

;subtract, half, affect all layers	
lda #$ff
sta color_add_des ; $2131

6- fixed color only

;turn on color math, fixed color mode
lda #$00
sta color_add_sel ; $2130

;adding, not half, affect all layers 
lda #$3f
sta color_add_des ; $2131

;set the fixed color to red 50%
lda #$2f ;red at 50%
sta color_fixed ; $2132

We could have also set half mode.

7- see just the sub screen. We did this by setting the “always clip main screen to black” bits in 2130, and then adding the sub screen to the now completely black main screen.

lda #$c2 ;= clip main always to black
sta color_add_sel ; $2130

;adding, not half, affect all layers 
lda #$3f
sta color_add_des ; $2131

 

Other examples

Color math only affects some sprites. Only sprites that use palettes 4-7 are affected by color math. That is why Mario (and the little ghosts) are solid.

Super Mario World (USA)_006

Windowing can affect where the color math applies. With HDMA adjusting the window, you can make some cool effects.

Contra III - The Alien Wars (USA)_000

Metroid1

Super Mario World (USA)_008

 

Tint the whole screen (adding a fixed color)… actually, upon further investigation, this is subtracting, which makes the screen slightly darker than the original. Also, the COLOR MATH is not in fixed color mode, it’s in subscreen mode, but NOTHING is enabled on the subscreen, so the subscreen is filled with the backdrop color (which for the sub screen is the fixed color). I guess that works too.

Legend of Zelda, The - A Link to the Past (USA)_000

 

Smooth Transparencies (add and halving). This is the most common transparency effect on the SNES.

Legend of Zelda, The - A Link to the Past (USA)_001

Sparkster, the water.

Sparkster (USA)_000

 

And creating shadows (subtracting) Mortal Kombat II. It’s hard to tell, but their shadows are created by color math subtraction. You could also give the appearance of clouds moving overhead by subtracting a cloud shape and having it scroll.

Mortal Kombat II (USA)_000

 

Links.

http://www.romhacking.net/documents/428/

https://wiki.superfamicom.org/transparency

 

SNES main page

HDMA Examples

SNES programming tutorial. Example 11.

https://github.com/nesdoug/SNES_11

 

HDMA is a way to write to PPU registers while the screen is drawing. You can change values at specific scanlines, to create unique effects.

The H is for H-Blank. Remember before, when we talked about V-blank (vertical blank), where the PPU isn’t doing anything for a short while after drawing each screen? Well, it also pauses a VERY SHORT time after drawing each horizontal line. Just long enough for the 8 HDMA channels to quickly change a register or send data, before the screen goes to write the next line.

They work in order, 1,2,3,4,5,6,7,8. They can all write 1 thing (1,2 or 4 bytes) per line. Or you can set them to wait a specific number of lines before changing a value.

HDMA uses the same registers as the DMA registers, and you shouldn’t use both at the same time. You should write zero to HDMA enable ($420c) before performing a DMA. Because the oldest revision of the SNES has a bug where it can crash if they both happen at the same time.

Here’s an interesting video on DMA and HDMA.

 

Examples

Here’s some things you can do with HDMA.

Changing the BG color with HDMA, to create a color gradient.

Batman

 

Changing the Window (there are 2 windows) with HDMA, to block off portions of the screen. The windows have left and right registers, which need to be written every scanline to create these shapes.

Super Mario World (USA)_000

Super Mario World (USA)_003

Mode 7 parameters. (lots of registers to change hundreds of times a frame).

Fzero

.

How HDMA works

https://wiki.superfamicom.org/registers

(for this link, scroll down to 43×0)

When you look at the HDMA registers, it looks like you need to put MORE values than a DMA (DMA uses 4300-4305 for channel 0 and HDMA uses 4300-430a)… but you actually write LESS values. Let’s go over them briefly, then in detail. For channel 0.

4300 – yes
4301 – yes
4302-4 – yes
4305-6 – no
4307 – yes, only if using Indirect Mode.
4308-a – no

4300 - Control Register. da---ttt

D=direction, probably want 0, from CPU to PPU.

A=HDMA mode. 0 for direct, 1 for indirect. More on this later.

TTT = transfer mode, which will vary by which register we use.

000 => 1 register write once (1 byte: p)
001 => 2 registers write once (2 bytes: p, p+1)
010 => 1 register write twice (2 bytes: p, p)
011 => 2 registers write twice each (4 bytes: p, p, p+1, p+1)
100 => 4 registers write once (4 bytes: p, p+1, p+2, p+3)
101 => 2 registers write twice alternate (4 bytes: p, p+1, p, p+1)
110 => 1 register write twice (2 bytes: p, p)
111 => 2 registers write twice each (4 bytes: p, p, p+1, p+1)

4301 – the PPU destination register. 21xx. So if you write $22 here, the HDMA will write to the $2122, the CGRAM Data Register.

4302-4 – the address of the HDMA table. 2=low, 3=middle, 4=upper/bank #.

4307 – if using Indirect HDMA, this is the bank # of the Indirect Addresses.

Anything marked “no”, don’t touch them. They are used by the HDMA hardware.

Then you write the channel (bitfield, each bit represents a channel, 1 for ch0, 2 for ch1, 4 for ch2, 8 for ch3, etc) to $420c, the HDMA enable register. Presumably, you would do this each frame during v-blank.

.

OK, so we are pointing the HDMA registers to a table (byte array). For a direct mode, the table would be a scanline count, then (depending on the TTT mode) 1,2, or 4 bytes to be written. Then another scanline count, then more bytes. Scanline count, bytes. Scanline count, bytes. Etc, until it sees a zero in the scanline count slot. From my own examples…

H_TABLE6:
.byte 32, $0f
.byte 32, $1f
.byte 32, $2f
.byte 32, $4f
.byte 32, $6f
.byte 32, $9f
.byte 32, $ff
.byte 0 ;end

That reads 32 lines, value $0f. 32 lines, value $1f. 32 lines… etc down to the terminating 0. One interesting thing is that the $0f is written immediately at the very top of the screen. THEN it waits 32 lines.

Here’s another example, when the transfer mode is “1 register write twice”.

H_TABLE2:
.byte 10, 0, 0
.byte 10, 1, 0
.byte 10, 2, 0
.byte 10, 3, 0
.byte 10, 4, 0
.byte 10, 5, 0
.byte 10, 6, 0
.byte 10, 7, 0
.byte 10, 8, 0
.byte 10, 9, 0
...etc...
.byte 0 ;end

10 is the scanline count. Then 2 bytes to write. Then 10 scanline count. Then 2 bytes. Etc.

Indirect Mode

43×0 register, we can set it to Indirect. Only with indirect do you need to write to 43×7, the bank of the indirect address. The table will always be sets of 3s. First the scanline count, then an indirect address (ie. pointer) to where our data is. I wrote the HDMA table like this…

H_TABLE5:
.byte 8
.addr $1000
.byte 8
.addr $1002
.byte 8
.addr $1004
.byte 8
.addr $1006
.byte 8
.addr $1008
...etc...
.byte 0 ;end

The .addr directive outputs a 16 bit value, low byte then high byte. I think you could have also used the .word directive. So, 8 is the scanline count, then an indirect address. Our bank byte is $7e, so the first one points to WRAM $7e1000. The second one points to $7e1002. Etc.

One of the advantages of the indirect system is that you can have a repeated pattern that changes.

I had copied the Indirect Table to $7e1000. It looks like this.

IND_TABLE:
.byte 0, 0
.byte 3, 0
.byte 6, 0
.byte 7, 0
.byte 8, 0
.byte 7, 0
.byte 6, 0
.byte 3, 0
.byte 0, 0
.byte $fd, 0
.byte $fa, 0
.byte $f9, 0
.byte $f8, 0
.byte $f9, 0
.byte $fa, 0
.byte $fd, 0
.byte 0,0

All of these are values to be written with HDMA to a PPU register. In this case, a horizontal scroll register, which is write twice (low then high bytes).

This is example 3. I am also shuffling these values every 4 frames, which causes the movement of the the sine wave.

 

Example Code

No effect. HDMA is turned off by writing zero to $420c.

NO_EFFECT

 

Example 1. Changing the BG color.

I’m actually setting up 2 separate HDMA transfers. First to set the CG address to zero. Second to write 2 bytes to change the #0 color. You have to rewrite the address each time, because it auto-increments when writing a color.

stz $4300 ;1 register, write once
lda #$21 ;pal_addr
sta $4301 ;destination
ldx #.loword(H_TABLE1)
stx $4302 ;address
lda #^H_TABLE1
sta $4304 ;address

lda #2
sta $4310 ;1 register, write twice
lda #$22 ;pal_data
sta $4311 ;destination
ldx #.loword(H_TABLE2)
stx $4312 ;address
lda #^H_TABLE2
sta $4314 ;address

lda #3 ;channels 1 and 2
sta hdma_enable ;$420c

And we have 2 HDMA tables (see the example code). Each time we are waiting 10 scanlines between changes. Each time, adding a little more red.

On a side note, you could set this up as a single HDMA channel. With a “2 registers write twice each” mode. You would be doing double writes to the CG address, and you would need 4 bytes after each scanline count.

EFFECT1

 

Example 2. Changing window 1 left and right positions.

A window punches a hole in one or more layer. There are 2 windows, but we only need 1 for this example. The only parameters you can set are left and right (and inverse, and combinations with the other window). But with HDMA, you can adjust the window parameters as the screen draws, and draw a shape. Circle shapes are very popular.

If the left position is > than the right position, the window will not appear. That is what we are doing for the top and the bottom of the screen. You also have to tell it which layers are affected with the $212e (window for main screen) and with the $2123-5 registers.

I’m using 2 HDMA channels, and writing 1 byte to 1 register. To 2126 and 2127.

(Again this could have been done with 1 channel with a 2 register transfer mode).

This example shows the Multi Single Scanline feature. If the scanline count is >128 (not including 128), it signals a series of single scanline writes. You can omit the scanline count for a number of lines (ie. subtract 128 from the scanline count number).

.byte 60, $ff ;first we wait 60 scanlines
.byte $c0 ;192-128 = 64 lines of single entries
.byte $7f ;1st write value
.byte $7e ;2nd write value
.byte $7d ;3rd write value
.byte $7c ;4th write value
...etc...64 lines.
.byte 0 ;end

It waits 1 scanline between each write.

Each line, I am moving the left position and right position further apart, and then closer together, which forms a diamond shape.

EFFECT2

 

Example 3. Changing BG1 horizontal scrolling position.

This was already discussed above, in the Indirect Mode section. We are using a sine wave pattern to create a wave in the picture, writing twice to 1 register, the horizontal scroll of BG1.

I wanted to include at least 1 example of indirect mode. I copied the value table to the RAM, so I was able to change the values to make the pattern move. See the Shuffle_f3 function in the HMDA3.asm file. The table of Indirect Addresses points to the RAM where our actual values are stored.

This example would have been even nicer if we wrote new values every scanline. Currently, we are only changing values every 8 scanlines (to make the table simpler / smaller). Maybe even every 2 scanlines would have been enough.

EFFECT3

 

Example 4. Changing the Mosaic filter.

This is the simplest example. A single write to a single register. I haven’t discussed the Mosaic filter before, $2106 . The upper nibble is the mosaic amount (0 = normal, 1 = 2×2, etc up to $f = 16×16), and the lower nibble says which layers are affected. Sprites are never affected.

stz $4300 ;1 register, write once
lda #$06 ;mosaic
sta $4301 ;destination
ldx #.loword(H_TABLE6)
stx $4302 ;address
lda #^H_TABLE6
sta $4304 ;address

lda #1 ;channel 1
sta hdma_enable ;$420c

It waits 32 lines before increasing the mosaic value. There are bigger squares at the bottom. You probably wouldn’t use this exact HDMA effect in a game, but it is just an example of what is possible. You can change so many settings, even the BG mode $2105, which layers are active, change the location of a tilemap or tileset. I think you can even write new data to the VRAM (a little bit at a time).

EFFECT4

Here’s a YouTube video of these examples.

 

Caution

Because HDMA settings need to be rewritten every frame, this code example would break in a lag frame… like if our game logic ran so long that it took 2 frames to complete one loop. You would have half of the frames missing the HDMA effect.

Therefore, you should put HDMA code inside the NMI code. NMI code is guaranteed to execute every frame, during v-blank. You should also put your DMA code here (copying bytes to the VRAM or OAM). Some games have very elaborate NMI code. I don’t currently have a proper example with HDMA in the NMI code. I’ll put that on my TODO list.

 

Links.

https://wiki.superfamicom.org/dma-and-hdma

https://wiki.superfamicom.org/grog’s-guide-to-dma-and-hdma-on-the-snes

 

SNES main page

SNES Music

SNES programming tutorial. Example 10.

https://github.com/nesdoug/SNES_10

 

Here’s an interesting video on the SPC700 and SNES audio.

 

Today, we are going to talk about SNES music. The APU (Sony SPC700) is a different chip entirely, and has its own 64k of RAM. At the beginning of our program, we need to load the APU with our SPC file. It is an audio program that runs automatically.

The APU is connected to an 8 channel DSP (digital sound processor). The song will direct the DSP to play different sound samples at different rates to make tones. If you are familiar with MIDI, it is similar. The samples can be looped or not. The samples are compressed into a native compression called BRR (bit rate reduction).

BRR samples are very large, and you will probably be only able to fit 10-15 samples. Each will have to be edited (perhaps with Audacity) to less than a second each, and at a reduced sample rate. We are going to work with SNESGSS (written by Shiru). I got the files from here, and from the source code for one of Shiru’s SNES projects.

https://github.com/nathancassano/snesgss

However, don’t use the .exe here. Apparently, there is a bug in the SNES hardware, where if the APU reads a value at the exact moment it is changed, it sometimes reads the wrong value. This can cause the game to crash. See the discussion here…

https://forums.nesdev.com/viewtopic.php?f=12&t=18096

Calima wrote a bug patch, which I have directly patched into the .exe (now called snesgssP.exe (p for patch). You can get it here…

https://github.com/nesdoug/SNES_00/tree/master/MUSIC

and also grab the music.asm file. You will need it.

Anyway, the patch works, but it forced me to overwrite something, and I chose to disable the streaming function. I tested this on a real SNES dozens of times with no problems.

.

SNESGSS prefers to have 16-bit MONO WAV samples at sample rate 32000 or 16000. I have tried 8000, but there is no improvement.

There seems to be a bug in Audacity, when you resample to another rate (Tracks/Resample), it seems to work, but then saves it to the original rate. But if you resample it, cut it, and then open a new window and start a new project at the target rate, and then paste it and then save it, it works. I don’t know what’s up with that.

Recording at the desired rate has no problems. 16000 seems to be a nice sweet spot on audio quality and file size.

SNESGSS also suggests tuning the samples to B +21 cents. I did not. I left all my samples at C. They are not in tune with the samples provided with SNESGSS, which I did not use. I think those are tuned to B +21.

SNESGSS3

Hit the WAV button near the middle of the screen to load your samples. Setting the envelopes similar to this seems good (15, 1, 7, 16) You can press the 2x or 4x buttons if you run out of room for files, to downsample by half. You can set a Loop, and adjust the numbers in “from” and “to”… which I find incredibly difficult to use.

I wouldn’t mess with the volume or EQ settings. That is something you should have done in Audacity while editing. Just keep in mind that the SNES tends to weaken the upper range and make bright sounds feel dull. You might have to do a treble boost for the lead instruments.

This tracker will convert our samples to BRR, but not until your final export. Unfortunately, you can’t import BRR samples to it from other sources. You could use other SNESGSS instruments (the ones that Shiru provided, for example).

SNESGSS4

Here you can check the size of all the files. Obliviously, you can’t have a bigger SPC file than 64k, the size of the APU RAM.

I should note, that we only load 1 song in the APU RAM at a time. Staring a new song will load a new song (over the previous song), so that only one song is loaded at any time. That should give you a little flexibility on overall size.

SNESGSS

Here is the main editor. You type Z-M keys for lower octave, Q-P keys for upper octave. You can change the octave by pressing the octave button. So, this is a standard tracker, it goes downward as the song plays.

You can toggle channels on and off by clicking on the word “channel 1”, etc. You can divide things into sections. Press the spacebar to mark the end of a section. Then you can repeat the previous section with an R00 command.

The order of things is Note, Instrument, Volume 0-99, and Special effects. The SP column is for song speed (smaller is faster). You can scroll up and down with PgUp and PgDn keys, and also Home and End goes to the next section.

CTRL+End marks the end of the song, and CTRL+Home marks the loop back point.

You can import Famitracker and MIDI files (notes only), but I haven’t tried.

SNESGSS2

On this page, you can mark a song as a “Sound effect”.

Once the songs are done, you File/Export. And that will produce several files.

spc700.bin is our main SPC file. It holds the program and the samples and the sound effects data.

music_1.bin (one file per song) is the song data.

sounds.asm and sounds.h we don’t need. Don’t include them. This was for a different assembler / C compiler. You might want to look at it to find the value of each sound effect.

.define SFX_DING 0

…tells us that the DING sound effect is called with the value zero.

If you look in

https://github.com/nesdoug/SNES_10/tree/master/MUSIC

you will see a file called Split.py. This is a python script to split the SPC file into 2 pieces if needed. Let me explain that a bit. We are mapping our SNES game as LoROM, which means that our banks are only 32k. The SPC file could be 64k in size. It needs to be split up to be included without editing the linker file.

Also the music loader function will fail to load correctly if it is >32k. Because of that, I have been copying the SPC file to the 7f0000-7fffff WRAM first, and then calling the INIT function (which copies the SPC file to the APU RAM).

The example code, however, is smaller than 32k, so this step is unnecessary.

 

CODE

Let’s go over the music.asm file, which you should have grabbed from one of my example folders. I had to modify the original code to work with ca65.

spc_init – should be called at the start of the game, with interrupts off (NMI, IRQ, controllers). With AXY16 you load A with the address of the of SPC file (spc700.bin) and X with the bank of the SPC file, and JSL to spc_init.

This is all well and good if the SPC file is < 32k, but if it’s over 32k, and we are mapped as LoROM (ROM banks of 32k), I have had to first copy all the SPC files to $7f0000-7fffff WRAM and then load A with 0000 and X with $7f and then JSL to spc_init.

spc_init expects the SPC file to be contiguous.

SPC file > 32k also means I need to either modify the linker script, or split the SPC file into 2 chunks, and include them into 2 different banks. I decided on the later. You could do this in a hex editor. I wrote a little python script to do the same (Split.py). If you had python3 installed, you would call a command line promt and type “Split.py spc700.bin” and it would split it into 2 files (smaller than 32k).

By the way. Running this function takes a long time. It could take 2 seconds or more.

spc_load_data is an internal function, for loading data to the APU RAM.

spc_play_song loads a song (data) to the APU RAM and then starts playing it. This also should be done with interrupts off. Note that this system only loads one song at a time to the APU RAM. If you have a song in and then load another song, it overwrites the first song.

With AXY16 load A with the address of the song data (ike music_1.bin) and load X with the bank of the song, then JSL to spc_play_song. Once it’s done, it will begin playing the song automatically.

spc_command_asm is an internal function. It’s what sends signals to the APU.

spc_stereo is to set mono (default) or stereo audio. Load A (8 or 16) with 0 for mono, 1 for stereo. Audio channels can be panned left or right.

spc_global_volume is to set the max volume, 0-127. It can also be used to fade in or fade out. One of the variables is called speed, and it is the step value, to go from previous volume to the new volume. 255 is the default speed, which is instant change (any value >= 127 would be instant). Speed of 7 seems nice for a fade, and will take 2 seconds to transition. Don’t give it a speed of zero, the volume won’t change.

AXY8 or AXY16, load A with the speed of volume change (1-255), and load X with the new volume (0-127), then jJSL spc_global_volume.

The SNES has a master volume variable, which affects all channels. That’s what this sets, and doesn’t affect individual channel volumes.

spc_channel_volume sets the max volume for an individual audio channel. AXY8 or AXY16, load A with the channels and load X with the volume (0-127) and the JSL to spc_channel_volume. I’m not sure what circumstances I would use this. Maybe to silence or dim a lead instrument, for a change in dramatic tone.

Note, the channel here is a bitfield, with each bit representing a channel.

0000 0001 = channel 1
0000 0010 = channel 2
0000 0100 = channel 3
0000 1000 = channel 4
0001 0000 = channel 5
0010 0000 = channel 6
0100 0000 = channel 7
1000 0000 = channel 8

For example, LDA #$42 (0100 0010) would effect channels 2 and 7.

music_stop stops the song. JSL here.

music_pause will pause and unpause the song (and not effect the sound effects that are playing). Load A (8 or 16) with 1 for pause and 0 for unpause, then JSL here.

sound_stop_all stops all sounds, song and sound effects. JSL here.

sfx_play_center plays a sound effect, pan center. With AXY8 or AXY16, load A with the # of the sound effect, load X with the max volume of the sound effect (0-127), and load Y with the channel (0-7), the sound effect should play. Channel needs to be higher than the max channel for the song playing. Therefore, you must reserve some empty channels in the song, if you want sound effects to play with it.

sfx_play_left, is the same, but pan left.

sfx_play_right, is the same, but pan right.

sfx_play is an internal function that the 3 above functions call.

Streaming has been removed.

 

EXAMPLE CODE

This is the audio loading code from the example. It was for an SPC file smaller than 32k.

AXY16
;copy music to 7f0000
BLOCK_MOVE (music_code_end-music_code), music_code, $7f0000

;copy the music code and samples to the Audio RAM 
lda #0000
ldx #$7f ;address 7f0000
jsl spc_init

AXY16
lda #$0001
jsl spc_stereo

…and at the bottom we have

.segment "RODATA6"
music_code:
.incbin "MUSIC/spc700.bin"
music_code_end:

.

In another test project, I had an SPC file bigger than 32k. I split the SPC file in half and put then in ROM banks 6 and 7.

.segment "RODATA6"
music_code:
.incbin "MUSIC/spc700_1.bin"

.segment "RODATA7"
music_code2:
.incbin "MUSIC/spc700_2.bin"
music_code2_end:

And the copying to the APU RAM was the same, except that I had 2 move instructions to copy to $7f0000.

;copy music to 7f0000
BLOCK_MOVE $8000, music_code, $7f0000

BLOCK_MOVE (music_code2_end-music_code2), music_code2, $7f8000

;copy the music code and samples to the Audio RAM 
lda #0000
ldx #$7f ;address 7f0000
jsl spc_init

…so that spc_init could load the entire SPC file as one contiguous chunk.

.

Then I load the song, and start it playing (before I turn on NMI interrupts).

AXY16
lda #.loword(song1)
ldx #^song1
jsl spc_play_song

By the way “.loword()” gets a 16 bit value from a 24 bit label. ^ gets the bank of a label.

.

Now I just need to set up a trigger for the sound effect. We already have that yellow block triggering the screen to go dark, so I just snuck in a little more code there. I didn’t want it re-starting the same sound effect over and over and over each frame, so I added a variable to remember the LAST FRAME, if we were over the yellow block, and skip a trigger in that case.

cmp bright_var2 ;compare to last frame
beq Past_Yellow ;skip if last frame is true

AXY8
lda #0 ;= ding
ldx #127 ;= volume
ldy #6 ; = channel
jsl sfx_play_center

Our song plays from channels 1-4 (ie. 0-3), and our sound effect uses 2 channels, so we could have set this to 4,5, or 6. This function is zero based index, ie. values 0-7. So 6 means it will play on channels 7 and 8. Sorry for flip flopping between zero based and one based numbers. Hope this isn’t too confusing.

However, if we loaded X with 0,1,2, or 3. It would not play. If we loaded X with 7, only the first channel of the sound effect would play.

 

Here’s a picture of the demo again. It looks the same as the previous example.

Example09

Here’s a Youtube video, if you want to hear it.

 

https://wiki.superfamicom.org/spc700-reference

https://wiki.superfamicom.org/bit-rate-reduction-(brr)

.

There are other programs for getting music onto a Super Nintendo.

You could use SNESMOD with OpenMPT. I still need to research this more before I can recommend it. I have heard that a version of SNESMOD by AugustusBlackheart and KungFuFurby is good. Sorry I can’t be more informative here.

Another program, BRRTools, can convert audio files to BRR. I haven’t used it, but the SNESGSS tool uses the same code. It says you can turn BRR samples into WAV and WAV into BRR. This could be a way to use existing BRR samples in our SNESGSS projects (by using this tool to convert them into WAV files first).

https://www.smwcentral.net/?p=section&a=details&id=17670

 

SNES main page

 

BG Collision

SNES programming tutorial. Example 9.

https://github.com/nesdoug/SNES_09

 

This time we are going to make a collision map, and make a sprite collide with the background. The actual graphics are not that important.

I took some pictures of some blocks (and a sketch of a cube with eyes) and Photoshopped them (GIMP) into 16×16 sized PNG (indexed 16 colors) and converted them to SNES .chr files with Superfamiconv. Actually, I expanded them to 32×16, with the left just filled with black. I thought that would give us a consistent zero index color of black, and it seems to have worked. Then I loaded everything into my M1TE tool.

I made the BG map in M1TE. It would have been nice if it supported 16×16 tiles, (all of our blocks are 16×16) but I didn’t program in 16×16 tile mode yet. I can load this into the game, no problem, but there is no easy way to make a collision map out of this. I could type one by hand. It would be an array of numbers 16×14 (224 total). Zero for blank, 1 for wall. Each number would represent a 16×16 square area of the screen.

M1TE

This time I used Tiled Map Editor to make a collision map. I loaded a picture of my 3 tiles as the tileset (see right side), and recreated the BG map that I had made in M1TE. The entire purpose of this is to export a .csv file of our collision map… which is that collision array I was talking about.

Tiled

The CSV file exported from Tiled.

csv

I added some .byte directives so it can be loaded as a byte array into the asm code. 0 is blank, 1 is red wall, 2 is the yellow square. Maybe in the future I will program some clever app or tool to speed this up. This time I just copy and pasted the word “.byte” to each line and resaved it.

byte

 

Our code calculates where our guy is on the map, and prevents movements if it’s over a 1 (wall). We now collide with the red walls.

Example09

How does it do that? Let’s go over the code. So our byte array has each block 16×16. We need to divide x and y pixel coordinates by 16 (the same as shift right 4x). But we also need to multiply the y by 16 to get to the correct row in our array, which cancels out the divide 16. So the algorithm is (Y & 0xf0) + (X >> 4). If we look at that index in the byte array, it will tell us if a point is in a wall or not. This is the code, with X and Y registers holding the X and Y coordinates…

tya
and #$f0
sta temp1
txa
lsr a
lsr a
lsr a
lsr a
ora temp1
tax
lda HIT_MAP, x
rts

I handled each direction separately. First do the X move, then see if any of the corners of our sprite are inside a wall. If yes, revert to previous X position. Then do the Y move, see if any of the corners of our sprite are inside a wall. If yes, revert to previous Y position.

This code would need to be a little more complex if we move more than 1 pixel per frame. If we are moving 2-3 pixels per frame, and the distance to the wall is 1 pixel, we should allow 1 pixel movement toward the wall… and not be stuck 1 pixel away from the wall. So, this code will need to be improved.

.

Touching the yellow square will darken the screen. We are just looking if 1 point (the middle of our guy) is over a 2 in the collision map, and changing the screen brightness variable. Remember that the $2100 register is the screen brightness. I am writing to it every frame, during v-blank. Full brightness is $0f. Half brightness is $07.

Example09b

If we were scrolling in a larger world, the collision map would have to be the size of the world. You could have it compressed, and decompress it to the WRAM. You would have to keep track of X and Y movements with 2 byte variables. One thing I would not recommend is trying to read from the VRAM to see what kind of tile you are standing over. The visuals of the level should probably be separate from the collision map.

One more thing. It wouldn’t be too much trouble to turn this simple example into a platformer. You would just need to add gravity, which is adding a little bit to the Y speed every frame, and then cancelling that if your feet touch the floor. Jumping would be a sudden negative Y speed.

This is a really cool page that explains collision maps in more detail.

http://higherorderfun.com/blog/2012/05/20/the-guide-to-implementing-2d-platformers/

 

SNES main page

 

BG Scrolling

SNES programming tutorial. Example 8.

https://github.com/nesdoug/SNES_08

 

So, this isn’t so complicated. I’m using the Example 4 backgrounds, and scrolling them with the controllers. I’m not going to go over the process of making backgrounds again. We will just talk about the scrolling code.

If you press A, B, X, or Y, you will toggle which background is selected. Visible by the sprite in the corner (1,2,3). This is the map_selected variable, which has a value 0-2.

The up/down/left/right functions will do a case switch style check on the map_selected variable. Normally, you would do CMP #1, CMP #2, CMP #3, etc. But you don’t actually need to do a CMP #0. This is something I see new 6502/65816 programmers do. The previous line “lda map_selected” already sets the z flag if map_selected is zero. Lot’s of instructions set the z (zero) and n (negative) flags. LDA, LDX, LDY, TAX, TXA, TXY, PLA, PLX, PLY, etc. If a register is loaded with zero, the z flag is set and BEQ will work.

Right_Handler:
.a16
.i16
  php
  A8
  lda map_selected
  bne @1or2
@0: ;BG1
  dec bg1_x
  bra @end
@1or2:
  cmp #1
  bne @2
@1: ;BG2
  dec bg2_x
  bra @end
@2: ;BG3
  dec bg3_x
@end: 
  plp
  rts

Let’s follow this for each value. If map_selected is zero, the BNE won’t branch, it goes to the @0, dec bg1_x and then exits. If map_selected is 1, the first BNE will branch to @1or2. A is still loaded with map_selected, we compare it to #1, the BNE won’t branch, so we do @1, dec bg2_x and exit. If map_selected is 2, the first BNE branches to @1or2, cmp #1 is false, so the bne @2 branches us to th @2 dec bg3_x line.

Notice, moving the map right means decreasing the horizontal scroll variable. Moving it left means increasing it. Likewise, moving a screen down is decreasing the vertical scroll, and moving it up is increasing it.

Scrolling registers are write twice (8 bit) each. Always write twice. You can actually write to these registers any time, but we want to do it during v-blank so we don’t get any shearing of the background in the middle for 1 frame. Near the top of the game loop, we have jsr set_scroll. Let’s look at set_scroll.

lda bg1_x
sta bg1_scroll_x ;$210d 
stz bg1_scroll_x
lda bg1_y
sta bg1_scroll_y ;$210e
stz bg1_scroll_y

lda bg2_x
sta bg2_scroll_x ;$210f
stz bg2_scroll_x
lda bg2_y
sta bg2_scroll_y ;$2110
stz bg2_scroll_y

lda bg3_x
sta bg3_scroll_x ;$2111
stz bg3_scroll_x
lda bg3_y
sta bg3_scroll_y ;$2112
stz bg3_scroll_y

bg1_x is a 1 byte variable, because our maps are set to 1 screen only (32×32 map and 8×8 tiles). If you made the tilemap bigger (or made the tile size larger), you would need 2 bytes for each scroll variable. With 64×32 our x needs 9 bits. If you also increase tilesize to 16×16 then we need 10 bits.

You can move each layer independently. Usually, you would have BG1 be the foreground and BG2 be the background and BG3 be either the far background or the HUD (scoreboard) always fixed in one place in the front.

Example08

 

SNES main page

 

Pong. Sprite collisions.

SNES programming tutorial. Example 7.

https://github.com/nesdoug/SNES_07

 

I made a simple Pong demo to show sprite collisions.

Well… I was trying to keep it simple, but I decided to use some of the more complicated code I have previously written. Copied to the library.asm file from some of the EasySNES files. oam_spr (copies one sprite to the buffer), oam_meta_spr (copies multiple sprites to the buffer), oam_clear (clears the buffer), map_offset (gets an address from a specific x/y coordinate in a map). I did change the return from these functions from RTL to RTS, because all of our code is in the same bank.

Link if you are interested.

https://github.com/nesdoug/SNES_00/blob/master/easySNES.asm

These functions are complex, and you don’t really need to understand them right now. Just focus on the sprite collision code and drawing numbers to the scoreboard.

check_collision is new. I will discuss that a bit later.

Let’s talk about the process of making this. I made a circle gradient in GIMP for the background, and converted to indexed 4 color (with dithering). Sized 256×192 (it won’t cover the entire screen).

Grad

Saved as uncompressed PNG. Converted to .chr .map .pal with superfamiconv. Loaded the background in M1TE (my tile/map editor).

BG1

Then I drew some numbers for BG3, and filled a little on the top and bottom.

BG3

Clicked the priority checkbox for this map.

Priority

Saved all the maps and tiles and palette. Pretty much the same as previous examples of loading a background.

Now I opened SPEZ (my sprite editor) and drew some simple box shapes for the ball and paddle. Saved them as metasprites.asm and saved their tiles (chr) and palette.

SPEZ7

Everything is .incbin -ed in the main.asm file. We are loading everything just like the previous examples, with DMAs to the VRAM. One difference is that I wrote a macro for DMAs to the VRAM. This made the code a little easier to read and write. Let’s look at an example…

DMA_VRAM $700, Map1, $6000

This is the DMA_VRAM macro definition…

.macro DMA_VRAM length, src_addr, dst_addr
;dst is address in the VRAM
;a should be 8 bit, xy should be 16 bit
ldx #dst_addr
stx vram_addr

lda #1
sta $4300 ; transfer mode, 2 registers 1 write
; $2118 and $2119 are a pair Low/High
lda #$18 ; $2118
sta $4301 ; destination, vram data
ldx #.loword(src_addr)
stx $4302 ; source
lda #^src_addr
sta $4304 ; bank
ldx #length
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0
.endmacro

So where it says length, the macro will insert the $700 bytes (not $800, because the screen is only 224 pixels high, so I’m not filling the entire 256 pixel high map). Where it says src_addr, it replaces it with Map1. Where it says dst_addr, it replaces it with VRAM address $6000. All that code could be written in one line.

DMA_VRAM $700, Map1, $6000

Doesn’t this look nicer though? Simple. Elegant. Easy to read. Macros are your friends.

Everything between InfiniteLoop and line 426 jmp InfiniteLoop is the game loop. Every frame we wait till v-blank. Copy the OAM_BUFFER to the OAM. Print the score to the top of the screen. Read the controllers. Move the paddles if up or down are pressed.

  lda pad1
  and #KEY_UP
  beq @not_up

@up:
  A8
  lda paddle1_y
  cmp #$20 ;max up
  beq @not_up ;too far up
  bcc @not_up

  dec paddle1_y
  dec paddle1_y

  dec paddle2_y
  dec paddle2_y

@not_up:

This code is moving both paddles, because this is just example code. You could modify it, so that controller2 moves the paddle on the right. Copy this whole thing, and replace pad1 with pad2, and only move paddle2. Also change the label names, so you don’t have duplicates.

We are only moving the ball while it is “active”. Press START to make it active, and choose a random direction to go (based on a frame counter).

lda #1
sta ball_active

ball_x_speed and ball_y_speed are the directions of the ball. Either 1 or -1 ($ff). Every frame we are adding the speed variable to the position variable. If speed is 1, we add 1 and it moves it to the right 1 pixel.

If the ball is active, it moves up/down until it reaches the ceiling or floor.

;bounce off ceilings
cmp #$20
bcs @above20

lda #1
sta ball_y_speed

;bounce off floor
lda ball_y
cmp #$c7
bcc @ball_done

lda #$ff ; -1
sta ball_y_speed

Sprite Collisions

It moves left/right until it reaches the end of the room. But we want it to bounce off the paddles, so we need to check collisions with hitboxes. I wrote this a long time ago (modified slightly). It’s the check_collision function in the library.asm file.

So we need the dimensions and location of the 4 sides of both boxes. That’s 8 numbers, that I copy to these variables…
obj1x, obj1w, obj1y, obj1h
obj2x, obj2w, obj2y, obj2h
x = left side of sprite object
w = width (minus 1), added to x to get the right side
y = top side of the sprite object
h = height (minus 1) , added to y to get the bottom side

I defined some of these with constants at the top of main.asm

BALL_SIZE = 7
PADDLE_W = 7
PADDLE_H = 47

Of course, the x and y values are changing. Those are defined as variables in the zero page (direct page).

paddle1_x, paddle1_y
paddle2_x, paddle2_y,
ball_x, ball_y

I copy these to the obj1 obj2 stuff, and then call check_collision, which sets the “collision” variable to 0 or 1. If collision is true, we bounce the ball. This collision check is for 8 bit positions only, and assumes that no object goes off the screen at all. The code won’t work right at the very edges of the screen.

Here’s what the collision code is doing, under the hood, in some optimized ASM.

if((obj1_right >= obj2_left) &&

(obj2_right >= obj1_left) &&

(obj1_bottom >= obj2_top) &&

(obj2_bottom >= obj1_top)) return 1;

else return 0;

 

.

I’m also drawing the sprites each frame. That uses the oam_meta_spr library function to read from a metasprite data set in metasprites.asm. I used SPEZ (my sprite editor tool) to generate this file.

Each sprite needs x, y, tile, attributes (flipping and palette), and size. The byte arrays listed in metasprites.asm provides all that. We just need to give it the starting x and y positions.

So, copy the x to spr_x, and the y to spr_y, and then loading A and X with the address of the metasprite data, and call our function. Remember ^ is for bank number.
A16
lda #.loword(Meta_00) ;left paddle
ldx #^Meta_00
jsr oam_meta_spr

And this will automatically put all the data in the OAM_BUFFER at the correct x and y positions. It also adjusts the high table bit shifting and keeps track of exactly how many sprites have been added (sprid).

*spr_x is 9 bits (uses 2 bytes). If the sprite never leaves the screen, just leave the upper byte of spr_x as zero. If you pass it more than 9 bits, it will ignore the extra bits.

The ball uses another function, oam_spr. This is for putting 1 sprite in the OAM BUFFER. You have to provide all the details of the sprite. Pass the x (9 bits) to spr_x, the y to spr_y, the tile # to spr_c, the attributes to spr_a, and set the size with spr_sz. spr_sz needs to be either 0 (small) or 2 (large). Then jsr oam_spr.

The metasprite function is actually easier to use, so you may choose to just use that every time. You just need to make a byte array of each sprite object.

Writing to the background

The print_score function always runs during v-blank. It has to, because it is writing to the VRAM. I’m using this map_offset function (in library.asm). It wants you to load X with the tile’s x position 0-31 and load Y with the tile’s y position 0-31. If you only have pixel X and Y, just shift right (lsr a) 3 times to get the 0-255 value to 0-31 (tile) for 8×8 tiles.

map_offset does some bit shifting to convert that to a VRAM address. It returns A16 = the offset. You add that to the base address (our BG3 map is at $7000).

ldx #12
ldy #1
jsr map_offset ; returns a16 = vram address offset
clc
adc #$7000 ;layer 3 map
sta vram_addr

and then copying 2 values per number on screen. We are writing with the VRAM increment set to +32. That means that the second write will go below the first one.

lda #V_INC_32
sta vram_inc ;$2115

Some of these values might be hard to understand, like, why are we adding $10 to the points_L? Our tiles for numbers begins at $10. 0 is at $10. 1 is at $11, etc. Anyway, here’s our finished program. Press START to get it going.

Example7

.
Try to make this into a game by having controller 2 to move the right paddle.

The ball is a bit slow, though. Moving 2 pixels per frame might be too fast. It would be best to use “fixed point” math, that’s a 16-bit variable for ball speed and position, where the upper byte refers to a pixel position, and the lower byte is a sub-pixel position (and speed). Then we could have 1 1/2 pixel per frame movement.

I wish we had some sound effects too. Maybe a little later for that.

 

SNES main page

 

 

Controllers and NMI

SNES programming tutorial. Example 6.

https://github.com/nesdoug/SNES_06

Controller reads

There is a set of registers that can be read like NES registers. Originally, they wanted to make it easy to transition from programming NES games to programming SNES games. They even used the same number $4016 and $4017n (ports 1 and 2). However, you shouldn’t read these. Instead you should turn on the auto-read feature… and also the NMI enable from register $4200.

With auto-controller reads set, the CPU will interrupt itself soon after the end of each frame and read all the buttons from both controllers and then store the values at $4218-$421b.

$4218-19 port 1
$421a-1b port 2
(if a multitap for 4 player games installed, 421c-d and 421e-f for controllers 3+4)

The button order is…
KEY_B = $8000
KEY_Y = $4000
KEY_SELECT = $2000
KEY_START = $1000
KEY_UP = $0800
KEY_DOWN = $0400
KEY_LEFT = $0200
KEY_RIGHT = $0100
KEY_A = $0080
KEY_X = $0040
KEY_L = $0020
KEY_R = $0010

And I use these constants as a bit mask (AND operation) to isolate the buttons.

The pad_poll function also does some bit twiddling to figure out which buttons have just been pressed this frame.

pad1 and pad2 are any button that is down, even if you been holding it down.
pad1_new and pad2_new are buttons that have just been newly pressed this frame.
We need call pad_poll each frame. How do we know that a new frame has started. That’s where the NMI comes in.

NMI

When the screen is on, the PPU spends most of it’s time drawing pixels to the screen, one line at a time. Starting at the top, it goes left to right and draw a line. Then it jumps to the left and draws the next line.

It does this so fast you can’t see it. But, since the PPU is busy, you can’t send new data to the VRAM. You can’t send new data to many of the PPU registers, such as the OAM. But when the screen is done drawing, the PPU rest in a vertical blank period for a little bit. During this v-blank period, you CAN access the PPU registers.

If you turn on NMI interrupts, when the PPU is done drawing to the screen… nearly at the very beginning of v-blank, the PPU sends an NMI signal. This happens every frame, which is 60 times a second (50 in Europe). That signal causes the CPU to pause and jump to the NMI vector (an address stored at $00ffea in the ROM). We have it set to jump to NMI: which is located in the init.asm file. (note, the NMI code needs to be in the 00 bank, or it’s mirror, the $80 bank).

The NMI code is just this.

pha
lda $4210 ; it is required to read this register
; in the NMI handler, don’t ask me why.
inc in_nmi
pla
rti

(many game have much more elaborate NMI code, btw).

And our code is waiting for the in_nmi variable to change. When it changes we know that we are in the v-blank period. Now is a good time to write to PPU registers or send data to the VRAM. But, also, we are using this to time our game loop.

wait_nmi: waits until we are in v-blank. We call this at the top of the game loop. Notice that I put a WAI (wait for interrupt) instruction here. If you neglected to turn NMI interrupts on, this would crash the game, as it waits forever for a signal that never comes. IRQ interrupts could also trip the WAI instruction, which is why I also wait for the in_nmi variable to change to be sure. You could delete the WAI instruction, if you would like*. Some games use this waiting loop to spin a random number generator. You could do that as well…. like adding a large prime number over and over, or just ticking a variable +1 over and over.

* someone told me that WAI could make an emulator run less laggy, as it would have less to do each frame. It also saves electricity, because the CPU uses less while it waits. You decide if you need it or not.

Soon after the wait_nmi function runs, we run our DMA to the OAM (copy our sprite buffer to the sprite RAM). This needs to be done during v-blank, which is why we do it first. Then, we run our pad_poll to read new button presses. Then we enter the game logic. Here’s an example of what we are doing to move the sprite.

Our sprite is composed of 3 sprites that move together (16×16 each). Each time we press the right button, we need to increase the X value of each sprite. Left, we decrease the X values. Each sprite uses 4 bytes, so each sprite X value is 4 bytes apart. So we do this…

  AXY16
  lda pad1
  and #KEY_LEFT
  beq @not_left
@left:
  A8
  dec OAM_BUFFER ;decrease the X values
  dec OAM_BUFFER+4
  dec OAM_BUFFER+8
  A16
@not_left:

LDA loads the A register with pad1, which has all the button presses for controller 1. We apply a bit mask (AND) to isolate the left button. If it is zero, the button isn’t being pressed, and it will branch (BEQ) over our code. Otherwise, it will then to the dec OAM_BUFFER lines. Dec can be 8 bit or 16 bit, depending on the size of the A register. We want 8 bit, so we A8 for that. We need the A16, to make sure we exit this bit of code with A always in 16 bit mode.

We repeat that process 3 more times for RIGHT, UP, and DOWN buttons.

And these values are copied to the OAM each frame, which moves our sprites on screen. Let’s look closely what happens when you scroll off screen. Y values 224 to 256 are off screen to the bottom, but they would eventually wrap around to the top.

example6b

But moving the sprite to the left, from x=00 to x=$ff (255)

example6d

The sprite suddenly disappears. Which is weird.

example6f

That’s why we need that 9th x bit in the high table. Here’s what it looks like at the far right without the high table x bit set.

example6c

Here’s with the high table x bit set.

example6e

So we need it for smoothly moving off the left side of the screen.

We didn’t do that in this example, but I worked up some code that can manage this, for next time.

.

Final note. The code here is not particularly good. Keeping the sprite variables in the OAM_BUFFER itself is not very good practice. I have seen other tutorials do this, and I did it so it would be easier to understand, but I don’t like it. It would be better to keep every sprite object as a set of variables (x and y maybe as 16-bit variables), that get copied to the OAM_BUFFER in a dynamic way, so that enemy objects can be created and destroyed without causing holes/gaps in the OAM_BUFFER. With that approach, you would clear the buffer at the start of each frame, and then draw the necessary sprites as needed every frame.

That might be slower code, but much more flexible.

.

SNES main page

 

Sprites

SNES programming tutorial. Example 5.

https://github.com/nesdoug/SNES_05

 

Sprites are the graphic objects that can move around the screen. Nearly all characters are made of sprites… Mario, Link, Megaman, etc. The OAM RAM controls how each sprites appear.

Mario

You will notice that Mario is made of 2 16×16 sprites. It is common to use more than 1 sprite for a character. Rex is also made of 2 16×16 sprites, with the lower sprite several pixels to the right of the top one. You can also layer sprites on top of each other, but with 15 colors to choose from, you shouldn’t have to.

You could increase the large sprite size to 32×32, but that would end up wasting more VRAM space on blank spaces. 8×8 and 16×16 are more common. I call it a “metasprite” when it is a collection of multiple sprites to make up 1 character. The SPEZ sprite editor I wrote saves these as tables of numbers (metasprite/save options). And the tiles save as 4bpp CHR files.

https://github.com/nesdoug/SPEZ

I actually used SPEZ to draw the example sprites, but you may choose to use YY-CHR or some other tile editor. But let’s go over how sprites work.

spez

 

OAM

The official docs call sprites “objects”. You need to write data to the OAM RAM to get them to show up on screen.

There are 2 tables in the OAM, and you need to write both of them, usually a DMA during v-blank or forced blank. (v-blank is the vertical blank period, the slight pause after each frame is drawn to the TV where the PPU isn’t doing anything, and can be updated for the next frame).

Low Table

The low table (512 bytes) is divided into 4 bytes per Sprite, with sprite #0 using bytes 0,1,2,3 and sprite , #1 using bytes 4,5,6,7, etc… up to sprite #127. 4 x 128 = 512 bytes.
Those bytes are, in this order…
x, y, tile #, attributes.
X and Y are screen relative, in pixels (for the top left of the sprite).

Attributes

vhoopppN
v vertical flip
h horizontal flip
oo priority
ppp palette
N 1st or 2nd set of tiles (you can have up to 512 tiles for sprites).

The High Table

There are 32 bytes in the high table for 128 sprites. That’s 2 bits per sprite, and it can be very tedious to manage. Lots of bit shifting. The bits are

sx (s upper bit, x lower bit)
s= size (small or large)
x = 9th bit for x

The extra X bit is so you can smoothly move a sprite off the left side of the screen. With that bit set and the regular X set to $ff, that would be like -1. Whereas, without the extra X bit, $ff would be the far right of the screen, with only 1 pixel wide showing.

How are the 2 bits put together?
Let’s say,
Sprite 0 = aa
Sprite 1 = bb
Sprite 2 = cc
Sprite 3 = dd
The the first byte of the high table is
ddccbbaa
or (dd << 6) + (cc << 4) + (bb << 2) + aa

Palettes

Sprites use the second half of the CGRAM (palette). It is 15 colors + transparency for each palette. Sprite palette #0 uses indexes 128-143. Sprite palette #1 uses indexes 144-159. And so forth.

Priorities

I like to set sprite priority to 2. That would be in front of bg layers (but behind layer 3 if it’s set as super-priority in front of everything). Higher sprite priority would be in front of sprites with lower priority.

Besides priorities…Low index sprites will go in front of higher index ones. Sprite #0 would be in front of Sprite #1. Sprite #1 would be in front of Sprite #2. Sprite #2 would be in front of Sprite #3. Etc.

There is a limit to how many sprites can fit on a horizontal line. And using larger sprites doesn’t improve that, internally it splits sprites up into 8×1 slivers, and only 32 slivers can fit on a line. The 33rd one disappears. Because of this, you could shuffle the sprites every frame. That’s a lot of sprites, so I see most games just ignore this problem, and try not to put too many sprites on each line. Space shooter games (lots of sprites on screen at once) re-order the sprites in the OAM manually every frame. Some kind of shuffling algorithm, to make sure no bullets hit you that you couldn’t see.

Caution. Don’t put sprites at X position 0x100. (with the 9th bit 1 and the regular X at 00) They will be off screen, but will somehow count towards the 32 sprites per line limit.

Clearing Sprites

If you leave the OAM zeroed, it will display sprites at X=0, Y=0, Tile=0, palette=0… and the top left of the screen would have 128 sprites on top of each other. If you just want ALL sprites off screen, you could just turn them off from the main screen ($212c). But to put an individual sprite off screen, you should put its Y value at 224 (assuming screens are left to the default 224 pixel height). This would put 8×8,16×16, and 32×32 sprites off screen, but 64×64 sprites would wrap around to the top of the screen… so it’s a good idea to also reset the sprite size bit to 0.
.

Let’s go over the code.

Code

We need to change a few settings, first.
$2101 sets the sprite size and the location of the sprite tiles.
sssnnbbb
sss = size mode*
nn = offset for 2nd set of sprite tiles. leave it at zero, standard.
bbb = base address for the sprite tiles.
Again, the upper bit is useless. So, each b is a step of $2000.

* size modes are

000 = 8×8 and 16×16 sprites
001 = 8×8 and 32×32 sprites
010 = 8×8 and 64×64 sprites
011 = 16×16 and 32×32 sprites
100 = 16×16 and 64×64 sprites
101 = 32×32 and 64×64 sprites

.

lda #2
sta $2101 ; sprite tiles at VRAM $4000, sizes are 8×8 and 16×16

And we need to make sure sprites show up on the main screen.

lda #$10
sta $212c ; main screen

https://wiki.superfamicom.org/sprites

https://wiki.superfamicom.org/registers

From here on out, I am going to use BUFFERS. Buffers are temporary locations in local RAM that will be copied (DMA) each frame to the actual memory (the OAM RAM)… during the v-blank period. Well, next time we will do that. In this example, we are doing it once during forced blank (2100 bit 7 set), which is also fine.

We are using a block move macro to copy from the ROM to the BUFFER.

BLOCK_MOVE 12, Sprites, OAM_BUFFER

to set up a MVN operation (to copy a block of data from the ROM to the RAM). See macros.asm for details.

And I’m writing just one byte to the high table. We only need 3 sprites in this example, so we will only need 2×3=6 bits, setting the size of each to large (16×16).

lda #$2A ;= 00101010
sta OAM_BUFFER2

Now I will DMA both tables at once. A DMA to the OAM looks like this…

; DMA from OAM_BUFFER to the OAM RAM
ldx #$0000
stx oam_addr_L ;$2102 (and 2103)

stz $4300 ; transfer mode 0 = 1 register write once
lda #4 ;$2104 oam data
sta $4301 ; destination, oam data
ldx #.loword(OAM_BUFFER)
stx $4302 ; source
lda #^OAM_BUFFER
sta $4304 ; bank
ldx #544
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

That’s 544 bytes being copied to the $2104 (OAM DATA register) after we zeroed the OAM address registers ($2102-3). We need to write the whole thing. I recommend always writing to the OAM with a 544 byte DMA. Even if you don’t want to set any of the high table bits, write both tables every time. The first demo I made didn’t work on a real SNES because I failed to write to the high table. DMA all 544 bytes to the OAM, and you won’t have problems.

The data we are transferring looks like this…

Sprites:
;4 bytes per sprite = x, y, tile #, attribute
.byte $80, $80, $00, SPR_PRIOR_2
.byte $80, $90, $20, SPR_PRIOR_2
.byte $7c, $90, $22, SPR_PRIOR_2

With the top left sprite at x = $80 and y = $80. We are using tiles 00,20,22, and all of the sprites use palette #0 and priority #2 (above BG layers).

And this is what it looks like.

example5

Try drawing your own sprite, and getting it to show up on screen.

 

SNES main page

 

 

Layers / Priority

SNES programming tutorial. Example 4.

https://github.com/nesdoug/SNES_04

Last time we created a background (tiles and map) and got it to show up on screen. This time we are going to add more layers.

In Mode 1, we get 3 background layers. Layer 1 and 2 are 4bpp (16 color) and Layer 3 is 2bpp (4 color). If you make the 2bpp tiles in YY-CHR, select the 2bpp GB setting. I made the graphics in GIMP and converted to 4 color indexed mode, saved to PNG (with no compression), and converted to SNES tiles with superfamiconv (with the BPP set to 2).

See the -B 2 setting in the batch file…

https://github.com/nesdoug/SNES_04/blob/master/ImageConverter/convert.bat

https://github.com/Optiroc/SuperFamiconv

Now I loaded all the maps and tiles to M1TE and quickly drew a line across layer 2, just to test it. Here’s each layer…

bg1 layer 1

bg2 layer 2

bg3 layer 3

Now let’s talk about how the layers work. Normally, layer 1 is on top, then layer 2 is next, and layer 3 on the bottom. Like this.

Example4

But each tile on the map has a PRIORITY setting. Normally, this is to determine if the BG tile will go behind a sprite on the same layer, or in front of it. In mode 1, the layers go like this…

(top)
Sprites with priority 3
BG1 tiles with priority 1
BG2 tiles with priority 1
Sprites with priority 2
BG1 tiles with priority 0
BG2 tiles with priority 0
Sprites with priority 1
BG3 tiles with priority 1
Sprites with priority 0
BG3 tiles with priority 0
(bottom)

However, if bit 3 of $2105 is set, BG3 will be in front of everything (if the priority bit is set on the map). In M1TE, you can set all the priority bits for the whole map by checking a box.

Priority

I did that for BG3. The only difference between the picture above and below is the bit 3 of $2105 is set. (see these links for reference)

https://wiki.superfamicom.org/registers

https://wiki.superfamicom.org/backgrounds

Example4b

With$2105 d3 set and priority bits in BG3 map set, they appear on top. This can be very useful for text boxes that appear in front of everything, or a HUD / Scoreboard that you can always see. Because BG3 is only 2bpp, it won’t be very colorful, so it will be ideal for text messages.

The code for putting all this together is very similar to the previous page. The 2bpp tiles were loaded to $3000 in the VRAM with a DMA.

ldx #$3000
stx vram_addr ; set an address in the vram of $3000

lda #1
sta $4300 ; transfer mode, 2 registers 1 write
lda #$18 ; $2118
sta $4301 ; destination, vram data
ldx #.loword(Tiles2)
stx $4302 ; source
lda #^Tiles2
sta $4304 ; bank
ldx #(End_Tiles2-Tiles2)
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

and the maps were loaded similarly with DMAs. BG2 map to $6800 and BG3 map to $7000. The maps for those layers will then be loaded to those VRAM addresses.

lda #$03
sta bg34_tiles ; $210c put BG3 TILES at VRAM address $3000

lda #$60 ; bg1 map at VRAM address $6000
sta tilemap1 ; $2107

lda #$68 ; bg2 map at VRAM address $6800
sta tilemap2 ; $2108

lda #$70 ; bg3 map at VRAM address $7000
sta tilemap3 ; $2109

We need to make sure that all 3 layers are active on the main screen.

lda #BG_ALL_ON ;$0f
sta main_screen ; $212c

and that will give us this picture (same as above)

Example4

with BG3 behind everything.

When we flip bit 3… 00001000 at $2105, BG3 will show up on top (if their priority bits are set on the map). Note BG3_TOP is defined as 8.

lda #1|BG3_TOP ; mode 1, tilesize 8×8 all, layer 3 on top
sta bg_size_mode ; $2105

Like this.

Example4b

Each of these layers scroll independently of each other. You would adjust them with these registers. They are write twice (low then high).

$210d – BG1 Horizontal

$210e – BG1 Vertical

$210f – BG2 Horizontal

$2110 – BG2 Vertical

$2111 – BG3 Horizontal

$2112 – BG3 Vertical

(Note: not used in this example)

 

Maps, In Depth

All of our examples are with maps set to 32×32 tiles. (the screen is set to 224 pixels high, so you can’t see all the tiles at once). Each address in the map uses 2 bytes, since the VRAM is set up for 16 bits per address. It can be very confusing to look at in a Hex Editor (VRAM memory viewer) that show bytes, you will have to multiply x2 the VRAM address to find it in the hex editor. VRAM address $6000 is going to be found at $C000 in the emulator’s viewer.

Each map uses $800 bytes, but only $400 addresses (32×32 = 1024 = $400). They would be arranged like…

0,1,2,3,4… 31 =  1st row / top of screen
32,33,34,35… 63 = 2nd row
64,65,66,67… 95 = 3rd row
etc. on down to 32nd row, below the bottom of the screen

So, if you go down 1 on the map, you add 32 to the address.

But what if we made the map 64 wide (for BG1 $2107, bit 0 = 1). Think of it as 2 screens left and right. The first screen works exactly the same as if in 32×32 mode, with the 33rd map location being just below the 1st one. If the left screen is at $6000, the right screen would be at $6400. If you scrolled past the right screen, it would wrap back to the left one.

Or, we could have made the map 32 wide and 64 tall (for BG1 $2107, bit 1 = 1). That would be like 2 screens on top of each other. If the top screen is at $6000, the bottom screen would be at $6400. Scrolling down below the bottom screen would wrap back to the top.

Lastly, if we made the map 64×64 (for BG1, both bits 0 and 1 set). If the first screen is at $6000, the screens would be arranged like

$6000 – $6400

$6800 – $6c00

If tiles were set to 8×8 size ($2105, called “character size”), a 64×64 map would be 512×512 pixels in size.

If tiles were set to 16×16, the same map would be 1024×1024 pixels. This should explain why we need 16 bit scrolling registers.

 

Tiles in the Map

So, I said that each entry in the map is 16 bits. Those bits are arranged like this…

vhopppcc cccccccc
v/h = Vertical/Horizontal flip this tile.
o = Tile priority.
ppp = Tile palette.
cc cccccccc = Tile number.

Each tileset is theoretically as big as 1024 tiles (for BG).

And, one more thing about palettes.

Palettes in Mode 1

Palette2

4bpp tiles use an entire row (left to right). If you set it’s palette to 0, it uses the top row (indexes 0-15). Palette 1, the next row (indexes 15-31), and so forth down to the 8th row (palette 7). That’s indexes 0 – 127 for background. Sprites would use the indexes 128 – 255 similarly. Sprites also use 4bpp tiles and 16 colors per tile.

2bpp tiles (BG3) shares the top 2 rows. Each palette only uses 4 colors, so palette 0 uses indexes 0-3, palette 1 uses index 4-7, palette 2 uses index 8-11, and palette 3 uses index 12-15… all in the top row. Palettes 4-7 similarly would use the next row. Every 0th color in each palette would be transparent. I usually reserve the top row for BG3 and the other 7 rows for BG1 and BG2.

Behind all the layers, the universal background color shows (index 0 of the palette), wherever there are transparent pixels. This is true for every layer. The black that fills most of these pictures is the background color showing through.

 

SNES main page