Things I Forgot

There are so many topics to discuss. It would be very difficult to cover 100%. Let’s briefly mention a few of the things I forgot to mention.

Multiplication and Division

SNES has hardware to perform multiplication and division. Actually, it has 2 ways to do multiplication. You can look at my EasySNES code to find examples of these (link below, search for multiply: and divide:). They are a little slow, so you can’t expect to do this 100x a frame. You have to wait a several cycles before you get a result for the regular multipy and divide functions (see the NOP opcodes, which do nothing but wait).

If you aren’t using Mode 7, there is a second multiply function (signed) which is much faster. I need to rewrite that function (TODO). But the regular Multiply function works fine.

https://github.com/nesdoug/SNES_00/blob/master/easySNES.asm

Any other higher level math will have to be done with LUTs (look up tables), precalculated byte arrays.

Development Cart

I have a Super Everdrive, made by Krikzz, from Ukraine. And, ooh, they even have one that supports the SuperFX chip. I wish I had that one.

https://krikzz.com/store/home/54-fxpak-pro.html

Well, this one is over $200 US, but the basic model is less than $100, and it is well worth it. It works great. It uses a MicroSD card to hold the game ROMs.

Mode 7

This is the big enchilada, but I haven’t quite figured out this mode. Especially, setting up a tool chain for editing. Mode 7 can stretch and zoom and rotate. Many of the coolest SNES games use this in some way. It’s just currently above my skill level.

I do plan to work on this in the future. I wrote a tool called M8TE which can import an image to Mode 7 and create mode 7 maps.

IRQ Timers

This is for timing mid screen events. You should try to use HDMA instead. If all 8 HDMA channels are being used, you could do a 9th thing with IRQ timers.

You need to enable IRQ timers (probably just the V timer). and CLI to enable IRQs on the CPU. And you need to add code to the IRQ handler. Once set, the counter will trigger an IRQ signal once the PPU reaches a specific scanline. H counter would fire an IRQ signal every scanline, and you probably don’t want that.

Enhancement Chips

Another thing that is a bit over my head. The SA-1 chip is just a much faster 65816 chip, and that might be the easiest to use.

Some chip names = DSP1, DSP1A, DSP1B, DSP2, DSP3, DSP4, GSU1 (aka MarioChip1 aka SuperFX), GSU2, GSU2-SP1, OBC1, SA-1, S-DD1, S-RTC, SPC7110, ST010, ST011, ST018, CX4.

Some of the functions they do…

decompression code

trigonometry functions

image zooming and rotation

converting bitmaps to tiles

drawing vector graphics, triangles

real time clock

enemy AI functions (probably wouldn’t be useful to you)

.

The cool chip is the SuperFX chip (GSU). That’s what StarFox used. It would be nice if I could figure it out, and explain it. But, I can not.

Other Modes

Hi resolution. Modes 5 and 6 are double horizontal resolution. They can also, optionally, do an interlaced mode which doubles vertical resolution. Very few games used hi resolution.

Offset per tile Modes 2 and 4. I need to investigate these a bit more. I don’t want to put incorrect information here.

SRAM

For LoROM, SRAM is mapped to banks $70–$7D in the $0000-$7FFF addresses. And also in the $FE-$FF banks in the $0000-$7FFF addresses. (7e and 7f banks are the WRAM, so that couldn’t be used for SRAM). That gives a total possible 512kB SRAM (battery backed save RAM).

HiROM, as usual, is completely different.

https://en.wikibooks.org/wiki/Super_NES_Programming/SNES_memory_map

You will also need to indicate in the header that the ROM is using SRAM. I think that’s mapped to $FFD7, but it’s this line in the header.asm file

.byte $00 ; backup RAM size

The value is (2^# in kB). 3 is 8kB, 4 is 16kB, 5 is 32kB, 6 is 64kB, 7 is 128 kB, 8 is 256kB, and 9 is 512kB. 0 means 0kB.

Oh, and the previous line, mapped to $FFD6, should have the d1 bit (0000 0010) set. To indicate a battery for the SRAM.

Once you have correctly set this, the emulator should automatically be creating SRAM save files, that persist after power off. You can freely read and write to this anytime, and you can save your game by keeping the progress stored in the SRAM.

SNES main page

Color Math

SNES programming tutorial. Example 12.

https://github.com/nesdoug/SNES_12

What is color math? If you’ve ever worked with Photoshop, it would be like blending 2 different layers with the settings on Add or Subtract. In this case the layers are the MAIN screen and the SUB screen. Everything we have done so far deals with the MAIN screen. So let me try to explain the SUB screen.

All that stuff that the SNES does to produce a picture, putting layers on top of each other, tile priorities, sprite priorities, etc… it does all that TWICE. If you set the settings for the MAIN screen exactly the same as the settings for the SUB screen, it would produce the exact same picture TWICE… with 1 difference.  The main screen uses color index zero as the backdrop color (any pixel that is transparent), and the sub screen uses the “fixed color” as the backdrop color (register $2132).

You would never see the SUB screen, unless you turned on the color math registers, which would then blend the 2 pictures together, using either addition or subtraction. And then there is an optional halving step after that. Each pixel on the screen, the R values are added or subtracted, and the G value, then the B value. That value is clamped to the max and min without overflow. (each RBG value is 0-31)

Let’s say we have it set to ADD. And the main screen pixel is gray 15,15,15, and the sub screen pixel is dark red 10,0,0. The final pixel would be 25,15,15.

If we added the HALF option, each value would shift right once (rounding down), giving a final pixel of 12,7,7.

If we set the color math to SUBTRACT (no halving), the final pixel would be 5,15,15. The RGB values of the sub screen are subtracted from the RBG values on the main screen.

If we added the HALF option, each value would shift right once (rounding down), giving a final pixel of 2,7,7.

Note, any pixel in the sub screen that is transparent will not be halved.

The main use for Color Math is for transparency effects. You will want Adding and Halving. That would equally blend the main and sub screen.

The least useful setting is the subtract and halving. That would just produce a very dark picture, and almost no games used this.

.

There is a completely different kind of color math operation, that uses ONLY the fixed color. That color is applied to the entire MAIN screen, and if halving is set, it will work for the whole screen. If you set the fixed color register to green, and had the color math set to ADD, it would add a green tint to the screen.

The fixed color register $2132 is weird. The wiki example suggest writing each color separately to it (3 writes for R,B, and G). However, you could set them all to a specific value with 1 write. Such as LDA #$E0, STA $2132 would set all fixed colors to zero.

Before we dive into the code, here’s a video. You can probably skip most of this video, which goes into too many details about how the 2 different screens are generated.

Example ROM

I put BG1 on the main screen (gray rocks) and BG2 on the sub screen (color bars).

No effect. Color Math disabled.

SNES_12_000

Just the Sub screen. (seen by setting the “clipping always to black” bits in the color math logic, and adding the sub screen).

SNES_12_006

Note, the top left is black (non-zero index). The bottom left is zero index (transparent).  The sub screen will show the “fixed color” (register 2132) where there is transparent. Right now the fixed color is black. Color halving will not work for a transparent pixel on the sub screen. If you notice, the bottom left square will not change at all for these examples, even when halving is indicated.

Color Math Adding.

SNES_12_001

Color Math Adding and Halving.

SNES_12_002

Color Math Subtracting.

SNES_12_003

Color Math Subtracting and Halving.

SNES_12_004

Fixed color only (red at 50%), Color Math Adding.

SNES_12_005

Here’s a YouTube video of the Example code.

Example Code

$2130 
ccmm--sd
cc = main screen black if... *
--mm---- = prevent color math if... *
------0- = fixed color
------1- = sub screen
d is for an unrelated thing

* 00 => Never
  01 => Outside Color Window only
  10 => Inside Color Window only
  11 => Always

$2131
shbo4321
0------- add
1------- subtract
-0------ normal
-1------ result is halved
b = backdrop, o = sprites, 4321 = layers enabled for color math

$2132 (fixed color)
bgrccccc 
b/g/r = Which color plane(s) to set the intensity for. 
ccccc = Color intensity.

So let’s go over each examples.

1- no effect, turn off color math

lda #$30 ; = off
sta CGWSEL ; $2130
;and make sure fixed color is black
lda #$e0 ; RGB, value = 0
sta COLDATA ; $2132

2- adding

lda #$02 ; color math with subscreen
sta CGWSEL ; $2130

;adding, not half, affect all layers 
lda #$3f
sta CGADSUB ; $2131

3- adding and half, same as last one, just add one bit to the 2131 write

;adding, half, affect all layers 
lda #$7f
sta CGADSUB ; $2131

4- subtracting

lda #$02 ; color math with subscreen
sta CGWSEL ; $2130

;subtracting, not half, affect all layers 
lda #$bf
sta CGADSUB ; $2131

5- subtracting and half. Same as last one, but add one bit to the 2131 write

;subtract, half, affect all layers	
lda #$ff
sta CGADSUB ; $2131

6- fixed color only

;turn on color math, fixed color mode
lda #$00
sta CGWSEL ; $2130

;adding, not half, affect all layers 
lda #$3f
sta CGADSUB ; $2131

;set the fixed color to red 50%
lda #$2f ;red at 50%
sta COLDATA ; $2132

We could have also set half mode.

7- see just the sub screen. We did this by setting the “always clip main screen to black” bits in 2130, and then adding the sub screen to the now completely black main screen.

lda #$c2 ;= clip main always to black
sta CGWSEL ; $2130

;adding, not half, affect all layers 
lda #$3f
sta CGADSUB ; $2131

Other examples

Color math only affects some sprites. Only sprites that use palettes 4-7 are affected by color math. That is why Mario (and the little ghosts) are solid.

Super Mario World (USA)_006

Windowing can affect where the color math applies. With HDMA adjusting the window, you can make some cool effects.

Contra III - The Alien Wars (USA)_000

Metroid1

Super Mario World (USA)_008

Tint the whole screen (adding a fixed color)… actually, upon further investigation, this is subtracting, which makes the screen slightly darker than the original. Also, the COLOR MATH is not in fixed color mode, it’s in subscreen mode, but NOTHING is enabled on the subscreen, so the subscreen is filled with the backdrop color (which for the sub screen is the fixed color). I guess that works too.

Legend of Zelda, The - A Link to the Past (USA)_000

Smooth Transparencies (add and halving). This is the most common transparency effect on the SNES.

Legend of Zelda, The - A Link to the Past (USA)_001

Sparkster, the water.

Sparkster (USA)_000

And creating shadows (subtracting) Mortal Kombat II. It’s hard to tell, but their shadows are created by color math subtraction. You could also give the appearance of clouds moving overhead by subtracting a cloud shape and having it scroll.

Mortal Kombat II (USA)_000

Links.

http://www.romhacking.net/documents/428/

https://wiki.superfamicom.org/transparency

SNES main page

HDMA Examples

SNES programming tutorial. Example 11.

https://github.com/nesdoug/SNES_11

HDMA is a way to write to PPU registers while the screen is drawing. You can change values at specific scanlines, to create unique effects.

The H is for H-Blank. Remember before, when we talked about V-blank (vertical blank), where the PPU isn’t doing anything for a short while after drawing each screen? Well, it also pauses a VERY SHORT time after drawing each horizontal line. Just long enough for the 8 HDMA channels to quickly change a register or send data, before the screen goes to write the next line.

They work in order, 1,2,3,4,5,6,7,8. They can all write 1 thing (1,2 or 4 bytes) per line. Or you can set them to wait a specific number of lines before changing a value.

HDMA uses the same registers as the DMA registers, and you shouldn’t use both at the same time. You should write zero to HDMA enable ($420c) before performing a DMA. Because the oldest revision of the SNES has a bug where it can crash if they both happen at the same time. Or, just make sure they aren’t used at the same time.

Here’s an interesting video on DMA and HDMA.

Examples

Here’s some things you can do with HDMA.

Changing the BG color with HDMA, to create a color gradient.

Batman

Changing the Window (there are 2 windows) with HDMA, to block off portions of the screen. The windows have left and right registers, which need to be written every scanline to create these shapes.

Super Mario World (USA)_000

Super Mario World (USA)_003

Mode 7 parameters. (lots of registers to change hundreds of times a frame).

Fzero

.

How HDMA works

https://wiki.superfamicom.org/registers

(for this link, scroll down to 43×0)

When you look at the HDMA registers, it looks like you need to put MORE values, but you don’t need to write to them all. Let’s go over each register.

For channel 0.

4300 – yes
4301 – yes
4302-4 – yes
4305-6 – no
4307 – yes, only if using Indirect Mode.
4308-a – no

4300 - Control Register. da---ttt

D=direction, probably want 0, from CPU to PPU.

A=HDMA mode. 0 for direct, 1 for indirect. More on this later.

TTT = transfer mode, which will vary by which register we use.

000 => 1 register write once (1 byte: p)
001 => 2 registers write once (2 bytes: p, p+1)
010 => 1 register write twice (2 bytes: p, p)
011 => 2 registers write twice each (4 bytes: p, p, p+1, p+1)
100 => 4 registers write once (4 bytes: p, p+1, p+2, p+3)
101 => 2 registers write twice alternate (4 bytes: p, p+1, p, p+1)
110 => 1 register write twice (2 bytes: p, p)
111 => 2 registers write twice each (4 bytes: p, p, p+1, p+1)

4301 – the PPU destination register. 21xx. So if you write $22 here, the HDMA will write to the $2122, the CGRAM Data Register.

4302-4 – the address of the HDMA table. 2=low, 3=middle, 4=upper/bank #.

4307 – if using Indirect HDMA, this is the bank # of the Indirect Addresses.

Anything marked “no”, don’t touch them. They are used by the HDMA hardware.

Then you write the channel (bitfield, each bit represents a channel, 1 for ch0, 2 for ch1, 4 for ch2, 8 for ch3, etc) to $420c, the HDMA enable register. Presumably, you would do this step during v-blank or during forced blank. I think it will misbehave for 1 frame if you turn HDMA on mid-frame.

.

OK, so we are pointing the HDMA registers to a table (byte array). For a direct mode, the table would be a scanline count, then (depending on the TTT mode) 1,2, or 4 bytes to be written. Then another scanline count, then more bytes. Scanline count, bytes. Scanline count, bytes. Etc, until it sees a zero in the scanline count slot. From my own examples…

H_TABLE6:
.byte 32, $0f
.byte 32, $1f
.byte 32, $2f
.byte 32, $4f
.byte 32, $6f
.byte 32, $9f
.byte 32, $ff
.byte 0 ;end

That reads 32 lines, value $0f. 32 lines, value $1f. 32 lines… etc down to the terminating 0. One interesting thing is that the $0f is written immediately at the very top of the screen. THEN it waits 32 lines.

Here’s another example, when the transfer mode is “1 register write twice”.

H_TABLE2:
.byte 10, 0, 0
.byte 10, 1, 0
.byte 10, 2, 0
.byte 10, 3, 0
.byte 10, 4, 0
.byte 10, 5, 0
.byte 10, 6, 0
.byte 10, 7, 0
.byte 10, 8, 0
.byte 10, 9, 0
...etc...
.byte 0 ;end

10 is the scanline count. Then 2 bytes to write. Then 10 scanline count. Then 2 bytes. Etc.

Indirect Mode

43×0 register, we can set it to Indirect. Only with indirect do you need to write to 43×7, the bank of the indirect address. The table will always be sets of 3s. First the scanline count, then an indirect address (ie. pointer) to where our data is. I wrote the HDMA table like this…

H_TABLE5:
.byte 8
.addr $1000
.byte 8
.addr $1002
.byte 8
.addr $1004
.byte 8
.addr $1006
.byte 8
.addr $1008
...etc...
.byte 0 ;end

The .addr directive outputs a 16 bit value, low byte then high byte. I think you could have also used the .word directive. So, 8 is the scanline count, then an indirect address. Our bank byte is $7e, so the first one points to WRAM $7e1000. The second one points to $7e1002. Etc.

One of the advantages of the indirect system is that you can have a repeated pattern that changes.

I had copied the Indirect Table to $7e1000. It looks like this.

IND_TABLE:
.byte 0, 0
.byte 3, 0
.byte 6, 0
.byte 7, 0
.byte 8, 0
.byte 7, 0
.byte 6, 0
.byte 3, 0
.byte 0, 0
.byte $fd, 0
.byte $fa, 0
.byte $f9, 0
.byte $f8, 0
.byte $f9, 0
.byte $fa, 0
.byte $fd, 0
.byte 0,0

All of these are values to be written with HDMA to a PPU register. In this case, a horizontal scroll register, which is write twice (low then high bytes).

This is example 3. I am also shuffling these values every 4 frames, which causes the movement of the the sine wave.

Example Code

No effect. HDMA is turned off by writing zero to $420c.

NO_EFFECT

.

.

Example 1. Changing the BG color.

I’m actually setting up 2 separate HDMA transfers. First to set the CG address to zero. Second to write 2 bytes to change the #0 color. You have to rewrite the address each time, because it auto-increments when writing a color.

stz $4300 ;1 register, write once
lda #$21 ; CGRAM Address
sta $4301 ;destination
ldx #.loword(H_TABLE1)
stx $4302 ;address
lda #^H_TABLE1
sta $4304 ;address

lda #2
sta $4310 ;1 register, write twice
lda #$22 ; CGRAM DATA
sta $4311 ;destination
ldx #.loword(H_TABLE2)
stx $4312 ;address
lda #^H_TABLE2
sta $4314 ;address

lda #3 ;channels 1 and 2
sta HDMAEN ;$420c

And we have 2 HDMA tables (see the example code). Each time we are waiting 10 scanlines between changes. Each time, adding a little more red.

EFFECT1

.

.

Example 2. Changing window 1 left and right positions.

A window punches a hole in one or more layer. There are 2 windows, but we only need 1 for this example. The only parameters you can set are left and right (and inverse, and combinations with the other window). But with HDMA, you can adjust the window parameters as the screen draws, and draw a shape. Circle shapes are very popular.

If the left position is > than the right position, the window will not appear. That is what we are doing for the top and the bottom of the screen. You also have to tell it which layers are affected with the $212e (window for main screen) and with the $2123-5 registers.

I’m using 2 HDMA channels, and writing 1 byte to 1 register. To 2126 and 2127.

lda #1 ;windows active on layer 1 on main screen
sta TMW ;$212e
lda #2 ;window 1 active on layer 1
sta W12SEL ;$2123

stz $4300 ;1 register, write once
lda #$26 ;2126 WH0
sta $4301 ;destination
ldx #.loword(H_TABLE3)
stx $4302 ;address
lda #^H_TABLE3
sta $4304 ;address


stz $4310 ;1 register, write once
lda #$27 ;2127 WH1
sta $4311 ;destination
ldx #.loword(H_TABLE4)
stx $4312 ;address
lda #^H_TABLE4
sta $4314 ;address

lda #3 ;channels 1 and 2
sta HDMAEN ;$420c

(This could have been done with 1 channel with a 2 register transfer mode).

Note… If the scanline count is >128 (not including 128), it signals a series of single scanline writes. You can omit the scanline count for a number of lines (ie. subtract 128 from the scanline count number).

.byte 60, $ff ;first we wait 60 scanlines
.byte $c0 ;192-128 = 64 lines of single entries
.byte $7f ;1st write value
.byte $7e ;2nd write value
.byte $7d ;3rd write value
.byte $7c ;4th write value
...etc...64 lines.
.byte 0 ;end

It waits 1 scanline between each write.

Each line, I am moving the left position and right position further apart, and then closer together, which forms a diamond shape.

EFFECT2

.

.

Example 3. Changing BG1 horizontal scrolling position.

This was already discussed above, in the Indirect Mode section. We are using a sine wave pattern to create a wave in the picture, writing twice to 1 register, the horizontal scroll of BG1.

I wanted to include at least 1 example of indirect mode. I copied the value table to the RAM, so I was able to change the values to make the pattern move. See the Shuffle_f3 function in the HMDA3.asm file. The table of Indirect Addresses points to the RAM where our actual values are stored.

This example would have been even nicer if we wrote new values every scanline. Currently, we are only changing values every 8 scanlines (to make the table simpler / smaller). Maybe even every 2 scanlines would have been enough.

lda #$42 ;indirect mode = the 0100 0000 bit ($40)
sta $4300 ;1 register, write twice
lda #$0d ;BG1HOFS horizontal scroll bg1
sta $4301 ;destination
ldx #.loword(H_TABLE5)
stx $4302 ;address
lda #^H_TABLE5
sta $4304 ;address
lda #$7e
sta $4307 ;indirect address bank

lda #1 ;channel 1
sta HDMAEN ;$420c

EFFECT3

.

.

Example 4. Changing the Mosaic filter.

This is the simplest example. A single write to a single register. I haven’t discussed the Mosaic filter before, $2106 . The upper nibble is the mosaic amount (0 = normal, 1 = 2×2, etc up to $f = 16×16), and the lower nibble says which layers are affected. Sprites are never affected.

stz $4300 ;1 register, write once
lda #$06 ;mosaic
sta $4301 ;destination
ldx #.loword(H_TABLE6)
stx $4302 ;address
lda #^H_TABLE6
sta $4304 ;address

lda #1 ;channel 1
sta HDMAEN ;$420c

It waits 32 lines before increasing the mosaic value. There are bigger squares at the bottom. You probably wouldn’t use this exact HDMA effect in a game, but it is just an example of what is possible. You can change so many settings, even the BG mode $2105, which layers are active, change the location of a tilemap or tileset. I think you can even write new data to the VRAM (a little bit at a time).

EFFECT4

.

.

Example 5. Windows with Color Math.

(the next example page will talk more about Color Math registers)

Some of the coolest effects were done with this combination. We are adding a fixed color to tint the picture, and using the windows to shape the color box. If we made the HDMA table for the window more elaborate, it could be a circle, or any simple shape. We could change the table and make it grow or squish. The windows work the same as before, but we just needed to change the settings so that the window affects color math and nothing else.

lda #$20 ;Window 1 active for color, not inverted
sta $2125 ;WOBJSEL - window mask for obj and color
lda #$10 ;prevent outside color window, clip never, 
;add fixed color (not subscreen)
sta $2130 ;CGWSEL - color addition select
lda #$3f ;color math on all things, add, not half
sta $2131 ;CGADSUB - color math designation

We are setting the color math to use only the fixed color, and not the subscreen. So, we need to set the color of the fixed color register $2132.

lda #$8f ;blue at 50%
sta $2132 ;COLDATA set the fixed color

This register (fixed color) is a bit complicated, I will explain it more on the next page, but the upper bits are the color selectors, and the lower 5 bits are for value. 8 is blue and $0f is for 15 (out of 31), or just above 50%. With color math addition (with the fixed color) turned on, it would color the entire screen blue. But, our HMDA window blocks out the effect for the top, sides, and bottom of the screen, leaving a blue box in the size/shape of the painting.

EFFECT5

.

.

Here’s a YouTube video of these examples (video is old, doesn’t include the 5th effect).

.

IMPORTANT NOTE – the top most scanline is never drawn. It’s blank. At the end of that 0th scanline, it sends the first value of the HDMA tables, and then it sets the scanline count. The HDMA table looks like the scanline count is before the value to send, but it does it the opposite… it sends the value and then it waits. That means that the first value effects the very top of the visible screen.

The HDMA table resets at the end of v-blank. Automatically. Even if it didn’t complete the table because it was longer than the number of scanlines on screen, it jumps back to the 0th item on the HDMA table, and continues the next frame from the top of the table.

Also, a count of zero terminates the HMDA for the rest of the frame, but it still resets again, at the top of the next frame, and keeps going. If there is a value listed after the zero, it isn’t sent.

.

Links.

https://wiki.superfamicom.org/dma-and-hdma

https://wiki.superfamicom.org/grog’s-guide-to-dma-and-hdma-on-the-snes

https://sneslab.net/wiki/HDMA

.

SNES main page

SNES Music

This is a lesson on using SNESGSS to make SNES music.

Before you read this… I have made improvements to SNESGSS, the newest version has a Q at the end, and it has a different music.asm file. See this link for more info.

SNES Music 2

.

NOTE – If SNESGSS editor stops playing the music correctly (silences some or all the notes), the issue is that there is too many things loaded to fit in the ARAM. This could happen unexpectedly, because adding to the song editor might overflow the available RAM, without warning. Click over to INFO and it will probably say there is no memory left. You will have to shorten the song or remove unused samples.

.

SNES programming tutorial. Example 10.

https://github.com/nesdoug/SNES_10

Here’s an interesting video on the SPC700 and SNES audio.

Today, we are going to talk about SNES music. The APU (Sony SPC700) is a different chip entirely, and has its own 64k of RAM. At the beginning of our program, we need to load the APU with our SPC file. It is an audio program that runs automatically.

The APU is connected to an 8 channel DSP (digital sound processor). The song will direct the DSP to play different sound samples at different rates to make tones. If you are familiar with MIDI, it is similar. The samples can be looped or not. The samples are compressed into a native compression called BRR (bit rate reduction).

BRR samples are very large, and you will probably be only able to fit 10-15 samples. Each will have to be edited (perhaps with Audacity) to less than a second each, and at a reduced sample rate. We are going to work with SNESGSS (written by Shiru).

I originally found it here (you might want to look here to get the sample intruments), but don’t use this .exe because it has a bug.

https://github.com/nathancassano/snesgss

I patched out the bug, and also added echo functions. You can get that version here…

https://github.com/nesdoug/SNES_00/tree/master/MUSIC

and also grab the music.asm file. You will need it.

.

SNESGSS prefers to have 16-bit MONO WAV samples at sample rate 32000 or 16000. I have tried 8000, but usually the sound quality is too bad at 8000. 8000 might be ok for a bass sample.

There seems to be a bug in Audacity, when you resample to another rate (Tracks/Resample), it doesn’t actually change the project sample rate, nor will it save the project at that sample rate. What you need to do is Open the WAV file in Audacity, Select All and COPY it. Then Open a NEW PROJECT, and change the project’s sample rate (at the bottom left) to 16000. Then PASTE it. Now it will save at the correct sample rate.

Recording at the desired rate has no problems. 16000 seems to be a nice sweet spot on audio quality and file size.

SNESGSS also suggests tuning the samples to B +21 cents. I did not. I left all my samples at C. They are not in tune with the samples provided with SNESGSS, which I did not use. I think those are tuned to B +21. The reason behind the unusual tuning is to make it easier to make looped samples without clicks. BRR format is forced to be blocks of 16 samples, so a multiple of 16 samples (such as 256 samples per wave cycle) at a sample rate of 32000 (or 16000) samples per second works out to B +21.

But, feel free to use whatever tuning is easiest for you.

SNESGSS3

Hit the WAV button near the middle of the screen to load your samples. Setting the envelopes similar to this sounds good to me (15, 1, 7, 16). If you managed to loop the sample perfectly, you may prefer to leave the last envelope setting (SR) at 0, for a tone that can continue infinitely.

You can press the 2x or 4x buttons if you run out of room for files, to downsample by half.

To loop, click the “On” button and type in the loop start (FROM) and loop end (TO) numbers. Note – BRR sample length needs to be a multiple of 16, and the loop start and end points need to be a multiple of 16. SNESGSS doesn’t tell you that… you will probably have to use a calculator to calculate a multiple of 16 and type exact numbers in.

I wouldn’t mess with the volume or EQ settings. That is something you should have done in Audacity while editing. Just keep in mind that the SNES tends to weaken the upper range and make bright sounds feel dull. You might have to do a treble boost for the lead instruments.

This tracker will convert our samples to BRR, but not until your final export. Unfortunately, you can’t import BRR samples to it from other sources. And you can’t export BRR samples, although you could export an SPC and use spc_decoder.exe from BRRtools to extract the BRR samples from the SPC.

SNESGSS4

Here you can check the size of all the files. Obliviously, you can’t have a bigger SPC file than 64k, the size of the APU RAM.

I should note, that we only load 1 song in the APU RAM at a time. Staring a new song will load a new song (over the previous song), so that only one song is loaded at any time. That should give you a little flexibility on overall size.

SNESGSS

Here is the main editor. You type Z-M keys for lower octave, Q-P keys for upper octave. You can change the octave by pressing the octave button. So, this is a standard tracker, it goes downward as the song plays.

You can toggle channels on and off by clicking on the word “channel 1”, etc. You can divide things into sections. Press the spacebar to mark the end of a section. Then you can repeat the previous section with an R00 command.

The order of things is Note, Instrument, Volume 0-99, and Special effects. The SP column is for song speed (smaller is faster). You can scroll up and down with PgUp and PgDn keys, and also Home and End goes to the next section.

CTRL+End marks the end of the song, and CTRL+Home marks the loop back point.

You can import Famitracker and MIDI files (notes only), but I haven’t tried.

SNESGSS2

On this page, you can mark a song as a “Sound effect”.

Once the songs are done, you File/Export. And that will produce several files.

spc700.bin is our main SPC file. It holds the program and the samples and the sound effects data.

music_1.bin (one file per song) is the song data.

sounds.asm and sounds.h we don’t need. Don’t include them. This was for a different assembler / C compiler. You might want to look at it to find the value of each sound effect.

.define SFX_DING 0

…tells us that the DING sound effect is called with the value zero.

.

https://github.com/nesdoug/SNES_10/tree/master/MUSIC

I changed the asm code in mid 2021, make sure you have the latest music.asm so it can handle SPC files larger than 32k. Here’s how we can include the file accross 2 different LoROM banks.

(ca65/ld65 linker specific commands).

If the SPC file is larger than 32k, you can add arguments to the .incbin command to split the file.

.incbin “MUSIC/spc700.bin”, 0, 32768

and

.incbin “MUSIC/spc700.bin”, 32768

The top one says copy 32768 bytes starting at 0. The second one (with 1 number) says to include from 32768 to the end of the file. The newest version of SPC_Init can copy the entire thing to the SPC RAM (even across multiple banks).

.

CODE

Let’s go over the music.asm file, which you should have grabbed from one of my example folders. I had to modify the original code to work with ca65.

SPC_Init – should be called at the start of the game, with interrupts off (NMI, IRQ, controllers). With AXY16 you load A with the address of the of SPC file (spc700.bin) and X with the bank of the SPC file, and JSL to SPC_Init.

By the way, running this function takes a long time. It could take 2 seconds or more.

SPC_Load_Data is an internal function, for loading data to the APU RAM.

SPC_Play_Song loads a song (data) to the APU RAM and then starts playing it. This also should be done with interrupts off. Note that this system only loads one song at a time to the APU RAM. If you have a song in and then load another song, it overwrites the first song.

With AXY16 load A with the address of the song data (like music_1.bin) and load X with the bank of the song, then JSL to SPC_Play_Song. Once it’s done, it will begin playing the song automatically.

SPC_Command_ASM is an internal function. It’s what sends signals to the APU.

SPC_Stereo is to set mono (default) or stereo audio. Load A (8 or 16) with 0 for mono, 1 for stereo. Audio channels can be panned left or right.

SPC_Global_Volume  is to set the max volume, 0-127. It can also be used to fade in or fade out. One of the variables is called speed, and it is the step value, to go from previous volume to the new volume. 255 is the default speed, which is instant change (any value >= 127 would be instant). Speed of 7 seems nice for a fade, and will take 2 seconds to transition. Don’t give it a speed of zero, the volume won’t change.

AXY8 or AXY16, load A with the speed of volume change (1-255), and load X with the new volume (0-127), then jJSL SPC_Global_Volume.

The SNES has a master volume variable, which affects all channels. That’s what this sets, and doesn’t affect individual channel volumes.

SPC_Channel_Volume sets the max volume for an individual audio channel. AXY8 or AXY16, load A with the channels and load X with the volume (0-127) and the JSL to SPC_Channel_Volume. I’m not sure what circumstances I would use this. Maybe to silence or dim a lead instrument, for a change in dramatic tone.

Note, the channel here is a bitfield, with each bit representing a channel.

0000 0001 = channel 1
0000 0010 = channel 2
0000 0100 = channel 3
0000 1000 = channel 4
0001 0000 = channel 5
0010 0000 = channel 6
0100 0000 = channel 7
1000 0000 = channel 8

For example, LDA #$42 (0100 0010) would effect channels 2 and 7.

Music_Stop stops the song. JSL here.

Music_Pause will pause and unpause the song (and not effect the sound effects that are playing). Load A (8 or 16) with 1 for pause and 0 for unpause, then JSL here.

Sound_Stop_All stops all sounds, song and sound effects. JSL here.

SFX_Play_Center plays a sound effect, pan center. With AXY8 or AXY16, load A with the # of the sound effect, load X with the max volume of the sound effect (0-127), and load Y with the channel (0-7), the sound effect should play. Channel needs to be higher than the max channel for the song playing. Therefore, you must reserve some empty channels in the song, if you want sound effects to play with it.

SFX_Play_Left, is the same, but pan left.

SFX_Play_Right, is the same, but pan right.

SFX_Play is an internal function that the 3 above functions call.

Streaming has been removed. See the 13th SNES example page for echo functions.

SNES Music 2

EXAMPLE CODE

;copy the music code and samples to the Audio RAM 
AXY16
lda #.loword(music_code)
ldx #^music_code
jsl SPC_Init

;turn on stereo sound
AXY16
lda #$0001
jsl SPC_Stereo

…and at the bottom we have

.segment "RODATA6"
music_code:
.incbin "MUSIC/spc700.bin"
music_code_end:

.

Then I load the song, and start it playing (before I turn on NMI interrupts).

AXY16
lda #.loword(song1)
ldx #^song1
jsl SPC_Play_Song

By the way “.loword()” gets a 16 bit value from a 24 bit label. ^ gets the bank of a label.

.

Now I just need to set up a trigger for the sound effect. We already have that yellow block triggering the screen to go dark, so I just snuck in a little more code there. I didn’t want it re-starting the same sound effect over and over and over each frame, so I added a variable to remember the LAST FRAME, if we were over the yellow block, and skip a trigger in that case.

cmp bright_var2 ;compare to last frame
beq Past_Yellow ;skip if last frame is true

AXY8
lda #0 ;= ding
ldx #127 ;= volume
ldy #6 ; = channel
jsl SFX_Play_Center

Our song plays from channels 1-4 (ie. 0-3), and our sound effect uses 2 channels, so we could have set this to 4,5, or 6. This function is zero based index, ie. values 0-7. So 6 means it will play on channels 7 and 8. Sorry for flip flopping between zero based and one based numbers. Hope this isn’t too confusing.

However, if we loaded X with 0,1,2, or 3. It would not play. If we loaded X with 7, only the first channel of the sound effect would play.

Here’s a picture of the demo again. It looks the same as the previous example.

Example09

Here’s a Youtube video, if you want to hear it.

https://wiki.superfamicom.org/spc700-reference

https://wiki.superfamicom.org/bit-rate-reduction-(brr)

.

There are other programs for getting music onto a Super Nintendo.

You could use SNESMOD with OpenMPT. I still need to research this more before I can recommend it. I have heard that a version of SNESMOD by AugustusBlackheart and KungFuFurby is good. Sorry I can’t be more informative here.

Another program, BRRTools, can convert audio files to BRR. I haven’t used it, but the SNESGSS tool uses the same code. It says you can turn BRR samples into WAV and WAV into BRR. This could be a way to use existing BRR samples in our SNESGSS projects (by using this tool to convert them into WAV files first).

https://www.smwcentral.net/?p=section&a=details&id=17670

SNES main page

BG Collision

SNES programming tutorial. Example 9.

https://github.com/nesdoug/SNES_09

This time we are going to make a collision map, and make a sprite collide with the background. The actual graphics are not that important.

I took some pictures of some blocks (and a sketch of a cube with eyes) and resized them in GIMP to 16×16 sized blocks. Then I imported everything into my M1TE tool. From there I made a 3 metatiles, green, red, and yellow. Red will be the collision blocks (1 = wall). This is what it looks like in M1TE.

Blocks

You can save the map screen as a image (File/Export Image). That was then loaded into Tiled Map Editor as the tileset.

Tiled Map Editor is a free game design tool. The entire purpose of this is to export a .csv file of our collision map… which is that collision array I was talking about.

Tiled

The CSV file exported from Tiled.

csv

I added some .byte directives so it can be loaded as a byte array into the asm code. 0 is blank, 1 is red wall, 2 is the yellow square. Now we can .include it into our ASM file.

byte

But how did I draw the map? Back when I was programming NES games, I had a whole metatile system worked out. I am doing a similar thing here, but I manually typed out each tile needed to construct a block. In metatiles.asm.

Metatiles:
;tile 0
.byte $02, TILE_PAL_0
.byte $03, TILE_PAL_0
.byte $12, TILE_PAL_0
.byte $13, TILE_PAL_0
;tile 1
.byte $04, TILE_PAL_1
.byte $05, TILE_PAL_1
.byte $14, TILE_PAL_1
.byte $15, TILE_PAL_1
;tile 2
.byte $02, TILE_PAL_5
.byte $03, TILE_PAL_5
.byte $12, TILE_PAL_5
.byte $13, TILE_PAL_5

And in main, it does a loop, converting each byte of the collision map (HIT_MAP) into 4 screen tiles. And copying them one by one to the VRAM on the map. The key thing is that we again use the HIT_MAP to stop movements of the sprite.

Our code calculates where our guy is on the map, and if we are over a 1, cancel the movement. That makes him collide with the red walls.

Example09

How does it do that? Let’s go over the code. So our byte array has each block 16×16. We need to divide x and y pixel coordinates by 16 (the same as shift right 4x). But we also need to multiply the y by 16 to get to the correct row in our array, which cancels out the divide 16. So the algorithm is (Y & 0xf0) + (X >> 4). If we look at that index in the byte array, it will tell us if a point is in a wall or not. This is the code, with X and Y registers holding the X and Y coordinates…

tya
and #$f0
sta temp1
txa
lsr a
lsr a
lsr a
lsr a
ora temp1
tax
lda HIT_MAP, x
rts

I handled each direction separately. First do the X move, then see if any of the corners of our sprite are inside a wall. If yes, revert to previous X position. Then do the Y move, see if any of the corners of our sprite are inside a wall. If yes, revert to previous Y position.

This code would need to be a little more complex if we move more than 1 pixel per frame. If we are moving 2-3 pixels per frame, and the distance to the wall is 1 pixel, we should allow 1 pixel movement toward the wall… and not be stuck 1 pixel away from the wall. So, this code will need to be improved.

.

Touching the yellow square will darken the screen. We are just looking if 1 point (the middle of our guy) is over a 2 in the collision map, and changing the screen brightness variable. Remember that the $2100 register is the screen brightness. I am writing to it every frame, during v-blank. Full brightness is $0f. Half brightness is $07.

Example09b

If we were scrolling in a larger world, the collision map would have to be the size of the world. You could have it compressed, and decompress it to the WRAM. You would have to keep track of X and Y movements with 2 byte variables. One thing I would not recommend is trying to read from the VRAM to see what kind of tile you are standing over. The visuals of the level should probably be separate from the collision map.

One more thing. It wouldn’t be too much trouble to turn this simple example into a platformer. You would just need to add gravity, which is adding a little bit to the Y speed every frame, and then cancelling that if your feet touch the floor. Jumping would be a sudden negative Y speed.

This is a really cool page that explains collision maps in more detail.

http://higherorderfun.com/blog/2012/05/20/the-guide-to-implementing-2d-platformers/

.

TODO – I need to write some code to automate generating metatile tables, or come up with some other kind of BG object system. Especially for a larger world… hand editing data tables will get very tedious.

.

SNES main page

BG Scrolling

SNES programming tutorial. Example 8.

https://github.com/nesdoug/SNES_08

So, this isn’t so complicated. I’m using the Example 4 backgrounds, and scrolling them with the controllers. I’m not going to go over the process of making backgrounds again. We will just talk about the scrolling code.

If you press A, B, X, or Y, you will toggle which background is selected. Visible by the sprite in the corner (1,2,3). This is the map_selected variable, which has a value 0-2.

The up/down/left/right functions will do a case switch style check on the map_selected variable. Normally, you would do CMP #1, CMP #2, CMP #3, etc. But you don’t actually need to do a CMP #0. This is something I see new 6502/65816 programmers do. The previous line “lda map_selected” already sets the z flag if map_selected is zero. Lot’s of instructions set the z (zero) and n (negative) flags. LDA, LDX, LDY, TAX, TXA, TXY, PLA, PLX, PLY, etc. If a register is loaded with zero, the z flag is set and BEQ will work.

Right_Handler:
.a16
.i16
  php
  A8
  lda map_selected
  bne @1or2
@0: ;BG1
  dec bg1_x
  bra @end
@1or2:
  cmp #1
  bne @2
@1: ;BG2
  dec bg2_x
  bra @end
@2: ;BG3
  dec bg3_x
@end: 
  plp
  rts

Let’s follow this for each value. If map_selected is zero, the BNE won’t branch, it goes to the @0, dec bg1_x and then exits. If map_selected is 1, the first BNE will branch to @1or2. A is still loaded with map_selected, we compare it to #1, the BNE won’t branch, so we do @1, dec bg2_x and exit. If map_selected is 2, the first BNE branches to @1or2, cmp #1 is false, so the bne @2 branches us to th @2 dec bg3_x line.

Notice, moving the map right means decreasing the horizontal scroll variable. Moving it left means increasing it. Likewise, moving a screen down is decreasing the vertical scroll, and moving it up is increasing it.

Scrolling registers are write twice (8 bit) each. Always write twice. You can actually write to these registers any time, but we want to do it during v-blank so we don’t get any shearing of the background in the middle for 1 frame. Near the top of the game loop, we have jsr set_scroll. Let’s look at set_scroll.

lda bg1_x
sta BG1HOFS ;$210d 
stz BG1HOFS 
lda bg1_y
sta BG1VOFS ;$210e
stz BG1VOFS

lda bg2_x
sta BG2HOFS ;$210f
stz BG2HOFS 
lda bg2_y
sta BG2VOFS ;$2110
stz BG2VOFS 

lda bg3_x
sta BG3HOFS ;$2111
stz BG3HOFS 
lda bg3_y
sta BG3VOFS ;$2112
stz BG3VOFS

bg1_x is a 1 byte variable, because our maps are set to 1 screen only (32×32 map and 8×8 tiles). If you made the tilemap bigger (or made the tile size larger), you would need 2 bytes for each scroll variable. With 64×32 our x needs 9 bits. If you also increase tilesize to 16×16 then we need 10 bits.

You can move each layer independently. Usually, you would have BG1 be the foreground and BG2 be the background and BG3 be either the far background or the HUD (scoreboard) always fixed in one place in the front.

Example08

SNES main page

Pong. Sprite collisions.

SNES programming tutorial. Example 7.

https://github.com/nesdoug/SNES_07

I made a simple Pong demo to show sprite collisions.

Well… I was trying to keep it simple, but I decided to use some of the more complicated code I have previously written. Copied to the library.asm file from some of the EasySNES files. OAM_Spr(copies one sprite to the buffer), OAM_Meta_Spr (copies multiple sprites to the buffer), oam_clear (clears the buffer), Map_Offset (gets an address from a specific x/y coordinate in a map). I did change the return from these functions from RTL to RTS, because all of our code is in the same bank.

I will discuss these functions further below.

Check_Collision is new. I will discuss that a bit later.

Let’s talk about the process of making this. I made a circle gradient in GIMP for the background, and converted to indexed 4 color (with dithering). Sized 256×192 (it won’t cover the entire screen).

Grad

Saved as a PNG. Imported to M1TE.

BG1

Then I drew some numbers for BG3, and filled a little on the top and bottom.

BG3

Clicked the priority checkbox for this map.

Priority

Saved all the maps and tiles and palette. Pretty much the same as previous examples of loading a background.

Now I opened SPEZ (my sprite editor) and drew some simple box shapes for the ball and paddle. Saved them as metasprites.asm and saved their tiles (chr) and palette.

SPEZ7

Everything is .incbin -ed in the main.asm file. We are loading everything just like the previous examples, with DMAs to the VRAM. One difference is that I wrote a macro for DMAs to the VRAM. This made the code a little easier to read and write. Let’s look at an example…

DMA_VRAM $700, Map1, $6000

This is the DMA_VRAM macro definition…

.macro DMA_VRAM length, src_addr, dst_addr
;dst is address in the VRAM
;a should be 8 bit, xy should be 16 bit
ldx #dst_addr
stx $2116 ; vram address

lda #1
sta $4300 ; transfer mode, 2 registers 1 write
; $2118 and $2119 are a pair Low/High
lda #$18 ; $2118
sta $4301 ; destination, vram data
ldx #.loword(src_addr)
stx $4302 ; source
lda #^src_addr
sta $4304 ; bank
ldx #length
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0
.endmacro

So where it says length, the macro will insert the $700 bytes (not $800, because the screen is only 224 pixels high, so I’m not filling the entire 256 pixel high map). Where it says src_addr, it replaces it with Map1. Where it says dst_addr, it replaces it with VRAM address $6000. All that code could be written in one line.

DMA_VRAM $700, Map1, $6000

Doesn’t this look nicer though? Simple. Elegant. Easy to read. Macros are your friends.

Everything between InfiniteLoop and, somewhere below that, jmp InfiniteLoop is the game loop. Every frame we wait till v-blank. Copy the OAM_BUFFER to the OAM. Print the score to the top of the screen. Read the controllers. Move the paddles if up or down are pressed.

  lda pad1
  and #KEY_UP
  beq @not_up

@up:
  A8
  lda paddle1_y
  cmp #$20 ;max up
  beq @not_up ;too far up
  bcc @not_up

  dec paddle1_y
  dec paddle1_y

  dec paddle2_y
  dec paddle2_y

@not_up:

This code is moving both paddles, because this is just example code. You could modify it, so that controller2 moves the paddle on the right. Copy this whole thing, and replace pad1 with pad2, and only move paddle2. Also change the label names, so you don’t have duplicates.

We are only moving the ball while it is “active”. Press START to make it active, and choose a random direction to go (based on a frame counter).

lda #1
sta ball_active

ball_x_speed and ball_y_speed are the directions of the ball. Either 1 or -1 ($ff). Every frame we are adding the speed variable to the position variable. If speed is 1, we add 1 and it moves it to the right 1 pixel.

If the ball is active, it moves up/down until it reaches the ceiling or floor.

;bounce off ceilings
cmp #$20
bcs @above20

lda #1
sta ball_y_speed

;bounce off floor
lda ball_y
cmp #$c7
bcc @ball_done

lda #$ff ; -1
sta ball_y_speed

Sprite Collisions

It moves left/right until it reaches the end of the room. But we want it to bounce off the paddles, so we need to check collisions with hitboxes. I wrote this a long time ago (modified slightly). It’s the Check_Collision function in the library.asm file.

So we need the dimensions and location of the 4 sides of both boxes. That’s 8 numbers, that I copy to these variables…
obj1x, obj1w, obj1y, obj1h
obj2x, obj2w, obj2y, obj2h
x = left side of sprite object
w = width (minus 1), added to x to get the right side
y = top side of the sprite object
h = height (minus 1) , added to y to get the bottom side

I defined some of these with constants at the top of main.asm

BALL_SIZE = 7
PADDLE_W = 7
PADDLE_H = 47

Of course, the x and y values are changing. Those are defined as variables in the zero page (direct page).

paddle1_x, paddle1_y
paddle2_x, paddle2_y,
ball_x, ball_y

I copy these to the obj1 obj2 stuff, and then call Check_Collision, which sets the “collision” variable to 0 or 1. If collision is true, we bounce the ball. This collision check is for 8 bit positions only, and assumes that no object goes off the screen at all. The code won’t work right at the very edges of the screen.

Here’s what the collision code is doing, under the hood, in some optimized ASM.

if((obj1_right >= obj2_left) &&

(obj2_right >= obj1_left) &&

(obj1_bottom >= obj2_top) &&

(obj2_bottom >= obj1_top)) return 1;

else return 0;

.

Placing Sprites

Every frame I DMA the OAM buffer. Then I clear it with Clear_OAM and then rebuild it by writing to either OAM_Spr or OAM_Meta_Spr. The metasprites were made with SPEZ, and exported to the Sprites/metasprites.asm file. It’s a list all the sprites needed to make a metasprite.

The OAM_Meta_Spr function works like this.

Copy the x position to spr_x, the y position to spr_y, and then load A and X with the address of the metasprite data, and call our function. Remember ^ is for bank number. Like this.

lda paddle1_x
sta spr_x
lda paddle1_y
sta spr_y
A16
lda #.loword(Meta_00) ;left paddle
ldx #^Meta_00
jsr OAM_Meta_Spr

And this will automatically put all the data in the OAM_BUFFER at the correct x and y positions. It also adjusts the high table bit shifting and keeps track of exactly how many sprites have been added (sprid).

*spr_x is 9 bits (uses 2 bytes). If the sprite never leaves the screen, just leave the upper byte of spr_x as zero. If you pass it more than 9 bits, it will ignore the extra bits.

The ball uses another function, OAM_Spr. This is for putting 1 sprite in the OAM BUFFER. You have to provide all the details of the sprite. Pass the x position to spr_x, the y position to spr_y, the tile # to spr_c, the attributes to spr_a, and set the size with spr_sz. spr_sz needs to be either 0 (small) or 2 (large). Then jsr OAM_Spr.

lda ball_x
sta spr_x
lda ball_y
sta spr_y
lda #2
sta spr_c
lda #SPR_PAL_5|SPR_PRIOR_2
sta spr_a
stz spr_sz ;8×8
jsr OAM_Spr

If you are placing multiple balls on screen, all using the same palette, then you would only need to change the spr_x and spr_y before calling OAM_Spr again.

Writing to the background

The print_score function always runs during v-blank. It has to, because it is writing to the VRAM. That is why we do it as soon as possible after the jsr Wait_NMI.

I’m using this Map_Offset function (in library.asm) to get the VRAM address of the numbers in at the top of the screen. It wants you to load X with the tile’s x position 0-31 and load Y with the tile’s y position 0-31. If you only have pixel X and Y, just shift right (lsr a) 3 times to get the 0-255 value to 0-31 (tile) for 8×8 tiles.

Map_Offset does some bit shifting to convert that to a VRAM address. It returns A16 = the offset. You add that to the base address (our BG3 map is at $7000).

ldx #12
ldy #1
jsr Map_Offset ; returns a16 = vram address offset
clc
adc #$7000 ;layer 3 map
sta VMADDL ;$2116

and then copying 2 values per number on screen (by writing to $2118-$2119). We are writing with the VRAM increment set to +32. That means that the second write will go below the first one.

lda #V_INC_32
sta VMAIN ;$2115

Some of these values might be hard to understand, like, why are we adding $10 to the points_L? Our tiles for numbers begins at $10.

Try the demo. Press START to get it going.

Example7

.
Try to make this into a game by having controller 2 to move the right paddle.

The ball is a bit slow, though. Moving 2 pixels per frame might be too fast. It would be best to use “fixed point” math, that’s a 16-bit variable for ball speed and position, where the upper byte refers to a pixel position, and the lower byte is a sub-pixel position (and speed). Then we could have 1 1/2 pixel per frame movement.

I wish we had some sound effects too. Maybe a little later for that.

SNES main page

Controllers and NMI

SNES programming tutorial. Example 6.

https://github.com/nesdoug/SNES_06

Controller reads

There is a set of registers that can be read like NES registers. Originally, they wanted to make it easy to transition from programming NES games to programming SNES games. They even used the same number $4016 and $4017 (ports 1 and 2). However, you shouldn’t read these. Instead you should turn on the auto-read feature (and also the NMI enable) from register $4200.

With auto-controller reads set, the CPU be interrupted (soon after the end of each frame) and automatically read all the buttons from both controllers and then store the values at $4218-$421b.

$4218-19 port 1
$421a-1b port 2
(if a multitap for 4 player games installed, 421c-d and 421e-f for controllers 3+4)

The button order is…
KEY_B = $8000
KEY_Y = $4000
KEY_SELECT = $2000
KEY_START = $1000
KEY_UP = $0800
KEY_DOWN = $0400
KEY_LEFT = $0200
KEY_RIGHT = $0100
KEY_A = $0080
KEY_X = $0040
KEY_L = $0020
KEY_R = $0010

And I use these constants as a bit mask (bitwise AND operation) to isolate the buttons.

The pad_poll function also does some bit twiddling to figure out which buttons have just been pressed this frame.

pad1 and pad2 variables tell you which buttons are being pressed.
pad1_new and pad2_new tell you which buttons have just been newly pressed this frame.
We need call pad_poll each frame. How do we know that a new frame has started? That’s where the NMI comes in.

NMI

When the screen is on, the PPU spends most of its time drawing pixels to the screen, one horizontal line at a time, one pixel at a time. Starting at the top, it goes left to right and draw a line. Then it jumps down and draws the next line. Etc, etc, until the frame is completed.

While it is drawing pixels to the screen, the PPU is busy, you can’t send new data to the VRAM. You can’t send new data to the  OAM or the CGRAM (palette) either. After the screen is done drawing, the PPU rests in a vertical blank period for a little bit. During this v-blank period, you CAN access the PPU registers.

If you turn on NMI interrupts, when the PPU is done drawing to the screen… nearly at the very beginning of v-blank, the PPU sends an NMI signal to the CPU. This happens every frame, which is 60 times a second (50 in Europe). That signal causes the CPU to pause and jump to the NMI vector (an address it finds at $00ffea in the ROM). We have it set to jump to the label called NMI: which is located in the init.asm file. (note, the NMI code needs to be in the 00 bank).

The NMI code is just this.

bit $4210 *
inc in_nmi
rti

* ; it is required to read this register during NMI

(many game have much more elaborate NMI code than this)

Our main code is waiting for the in_nmi variable to change. When it changes we know that we are in the v-blank period. Now is a good time to write to PPU registers or send data to the VRAM. But, also, we are using this to time our game loop.

wait_nmi: waits until we are in v-blank. We call this at the top of the game loop. Notice that I put a WAI (wait for interrupt) instruction here. If you neglected to turn NMI interrupts on, this would crash the game, as it waits forever for a signal that never comes. IRQ interrupts could also trip the WAI instruction, which is why I also wait for the in_nmi variable to change to be sure. You could delete the WAI instruction, if you would like*. Some games use this waiting loop to spin a random number generator. You could do that as well…. like adding a large prime number over and over, or just ticking a variable +1 over and over.

* someone told me that WAI could make an emulator run less laggy, as it would have less to do each frame. It also saves electricity, because the CPU uses less while it waits. You decide if you need it or not.

Soon after the wait_nmi function runs, we run our DMA to the OAM (copy our sprite buffer to the sprite RAM). This needs to be done during v-blank, which is why we do it first. Then, we run our pad_poll to read new button presses. Then we enter the game logic. Here’s an example of what we are doing to move the sprite.

Our sprite is composed of 3 sprites that move together (16×16 each). Each time we press the right button, we need to increase the X value of each sprite. Left, we decrease the X values. Each sprite uses 4 bytes, so each sprite X value is 4 bytes apart. So we do this…

  AXY16
  lda pad1
  and #KEY_LEFT
  beq @not_left
@left:
  A8
  dec OAM_BUFFER ;decrease the X values
  dec OAM_BUFFER+4
  dec OAM_BUFFER+8
  A16
@not_left:

LDA loads the A register with pad1, which has all the button presses for controller 1. We apply a bit mask (AND) to isolate the left button. If it is zero, the button isn’t being pressed, and it will branch (BEQ) over our code. Otherwise, it will then to the dec OAM_BUFFER lines. Dec can be 8 bit or 16 bit, depending on the size of the A register. We want 8 bit, so we A8 for that. We need the A16, to make sure we exit this bit of code with A always in 16 bit mode.

We repeat that process 3 more times for RIGHT, UP, and DOWN buttons. You see, our character moves around the screen. This code isn’t very good, though. We aren’t handling that 9th X bit.

With this code, you can move smoothly off the top and bottom of the screen, like this…

example6b

But if you try to move left off screen, it suddenly disappears. Like this below…

example6d

example6f

That’s why we need that 9th X bit in the high table. Here’s what it looks like at X=248, with the 9th bit = 0.

example6c

And below shows what the same X=248, with the high table (9th bit) = 1

example6e

We didn’t do that in this example, but I worked up some code that can manage this. If you look in the next example files, in the library.asm file, you will see the functions called OAM_Spr and OAM_Meta_Spr. The spr_x variable is 9 bit so that we can move a sprite object smoothly off the left side without suddenly disappearing.

https://github.com/nesdoug/SNES_07/blob/master/library.asm

To use OAM_Spr, first we set the variables spr_x, spr_y, spr_c (tile), spr_a (attributes), and spr_sz (size), then call this function, and it will load the OAM buffers with the appropriate values (and also handle that awkward high table).

To use OAM_Meta_Spr, we first set spr_x, and spr_y, and then load the A and X registers with the address of the metasprite data. (A16 with absolute address, and X with the bank #). The metasprite data is generated by SPEZ and it is a list of each sprite in the multi-sprite object (5 bytes per sprite). This function will automatically calculate the relative position of each sprite, and write them in the OAM buffers.

.

SNES main page

Sprites

SNES programming tutorial. Example 5.

https://github.com/nesdoug/SNES_05

Sprites are the graphic objects that can move around the screen. Nearly all characters are made of sprites… Mario, Link, Megaman, etc. The OAM RAM controls how each sprites appear.

Mario

You will notice that Mario is made of 2 16×16 sprites. It is common to use more than 1 sprite for a character. Rex is also made of 2 16×16 sprites, with the lower sprite several pixels to the right of the top one. You can also layer sprites on top of each other, but with 15 colors to choose from, you shouldn’t have to.

You could increase the large sprite size to 32×32, but that would end up wasting more VRAM space on blank spaces. 8×8 and 16×16 are more common. I call it a “metasprite” when it is a collection of multiple sprites to make up 1 character. The SPEZ sprite editor I wrote saves these as tables of numbers HOWEVER I didn’t do that this time. This time I manually typed the sprite values in main.asm at the Sprites: label. In SPEZ, I saved the tiles and palette, which we .incbin at the bottom of main.asm.

https://github.com/nesdoug/SPEZ

You may prefer to draw your sprites in another tool, and import those images into SPEZ.

spez

OAM

The official docs call sprites “objects”. You need to write data to the OAM RAM to get them to show up on screen.

There are 2 tables in the OAM, and you need to write both of them, usually a DMA during v-blank or forced blank.

Low Table

The low table (512 bytes) is divided into 4 bytes per Sprite, with sprite #0 using bytes 0,1,2,3 and sprite , #1 using bytes 4,5,6,7, etc… up to sprite #127. 4 x 128 = 512 bytes.
Those bytes are, in this order…

byte 1 = X position

byte 2 = Y position

byte 3 = Tile #

byte 4 = attributes
.
X and Y are screen relative, in pixels (for the top left of the sprite).

Attributes

vhoopppN
v vertical flip
h horizontal flip
oo priority
ppp palette
N 1st or 2nd set of tiles (you can have up to 512 tiles for sprites).

The High Table

There are 32 bytes in the high table for 128 sprites. That’s 2 bits per sprite, and it can be very tedious to manage. Lots of bit shifting. The bits are

sx (s upper bit, x lower bit)
s= size (small or large)
x = 9th bit for x

The extra X bit is so you can smoothly move a sprite off the left side of the screen. With that bit set and the regular X set to $ff, that would be like -1. Whereas, without the extra X bit, $ff would be the far right of the screen, with only 1 pixel wide showing.

How are the 2 bits put together?
Let’s say,
Sprite 0 = aa
Sprite 1 = bb
Sprite 2 = cc
Sprite 3 = dd
The the first byte of the high table is
ddccbbaa
or (dd << 6) + (cc << 4) + (bb << 2) + aa

Palettes

Sprites use the second half of the CGRAM (palette). It is 15 colors + transparency for each palette. Sprite palette #0 uses indexes 128-143. Sprite palette #1 uses indexes 144-159. And so forth.

Priorities

I like to set sprite priority to 2. That would be in front of bg layers (but behind layer 3 if it’s set as super-priority in front of everything). Higher sprite priority would be in front of sprites with lower priority.

Besides priorities…Low index sprites will go in front of higher index ones. Sprite #0 would be in front of Sprite #1. Sprite #1 would be in front of Sprite #2. Sprite #2 would be in front of Sprite #3. Etc.

There is a limit to how many sprites can fit on a horizontal line. And using larger sprites doesn’t improve that, internally it splits sprites up into 8×1 slivers, and only 32 slivers can fit on a line. The 33rd one disappears. Because of this, you could shuffle the sprites every frame. That’s a lot of sprites, so I see most games just ignore this problem, and try not to put too many sprites on each line. Space shooter games (lots of sprites on screen at once) re-order the sprites in the OAM manually every frame. Some kind of shuffling algorithm, to make sure no bullets hit you that you couldn’t see.

Caution. Don’t put sprites at X position 0x100. (with the 9th bit 1 and the regular X at 00) They will be off screen, but will somehow count towards the 32 sprites per line limit.

Clearing Sprites

If you leave the OAM zeroed, it will display sprites at X=0, Y=0, Tile=0, palette=0… and the top left of the screen would have 128 sprites on top of each other. If you just want ALL sprites off screen, you could just turn them off from the main screen ($212c). But to put an individual sprite off screen, you should put its Y value at 224 (assuming screens are left to the default 224 pixel height). This would put 8×8,16×16, and 32×32 sprites off screen, but 64×64 sprites would wrap around to the top of the screen… so maybe don’t use 64×64 sprites (or make sure to set its size to small before pushing it off screen).
.

Let’s go over the code.

Code

We need to change a few settings, first.
$2101 sets the sprite size and the location of the sprite tiles.
sssnnbbb
sss = size mode*
nn = offset for 2nd set of sprite tiles. leave it at zero, standard.
bbb = base address for the sprite tiles.
Again, the upper bit is useless. So, each b is a step of $2000.

* size modes are

000 = 8×8 and 16×16 sprites
001 = 8×8 and 32×32 sprites
010 = 8×8 and 64×64 sprites
011 = 16×16 and 32×32 sprites
100 = 16×16 and 64×64 sprites
101 = 32×32 and 64×64 sprites

.

lda #2
sta OBSEL ; $2101 sprite tiles at VRAM $4000, sizes are 8×8 and 16×16

And we need to make sure sprites show up on the main screen.

lda #$10 ; sprites active
sta TM ; $212c main screen

https://wiki.superfamicom.org/sprites

https://wiki.superfamicom.org/registers

From here on out, I am going to use BUFFERS. Buffers are temporary locations in local RAM that will be copied (DMA) each frame to the actual memory (the OAM RAM)… during the v-blank period. Well, next time we will do that. In this example, we are doing it once during forced blank (2100 bit 7 set), which is also fine.

We are using a block move macro to copy from the ROM to the BUFFER.

BLOCK_MOVE 12, Sprites, OAM_BUFFER

to set up a MVN operation (to copy a block of data from the ROM to the RAM). See macros.asm for details.

And I’m writing just one byte to the high table. We only need 3 sprites in this example, so we will only need 2×3=6 bits, setting the size of each to large (16×16).

lda #$6A ;= 01 101010
sta OAM_BUFFER2

Now I will DMA both tables at once. A DMA to the OAM looks like this… [sorry, I changed the code a bit, but this is essentially the same thing.] jsr DMA_OAM will do this…

; DMA from OAM_BUFFER to the OAM RAM
ldx #$0000
stx $2102 ;OAM address

stz $4300 ; transfer mode 0 = 1 register write once
lda #4 ;$2104 oam data
sta $4301 ; destination, oam data
ldx #.loword(OAM_BUFFER)
stx $4302 ; source
lda #^OAM_BUFFER
sta $4304 ; bank
ldx #544
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

That’s 544 bytes being copied to the $2104 (OAM DATA register) after we zeroed the OAM address registers ($2102-3). I recommend always writing to the OAM with a 544 byte DMA, once per frame (during v-blank).

The data we are transferring looks like this…

Sprites:
;4 bytes per sprite = x, y, tile #, attribute
.byte $80, $80, $00, SPR_PRIOR_2
.byte $80, $90, $20, SPR_PRIOR_2
.byte $7c, $90, $22, SPR_PRIOR_2

With the top left sprite at x = $80 and y = $80. We are using tiles 00,20,22, and all of the sprites use palette #0 and priority #2 (above BG layers).

And this is what it looks like.

example5

Try drawing your own sprite, and getting it to show up on screen.

SNES main page

Layers / Priority

SNES programming tutorial. Example 4.

https://github.com/nesdoug/SNES_04

Last time we created a background (tiles and map) and got it to show up on screen. This time we are going to add more layers.

In Mode 1, we get 3 background layers. Layer 1 and 2 are 4bpp (16 color) and Layer 3 is 2bpp (4 color). I made the graphics in GIMP and resized to 256×256 or less (the moon was 112×128, the text was 256×32).

Now I imported these into M1TE. The moon was imported while Layer 1 (4bpp) was active. The text was imported while Layer 3 (2bpp) was active. I just drew some moon tiles in blue on Layer 2 (also 4bpp).

bg1 layer 1

bg2 layer 2

bg3 layer 3

Now let’s talk about how the layers work. Normally, layer 1 is on top, then layer 2 is next, and layer 3 on the bottom. Like this.

Example4

But each tile on the map has a PRIORITY setting. Normally, this is to determine if the BG tile will go behind a sprite on the same layer, or in front of it. In mode 1, the layers go like this…

(top)
Sprites with priority 3
BG1 tiles with priority 1
BG2 tiles with priority 1
Sprites with priority 2
BG1 tiles with priority 0
BG2 tiles with priority 0
Sprites with priority 1
BG3 tiles with priority 1
Sprites with priority 0
BG3 tiles with priority 0
(bottom)

Anywhere there is color #0 on a tile, it will be transparent on that layer. Behind all the layers (if there isn’t a solid pixel on any layer) it will be filled with color #0.

Layers

However, if bit 3 of $2105 is set, BG3 will be in FRONT of everything (if the priority bit is set on the map). In M1TE, you can set all the priority bits for the whole map by checking a box.

Priority

I did that for BG3. The only difference between the picture above and below is the bit 3 of $2105 is set. (see these links for reference)

https://wiki.superfamicom.org/registers

https://wiki.superfamicom.org/backgrounds

Example4b

With $2105 d3 set and priority bits in BG3 map set, they appear on top. This can be very useful for text boxes that appear in front of everything, or a HUD / Scoreboard that you can always see. Because BG3 is only 2bpp, it won’t be very colorful, so it will be ideal for text messages.

The code for putting all this together is very similar to the previous page. The 2bpp tiles were loaded to $3000 in the VRAM with a DMA.

ldx #$3000
stx VMADDL ; set an address in the vram of $3000

lda #1
sta $4300 ; transfer mode, 2 registers 1 write
lda #$18 ; $2118
sta $4301 ; destination, vram data
ldx #.loword(Tiles2)
stx $4302 ; source
lda #^Tiles2
sta $4304 ; bank
ldx #(End_Tiles2-Tiles2)
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

and the maps were loaded similarly with DMAs. BG2 map to $6800 and BG3 map to $7000. The maps for those layers will then be loaded to those VRAM addresses. We need to tell the PPU where our tiles and maps are.

stz BG12NBA ; $210b BG 1 and 2 TILES at $0000

lda #$03
sta BG34NBA ; $210c put BG3 TILES at VRAM address $3000

lda #$60 ; bg1 map at VRAM address $6000
sta BG1SC ; $2107

lda #$68 ; bg2 map at VRAM address $6800
sta BG2SC ; $2108

lda #$70 ; bg3 map at VRAM address $7000
sta BG3SC ; $2109

We need to make sure that all 3 layers are active on the main screen.

lda #BG_ALL_ON ;$0f
sta TM ; $212c

and that will give us this picture (same as above)

Example4

with BG3 behind everything.

When we flip bit 3… 00001000 at $2105, BG3 will show up on top (if their priority bits are set on the map). Note BG3_TOP is defined as 8.

lda #1|BG3_TOP ; mode 1, tilesize 8×8 all, layer 3 on top
sta BGMODE ; $2105

Like this.

Example4b

Each of these layers scroll independently of each other. You would adjust them with these registers. They are write twice (low then high).

$210d – BG1 Horizontal

$210e – BG1 Vertical

$210f – BG2 Horizontal

$2110 – BG2 Vertical

$2111 – BG3 Horizontal

$2112 – BG3 Vertical

(Note: not used in this example)

Maps, In Depth

All of our examples are with maps set to 32×32 tiles. (the screen is set to 224 pixels high, so you can’t see all the tiles at once). Each address in the map uses 2 bytes, since the VRAM is set up for 16 bits per address. It can be very confusing to look at in a Hex Editor (VRAM memory viewer) that show bytes, you will have to multiply x2 the VRAM address to find it in the hex editor. VRAM address $6000 is going to be found at $C000 in the emulator’s memory viewer.

Each map uses $800 bytes, but only $400 addresses (32×32 = 1024 = $400). They would be arranged like…

0,1,2,3,4… 31 =  1st row / top of screen
32,33,34,35… 63 = 2nd row
64,65,66,67… 95 = 3rd row
etc. on down to 32nd row, below the bottom of the screen

So, if you go down 1 on the map, you add 32 to the address.

Larger Maps

Maps

We could have made the map 64 tiles wide (for BG1 $2107, bit 0 = 1). If the left screen is at $6000, the right screen would be at $6400. In M1TE, you could construct each 32×32 part as a separate map.

We could have made the map 64 tiles tall (for BG1 $2107, bit 1 = 1). The upper screen at $6000 and the lower screen at $6400.

Lastly, if we made the map 64×64 (for BG1, both bits 0 and 1 set). If the first screen is at $6000, the screens would be arranged like

$6000 – $6400

$6800 – $6c00

If tiles were set to 8×8 size ($2105, called “character size”), a 64×64 map would be 512×512 pixels in size.

If tiles were set to 16×16, the same map would be 1024×1024 pixels. This should explain why we need 16 bit scrolling registers.

Tiles in the Map

So, I said that each entry in the map is 16 bits. Those bits are arranged like this…

vhopppcc cccccccc
v/h = Vertical/Horizontal flip this tile.
o = Tile priority.
ppp = Tile palette.
cc cccccccc = Tile number.

Each tileset is theoretically as big as 1024 tiles (for BG).

And, one more thing about palettes.

Palettes in Mode 1

Palette2

4bpp tiles use an entire row (left to right). If you set its palette to 0, it uses the top row (indexes 0-15). Palette 1, the next row (indexes 15-31), and so forth down to the 8th row (palette 7). That’s indexes 0 – 127 for background. Sprites would use the indexes 128 – 255 similarly. Sprites also use 4bpp tiles and 16 colors per tile.

2bpp tiles (BG3) shares the top 2 rows. Each palette only uses 4 colors, so palette 0 uses indexes 0-3, palette 1 uses index 4-7, palette 2 uses index 8-11, and palette 3 uses index 12-15… all in the top row. Palettes 4-7 similarly would use the next row. Every 0th color in each palette would be transparent. I usually reserve the top row for BG3 and the other 7 rows for BG1 and BG2.

Behind all the layers, the universal background color shows (index 0 of the palette), wherever there are transparent pixels. This is true for every layer. The black that fills most of these pictures is the background color showing through.

SNES main page