Pong. Sprite collisions.

SNES programming tutorial. Example 7.



I made a simple Pong demo to show sprite collisions.

Well… I was trying to keep it simple, but I decided to use some of the more complicated code I have previously written. Copied to the library.asm file from some of the EasySNES files. oam_spr (copies one sprite to the buffer), oam_meta_spr (copies multiple sprites to the buffer), oam_clear (clears the buffer), map_offset (gets an address from a specific x/y coordinate in a map). I did change the return from these functions from RTL to RTS, because all of our code is in the same bank.

Link if you are interested.


NOTE: I will probably change the metasprite code. It is very slow.

check_collision is new. I will discuss that a bit later.

Let’s talk about the process of making this. I made a circle gradient in GIMP for the background, and converted to indexed 4 color (with dithering).


We could have used Mode 0 for that, huh? But I stuck with Mode 1 anyway. Saved as uncompressed PNG. Converted to .chr .map .pal with superfamiconv. Loaded the background in M1TE (my tile/map editor).


Then I drew some numbers for BG3, and filled a little on the top and bottom.


Clicked the priority checkbox for this map.


Saved all the maps and tiles and palette.

Now I opened SPEZ (my sprite editor) and drew some simple box shapes for the ball and paddle. Saved them as metasprites.asm and saved their tiles (chr) and palette.


Everything is .incbin -ed in the main.asm file. We are loading everything just like the previous examples, with DMAs to the VRAM. One difference is that I’m turning everything on for the main screen.

lda #ALL_ON_SCREEN ; $1f
sta main_screen ; $212c

We didn’t really need BG2, since it’s blank. Oh well.

You might notice that I wrote a macro for DMAs to the VRAM. This made the code a little easier to read and write. Let’s look at an example…

DMA_VRAM $700, Map1, $6000

This is the DMA_VRAM macro definition…

.macro DMA_VRAM length, src_addr, dst_addr
;dst is address in the VRAM
;a should be 8 bit, xy should be 16 bit
ldx #dst_addr
stx vram_addr

lda #1
sta $4300 ; transfer mode, 2 registers 1 write
; $2118 and $2119 are a pair Low/High
lda #$18 ; $2118
sta $4301 ; destination, vram data
ldx #.loword(src_addr)
stx $4302 ; source
lda #^src_addr
sta $4304 ; bank
ldx #length
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

So where it says length, the macro will insert the $700 bytes (not $800, because the screen is only 224 pixels high, so I’m not filling the entire 256 pixel high map). Where it says src_addr, it replaces it with Map1. Where it says dst_addr, it replaces it with VRAM address $6000. Doesn’t this look nicer though?

DMA_VRAM $700, Map1, $6000

Simple. Elegant. Easy to read. Macros are your friends.
Everything between InfiniteLoop and line 426 jmp InfiniteLoop is the game loop. Every frame we wait till v-blank. Copy the OAM_BUFFER to the OAM. Print the score to the top of the screen. Read the controllers. Move the paddles if up or down are pressed.

  lda pad1
  and #KEY_UP
  beq @not_up

  lda paddle1_y
  cmp #$20 ;max up
  beq @not_up ;too far up
  bcc @not_up

  dec paddle1_y
  dec paddle1_y

  dec paddle2_y
  dec paddle2_y


This code is moving both paddles, because this is just example code. You could modify it, so that controller2 moves the paddle on the right. Copy this whole thing, and replace pad1 with pad2, and only move paddle2. Also change the label names, so you don’t have duplicates.

We are only moving the ball while it is “active”. Press START to make it active, and choose a random direction to go (based on a frame counter).

lda #1
sta ball_active

ball_x_speed and ball_y_speed are the directions of the ball. Either 1 or -1 ($ff).

If the ball is active, it moves up/down until it reaches the ceiling or floor.

;bounce off ceilings
cmp #$20
bcs @above20

lda #1
sta ball_y_speed

;bounce off floor
lda ball_y
cmp #$c7
bcc @ball_done

lda #$ff ; -1
sta ball_y_speed

Sprite Collisions

It moves left/right until it reaches the end of the room. But we want it to bounce off the paddles, so we need to check collisions with hitboxes. I wrote this a long time ago (modified slightly). It’s the check_collision function in the library.asm file.

So we need the dimensions and location of 2 boxes. That’s 8 numbers, that I copy to these variables…
obj1x, obj1w, obj1y, obj1h
obj2x, obj2w, obj2y, obj2h
x = left side of sprite object
w = width (minus 1)
y = top side of the sprite object
h = height (minus 1)

I defined some of these with constants at the top of main.asm


Of course, the x and y values are changing. Those are defined as variables in the zero page (direct page).

paddle1_x, paddle1_y
paddle2_x, paddle2_y,
ball_x, ball_y

I copy these to the obj1 obj2 stuff, and then call check_collision, which sets the “collision” variable to 0 or 1. If collision is true, we bounce the ball.

I’m also drawing the sprites each frame. That uses the oam_meta_spr function to read from a metasprite data set in metasprites.asm.

I’m using a similar approach as the collision code. I’m copying a bunch of values to the spr_h*, spr_x, spr_y, and then loading A and X with the address of the metasprite data…
lda #.loword(Meta_00) ;left paddle
ldx #^Meta_00
jsr oam_meta_spr

And this will automatically put all the data in the OAM_BUFFER at the correct x and y positions. It also adjusts the high table bit shifting and keeps track of exactly how many sprites have been added (sprid).

*spr_h is just that extra 9th X bit. It should be 0 or 1. We are keeping it 0 for all these examples, for simplicity. This function is a bit complex, but don’t worry about it right now.

Writing to the background

The print_score function always runs during v-blank. It has to, because it is writing to the VRAM. I’m using this map_offset function (in library.asm). It wants you to load X with the tile’s x position 0-31 and load Y with the tile’s y position 0-31. If you only have pixel X and Y, just shift right (lsr a) 3 times to get the 0-255 value to 0-31 (tile) for 8×8 tiles.

map_offset does some bit shifting to convert that to a VRAM address. It returns A16 = the offset. You add that to the base address (our BG3 map is at $7000).

ldx #12
ldy #1
jsr map_offset ; returns a16 = vram address offset
adc #$7000 ;layer 3 map
sta vram_addr

and then copying 2 values per number on screen. We are writing with the VRAM increment set to +32. That means that the second write will go below the first one.

lda #V_INC_32
sta vram_inc ;$2115

Some of these values might be hard to understand, like, why are we adding $10 to the points_L? Our tiles for numbers begins at $10. 0 is at $10. 1 is at $11, etc. Anyway, here’s our finished program. Press START to get it going.


Try to make this into a game by having controller 2 to move the right paddle.

The ball is a bit slow, though. Moving 2 pixels per frame might be too fast. It would be best to use “fixed point” math, that’s a 16-bit variable for ball speed and position, where the upper byte refers to a pixel position, and the lower byte is a sub-pixel position (and speed). Then we could have 1 1/2 pixel per frame movement.

I wish we had some sound effects too. Maybe a little later for that.





Controllers and NMI

SNES programming tutorial. Example 6.


Controller reads

There is a set of registers that can be read like NES registers. Originally, they wanted to make it easy to transition from programming NES games to programming SNES games. They even used the same number $4016 and $4017n (ports 1 and 2). However, you shouldn’t read these. Instead you should turn on the auto-read feature… and also the NMI enable from register $4200.

With auto-controller reads set, the CPU will interrupt itself soon after the end of each frame and read all the buttons from both controllers and then store the values at $4218-$421b.

$4218-19 port 1
$421a-1b port 2
(if a multitap for 4 player games installed, 421c-d and 421e-f for controllers 3+4)

The button order is…
KEY_B = $8000
KEY_Y = $4000
KEY_SELECT = $2000
KEY_START = $1000
KEY_UP = $0800
KEY_DOWN = $0400
KEY_LEFT = $0200
KEY_RIGHT = $0100
KEY_A = $0080
KEY_X = $0040
KEY_L = $0020
KEY_R = $0010

And I use these constants as a bit mask (AND operation) to isolate the buttons.

The pad_poll function also does some bit twiddling to figure out which buttons have just been pressed this frame.

pad1 and pad2 are any button that is down, even if you been holding it down.
pad1_new and pad2_new are buttons that have just been newly pressed this frame.
We need call pad_poll each frame. How do we know that a new frame has started. That’s where the NMI comes in.


When the screen is on, the PPU spends most of it’s time drawing pixels to the screen, one line at a time. Starting at the top, it goes left to right and draw a line. Then it jumps to the left and draws the next line.

It does this so fast you can’t see it. But, since the PPU is busy, you can’t send new data to the VRAM. You can’t send new data to many of the PPU registers, such as the OAM. But when the screen is done drawing, the PPU rest in a vertical blank period for a little bit. During this v-blank period, you CAN access the PPU registers.

If you turn on NMI interrupts, when the PPU is done drawing to the screen… nearly at the very beginning of v-blank, the PPU sends an NMI signal. This happens every frame, which is 60 times a second (50 in Europe). That signal causes the CPU to pause and jump to the NMI vector (an address stored at $00ffea in the ROM). We have it set to jump to NMI: which is located in the init.asm file. (note, the NMI code needs to be in the 00 bank, or it’s mirror, the $80 bank).

The NMI code is just this.

lda $4210 ; it is required to read this register
; in the NMI handler, don’t ask me why.
inc in_nmi

(many game have much more elaborate NMI code, btw).

And our code is waiting for the in_nmi variable to change. When it changes we know that we are in the v-blank period. Now is a good time to write to PPU registers or send data to the VRAM. But, also, we are using this to time our game loop.

wait_nmi: waits until we are in v-blank. We call this at the top of the game loop. Notice that I put a WAI (wait for interrupt) instruction here. If you neglected to turn NMI interrupts on, this would crash the game, as it waits forever for a signal that never comes. IRQ interrupts could also trip the WAI instruction, which is why I also wait for the in_nmi variable to change to be sure. You could delete the WAI instruction, if you would like*. Some games use this waiting loop to spin a random number generator. You could do that as well…. like adding a large prime number over and over, or just ticking a variable +1 over and over.

* someone told me that WAI could make an emulator run less laggy, as it would have less to do each frame. It also saves electricity, because the CPU uses less while it waits. You decide if you need it or not.

Soon after the wait_nmi function runs, we run our DMA to the OAM (copy our sprite buffer to the sprite RAM). This needs to be done during v-blank, which is why we do it first. Then, we run our pad_poll to read new button presses. Then we enter the game logic. Here’s an example of what we are doing to move the sprite.

Our sprite is composed of 3 sprites that move together (16×16 each). Each time we press the right button, we need to increase the X value of each sprite. Left, we decrease the X values. Each sprite uses 4 bytes, so each sprite X value is 4 bytes apart. So we do this…

  lda pad1
  and #KEY_LEFT
  beq @not_left
  dec OAM_BUFFER ;decrease the X values
  dec OAM_BUFFER+4
  dec OAM_BUFFER+8

LDA loads the A register with pad1, which has all the button presses for controller 1. We apply a bit mask (AND) to isolate the left button. If it is zero, the button isn’t being pressed, and it will branch (BEQ) over our code. Otherwise, it will then to the dec OAM_BUFFER lines. Dec can be 8 bit or 16 bit, depending on the size of the A register. We want 8 bit, so we A8 for that. We need the A16, to make sure we exit this bit of code with A always in 16 bit mode.

We repeat that process 3 more times for RIGHT, UP, and DOWN buttons.

And these values are copied to the OAM each frame, which moves our sprites on screen. Let’s look closely what happens when you scroll off screen. Y values 224 to 256 are off screen to the bottom, but they would eventually wrap around to the top.


But moving the sprite to the left, from x=00 to x=$ff (255)


The sprite suddenly disappears. Which is weird.


That’s why we need that 9th x bit in the high table. Here’s what it looks like at the far right without the high table x bit set.


Here’s with the high table x bit set.


So we need it for smoothly moving off the left side of the screen.

We didn’t do that in this example, but I worked up some code that can manage this, for next time.


Final note. The code here is not particularly good. Keeping the sprite variables in the OAM_BUFFER itself is not very good practice. I have seen other tutorials do this, and I did it so it would be easier to understand, but I don’t like it. It would be better to keep every sprite object as a set of variables (x and y maybe as 16-bit variables), that get copied to the OAM_BUFFER in a dynamic way, so that enemy objects can be created and destroyed without causing holes/gaps in the OAM_BUFFER. With that approach, you would clear the buffer at the start of each frame, and then draw the necessary sprites as needed every frame.

That might be slower code, but much more flexible.




SNES programming tutorial. Example 5.



Sprites are the graphic objects that can move around the screen. Nearly all characters are made of sprites… Mario, Link, Megaman, etc. The OAM RAM controls how each sprites appear.


You will notice that Mario is made of 2 16×16 sprites. It is common to use more than 1 sprite for a character. Rex is also made of 2 16×16 sprites, with the lower sprite several pixels to the right of the top one. You can also layer sprites on top of each other, but with 15 colors to choose from, you shouldn’t have to.

You could increase the large sprite size to 32×32, but that would end up wasting more VRAM space on blank spaces. 8×8 and 16×16 are more common. I call it a “metasprite” when it is a collection of multiple sprites to make up 1 character. The SPEZ sprite editor I wrote saves these as tables of numbers (metasprite/save options). And the tiles save as 4bpp CHR files.


I actually used SPEZ to draw the example sprites, but you may choose to use YY-CHR or some other tile editor. But let’s go over how sprites work.




The official docs call sprites “objects”. You need to write data to the OAM RAM to get them to show up on screen.

There are 2 tables in the OAM, and you need to write both of them, usually a DMA during v-blank or forced blank. (v-blank is the vertical blank period, the slight pause after each frame is drawn to the TV where the PPU isn’t doing anything, and can be updated for the next frame).

Low Table

The low table (512 bytes) is divided into 4 bytes per Sprite, with sprite #0 using bytes 0,1,2,3 and sprite , #1 using bytes 4,5,6,7, etc… up to sprite #127. 4 x 128 = 512 bytes.
Those bytes are, in this order…
x, y, tile #, attributes.
X and Y are screen relative, in pixels (for the top left of the sprite).


v vertical flip
h horizontal flip
oo priority
ppp palette
N 1st or 2nd set of tiles (you can have up to 512 tiles for sprites).

The High Table

There are 32 bytes in the high table for 128 sprites. That’s 2 bits per sprite, and it can be very tedious to manage. Lots of bit shifting. The bits are

sx (s upper bit, x lower bit)
s= size (small or large)
x = 9th bit for x

The extra X bit is so you can smoothly move a sprite off the left side of the screen. With that bit set and the regular X set to $ff, that would be like -1. Whereas, without the extra X bit, $ff would be the far right of the screen, with only 1 pixel wide showing.

How are the 2 bits put together?
Let’s say,
Sprite 0 = aa
Sprite 1 = bb
Sprite 2 = cc
Sprite 3 = dd
The the first byte of the high table is
or (dd << 6) + (cc << 4) + (bb << 2) + aa


Sprites use the second half of the CGRAM (palette). It is 15 colors + transparency for each palette. Sprite palette #0 uses indexes 128-143. Sprite palette #1 uses indexes 144-159. And so forth.


I like to set sprite priority to 2. That would be in front of bg layers (but behind layer 3 if it’s set as super-priority in front of everything). Higher sprite priority would be in front of sprites with lower priority.

Besides priorities…Low index sprites will go in front of higher index ones. Sprite #0 would be in front of Sprite #1. Sprite #1 would be in front of Sprite #2. Sprite #2 would be in front of Sprite #3. Etc.

There is a limit to how many sprites can fit on a horizontal line. And using larger sprites doesn’t improve that, internally it splits sprites up into 8×1 slivers, and only 32 slivers can fit on a line. The 33rd one disappears. Because of this, you could shuffle the sprites every frame. That’s a lot of sprites, so I see most games just ignore this problem, and try not to put too many sprites on each line. Space shooter games (lots of sprites on screen at once) re-order the sprites in the OAM manually every frame. Some kind of shuffling algorithm, to make sure no bullets hit you that you couldn’t see.

Caution. Don’t put sprites at X position 0x100. (with the 9th bit 1 and the regular X at 00) They will be off screen, but will somehow count towards the 32 sprites per line limit.

Clearing Sprites

If you leave the OAM zeroed, it will display sprites at X=0, Y=0, Tile=0, palette=0… and the top left of the screen would have 128 sprites on top of each other. If you just want ALL sprites off screen, you could just turn them off from the main screen ($212c). But to put an individual sprite off screen, you should put its Y value at 224 (assuming screens are left to the default 224 pixel height). This would put 8×8,16×16, and 32×32 sprites off screen, but 64×64 sprites would wrap around to the top of the screen… so it’s a good idea to also reset the sprite size bit to 0.

Let’s go over the code.


We need to change a few settings, first.
$2101 sets the sprite size and the location of the sprite tiles.
sss = size mode*
nn = offset for 2nd set of sprite tiles. leave it at zero, standard.
bbb = base address for the sprite tiles.
Again, the upper bit is useless. So, each b is a step of $2000.

* size modes are

000 = 8×8 and 16×16 sprites
001 = 8×8 and 32×32 sprites
010 = 8×8 and 64×64 sprites
011 = 16×16 and 32×32 sprites
100 = 16×16 and 64×64 sprites
101 = 32×32 and 64×64 sprites


lda #2
sta $2101 ; sprite tiles at VRAM $4000, sizes are 8×8 and 16×16

And we need to make sure sprites show up on the main screen.

lda #$10
sta $212c ; main screen



From here on out, I am going to use BUFFERS. Buffers are temporary locations in local RAM that will be copied (DMA) each frame to the actual memory (the OAM RAM)… during the v-blank period. Well, next time we will do that. In this example, we are doing it once during forced blank (2100 bit 7 set), which is also fine.

We are using a block move macro to copy from the ROM to the BUFFER.


to set up a MVN operation (to copy a block of data from the ROM to the RAM). See macros.asm for details.

And I’m writing just one byte to the high table. We only need 3 sprites in this example, so we will only need 2×3=6 bits, setting the size of each to large (16×16).

lda #$2A ;= 00101010

Now I will DMA both tables at once. A DMA to the OAM looks like this…

; DMA from OAM_BUFFER to the OAM RAM
ldx #$0000
stx oam_addr_L ;$2102 (and 2103)

stz $4300 ; transfer mode 0 = 1 register write once
lda #4 ;$2104 oam data
sta $4301 ; destination, oam data
ldx #.loword(OAM_BUFFER)
stx $4302 ; source
sta $4304 ; bank
ldx #544
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

That’s 544 bytes being copied to the $2104 (OAM DATA register) after we zeroed the OAM address registers ($2102-3). We need to write the whole thing. I recommend always writing to the OAM with a 544 byte DMA. Even if you don’t want to set any of the high table bits, write both tables every time. The first demo I made didn’t work on a real SNES because I failed to write to the high table. DMA all 544 bytes to the OAM, and you won’t have problems.

The data we are transferring looks like this…

;4 bytes per sprite = x, y, tile #, attribute
.byte $80, $80, $00, SPR_PRIOR_2
.byte $80, $90, $20, SPR_PRIOR_2
.byte $7c, $90, $22, SPR_PRIOR_2

With the top left sprite at x = $80 and y = $80. We are using tiles 00,20,22, and all of the sprites use palette #0 and priority #2 (above BG layers).

And this is what it looks like.


Try drawing your own sprite, and getting it to show up on screen.




Layers / Priority

SNES programming tutorial. Example 4.


Last time we created a background (tiles and map) and got it to show up on screen. This time we are going to add more layers.

In Mode 1, we get 3 background layers. Layer 1 and 2 are 4bpp (16 color) and Layer 3 is 2bpp (4 color). If you make the 2bpp tiles in YY-CHR, select the 2bpp GB setting. I made the graphics in GIMP and converted to 4 color indexed mode, saved to PNG (with no compression), and converted to SNES tiles with superfamiconv (with the BPP set to 2).

See the -B 2 setting in the batch file…



Now I loaded all the maps and tiles to M1TE and quickly drew a line across layer 2, just to test it. Here’s each layer…

bg1 layer 1

bg2 layer 2

bg3 layer 3

Now let’s talk about how the layers work. Normally, layer 1 is on top, then layer 2 is next, and layer 3 on the bottom. Like this.


But each tile on the map has a PRIORITY setting. Normally, this is to determine if the BG tile will go behind a sprite on the same layer, or in front of it. In mode 1, the layers go like this…

Sprites with priority 3
BG1 tiles with priority 1
BG2 tiles with priority 1
Sprites with priority 2
BG1 tiles with priority 0
BG2 tiles with priority 0
Sprites with priority 1
BG3 tiles with priority 1
Sprites with priority 0
BG3 tiles with priority 0

However, if bit 3 of $2105 is set, BG3 will be in front of everything (if the priority bit is set on the map). In M1TE, you can set all the priority bits for the whole map by checking a box.


I did that for BG3. The only difference between the picture above and below is the bit 3 of $2105 is set. (see these links for reference)




With$2105 d3 set and priority bits in BG3 map set, they appear on top. This can be very useful for text boxes that appear in front of everything, or a HUD / Scoreboard that you can always see. Because BG3 is only 2bpp, it won’t be very colorful, so it will be ideal for text messages.

The code for putting all this together is very similar to the previous page. The 2bpp tiles were loaded to $3000 in the VRAM with a DMA.

ldx #$3000
stx vram_addr ; set an address in the vram of $3000

lda #1
sta $4300 ; transfer mode, 2 registers 1 write
lda #$18 ; $2118
sta $4301 ; destination, vram data
ldx #.loword(Tiles2)
stx $4302 ; source
lda #^Tiles2
sta $4304 ; bank
ldx #(End_Tiles2-Tiles2)
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

and the maps were loaded similarly with DMAs. BG2 map to $6800 and BG3 map to $7000. The maps for those layers will then be loaded to those VRAM addresses.

lda #$03
sta bg34_tiles ; $210c put BG3 TILES at VRAM address $3000

lda #$60 ; bg1 map at VRAM address $6000
sta tilemap1 ; $2107

lda #$68 ; bg2 map at VRAM address $6800
sta tilemap2 ; $2108

lda #$70 ; bg3 map at VRAM address $7000
sta tilemap3 ; $2109

We need to make sure that all 3 layers are active on the main screen.

lda #BG_ALL_ON ;$0f
sta main_screen ; $212c

and that will give us this picture (same as above)


with BG3 behind everything.

When we flip bit 3… 00001000 at $2105, BG3 will show up on top (if their priority bits are set on the map). Note BG3_TOP is defined as 8.

lda #1|BG3_TOP ; mode 1, tilesize 8×8 all, layer 3 on top
sta bg_size_mode ; $2105

Like this.


Each of these layers scroll independently of each other. You would adjust them with these registers. They are write twice (low then high).

$210d – BG1 Horizontal

$210e – BG1 Vertical

$210f – BG2 Horizontal

$2110 – BG2 Vertical

$2111 – BG3 Horizontal

$2112 – BG3 Vertical

(Note: not used in this example)


Maps, In Depth

All of our examples are with maps set to 32×32 tiles. (the screen is set to 224 pixels high, so you can’t see all the tiles at once). Each address in the map uses 2 bytes, since the VRAM is set up for 16 bits per address. It can be very confusing to look at in a Hex Editor (VRAM memory viewer) that show bytes, you will have to multiply x2 the VRAM address to find it in the hex editor. VRAM address $6000 is going to be found at $C000 in the emulator’s viewer.

Each map uses $800 bytes, but only $400 addresses (32×32 = 1024 = $400). They would be arranged like…

0,1,2,3,4… 31 =  1st row / top of screen
32,33,34,35… 63 = 2nd row
64,65,66,67… 95 = 3rd row
etc. on down to 32nd row, below the bottom of the screen

So, if you go down 1 on the map, you add 32 to the address.

But what if we made the map 64 wide (for BG1 $2107, bit 0 = 1). Think of it as 2 screens left and right. The first screen works exactly the same as if in 32×32 mode, with the 33rd map location being just below the 1st one. If the left screen is at $6000, the right screen would be at $6400. If you scrolled past the right screen, it would wrap back to the left one.

Or, we could have made the map 32 wide and 64 tall (for BG1 $2107, bit 1 = 1). That would be like 2 screens on top of each other. If the top screen is at $6000, the bottom screen would be at $6400. Scrolling down below the bottom screen would wrap back to the top.

Lastly, if we made the map 64×64 (for BG1, both bits 0 and 1 set). If the first screen is at $6000, the screens would be arranged like

$6000 – $6400

$6800 – $6c00

If tiles were set to 8×8 size ($2105, called “character size”), a 64×64 map would be 512×512 pixels in size.

If tiles were set to 16×16, the same map would be 1024×1024 pixels. This should explain why we need 16 bit scrolling registers.


Tiles in the Map

So, I said that each entry in the map is 16 bits. Those bits are arranged like this…

vhopppcc cccccccc
v/h = Vertical/Horizontal flip this tile.
o = Tile priority.
ppp = Tile palette.
cc cccccccc = Tile number.

Each tileset is theoretically as big as 1024 tiles (for BG).

And, one more thing about palettes.

Palettes in Mode 1


4bpp tiles use an entire row (left to right). If you set it’s palette to 0, it uses the top row (indexes 0-15). Palette 1, the next row (indexes 15-31), and so forth down to the 8th row (palette 7). That’s indexes 0 – 127 for background. Sprites would use the indexes 128 – 255 similarly. Sprites also use 4bpp tiles and 16 colors per tile.

2bpp tiles (BG3) shares the top 2 rows. Each palette only uses 4 colors, so palette 0 uses indexes 0-3, palette 1 uses index 4-7, palette 2 uses index 8-11, and palette 3 uses index 12-15… all in the top row. Palettes 4-7 similarly would use the next row. Every 0th color in each palette would be transparent. I usually reserve the top row for BG3 and the other 7 rows for BG1 and BG2.

Behind all the layers, the universal background color shows (index 0 of the palette), wherever there are transparent pixels. This is true for every layer. The black that fills most of these pictures is the background color showing through.




SNES programming tutorial. Example 3.


So, this is the big lesson. There are about a dozen things we need to do just to see a picture on the screen. We need to set a video mode. We need to enable a layer on the main screen. We need to set the address for tiles. We need to set the address for a tilemap. We need to make tiles and a tilemap (and a palette). We need to copy all the things to the VRAM. And we need to turn screen brightness on (and end forced blank).

I picked a random picture of a moon. Open in GIMP, resize to 128×128. Resize canvas to 256 pixel wide. Change picture mode to index, 16 colors, saved (export) as PNG with no compression.

(a little later, the image looked stretched out when I got it running on my actual SNES, due to the aspect ratio of SNES pixels being about 8/7. And also the order of the palette colors seemed oddly out of order, so I started over…)

OK. GIMP. Open file, resize to 112×128 to fix the sideways stretching. Resize canvas to 256 pixel wide. Then I made a 16 color grayscale palette like #000000 #101010 #202020 etc. Then I changed color mode to indexed using the custom palette (I shouldn’t have left the “remove unused colors” checked… oh, well). This fixed the out of order palette issue. That isn’t a big deal now, but if I had to process multiple images using the same SNES palette, you should have a standard GIMP palette for all the images to use. YY-CHR can rearrange color indexes, but it becomes a real pain for 16 colors over multiple images.

Then export as PNG with no compression.


Now we need superfamiconv.


This is a great tool to convert PNG to SNES palette, tiles, and map. The image needs to be 256 wide to correctly make a map. The PNG needs to be 16 color indexed and no compression. Superfamiconv is a command line tool, but I wrote a batch file (.bat) to automate it. (Just double click on the batch file)


and it produced moon.pal, moon.chr, and moon.map.

The .chr file is in 4bpp SNES format.

Now I can load all 3 files in M1TE (my mode 1 tile editor tool).


When I loaded the .map file, I clicked a few tiles down from the top and chose “load map to selected Y”. I changed the map height to 28, and resaved the full map as moon2.map.

All these files (moon.pal, moon.chr, and moon2.map) were added to the project by adding .incbin lines at the bottom of main.asm.

Now the code, which I will go over, line by line… but first I want to figure out where we are putting things in the VRAM. This is what I have been using, and it seems to work for my current needs. This arrangement is optional. You can rearrange the VRAM any way you like.


$0000 4bpp BG tiles (768 of them)
$3000 2bpp BG tiles (512 of them)
$4000 4bpp sprite tiles (512 of them)
$6000 layer 1 map (up to 2 screens)
$6800 layer 2 map (up to 2 screens)
$7000 layer 3 map (up to 2 screens)
$7800 layer 4 map (up to 2 screens)

So we need to put the 4bpp tiles at $0000 and the layer 1 map at $6000.


; DMA from Palette_Copy to CGRAM
; see previous tutorial page for explanation
; with A8 and XY16

; DMA from Tiles to VRAM 
lda #V_INC_1 ; the value $80
sta vram_inc ; $2115 register, set the increment mode +1
; each write will go +1 the previous write address
ldx #$0000
stx vram_addr ; set an address in the vram of $0000

; now we set up the DMA
lda #1
sta $4300 ; transfer mode, 2 registers 1 write
; $2118 and $2119 are a pair Low/High
lda #$18 ; $2118
sta $4301 ; destination, vram data
ldx #.loword(Tiles)
stx $4302 ; source
lda #^Tiles
sta $4304 ; bank
ldx #(End_Tiles-Tiles)
; let the assembler calculate the size of transfer
; using 2 labels before and after our tiles.
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

; DMA from Tilemap to VRAM 
ldx #$6000
stx vram_addr ; set an address in the vram of $6000

lda #1
sta $4300 ; transfer mode, 2 registers 1 write
; $2118 and $2119 are a pair Low/High
lda #$18 ; $2118
sta $4301 ; destination, vram data
ldx #.loword(Tilemap)
stx $4302 ; source
lda #^Tilemap
sta $4304 ; bank
ldx #$700
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0 

; a is still 8 bit.
lda #1 ; mode 1, tilesize 8x8 all
sta bg_size_mode ; $2105

stz bg12_tiles ; $210b tiles for BG 1+2 at VRAM address $0000

lda #$60 ; bg1 map at VRAM address $6000
sta tilemap1 ; $2107

lda #BG1_ON ; $01 = only bg 1 is active
sta main_screen ; $212c

lda #FULL_BRIGHT ; $0f = turn the screen on (end forced blank)
sta fb_bright ; $2100

Let’s go over these last lines. $2105 is for video mode. We want mode 1, and 8×8 tiles. Our tiles will need to be 4bpp for layers 1 and 2. (we are only using layer 1). The bits of $2105 are 4321Emmm, with mmm for BG Mode. E will affect priority of BG3. 4321 are zero so all the layers have 8×8 tiles. If any of these are set, the corresponding layer will have 16×16 tiles.

$210b tells the PPU where our tiles are (for layer 1 and 2). Low nibble for layer 1 and high nibble for layer 2. Our tiles are at $0000 so we are just storing zero. But if we wanted to, we could put our tiles at $1000 for layer 1 by storing #1 to 210b. They are steps of $1000.

This is the perfect opportunity to point out that for all these “VRAM address X is for Y” registers the upper bit is always zero. There are only $8000 VRAM addresses, and the registers always look like they go up to $FFFF, but they don’t. My guess is that the original engineers were told that there would be 128 kB of VRAM, but some bean counter said “128 kB is too expensive, we only need 64 kB”.

So, bbbb aaaa is really -bbb -aaa (a = layer 1, b = layer2).

So, for 210b, only values 0-7 make sense for each nibble.

Maps have 6 bits for VRAM address. They are steps of $400, but the low 2 bits of the $2107 register are for map size… aaaaaayx where a is VRAM address and yx is map size (is really -aaaaayx with upper bit always 0 since VRAM addresses don’t go above $8000). It looks like you are just multiplying by $100. The value $60 is for VRAM address $6000, where our tile map for layer 1 will go.

Next time I will discuss the yx bits. You can reference this page for more information.


And we need to turn on layer 1 on the main screen $212c. (It would look really weird if we turned on ALL layers right now. All the other maps are still set to $0000 where our tiles are).

My main focus was setting up a tool chain for putting an image on screen. Here’s what it looks like. Try to repeat the process with a different picture. Maybe something more colorful?



What’s that RLE folder?

Another option for M1TE files (map and tiles) is to save as .rle (run length encoding). It’s slightly more advanced than a simple rle. It is designed for map compression. It works especially well for this particular tool chain (superfamiconv) because that tool puts tiles in order 1,2,3,4,etc. on the map and this .rle format can compress that. The full screen map is 1792 bytes and the compressed version is 76 bytes. It didn’t do so well for compressing tiles (5632 bytes became 4763 bytes). Other tilesets compress 50% or so.

The code to decompress is unrle.asm provided in the same main folder. It is set to decompress to $7f0000 and has a original maximum file size of $8000 (32k) bytes.

See mainB.asm. You pass a pointer to the compressed file to unrle the subroutine. With AXY16, A = source address, X = source bank. JSR unrle, it decompresses to $7f0000 (sometimes does a second rearrangement of bytes, so it may not be exactly 7f0000). And it returns AX with the address of the unpacked data, and Y the number of bytes.) We pass those to the DMA registers and send our data to the VRAM.

You may choose to leave everything uncompressed and skip the RLE stuff, and only use it if you run out of ROM space. You decide.


DMA palette

SNES programming tutorial. Example 2.



What we want to do is fill the palette. We could set up a loop that writes to the CGDATA register 512 times (register $2122). But there is a much faster way to do this called DMA (direct memory access).



Ignore the HDMA stuff for now (which use the same registers). The main use for DMA is to write to $2104, $2118, and $2122 (OAM data, VRAM data, and CG data). DMA is just a hardware copy loop for transferring data from the CPU bus (ROM and RAM) to the PPU bus (VRAM, palette, and OAM). You should use a DMA when you are transferring more than a dozen bytes to any of these RAMs.

What it can’t do… It can’t copy from CPU bus to the CPU bus (such as WRAM to WRAM, or ROM to WRAM). You need to use a MVN or MVP block move operation to do that.

The example code is for palette DMA. DMA needs to happen during *forced blank or during **v-blank. First you set up the transfer. There are 8 channels, but let’s focus on channel 0. All of these are 8 bit values.

* forced blank is when the PPU is off, register $2100 upper bit set.

** v-blank or vertical blank, happens once per frame when the the PPU is on. First the PPU will draw the entire screen, line by line, then it will pause slightly before it jumps back to the top. This pause is the vertical blank period, and the PPU is idle, so you can send new data to the VRAM and OAM and CGRAM during this time.


DMA Registers

$4300 to set up the transfer mode.

$4301 is the destination register $21xx. So, $04 = $2104. $18 = $2118. Etc.

$4302 is the source address, low byte

$4303 is the source address, high byte

$4304 is the source address, bank byte

$4305 is the number of bytes, low byte

$4306 is the number of bytes, high byte

Then, for channel 0, you write #1 to $420b to start the transfer. This locks up the CPU until the transfer is complete.

If you were using channel 1, the registers would be 4310,4311,4312,etc and you would write #2 to 420b. If you were using channel 2, the registers would be 4320,4321,4322,etc and you would write #4 to 420b. And so forth.

We are writing to the palette, we need to first zero the palette address, then send to $2122, the CG data register.

; DMA from BG_Palette to CGRAM
stz pal_addr ; $2121 cg address = zero

stz $4300 ; transfer mode 0 = 1 register write once
lda #$22 ; $2122
sta $4301 ; destination, pal data
ldx #.loword(BG_Palette)
stx $4302 ; source
lda #^BG_Palette
sta $4304 ; bank
ldx #256 ; BG_Palette only has 128 colors
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

Note, I only transferred 256 bytes (128 colors). Let’s look at why.

I just used the default palette from the M1TE editor. This was designed for editing BG tiles, so the palette only has 128 entries. I thought having a ROM that was just a black screen would be dull, so I changed color #0 (the top left = the background color) to blue. Palette / Save.


You can include a binary file using the .incbin directive (see bottom of main.asm). There is a label here BG_Palette: which we can reference in our code. I put it in an entirely different bank (RODATA1 segment is bank $81), just to show that it can be done easily.

Then I ran the DMA code twice to copy the same 256 bytes to both the BG palette (first 128 colors) and the Sprite palette (last 128 colors). It would be nice if you could just write #1 to $420b again, but the transfer changes some of the DMA registers (transfer length counted down to zero and the source address will be adjusted upward). So I had to rewrite the those and THEN write #1 to $420b.

When I run the ROM in MESEN-S, I can open the Palette Viewer tool, and see that our palette has been copied twice, as expected.


Well… we haven’t copied any actual tiles yet, so we can’t show anything but a plain screen. Try changing the color in M1TE to some other color, and reassembling and see if you can get it to work. This is what it looked like for me.


For reference, I will post the example code for DMA to VRAM and DMA to OAM. See also…


;DMA from Tiles to VRAM 
lda #$80
sta $2115 ; set the increment mode +1
ldx #$0000
stx $2116 ; set an address in the vram of $0000
; (and $2117)
lda #1
sta $4300 ; transfer mode, 2 registers 1 write
; $2118 and $2119 are a pair Low/High
lda #$18 ; $2118 vram data
sta $4301 ; destination
ldx #.loword(Tiles)
stx $4302 ; source
lda #^Tiles
sta $4304 ; bank
ldx #$2000
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0


ldx #$0000
stx $2102 ; oam address (and $2103)

stz $4300 ; transfer mode 0 = 1 register write once
lda #4 ;$2104 oam data
sta $4301 ; destination, oam data
ldx #.loword(OAM_BUFFER)
stx $4302 ; source
sta $4304 ; bank
ldx #544
stx $4305 ; length
lda #1
sta $420b ; start dma, channel 0

I will be covering these a little later.

On a side note. HMDA is a different thing altogether (but uses the same registers). It is for changing register values midscreen and it can change lots of different registers, such as mode 7 parameters, scroll registers, colors, mosaic, windowing, etc. You should be aware that there is a bug which happens if HDMA and DMA happen at the same time. It can crash the game (on the early revision SNES model). If you are using both, you might want to write #0 to $420c (the HDMA enable register) before performing a DMA, to disable HDMA.


SNES Example 1

Finally, some real programming!


Let’s do the simplest possible thing, to make sure that ca65 is working and we can get something to assemble correctly. We are going to turn the screen red.



.include "defines.asm"
.include "macros.asm"
.include "init.asm"

.segment "CODE"

  ;enters here in forced blank
.a16 ;just a reminder of the setting from init code

  stz pal_addr ;set color address to 0
  lda #$1f     ;palette low byte gggrrrrr
  sta pal_data ;1f = all the red bits
  lda #$00     ;palette high byte -bbbbbgg
  sta pal_data ;store zero for high byte

  ;turn the screen on (end forced blank)
  lda #$0f
  sta $2100

  jmp InfiniteLoop

.include "header.asm"


Notice that every asm file is included into the main file. That is to keep our compile.bat as simple and error proof as possible. I have put the SNES_01 folder inside my cc65 folder, and so the path to the bin folder is ..\bin\


Let’s go over every line.

.p816 – puts the assembler in 65816 mode

.smart – tell the assembler to automatically adjust register size depending on REP / REP changes (handled through macros like A8, AXY16, etc)

.include – all of our constants, macros, init code, and header file.

.segment “CODE” – where will our main code end up, in the CODE segment

main: – our label, where the init code jumps to at startup

.a16 / .i16 – this is what the register size was when it left init. These are assembler directives to set A and XY assembly to 16 bit size.

phk / plb – not that important here, but sets the Data Bank Register to the same as the Program Bank (which is currently $80, a mirror of $00, and necessary for fastROM).

A8 – a macro to put the A register in 8 bit mode, I know it’s confusing since I just told the assembler to do A16 a second ago. If you like, you can delete the .a16 line since we change it right away, but I like to leave directives just to remind myself what the settings were just before we got here.

stz pal_addr ;$2121 – set the palette address register to zero

lda #$1f – load A register with the value $1f – the low byte of the color $001f = red.

sta pal_data ;$2122 – send the low byte to the palette (CGRAM)

lda #$00 – load A register with the value $00 – the upper byte of the color.

sta pal_data ;$2122 – send the high byte to the palette

lda #$0f – full brightness, no forced blank
sta fb_bright ;$2100 – effectively turns the screen ON

$2100 hardware register bits… x- – -bbbb
x = forced blank if set (turns off screen rendering)… $80
b = screen brightness, from 0 = black to $0f = full brightness

jmp InfiniteLoop – will jump repeatedly to this line

Now, we did NOT activate anything on the “main screen”, such as layers 1,2,3, or 4, or sprites (objects). With nothing on main, only the background color will show, which is the 0th palette entry.


Each color is 15 bits, BGR (the upper bit should be 0). Try changing the color and re-assembling. If you run it in MESEN-S, remember to reload from file/open and not click on the picture of the file (which only reloads the savestate, rather than reloading from file).

-bbbbbgg gggrrrrr
black = $0000
red = $001f
green = $03e0
blue = $7c00
white = $7fff

Here’s what it looks like. It’s not much, but we need to start somewhere.


And, if you look at the palette viewer, you can see our color at index 0.



Before we go, I just want to briefly mention the other files here.

Init.asm is the RESET code (and a few other bits). It zeroes all the registers and RAM and gets us to square one, before jumping long to main. Don’t feel like you need to understand every line in this file. Focus on the main code for now.

Macros.asm is a few macro definitions, like A8, AXY16, etc.

Defines.asm is a bunch of constants. I really hate the official names of each register, and I made my own list of names that I can read more easily. If you want to see the official list (and read about the hardware registers) go here…



Now, I had written a library of code (EasySNES), but I did not use most of it here. I was discussing the matter with some friends, and I feel that SNESdev could be taught better with simpler examples, and the library will just obfuscate what is really going on.


(SNESdev SNES SFC Super Nintendo Super Famicom programming tutorial)


My apologies, since this first tutorial is essentially the same as this one…


I didn’t copy it. We just both coincidentally arrived at the same first step. Oh well. The next steps will be different.