SNES programming tutorial. Example 6.
There is a set of registers that can be read like NES registers. Originally, they wanted to make it easy to transition from programming NES games to programming SNES games. They even used the same number $4016 and $4017 (ports 1 and 2). However, you shouldn’t read these. Instead you should turn on the auto-read feature… and also the NMI enable from register $4200.
With auto-controller reads set, the CPU will interrupt itself soon after the end of each frame and read all the buttons from both controllers and then store the values at $4218-$421b.
$4218-19 port 1
$421a-1b port 2
(if a multitap for 4 player games installed, 421c-d and 421e-f for controllers 3+4)
The button order is…
KEY_B = $8000
KEY_Y = $4000
KEY_SELECT = $2000
KEY_START = $1000
KEY_UP = $0800
KEY_DOWN = $0400
KEY_LEFT = $0200
KEY_RIGHT = $0100
KEY_A = $0080
KEY_X = $0040
KEY_L = $0020
KEY_R = $0010
And I use these constants as a bit mask (AND operation) to isolate the buttons.
The pad_poll function also does some bit twiddling to figure out which buttons have just been pressed this frame.
pad1 and pad2 are any button that is down, even if you been holding it down.
pad1_new and pad2_new are buttons that have just been newly pressed this frame.
We need call pad_poll each frame. How do we know that a new frame has started. That’s where the NMI comes in.
When the screen is on, the PPU spends most of it’s time drawing pixels to the screen, one line at a time. Starting at the top, it goes left to right and draw a line. Then it jumps to the left and draws the next line.
It does this so fast you can’t see it. But, since the PPU is busy, you can’t send new data to the VRAM. You can’t send new data to many of the PPU registers, such as the OAM. But when the screen is done drawing, the PPU rest in a vertical blank period for a little bit. During this v-blank period, you CAN access the PPU registers.
If you turn on NMI interrupts, when the PPU is done drawing to the screen… nearly at the very beginning of v-blank, the PPU sends an NMI signal. This happens every frame, which is 60 times a second (50 in Europe). That signal causes the CPU to pause and jump to the NMI vector (an address stored at $00ffea in the ROM). We have it set to jump to NMI: which is located in the init.asm file. (note, the NMI code needs to be in the 00 bank, or it’s mirror, the $80 bank).
The NMI code is just this.
bit $4210 ; it is required to read this register
; bit does a read, without changing the A register
(many game have much more elaborate NMI code, btw).
And our code is waiting for the in_nmi variable to change. When it changes we know that we are in the v-blank period. Now is a good time to write to PPU registers or send data to the VRAM. But, also, we are using this to time our game loop.
wait_nmi: waits until we are in v-blank. We call this at the top of the game loop. Notice that I put a WAI (wait for interrupt) instruction here. If you neglected to turn NMI interrupts on, this would crash the game, as it waits forever for a signal that never comes. IRQ interrupts could also trip the WAI instruction, which is why I also wait for the in_nmi variable to change to be sure. You could delete the WAI instruction, if you would like*. Some games use this waiting loop to spin a random number generator. You could do that as well…. like adding a large prime number over and over, or just ticking a variable +1 over and over.
* someone told me that WAI could make an emulator run less laggy, as it would have less to do each frame. It also saves electricity, because the CPU uses less while it waits. You decide if you need it or not.
Soon after the wait_nmi function runs, we run our DMA to the OAM (copy our sprite buffer to the sprite RAM). This needs to be done during v-blank, which is why we do it first. Then, we run our pad_poll to read new button presses. Then we enter the game logic. Here’s an example of what we are doing to move the sprite.
Our sprite is composed of 3 sprites that move together (16×16 each). Each time we press the right button, we need to increase the X value of each sprite. Left, we decrease the X values. Each sprite uses 4 bytes, so each sprite X value is 4 bytes apart. So we do this…
AXY16 lda pad1 and #KEY_LEFT beq @not_left @left: A8 dec OAM_BUFFER ;decrease the X values dec OAM_BUFFER+4 dec OAM_BUFFER+8 A16 @not_left:
LDA loads the A register with pad1, which has all the button presses for controller 1. We apply a bit mask (AND) to isolate the left button. If it is zero, the button isn’t being pressed, and it will branch (BEQ) over our code. Otherwise, it will then to the dec OAM_BUFFER lines. Dec can be 8 bit or 16 bit, depending on the size of the A register. We want 8 bit, so we A8 for that. We need the A16, to make sure we exit this bit of code with A always in 16 bit mode.
We repeat that process 3 more times for RIGHT, UP, and DOWN buttons.
And these values are copied to the OAM each frame, which moves our sprites on screen. Let’s look closely what happens when you scroll off screen. Y values 224 to 256 are off screen to the bottom, but they would eventually wrap around to the top.
But moving the sprite to the left, from x=00 to x=$ff (255)
The sprite suddenly disappears. Which is weird.
That’s why we need that 9th x bit in the high table. Here’s what it looks like at the far right without the high table x bit set.
Here’s with the high table x bit set.
So we need it for smoothly moving off the left side of the screen.
We didn’t do that in this example, but I worked up some code that can manage this, for next time.
Final note. The code here is not particularly good. Keeping the sprite variables in the OAM_BUFFER itself is not very good practice. I have seen other tutorials do this, and I did it so it would be easier to understand, but I don’t like it. It would be better to keep every sprite object as a set of variables (x and y maybe as 16-bit variables), that get copied to the OAM_BUFFER in a dynamic way, so that enemy objects can be created and destroyed without causing holes/gaps in the OAM_BUFFER. With that approach, you would clear the buffer at the start of each frame, and then draw the necessary sprites as needed every frame.
That might be slower code, but much more flexible.