13. Scrolling up

This was much harder than scrolling right. If I had chosen to scroll downward, it might have been easier, because scrolling up is like going backwards in the code.

Skipping over the y values of 0xf0-ff wasn’t that bad as long as you use my functions add_scroll_y() and sub_scroll_y() that handles this for you.  The code assumes that Y position is between 0 and $ef… so you need to skip over the values 0xf0-0xff. These 2 functions can do that for you.

int add_scroll_y(unsigned char add, unsigned int scroll);

This adds an 8 bit value with the 16 bit scroll value, and returns a new scroll value.


int sub_scroll_y(unsigned char add, unsigned int scroll);

This subtracts an 8 bit value from the 16 bit scroll value, and returns a new scroll value.


But, I also had to use them with the BG collision code, since you can be half in one nametable and half in another.


Since scrolling up is going backwards, I’m starting at the maximum scroll position, and going backwards to 0.

I had to build the array of room arrays backwards too.

const unsigned char * const Rooms[]= {
Room5, Room4, Room3, Room2, Room1

Anyway, I made 5 more rooms, in Tiled, and exported 5 more csv files, converted to C arrays, and imported those into the code.


And after a few days of debugging (yikes) it finally works. I have to give some due respect to the guys who made games like Kid Icarus, or Metroid, or Ice Climber, because scrolling upward without bugs was not easy.

Note: we are using the horizontal mirroring (vertical scrolling) cfg file.




12. Scrolling right

Ok, THIS is where having an automatic vram buffer system that does updates during NMI really starts to become useful.

There are so many ppu updates in scrolling continuously, it helps to have an auto update system. You want to update on the opposite nametable that you are on so the user can’t see it change. And, only a little at a time (not too many to overload the system). Let’s look at it frame by frame as I’m scrolling to the right. I’m doing buffer_4_mt() twice per frame, which draws 8 blocks per frame.





So, I don’t want to do too many PPU updates in 1 frame, that’s why I break it up into 4 frames of updates.


Level design in Tiled.


So, I made 5 rooms worth of levels (easily expandable). Once you scroll right, it automatically updates just off the screen to the right, where you can’t see. It’s reading data from the collision map, like before, but I have to reload new maps as we move.

There are now 2 collision maps of 16×15 metatiles (240 bytes each). Every time you cross into another room, it loads the next collision map just off the screen to the right. We could have read directly from the ROM for collisions, but the advantage of having it in RAM means it is destructable / modifiable. Like when Mario breaks a block, or when Link (Zelda) pushes a block to the side.

I’m still making the rooms 1 room at a time (see the previous metatile page). The Tiled .csv files are processed with a py script into a c array. I import those to the code, and added an array of pointers to these arrays.

Note, if you expand this, you need to also change the max scroll constant.

I decided that 1 pixel per frame seemed a bit slow, so I modified all the movement and collision code to handle variable speeds bigger than 1 pixel, and also collision ejection bigger than 1 pixel. It seems to work ok now, but I didn’t test speeds over 2 pixels per frame.

And it also works across the room edges, which is the tricky part, but that complicates the math. When it checks bg collision points, each of those points might be in a different room, so I accounted for that.





11. Metatiles

What’s a metatile? Just a fancy word for a block of tiles. 2×2 in my case. Why use them? It’s a form of compression. And it simplifies the BG. And the BG palette table (attribute table) can’t modify each tile’s palette selection, only a 2×2 block of tiles, at minimum.

Instead of thinking of a room as 32×30 tiles (960 bytes), the code will treat it like 16×15 metatiles (an array of 240 bytes). We only need 4 bits from X and 4 bits from Y to calculate the offset into a collision array.

index = (y & 0xf0) + (x >> 4);

But, I take it one step further. The collision map can double as an array to draw the BG. I use a system, which allows me to use Tiled Map Editor to draw my levels. Rooms of size 16×15.

But we have to jump through some hoops to make this happen. We can’t draw our graphics in Tiled. The game graphics need to be 2bpp binary CHR files. We could make our graphics in a tile editor. I still recommend YY CHR to start. Then, import the graphics to NES Screen Tool. Then make some 16×16 blocks (2 tiles x 2 tiles) this is a good size because this is the size of the attribute table squares. And, importantly, choosing a palette for each block. If have 2 identical blocks of different color, I make them into 2 different metatiles (see the last 2 blocks… identical tiles, but different palette).


Start at the top left and make 16×16 blocks, which are our metatile blocks. My system can only handle 51 metatiles, because it assumes a data set under 256 bytes, and each metatile takes 5 bytes for definition. 51×5 = 255. The latest versions of NES Screen Tool (and NEXXT) can export the nametable as a BMP. This can be imported into Tiled, as a tileset, with the tile dimensions set to 16×16.

(I had to make the tile size in Tiled 32×32 because I actually made a screenshot of NES Screen Tool, and it doubles the pixel dimensions, and saved it as metatiles.png in the BG folder).

We’re not done with NES Screen Tool yet. Save as a nametable with attributes, uncompressed binary as a .nam file. Use my meta.py (python 3 script) to convert it into a c array. Here’s what it output…

const unsigned char metatiles[]={
2, 2, 2, 2, 3,
4, 4, 4, 4, 1,
9, 9, 9, 9, 2,
5, 6, 8, 7, 1,
5, 6, 8, 7, 0

4 tiles + 1 palette = 5 bytes per metatile. I copy this array into the code.

Now, to Tiled, and make my level data. It’s a nice GUI for designing video game levels. We imported the metatiles.png as the tileset. And drew our level.


Make a level. Save it. Then, export to csv file. you could easily turn that into a c array, but I made a .py file to automate that step, csv2c.py, and turn our csv file to a C array, “Room1.c”. Import that into the game.


How does the metatile system work?

Ok. we have our metatile data. we have our level data. How does the metatile system work?

So my metatile system is an extension of the vram buffer system. It can run while the screen is on, but in this example it writes while the screen is off. Instead of waiting for NMI to push our data to the ppu, we’re going to speed up everything by immediately pushing the vram buffer to ppu with flush_vram_update2(). I guess flush usually means empty the buffer, but for this code it means “send it right away”.

I already set up a pointer to the vram buffer… set_vram_buffer(). Also, we need to set a pointer to the metatile definitions set_mt_pointer(metatiles1), and set a pointer to the room data set_data_pointer(Room1). Now we can use the metatile functions.

I made 2 functions for pushing metatiles to the buffer.

buffer_1_mt(address, metatile); // push 1 metatile, no palette changes. (this doesn’t need a pointer to the room data). We could have used this if we read each byte off the room array, and sent that byte as the metatile #. But, you can also use this anytime to change 1 metatile on the screen. However, this function does not send the palette information. I didn’t make it smart enough to modify the attribute table.

buffer_4_mt(address, index); // push 4 metatiles (2×2 block) and set their palettes. Index is the index into the room data array. It actually finds 4 metatile blocks in the room, and pushes each to the vram buffer. This function also sets the attribute table, and you should use this one to load the level.

The room loader code does a loop, and goes to every 32 x 32 pixel block, and converts all the data to metatiles, and prints them to the screen.

for(y=0; ;y+=0x20){
	for(x=0; ;x+=0x20){
		address = get_ppu_addr(0, x, y);
		index = (y & 0xf0) + (x >> 4);
		buffer_4_mt(address, index); // ppu_address, index to the data
		if (x == 0xe0) break;
	if (y == 0xe0) break;

flush_vram_update2() immediately pushes to the ppu, so the screen needs to be off.

I updated the movement / BG collision code (Feb 2023). We are checking 2 points for X (either 2 on the right, if moving right… or 2 on the left, if moving left). We eject if collision X. Then we check 2 points for Y (either top or bottom) and eject if collision.

The hitbox is actually 13×13 instead of 15×15, so that he can slide into smaller spaces a little better. If you had a large metasprite, bigger than 16×16, you might have to check 3 or more points per side.

We are still colliding with the c_map ( = collision map). Every non zero metatile is solid. This could be improved, so that each bit represents some property of the tile. But for now, lets just say that != 0 means solid.

Here’s what the game looks like, and he’s colliding nicely.






Example 2

And then I made another one, because I wanted to show how buffer_1_mt() worked. It’s basically the same code as the last time… I did have to change the main BG color to green to get this to work how I wanted.

buffer_1_mt() is supposed to be for changing a metatile mid-game. But it doesn’t affect palettes. Changing just 1 metatiles worth of palettes would require changing just 2 bits of a byte of the attribute table, and doing that is quite annoyingly complicated.

buffer_1_mt() requires that you turn on the vram system, set_vram_buffer(), and it needs a pointer set for the metatile definitions set_mt_pointer(metatiles). You just need to give it an address, and tell it which metatile you want to print there.

This was the level data.


And I added just 1 more metatile using buffer_1_mt() at the top left.


The right parameter (1) means put the #1 metatile (brick), with it’s top left tile starting at 4th tile from the left and 4th tile from the top.

Notice its palette is wrong. buffer_1_mt() doesn’t change attribute bytes. You could fix this, if you knew which bits to send. I don’t want to get into that, but if you uncomment these lines in the code, it would color that block correctly.

address = get_at_addr(0, 32, 32); // tile 4,4 = pixels 32,32
one_vram_buffer(0x01,address); // top left = palette 1


I gave the little guy a stick, which pokes to the right if you press A or B. When the stick is out, it checks the collision map to see if there is a block there, and replaces it with blank blocks using the one_vram_buffer() function. (and also changes the collision map byte to zero, so you can walk through it).

void break_wall(void){
	temp_x = BoxGuy1.x >> 8;
	temp_x += 0x16;
	temp_y = BoxGuy1.y >> 8;
	temp_y += 5;
	coordinates = (temp_x>>4) + (temp_y & 0xf0);
	if(c_map[coordinates] == 1){ // if brick
		c_map[coordinates] = 0; // can walk through it
		address = get_ppu_addr(0, temp_x, temp_y);
		buffer_1_mt(address, 0); // put metatile #0 here = blank grass

So the system is a bit complicated, but as you can see, the code is pretty straightforward.




We could easily make a non-scrolling game, with just this. You would just need to make room data for each room, and call the room loader when you walk to the edge of the screen. Lot’s of games work like that.

But, I decided to make some scrolling examples, because ultimately, I want to make a platformer. Next time, we add scrolling.



More than 51 Metatiles

Someone asked me what to do if you need more than 51 kinds of metatiles, so, I made a version that can handle up to 256 metatiles. The nesdoug.s file is different, and does a 16 bit calculation to find the position of the metatile in the array. It’s not very fast, so this could be improved in the future.

(actually the limit is 240, for the way I’m making the metatiles in NES Screen Tool.)




10. Game loop

Let’s talk about a game loop. Here’s some sample code for a simplified breakout clone.

Games need an infinite loop. while (1) works.

The first item is ppu_wait_nmi(). This sits and waits till the start of a new frame. 60 per second (50 in Europe), the nmi will trigger. But, if the game logic takes too long, and we’re already past the start of nmi before we get to the ppu_wait_nmi(), then it waits an extra frame, and the game will seem to slow down. That hasn’t happened yet, because our code is so short, but it will later, in the platformer game, so keep the loop short and concise.

Then I read the controllers. Both regular pad_poll(0) and the new presses get_pad_new(0).

Then I draw the scoreboard each frame. I’m using one_vram_buffer() and the vram buffer system so we never have to turn the screen off and on to write to the PPU.

Then the game logic, move the paddle. Move the ball. Check if collision with blocks.

Move paddle…

if(pad1 & PAD_LEFT){
	Paddle.X -= 2;
	if(Paddle.X < PADDLE_MIN) Paddle.X = PADDLE_MIN;
if(pad1 & PAD_RIGHT){
	Paddle.X += 2;
	if(Paddle.X > PADDLE_MAX) Paddle.X = PADDLE_MAX;

Move ball, if active…

if(ball_direction == GOING_UP){
	Ball.Y -= 3;
	if(Ball.Y < MAX_UP){
		ball_direction = GOING_DOWN;
else { // going down
	Ball.Y += 3;
	if(Ball.Y > MAX_DOWN){
		ball_state = BALL_OFF;

Then draw the sprites, first by clearing the old, then redrawing all the active sprite objects.

If a block is hit with the ball, hit_block(), it deletes it from the collision map, c_map[temp1] = 0. Then write some blank tiles (zero) to the background at that same position. get_ppu_addr() will find the address of a specific x and y. We just need to push 2 zero tiles at that location. I use the vram buffer system (using one_vram_buffer() twice) to automatically send it to the PPU during the next v-blank.

Normally, I would have different “game_states”, like for title, game, pause, end. I am using different “ball_states” to handle the ball off screen wait “BALL_OFF”, the ball ready to go “BALL_STUCK” (stuck on the paddle), and ball moving “BALL_ACTIVE”.

I started out by making the background in NES Screen Tool. The breakable tiles are defined an array, c1.csv. I did not end up using Tiled for it, because it was easy to type. If this was modified, it would change the layout of the breakable tiles.

const unsigned char c1[]={


The gray bricks were drawn in NES Screen Tool, and exported as a compressed RLE file. vram_unrle(breaky_bg2) will decompress it to our nametable (remember, this function needs to be used when the screen is off).


I changed the attribute tables beforehand, in NES Screen Tool, and save the background. That’s what gives the tiles their color, they are different palettes. But they all use the same tiles. Here is the attribute checker view (press “A”).


And it updates the scoreboard every frame. Notice, I’m keeping each digit of the score as a separate variable (value assumed to be 0-9). score10 is tens digit and score01 is ones digit. Division and modulus are very slow operations on the NES, due to the lack of built in math in the 6502 processor, so keeping each digit separate speeds up the code.



Feel free to turn this into a full game. Sideways movement would complicate the logic a bit more. I wanted to keep the example code as simple as possible. Well, line 183 is a bit complicated (sorry about that).

temp1 = (temp_x>>4) + (((temp_y-0x30) << 1) & 0xf0);

but what it does is convert the x and y position of the ball to a position on the array of blocks to break. >>4 is like divide 16… which gets a value 0-15 from the x. The screen Y is misaligned to the array by 0x30. The tiles are actually 8 pixels high (instead of 16), so I had to modify my usual background collision code with a << 1 to get the y value to line up to the array.

09. Scrolling

Scrolling means moving the background around. It does not affect sprites.

The NES PPU has 1 scroll register, 2005. You write to it twice, first the X scroll, and then the Y scroll. This is another thing that needs to happen during v-blank, and is handled automatically by neslib. neslib has this function scroll(x,y), you pass it the shift amounts. Adding to X scroll, shifts the screen left. Adding to the Y scroll shifts the screen up.

But, I decided that I didn’t like the way it handled Y scrolling. Y scrolling is a bit odd anyway, since values 0-$ef are real positions, and $f0 – ff are treated as negative values, and not what you want. neslib subtracts $f0 if the Y value is > $ef, and assumes that you are going to manage the maximum at $1df.

So, long story short, I do things differently than everyone else using C. I made 2 functions called set_scroll_x(x) and set_scroll_y(y). You can pass the set_scroll_y any int value, and the high byte will tell you which nametable you are in. Even means top, odd means bottom. If you have 2 collision maps, you know even = use the first one, odd = use the second one. Simple. Well, not perfect.

Our code still has to skip over the $f0-ff region, because our screen is only 240 pixels high. Luckily, I wrote some functions to do this for us.

add_scroll_y(add, old y) to add to the y scroll.

sub_scroll_y(add, old y) to subtract from the y scroll.

Each returns a value, which will have to be passed to set_scroll_y(), to change the screen scroll.



scroll_y = 0xef, add 5. This returns 0x104

scroll_y = add_scroll_y(5, scroll_y);



scroll_y = 0x104, subtract 5. Returns 0xef

scroll_y = sub_scroll_y(5, scroll_y);

Again, skipping over the 0xf0 – 0xff invalid Y scroll values.


Horizontal scrolling (Vertical mirroring)

Remember from the intro page, I said that the NES only has enough VRAM for 2 nametables. If you set it to Vertical mirroring — the mirroring is set in the ines header in crt0.s, which is actually a linker symbol “NES_MIRRORING” found in the .cfg file. On a real cartridge they would have soldered one of the connections to permanently set it to H or V mirroring.

So with vertical mirroring the nametables are arranged like this.



With the lower 2 nametables being copies of the top 2.

This is good for sideways scrolling. If you scroll past the right screen, it will wrap back to the left. If you want a level that’s bigger than 2 screens wide, you have to change BG tiles as you go.

(The numbers on the screen are sprites. They don’t scroll with the background.)





Vertical scrolling (Horizontal mirroring)

is basically the same, except the right 2 nametables are copies of the left 2


This is good for vertical scrolling.





There is also 4 screen mode, which almost zero games used. It required a special cartridge with an extra RAM chip. Gauntlet and Rad Racer II, for example, use it.

This would be good for all direction scrolling. Most games just used the standard 2 screen layout, and had glitchy tiles at the edges. Old TVs tended to cut off the edges, so it usually wasn’t too noticeable.

There are special boards (mapper) that can change the mirroring layout. See Metroid, and it sometimes scrolls horizontally, and sometimes it scrolls vertically. Instead of being hardwired to one layout, it can alternate between them. But, that is a more advanced topic.

08. BG collisions

Well, bg is a little different than sprites. We can’t read the bytes in the PPU, not easily. So let’s have a map of all the solid blocks in the room. Having each block 16×16 simplifies everything, and you can stuff the entire room into a 240 byte array. X from 0 to 15, Y from 0 to 14. I copied the array to the RAM, in case we want to make the BG destructable. (Which I will demonstrate a little later). Here’s what the array might look like…

const unsigned char c2[]={

And the 1’s here match the blocks in the game.


To check collision, you just need to mask the low nibbles of X and Y (& 0xf0) and combine them YX… (X >> 4) + (Y & 0xf0). and check that byte in the array. 1 = brick, 0 = nothing.

For sprites, I have been checking 4 points, each corner, and setting L R U D variables if collision, and eject as needed. First I do an X move, check collision points, and eject X if hit. Then I do a Y move, check collision, and eject Y if hit. See bg_collision() below.

This code is simplified, because X and Y moves are fixed at 1 pixel per frame. A little later, I will modify this so we can test variable speeds. 1 pixel per frame is a bit slow, and might be dull gameplay.

I have been using Tiled to make level data, to use for the collision map. It’s simple to use, and it can export a csv file, which is easy to make into a C array. But, it can’t import NES style .chr file, so I had to make a picture of all the types of blocks. This is very easy, since we have only 2 types, blank and block.

First I make tiles in NES Screen Tool. Then I draw the 2×2 blocks. It looks kind of dumb here, because I only have 2 types, but when have a more detailed game with dozens of blocks, it will start to make more sense. There isn’t a way to export a picture of the nametable, so I just do a screen capture, and crop it in GIMP, save as metatile.png.

Now, import it as a tileset to Tiled. The dimensions are 32×32 per tile, because NES screen tool doubles the pixels. Now design the levels, and export CSV.


It’s a piece of cake to turn a CSV file into a C array, but I made a python 3 script anyway to automate it. CSV2C.py. Then I import the C arrays into my code, and have an array of pointers to each array.

#include “CSV/c1.c”
#include “CSV/c2.c”
#include “CSV/c3.c”
#include “CSV/c4.c”

const unsigned char * const All_Collision_Maps[] = {c1,c2,c3,c4};

Now, I wrote some code to print the array as a block of 2×2 tiles to the screen with a big loop and some vram_put() statements. vram_put() needs the screen to be off. Left to right ppu writes wrap around to the next line. So, you don’t have to change the address to do even the entire screen.

And I have it so that, if you press “Start” it loads a new collision map, and draws it to the screen.

When you (for example) press the right button, it adds 1 to the X position. It then checks 4 points of collision, and if the ones on the right are 1 in the collision map, it ejects (subtract 1 from the X position).

So, test it out. Bump into the walls. Collisions work. Press start and the background changes. Collisions still work, because it loaded a new collision map loaded to the RAM.

Note, I shifted the whole screen down 1 by scroll. Y scroll =  ff (-1). Sprites always show up 1 pixel low, so shifting the bg down 1 lines them up.



The loading code isn’t very good, because it can only draw 1 kind of tile block, and it never changes the attribute table. I’m going to cover a much better loading system (see page on metatiles), a bit later, but first I wanted to talk about scrolling.

07. Controllers

There are 2 controller ports on the NES. You can read them anytime, using ports 4016 and 4017. Behind the scenes, it is strobing the 4016 port off and on, and then reading the buttons, 1 button at a time, times 8 reads, and then shifting them into a variable.

Neslib, use this function.

pad1 = pad_poll(0) to read controller 1.

pad2 = pad_poll(1) to read controller 2.

pad_state(0) or pad_state(1) if you forgot the value, and want to get it again without re-reading the controllers.

pad_trigger() gets the newly pressed buttons. I don’t use it. If you did, the order would be pad_trigger() and then pad_state(), since trigger runs the pad_poll() function. You don’t want to poll the controllers more than once per frame. You could read the controller like this…

pad1_new = pad_trigger(0);

pad1 = pad_state(0);

I wrote a function get_pad_new(), and it returns the PAD_STATET variable, which is the same thing that pad_trigger() returned… the new button presses. You need to run pad_poll() first, and then get_pad_new(). Generally, you would want both values so you could test buttons held down (say for running left and right) and buttons newly pressed (for jumping or pausing the game). This is what I do…

pad1 = pad_poll(0);

pad1_new = get_pad_new(0);

We use pad1_new for checking the pause button. We don’t want it to continuously pause and unpause if you hold Start down. It only change modes if you let go of the Start button and press it once again.

pad1 is a char (8 bit), basically a bit field of 8 buttons. And we have to apply bit masks to get the individual button presses.

if(pad1_new & PAD_START){


Sprite vs. sprite collisions.

I have each sprite controlled by a different controller. When they collide, I’m changing the background color.

if (collision){

And, I wrote a funtion that can test any sprite objects to see if they are touching. But you have to pass 2 structs (or arrays) of 4 bytes each, where the byte order is (x, y, width, height). I made this function take in 2 void pointers, because I wanted to be able to use different types of structs in the future. At least, that was the plan.

Here’s the example in the code…

collision = check_collision(&BoxGuy1, &BoxGuy2);

I suppose we could have put this inside the if condition, if you like.

if(check_collision(&BoxGuy1, &BoxGuy2))

The ASM funtion is an optimized version of this code…

if((obj1_right >= obj2_left) &&

(obj2_right >= obj1_left) &&

(obj1_bottom >= obj2_top) &&

(obj2_bottom >= obj1_top)) return 1;

else return 0;

And we know it’s working, because the screen turns white when they touch. The code breaks a bit when one object is half off the edge of the screen. It’s working well enough for my needs.




06. Sprites

What’s a sprite? A sprite is a tile that can be moved freely all over the screen. Sprites are usually 8×8, but they can also be 8×16 (a little more complicated). I will be using 8×8 examples. Sprites are defined by the 256 bytes in the OAM part of the PPU. There are 64 sprites. That’s 4 bytes per sprite.

But 8×8 is so small. How do we make Mario so big? We combine multiple sprites to move together on the screen. This is called a metasprite. Look below. Small Mario is made up of 4 sprites, and large Mario is made up of 8 sprites.


Sprites on the NES have an annoying limitation. 8 sprites per horizontal line. That’s all you get. Any more than that, and then next sprite will disappear. The order of the sprites in the OAM determines which 8 will show and which disappear. First in the OAM (the 0th sprite) has top priority. It will show up in front of the others and will count first toward the 8 sprite limit.

You might have seen sprites flickering in NES games. To avoid disappearing sprites, it is common to rotate the order of the sprites in the OAM, so that the sprite that disappears alternates, creating flickering. That’s better than an invisible sprite, I suppose.

Another oddity, is that sprites are always shifted down 1 pixel. If you put a sprite at Y = 0, the top of the sprite won’t appear until the next line down. Look at Mario’s feet, and you see that he is 1 pixel into the floor. This might look ok for platform games, but a top down game might look better with sprites aligned to the background. We can do this easily by shifting the BG down 1 pixel.

Sprites can go anywhere on the screen. However, they are not very good at moving smoothly off the left side of the screen. There is an option (PPU Mask 2001 bits xxxx x11x), that if zero, you turn off the left 8 pixels of the screen, and THEN you can smoothly move off the left side of the screen.

Any sprite Y position >= $ef is off the screen. When you call the function oam_clear() or oam_hide_rest(), it puts the sprites Y at $ff, which is below the screen. Sprites don’t wrap around.

So… there are 4 bytes for sprites. Y, tile #, attributes, and X.

Attributes (copied from the wiki)

||||||++- Palette (4 to 7) of sprite
|||+++--- Unimplemented
||+------ Priority (0: in front of background; 1: behind background)
|+------- Flip sprite horizontally
+-------- Flip sprite vertically


So how do we make sprites appear? Like writing to the background, you can only write to the sprites during v-blank, which is handled by neslib in the nmi code. The standard way to do this, is to set aside a 256 byte buffer, aligned exactly to xx00. The picture above is at $700, but neslib usually uses $200 (defined in crt0.s as OAM_BUF). The nmi code will do a quick OAM DMA and copy all the sprites from the buffer to the OAM.

Since we are using a buffer, you should be able to write to the buffer at any time. I prefer to clear the buffer every frame, and rebuild it from scratch every frame.

oam_clear(); //Clear the sprite buffer.

sprid = 0; //Set the index to the buffer to zero.

sprid = oam_spr(x,y,tile,attribute,old sprid); //Push 1 sprite to the buffer.

sprid = oam_meta_spr(x,y,sprid,*data); //Push 1 metasprite to the buffer.

NOTE: I changed all this 9/17/2019, and removed sprid from my code to speed the functions up a bit. Now it looks like this.

oam_clear(); //Clear the sprite buffer.

oam_spr(x,y,tile,attribute); //Push 1 sprite to the buffer.

oam_meta_spr(x,y,*data); //Push 1 metasprite to the buffer.

Fewer variables pushed to the c stack = faster.

Also added were these functions, just in case you need to access the sprid.

oam_set(); // manually set the index to the sprite buffer

oam_get(); // returns SPRID, the current index to the sprite buffer

And to make it all work, I had to add an internal variable in crt0.s called SPRID.


When I made the graphics file, I put the sprite graphics in the second half. We have to remember to tell neslib that we want to use the second half for sprites…


And make sure to define a palette for both BG and sprites.

Ok, so, how do I make a metasprite? NES Screen Tool has a tool for making metasprites. I find it a bit difficult to use. Sometimes I just copy and paste a definition from a same sized metasprite (and change the tile #s). But, if you use NES Screen Tool, you can “put single metasprite to clipboard as C” and then paste it into the code. Then you can pass that to the oam_meta_spr() function.

oam_meta_spr(x,y, * data).

The sprite definitions used by neslib (and NES Screen Tool) are out of order. It goes x,y,tile,attribute, as opposed to the NES’s actual byte order (y,tile,attribute,x). Keep that in mind, if you want to just retype it by hand, like I sometimes do.

If you want to have a metasprite that changes direction (and flips horizontally), then you should make 2 separate metasprites, one for each direction.

One limitation. None of these functions keep track of how many sprites are in the buffer. You could easily put in too many, and overwrite the ones in the beginning of the buffer.

This example uses 1 basic sprite and 2 metasprites, and moves then down 1 pixel per frame.





Oh, one more thing. If you are transitioning from one part of a game to another, and you turn off the screen, make sure you clear the sprites before you turn the screen back on so you don’t have 1 frame of junk sprites left on the screen.


05. Palettes

A little more about the NES palette.


There are 64 choices (0-$3f), but many of those are black. The neslib forces you to use $0f for black and $30 for white. Don’t use the xD colors, especially $0D (it glitches some TVs, see YouTube videos of “Immortal” glitched title screen). Well, you can’t anyway, since the neslib converts it to $0f.

The background uses the PPU addresses $3f00-3f0f for palettes.

The sprites use PPU addresses $3f10-3f1f for palettes.

Color index #0 (at PPU address $3f00) is the universal BG color. All 4 of the bg palettes will reuse that same color as their 0th color.

There are 4 Background palettes.

U = universal color
U123 U123 U123 U123
That makes 13 unique BG colors on screen.

The 0 color index for each Sprite palette is transparent.

There are 4 Sprite palettes.
x = transparent
x123 x123 x123 x123
That makes 12 unique Sprite colors on screen.


So far, we have been working with only 1 palette (4 colors), so let’s make something using all the bg palettes.

You can change an entire palette (32 bytes) with pal_all(), or change the 16 byte bg palette with pal_bg() and the 16 byte sprite palette with pal_spr(). Just pass it an array of 16 bytes. And, you can change just 1 color with pal_col(index, color), where index is 0-$1f. There will be an example of pal_col in the next page (sprite collisions).

Although the palette is in the PPU, and usually it can’t be written while the screen is on… all the neslib palette functions write to a buffer, which is copied to the PPU only during v-blank (in the nmi code). So, feel free to use these functions anytime, with the screen on or off.

Background Attribute Table

In the PPU, at the end of each nametable (tilemap), is the attribute table. For map#0, that’s $23c0-$23ff. The only “attribute” they can have is palette selection, so, you can think of it as the palette table.

A nametable only has 64 bytes to work with for palette choices, so that makes an 8×8 grid. Each byte represents a 32×32 pixel chunk of the BG. Each byte in the attribute table is further divided into 2 bit segments, and each 2 bits represents a 16×16 pixel chunk of the BG.


So, each tile doesn’t get its own palette choice. You can only define a palette choice for a 2×2 block of tiles.

Try NES Screen Tool and draw some simple graphics and place them on the map. Now unclick “Apply tiles” and choose a different palette and draw on the tilemap. It will only change the palette instead of applying tiles. You can easily see the limitations of the attribute table. You can also highlight the attribute grid by clicking on the 2x grid button.

Most games just design their games in blocks of 16×16 pixels. I do this too.


Notice how the floor blocks and the window blocks are exactly 16×16 pixels. The columns are exactly 32 pixels wide. The curtain area is exactly 32 pixels wide.

Having multiple palettes to choose from extends our tileset, since we can reuse the same tiles for different objects by changing its palette. Look at the clouds and the bushes. They are using the same tiles, but they are colored differently because they are assigned a different palette.


Again, 1 attribute byte is further divided into 2 bits per block. So, the layout of an attribute byte go like this, bitwise…

if you look at the byte in binary, these bits represent tiles like this… DDCCBBAA


So, AA is the top left tiles of that block. BB is the top right, etc. So if bits DD goes 00, the bottom right tiles in that block will use the #0 palette. If it goes 01, it will use the #1 palette, etc.

So, I made a background in NES screen tool. The entire picture was gray, using palette #0, but the code is writing to the Attribute Table with fills, changing the palette choices. Notice I used get_at_addr(0,0,0) to calculate an address in the attribute table. Then I used vram_fill() to set the attribute bytes.

get_at_addr(char nt, char x, char y); — x and y are pixel positions 0-255

vram_fill(unsigned char n,unsigned int len); — n is the fill value

Palette 0 = grays, 1 = blues, 2 = reds, 3 = greens.


The attribute table is fairly hard to modify mid game…involving bit shifting and masking. You would keep a full copy of the attribute table as an array in the RAM, and modify 2 bits at a time, and copy the byte to the correct PPU address after it is modified. Many games just avoid changing it, except as part of the scrolling engine. You could design the game as 32×32 blocks, and you would just change a full byte, rather than worry about bit shifting.

I wrote 1 vram_put() statement to change 1 attribute byte, so you can see it’s size and abilities. See that multi colored block on the lower left?

vram_put(0xe4); // push 1 byte (in binary, that’s 11 10 01 00)

For now, if make the background in NES screen tool, just save the nametable with the attribute bytes as a compressed rle, and it will copy those when your game loads them.




Part of my library is a metatile system, which can handle attribute bytes for you.
For another time.

04. Full background

Writing a full screen with an RLE compressed file.

Reduce the size of the image to 256×240 or maybe 128×128, and then change the mode to 4 color indexed.

From there, you can cut and paste into YY-CHR.

I think you get slightly better results first converting to grayscale, then to mode/indexed (4 colors).

Sometimes (frequently) YY-CHR would get the index in the wrong order and you would have to use the color replace tool to get it correct. Then save as .chr, which you can open in NESST.


BUT — I decided to make my own graphics conversion tool, called NESIFIER.  It can convert an image (.png .jpg .bmp or .gif) to NES format — nametable, .chr graphics, and palette. You can now (with version 2.2) export as an 8-bit indexed BMP, which is the format the NES Screen Tool uses to import a file. This could be useful if you have too many unique tiles (use the lossy import option).


Originally, I made a 256×240 image. But, I had too many unique tiles… So, in GIMP I resized the image smaller (about 160×160), but then padded the canvas size to 256×240.


I saved as .png. Open that in NESIFIER, manually selected 4 colors, and dither settings (Floyd-Steinberg, 10) and press “convert”. This is the result.


Notice that the number to the left of “Tiles” is 254. Good. We need it 255 or less. Then I save the tiles “save final CHR”, and save the tilemap “save nametable”, and save the palette “Palette/Save NES 16 bytes”.

You might have to play with the dither settings or dither pattern to get better results. Note that higher dither value looks better, but tends to create more unique tiles.

Open NES Screen Tool and load all these files.

With NES Screen Tool I saved the tilemap , “Nametable/Save Nametable and Attributes/RLE packed as C header .h”. Now we can import it into the C code, with #include “NES_ST/Girl5.h”.

A full nametable is 1024 bytes. You don’t want to leave nametables uncompressed… you would very quickly run out of space. The RLE version is compressed to 339 bytes. The game code needs to decompress this Girl5.h file.

We can’t do this with the screen on, so turn it off, then set a starting address, and call the rle function.


vram_adr(NAMETABLE_A); // set the destination address, the top left of the screen

vram_unrle(Girl5); // decompress our rle file, copy to the screen. Girl5 is the name of the char array in the Girl5.h file.


So far, I’ve forgotten to mention the palette. NES Screen Tool can copy the palette to the clipboard, which I pasted into the C code an an array of chars. pal_bg() sets the palette for the Background. The palette itself is just a byte array of 16 bytes. We pass the name of the array to pal_bg(palette_name) to copy it to the NES palette.






Fade In / Fade Out

neslib makes it easy to change the brightness of the screen. You can do this with pal_bright(), using a value between 0 (black) and 8 (white). 4 = normal.

I borrowed a function from Shiru’s “Chase” game, and it’s very easy to use.

pal_fade_to(0,4); // fade from black to normal

pal_fade_to(4,0); // fade from normal to black

And if you run the fade.nes file, you see that it fades in and out in an infinite loop. Fading could be used for transitions, like from the title to the game, or from level to level.




Side Note –

With the NESIFIER tool, you can also save tiles and tilemap as a DZ4 file, which is a compression format that I came up with. I haven’t integrated that into the neslib / nesdoug code yet, so you can just skip it and use the RLE format that came with neslib originally. But, DZ4 would work similarly. It can sometimes get better compression than the RLE format.

There are lots of compression tools out there. NES Screen Tool RLE is good enough for now.