24. Advanced Mapper – MMC3

Note – code updated Feb 2023. Bug fix, to prevent possible conflicts with interrupts using the Bank Swap Register (which could crash the game). The neslib.s file is also changed. Also, I moved the palette buffer out of the hardware stack.  The new code is more flexible, and the main code should be able to swap PRG and CHR banks whenever you want.

Note – code updated Oct-27-2021, added another CLI to the asm code, to make sure that IRQs work. Setting a scanline count less than 10, we might still be in the NMI code, and NMIs automatically disable IRQs until the RTI is reached. The music code is inside the NMI, and if the music code went long, it might be as low as scanline 10 or 11 before allowing IRQs again, which caused at least one user to have glitchy / delayed IRQs. This fixes the issue.

MMC3 is the 2nd most popular mapper. Many of the most popular games were MMC3. Several of the Megaman games. Super Mario Bros 2 + 3. This photo is borrowed from bootgod. The little chip at the top says MMC3B.


MMC3 PRG banks are $2000 in size. Max size is 512k (64 of those PRG banks).

MMC3 CHR banks are $400 in size. Max size is 256k (256 of those CHR banks). This extra small size makes it possible to change 1/4 of a tileset.

Although, only one of the tilesets can do this. The other can only change 1/2 at a time. Which one is optional, but I would prefer the BG to have the smaller banks, so that it more easily animate the background without wasting too much memory.

MMC3 can have WRAM, some times with a battery for saving the game.

MMC3 can change between Horizontal and Vertical mirroring, freely.

Best of all. MMC3 has a scanline counter hardware IRQ. That means you can very exactly time mid-frame changes, such as scrolling changes. You can do parallax scrolling, like the train stage on Ninja Gaiden 2.

Since IRQ code needs to be programmed in Assembly, instead of having you, the casual C user, try to write your own… I made an automated system, that can do most anything you would need it to do mid-screen. (I will discuss a little later)

I don’t want to get into all the details of how the hardware works, which you can find here, if you want to read them.


There are multiple screen splits (timed by IRQ) and I’m animating the gears by swapping CHR banks. Also, the music is in a swappable bank.
Let’s go over every thing that had to be changed…


I made several $2000 byte swappable banks. The C code and all the libraries need to go in a fixed bank. I decided to keep $a000-ffff fixed to the last 3 banks. That means all the swappable banks will go in the $8000 slot… which is why we have all those banks start at $8000. Technically, MMC3 has the ability to swap the $a000-bfff bank, but I’m adapting the code from the MMC1 example, which only expects 1 swappable region, so let’s pretend that you can’t change $a000.

The startup code (reset code) needs to be in the very last $e000-ffff area, because this is the only bank that, by default, we know the position of. Vectors also go here.

Again, this mapper can have WRAM at $6000, so I made that segment to test it. You would need to change the BSS name to XRAM (what I called this segment) to declare a variable or array here.

In the Assembly code, just inserting a .segment like .segment “CODE” makes everything below that point map into that bank.

In the C code, you need to use a pragma, like
#pragma bss-name(push, “ZEROPAGE”)
#pragma bss-name(pop)


#pragma rodata-name (“BANK0”)
#pragma code-name (“BANK0”)

to direct the various types of things into the correct bank.

At the bottom of MMC3_128_128.cfg, it defines some symbols, NES_MAPPER=4.
And size of each ROM chip (NES_PRG_BANKS=8). MMC3 can be expanded to  512k PRG and 256k CHR, if you need.


This is the reset code and where most .s files are included. For example, mmc3_code.asm is included here. All in the “STARTUP” segment, so we know it is mapped to $e000-ffff, the default fixed bank.

I inserted a basic MMC3 initialization, which puts all the banks in place (and bank 0 into the $8000 slot). It also, importantly, does a CLI to allow IRQs to work on the CPU chip.

Also, I defined “SOUND_BANK 12”, and put the music data there, in BANK 12. All the music code will swap that bank in place, to play songs.

MMC3 folder


is all the hidden code that you shouldn’t have to worry about. Except if you want to change the A12_INVERT from $80 to 0. This determines which tileset will get the smaller bank $400 size and which will get the larger $800. I have it $80 so that background has the smaller tile banks, so that swapping BG CHR banks (to animate the background) uses less space.

If you prefer to have smaller sprite banks, change this number to 0. This also changes which CHR bank mode you need to change BG vs sprite tiles.

;if invert bit is 0
;mode 0 changes $0000-$07FF
;mode 1 changes $0800-$0FFF

;mode 2 changes $1000-$13FF
;mode 3 changes $1400-$17FF
;mode 4 changes $1800-$1BFF
;mode 5 changes $1C00-$1FFF

;if invert bit is $80
;mode 0 changes $1000-$17FF
;mode 1 changes $1800-$1FFF

;mode 2 changes $0000-$03FF
;mode 3 changes $0400-$07FF
;mode 4 changes $0800-$0BFF
;mode 5 changes $0C00-$0FFF


is the bank swapping code. This is the same as the MMC1 example.
You only need the banked_call() function. Anytime you call a function that is in a swappable bank, you use..

banked_call(unsigned char bankId, void (*method)(void))

(which bank is it in, the name of the function)
Don’t nest more than 10 function calls deep this way, or it will crash.


Here is the new stuff that you need to know.

set_prg_8000(unsigned char bank_id);

this changes which bank is mapped to $8000. I prefer that you use the banked_call() function, which automatically does this for you. You could use this to read data from a certain bank.


returns the number of the bank at $8000


don’t use this, for how I have it set up. It would change the bank mapped at $a000. Currently, we have code mapped here, which could crash if it was missing.

set_chr_mode_0(unsigned char chr_id);
set_chr_mode_1(unsigned char chr_id);
set_chr_mode_2(unsigned char chr_id);
set_chr_mode_3(unsigned char chr_id);
set_chr_mode_4(unsigned char chr_id);
set_chr_mode_5(unsigned char chr_id);

these change which parts of the CHR ROM are mapped to the tilesets.
See the chart above.

set_mirroring(unsigned char mirroring)

to change the PPU mirroring layout. Horizontal or Vertical.


this could turn the WRAM on and off. I turned it ON by default.


turns off the irq, and redirects the IRQ SYSTEM to point to 0xff (end).

set_irq_ptr(char * address)

turns on the IRQ SYSTEM, and points it to a char array.


So I should talk about this IRQ stuff. An IRQ is a hardware interrupt.
The MMC3 has a scanline counter. Once it is set, it counts down each line that is drawn. When it reaches zero, it jumps to the IRQ code. The standard use for the scanline IRQ is to make mid-screen scrolling changes, such as parallax scrolling.

The IRQ code would need to be written in Assembly. Instead of expecting you to learn this to make it work, I wrote an automated system that will parse a set of instructions (char array) and execute the necessary code behind the scenes. Here’s how it works.

A value < 0xf0, it’s a scanline count
zero is valid, it triggers an IRQ at the end of the current line

if >= 0xf0…
f0 = 2000 write, next byte is write value
f1 = 2001 write, next byte is write value
f2-f4 unused – future TODO ?
f5 = 2005 write, next byte is H Scroll value
f6 = 2006 write, next 2 bytes are write values

f7 = change CHR mode 0, next byte is write value
f8 = change CHR mode 1, next byte is write value
f9 = change CHR mode 2, next byte is write value
fa = change CHR mode 3, next byte is write value
fb = change CHR mode 4, next byte is write value
fc = change CHR mode 5, next byte is write value

fd = very short wait, no following byte
fe = short wait, next byte is quick loop value
(for fine tuning timing of things)

ff = end of data set
Once it sees a scanline value or an 0xff, it exits the parser.
Small example…
irq_array[0] = 47; // wait 48 lines

irq_array[1] = 0xf5; // 2005, H scroll change
irq_array[2] = 0; // change it to zero
irq_array[3] = 0xff; // end of data

This will cause it to, at the very top of the screen, set the scanline counter for 47. The screen will draw the top 48 lines (it’s usually 1 more than the number) then an IRQ will fire.

It will read the next lines, and change the Horizontal scroll to zero, then it will see the 0xff, and exit.

If it saw another scanline value (any # < 0xf0) it would instead set another scanline counter. And so on. Let’s look at the example.


There are 4 IRQ splits here (red lines). I’m doing quite a lot here. This is sort of an extreme example. Let’s go over the array, and see all what’s going on.


Anything put BEFORE the first scanline count will happen in v-blank,  and affect the entire screen.

irq_array[0] = 0xfc; // CHR mode 5, change the 0xc00-0xfff tiles
irq_array[1] = 8; // top of the screen, static value
irq_array[2] = 47; // value < 0xf0 = scanline count, 1 less than #

0xfc, change some tiles, mode 5, use CHR bank 8. Every frame this will always be 8 at the top of the screen. If you look at the video or the ROM, you see the top gear is not spinning. I did this on purpose to show the tile change mid-screen better.

It sees the 47, which is less than 0xf0, so it sets the scanline counter and exits.

At line 48 of the screen, an IRQ fires, the IRQ parser reads the next command.

irq_array[3] = 0xf5; // H scroll change, do first for timing
temp = scroll2 >> 8;
irq_array[4] = temp; // scroll value
irq_array[5] = 0xfc; // CHR mode 5, change the 0xc00-0xfff tiles
irq_array[6] = 8 + char_state; // value = 8,9,10,or 11
irq_array[7] = 29; // scanline count

First, it changes the Horizontal scroll. This code actually takes up an entire scanline, to try to time it near the end of a scanline so it’s not so visible. Note, you can’t change the Vertical scroll this way, only the Horizontal.

Then, it changes the background tiles, and cycles them through 4 options. This causes the gear to spin. The entire screen below this point is affected.

Then, it sets another scanline count and exits. 30 scanlines later another IRQ fires. It then reads this…

irq_array[8] = 0xf5; // H scroll change
temp = scroll3 >> 8;
irq_array[9] = temp; // scroll value
irq_array[10] = 0xf1; // $2001 test changing color emphasis
irq_array[11] = 0xfe; // value COL_EMP_DARK 0xe0 + 0x1e
irq_array[12] = 30; // scanline count

We change the H scroll again. Just to show what ELSE we can do, it’s writing a value to the $2001 register, setting all the color emphasis bits, which darkens the screen. If you move the sprite guy down, you see that he is also affected.

Another scanline count is set, for 30. It takes 31 lines, probably.
irq_array[16] = 0xf5; // H scroll change
irq_array[17] = 0; // to zero, to set the fine X scroll
irq_array[18] = 0xf6; // 2 writes to 2006 shifts screen
irq_array[19] = 0x20; // need 2 values…
irq_array[20] = 0x00; // PPU address $2000 = top of screen
irq_array[21] = 0xff; // end of data

And just to be fancy, I’m showing an example of using the $2006 PPU register mid-screen. But, to make it work, I also needed to set the $2005 register to zero with the 0xf5 command. The 0xf6 command is the only one that is expecting 2 values after it. High byte, then Low byte, of a PPU address.

Writing to $2006 mid-frame causes the scroll to realign to that address. In this example, the address $2000 is the top of the nametable, which means that the top of the nametable is drawn AGAIN, below that point.

Instead of seeing a value < 0xf0, it sees an 0xff, and exits without setting another scanline count. The parser does nothing until the top of the next line.


In the code, a double buffering system was added so that you aren’t editing the same array that the system is currently reading. Every irq_array[] reference above has been replaced with double_buffer[]. Then at the end of the frame logic, it waits till the IRQ System is done while(!is_irq_done() ).

is_irq_done() returns zero if it’s not done, and 0xff if it is done. The preceding ! not operator flips the result, so it becomes a “while this function returns zero, do nothing”.

Then, it copies the double_buffer array to the irq_array.

Swappable PRG banks

These work the same as the MMC1 code, except that the bank size is $2000.

To make the code / data go in BANK 0, use these directives

#pragma rodata-name (“BANK0”)
#pragma code-name (“BANK0”)

The function_bank0() function is compiled into bank 0. To call it, we use

banked_call(0, function_bank0);

0 is the bank, function_bank0 is the name of the function.

This jumps to banked_call(), which is in the fixed bank, it automatically swaps bank 0 in place, and then jumps to function_bank0() with a function pointer. Then it swaps the previous bank back in place, and returns.

As you see, this means that I can’t pass arguments to function_bank0().
So, we have to pass arguments via global variables. arg1, arg2, for example. Obviously, this isn’t safe practice. You could immediately pass those values to static local variables, to avoid errors.

If you call a function in the SAME swapped bank that you are already in, you can safely use regular function calls.

You can call a function from one swapped bank to another swapped bank. But, don’t keep going from one to another swapped bank too much, not more than 10 deep.

Other important notes.

For the IRQ scanline counter to work, background must use tilset #0 (which is the default)and sprites must use tileset #1 bank_spr(1). And, the screen must be on. ppu_on_all().

Here’s the link to the example code.


Future plans… maybe some time, I will make a UxROM (perhaps UNROM 512) code example. This would require VRAM, which would involve compression. Otherwise, the PRG works the same as the MMC1, which we already have working example code, so it wouldn’t be too difficult to convert.