So this is a story all about how I tried to recommend an NES emulator as a 'confidence building' project way back when. Then the topic somewhat came up again recently, and I shifted gears to an 'NSF Player' (which is just a mini-NES emulator ... just the CPU and audio functionality -- NSF files are the music code extracted from the game).
For whatever reason... I
really want someone here to do a project like this. I can't explain why. I have a weird fascination with it. It's like living vicariously through other board members. It's a little insane, I know, but at the same time I can't help it.
So I'd like to take a minute just sit right there, I'll tell you all about the basics of NES emulation.
PART 1: ADDRESSING SPACE
The NES CPU has a 16-bit address bus. This means there are only 64K addresses. It operates on memory mapped I/O, so different addresses corespond to different areas of the system.
Here is a [very] simplified mapping (areas not mentioned are either "mirrors" or are open bus... I'll get into that later... maybe):
$0000 - $07FF = RAM
$4000 - $4017 = APU registers (writes to here will generate audio)
$6000 - $7FFF = More RAM
$8000 - $FFFF = ROM (ie: the NSF file containing the code/data to play the music)
|
You can think of this like a giant array. The Game/NSF code will read/write different areas in this array. To emulate the system, you merely have to catch the special reads/writes and make them do stuff.
PART 2: CPU BASICS
The NES has a slightly modified NMOS 6502. The main difference as far as emulation is concerned is that there's no decimal mode... so ADC and SBC behave the same whether or not the D flag is set. But "WHATEVER WTF DOES THAT MEAN I DON'T KNOW ANYTHING ABOUT ASSEMBLY USE TERMS I CAN UNDERSTAND YOU ASSHOLE".
Let's start with the registers. There's 3 'main' registers, they're called 'A', 'X', and 'Y'. Not very descriptive names but who cares, right? They basically just act as 8-bit variables. To emulate them... all you need to do is have an unsigned 8-bit var:
|
uint8_t A; // <- you successfully emulated the 6502 'A' register! congrats!
|
(yes yes, A stands for "Accumulator" and X,Y are actually called "X-index" and "Y-index"... but I'm trying not to get too technical dammit)
Different instructions will do different things with the A,X,Y registers... and possibly with memory (that addressing space business I mentioned earlier).
But that's only HALF the fun! The other half is the "Addressing Modes" which determine where the instruction gets its information from. But before I get into those... let's look at an example instruction:
This is the 'LDA' instruction. LDA stands for "Load A" and as you might guess, this assigns the literal value $06 to A. In C, this would be the same as:
The $ denotes hexadecimal notation. The # denotes "immediate" mode (that's an addressing mode). Without the # symbol... the instruction means something else:
1 2
|
// conceptual equivilent C code:
A = memory[ 0x06 ];
|
This is "Zero Page" mode. Unlike Immediate mode, it is not assigning 6 to A, but instead 6 is an address... and it will read from address $0006 and whatever value it reads will get put into A.
Of course... since addressing space is 16 bits, this means you can also have 16-bit addresses:
This is "Absolute" mode and is completely identical to "Zero Page" mode except it takes a 16-bit address instead of an 8-bit address. Zero page is used because it's slightly faster and because it takes up less space in the program.
PART 3: INDEXING, INDIRECTION, AND YOU
So this is fine and dandy and all... but reading/writing in absolute addresses doesn't give the programmer much flexibility. So this is where X and Y come in. They are used as "indexes" for instructions:
1 2
|
// C equivilent
A = memory[0x6000 + X];
|
This is "Absolute,X" mode, and as you can see, it merely uses X to index memory... as if it were an array. The comma here denotes addition.
But even that is limiting... to really be turing complete you need to have something like those weird "pointer" things. This is done with "Indirect" addressing modes.
1 2
|
uint16_t temp_address = memory[0x10] | (memory[0x11] << 8);
A = memory[temp_address + Y];
|
This is "(Indirect),Y" mode. And yeah yeah I know... this is more complicated. But chillax, it's not so bad.
The (parenthesis) here denotes indirection. Indirection basically is just fetching a 16-bit pointer from the given address... then reading from THAT pointer using Y to index it.
Programs can use this to write a pointer to some kind of lookup table or data in the ROM and then use Y to index it. For example:
LDA #$A0 ; load low byte of address
STA $10 ; write it to memory[0x10]
LDA #$80 ; load high byte
STA $11 ; write it
; at this point... memory[0x0010] and memory[0x0011] form a 16 bit pointer, pointing to address $80A0
LDY #$02 ; put literal 2 in Y
LDA ($10),Y ; reads from address $80A2 (pointer address of $80A0 + contents of Y (2) = $80A2)
|
Indirection with X can also be done... but it's different... and stupider and much less useful:
Notice how the ",X" is INSIDE the parenthesis whereas the ",Y" was outside? That's because with X, the indexing is done before the indirection. So it ends up being like this:
1 2
|
uint16_t pointer = memory[0x10 + X] | (memory[0x11 + X] << 8);
A = memory[ pointer ];
|
Don't ask me to explain why anyone would find this useful. I still haven't figured it out.
PART 4: THE STACK, THE PC, AND JUMPING
Remember back on the memory map... I mentioned that addresses $0000-07FF was RAM? Well one 'page' ($100 byte block) of that is "the stack" and is treated somewhat specially.
If you're unfamiliar with the concept of a stack... think of a stack of plates. You can put a plate on top (push) or you can take a plate off the top (pull). But you don't really have [easy] access to any plates other than the one on the top.
This concept of pushing/pulling values onto a stack is the same idea that is behind the 6502 stack. It's easily illustrated with 2 instructions: PHA (push A onto stack) and PLA (pull A off of stack):
LDA #$05
PHA ; pushes 5 onto the stack: [bottom] $05 [top]
LDA #$03
PHA ; pushes 3 onto the stack: [bottom] $05 $03 [top]
LDA #$00 ; erase A. A=0
PLA ; A=3 (value off top of stack): [bottom] $05 [top]
PLA ; A=5 (value off top of stack): [stack is empty]
|
To keep track of the stack... there's an 8-bit 'SP' register. This register is not directly manipulated by the code like A,X, and Y are. Instead... it's used implicitly in instructions like PHA.
The stack starts at address $01FF and grows down (so if the stack is completely full.. $01FF is the bottom of the stack, and $0100 is the top).
With that in mind...
1 2 3 4 5 6 7
|
// PHA in C:
memory[ 0x0100 + SP ] = A;
--SP;
// PLA in C:
++SP;
A = memory[ 0x0100 + SP ];
|
On a somewhat unrelated note... in addition to A,X,Y, and SP... there's also a 'PC' register. The PC register is 16-bits and basically just tells the CPU which address it should be reading instructions from. Every time you execute an instruction, PC gets incremented to point to the next instruction.
So if PC=$8000 , this just means that the CPU will read the next instruction opcode from address $8000, the increment PC.
Like SP, the PC cannot be directly modified in the 6502 code like A,X,Y are. Instead, the program will "jump" to different areas in the code with the JMP instruction:
JMP $C000 ; jumps to address $C000
|
1 2
|
// equivilent C code:
PC = 0xC000;
|
JMP is basically like a goto. Except this is 6502 assembly, so it's not evil like goto in C++ is.
But JMP sucks for subroutines because you want to be able to "return" or jump back to whatever called this code. For that... there's JSR ("jump to subroutine") and RTS ("return from subroutine")
JSR functions exactly like JMP does... only it will push the PC onto the stack (in 2 bytes... since the PC is 16-bits).
RTS will just pull 2 bytes off the stack, and stick that value in the PC to jump back (basically undo-ing the JSR)
PART 5: FLAGS AND BRANCHING
To be written. I need to take a game break.