At the start of VBlank, the 'vblank' status bit of $2002 is set (so reading $2002 will return the high bit set), and the PPU will generate an NMI (if enabled via the high bit of $2000). When an NMI is triggered, an interrupt occurs in the CPU (current PC and status flags are pushed to the stack, and the CPU jumps to the 'NMI' vector, specified by the address at $FFFA). NMIs are how games get notified that VBlank has started. One should happen every frame (when enabled). When disabled they should not occur.
VBLANK, PPU ON/OFF SWITCH
The $2001 PPU Register has 2 very important control bits.
ref:
http://wiki.nesdev.com/w/index.php/PPU_registers#Mask_.28.242001.29_.3E_write
Bit 3 (0x08), when set, the background (BG) will be visible
Bit 4 (0x10), when set, the sprites (movable objects, like Mario, Link, etc) will be visible
When
either of these bits are set... the PPU is considered "on". When both of them are clear, the PPU is considered "off" and does different behavior than if it were on.
The game will set/clear these bits by writing to $2001:
LDA #$18
STA $2001 ; Turns on the PPU
LDA #$00
STA $2001 ; Turns off the PPU
|
Note that as previously mentioned... turning off the PPU does not stop the frame from progressing. "off" is a bit of a misnomer because the PPU is still powered and is doing stuff... it's just doing less work.
During the "idle" and "VBlank" scanlines... the PPU is effectively doing nothing but waiting for time to pass. During idle and VBlank scanlines the PPU is "not in rendering"
The "prerender" and "render" scanlines are when the PPU is drawing pixels to the screen, updating internal registers, and doing all its crazy work to get the image displayed. If the PPU is
on during these scanlines... the PPU is "in rendering". If the PPU is
off during this time... it is "not in rendering".
When the PPU is "in rendering" (or hereon, just "rendering"), it is unsafe to access various PPU registers. For example, $2007 (CPU<->PPU data port) can only be accessed outside of rendering. Same with $2003 and $2004 (Sprite address and data regs). Therefore games can only updates sprites/BG to update the screen outside of rendering. This means either waiting for VBlank (or idle scanline)... or forcibly turning the PPU off via $2001.
Typically, games will turn the PPU off... draw the entire screen by doing a bazillion writes to $2007, then will turn the PPU on. While the PPU is on.. every frame, during VBlank, they will make minor changes to the screen.
For example in Super Mario Bros... once you press start at the title screen:
- the screen will go black for a few frames because the PPU is switched off so it can clear the BG and print how many lives you have remaining
- then the screen will turn back on (PPU on) to show that info to you for a few frames.
- then it goes black again (PPU off) to draw the visible tiles from level 1-1
- then PPU goes on again and you start playing the game
- while playing the game, the PPU remains on... and as you move through the level, more of it is drawn with the PPU remaining on (because the drawing is being done by the CPU during VBlank)
MORE SCANLINE DETAILS
As mentioned, the PPU is effectively waiting for the idle and vblank scanlines... so nothing happens on those lines.
However during rendering, the PPU is doing all sorts of crap. I'm not going to get into details just yet. For extreme details on the timing you can refer to the wiki (or ask... I don't mind answering... I just don't want to overwhelm you). I recommend you start with a simple emu and don't focus too much on the extreme timing. At least not until later.
If interested in the exact timing... a diagram can be found here:
http://wiki.nesdev.com/w/images/d/d1/Ntsc_timing.png (note it refers to the "idle" scanline as "post-render", but it's the same thing)
The basic work done by the PPU during a rendering scanline is:
- Fetch tile data for visible tiles
- Output pixels (1 pixel per dot). Only outputs 256 pixels.
- Updates scroll (will explain in another section)
- Fetches tiles for sprite data for the next scanline
For a basic emu... if you don't care about cycle-level timing (pixel accurate emulation)... you can just draw one row of pixels every 341 dots.
SYNCING PPU AND CPU + TECHNIQUES
It is important to keep track of where both the CPU and PPU are in the current frame. Games will do various raster effecs by writing to PPU registers mid-frame (and in some cases, even mid-scanline!). The most common technique is basic screen splitting, where the game will change the scroll mid-screen.
For example of splitting the screen... you can look at Super Mario Bros. It has a status bar at the top of the screen which stays stationary, whereas the map scrolls horizontally below it. This is accomplished by having the scroll set to 0,0 at the start of the frame... then about 70 or so scanlines into rendering... it will change the PPU scroll values. This results in the visible "split". Super Mario Bros waits those 70 or so scanlines by having the CPU effectively wait/spin until a certain number of CPU cycles has passed. (and... it also uses "Sprite 0 hit" which is another topic I'll get into later).
There are a couple of techniques for emulating CPU and PPU timing interations. The most common one that I'm going to focus on is the "catch up" approach. It works like so:
- You run the CPU ahead of the PPU... have it run for a full frame's worth of cycles.
- Keep track of a CPU 'timestamp' which increments with each passing cycle.
- Whenever the CPU does something that could impact the PPU (ie: read/write any PPU register), you pause CPU execution and run the PPU up to the current CPU timestamp.
- Once the PPU "catches up" to the CPU timestamp... you perform the register read/write, and continue executing the CPU.
This can be done in a single thread. Or if you want to get very adventerous you can try multithreading. Note that I do
not recommend pre-emptive multithreading (which is what you are thinking of when I say "multithreading"). However cooperative multithreading actually works pretty well. The idea with cooperative is that the two threads do not run simultaneously. Instead, only one thread is running at a time.. and you can "switch" to other threads whenever you want.
The reason cooperative is better for this is because you will have to sync up the PPU and CPU a LOT. Several times (possibly several dozen times) per frame. Having one thread wait for the other that many times creates a lot of blocking which ultimately makes things very slow.
If interested in cooperative threading... a very simple to use lib is libco, which is available here:
https://www.dropbox.com/s/cqfhidb8djug9lq/libco.zip
Libco is used in various emus.
As for keeping track of timestamps... I recommend keeping both timestamps in the same "base" so comparisons between them are easier. So for example:
- increment the CPU timestamp by 3 every CPU cycle
- increment the PPU timestamp by 1 every dot
That will keep the 3:1 ratio.
OR, a better way might be:
- increment the CPU timestamp by 15 every CPU cycle
- increment the PPU timestamp by 5 every dot
This will keep the 3:1 ratio, and will also make PAL support easier if/when you add it in the future.