I've build a routine in my emulator that processes pixels one by one till vblank after which the frame is rendered using SDL.
The rendering using SDL takes about 50ms (giving about 30fps with empty screens, the emulator constantly rendering empty screens, dedicating all it's time to sdl resizing&rendering an empty buffer).
At the moment the pixels are processed one at a time in a loop. This is true for both graphics and text modes. Every pixel takes about 30 microseconds to render to the buffer (30/1000000th second, so a maximum of 33333 pixels per second, which at 320x200 resolution is about 0.5fps.
I want to get the speed up to at least 10fps, which will need every pixel to be rendered in about 1.5 microsecond (assuming 320x200 at 10fps with no time spend on rendering). Is this even possible on a 333MHz cpu, and if so, how?
The rendering routine of my VGA emulation does the following each pixel:
1. Read the pixel on/off and/or attribute from emulated VRAM. This can be either a pixel on/off from the character generator or a permanent pixel on from a graphics pixel. The attribute is also returned.
2. The atribute controller checks and processes the pixel on/off and attribute into a DAC index according to the attribute controller registers.
3. The DAC uses a lookup table to convert the 8-bit dac index into a 24-bit RGB value and writes it to the screen buffer.
Step 1 takes the most time (about 15-20 us), step 2 takes a lot less (about 7-9us), step 3 takes almost no time at all (1-3us).
When vblank occurs, the entire buffer is rendered/cropped to a screen SDL surface, which is periodically rendered during rendering of the VGA and other surfaces. Finally the active surface is flipped to render the screen. All rendering and cropping happens only when the buffer and/or surfaces are dirty (modified).