Increase framerate of software rendering (VGA in emulator)?

My software is currently running at INT13 mode 2/3 (80x25 text mode,720x400) with about 1-2FPS.

Every pixel is drawn using the following steps:
1. Character generator or Graphics generator (fetches character if needed and attribute, processes into pixel fg/bg).
2. Attribute processor (processes fetched input according to the attribute controller registers).
3. DAC (processes final DAC index into output and sets it in the emulated screen surface).

I notice that the DAC is pretty fast (using a simple lookup table and some flags for b/w mode and b/w conversion(emulator defined, to make it look like a b/w (256 scales) monitor (although with color input))).

The steps take the following time:
Attribute processing: 28us per pixel.
Text: 56us per pixel.
Combined: 51us per pixel.

It seems that to make it run at ~10fps I need to get this at about ~0.3us (640x480x10 in 1000000us=0.32us per pixel).

It is currently executed one pixel of a horizontal row (either character or graphics, depending on the current mode), from left to right, top to bottom (the same way the scanlines work on a normal monitor).

So when processing characters, usually 1x8, 1x14 or 1x16 is processed in one step. When processing graphics with a cell height of 1 pixel, 1 pixel is processed within one step.

When the full screen has been rendered, the emulator's graphics routines are invoked and the screen is resized, extra surfaces overlayed and displayed on-screen (this takes about 50-60ms usually, so ~30fps at full speed (without rendering the VGA)). This has been improved to skip rendering when the screen hasn't been updated (taking the process down to about 5-10us when not updated (no extra surfaces changed and no VGA screen changes/resolution changes)).

VGA Source code: http://superfury.heliohost.org/cplusplus/vga_20140527_1810.zip
- The vga_screen folder contains the rendering functions.
- vga_vram.c contains the VRAM direct access functionality.
- vga_vramtext.c contains the functions used in rendering characters from VRAM plane 2 (8x8 font seperated for emulator extra surface output).
- vga_screen/vga_screen.c; function VGA_generateScreenLine is the root function rendering VGA pixels to the emulated screen in the GPU (emulator specific). This is called by a timer executing at the speed of rendering (depending on the resolution, trying at 60fps (usually total pixels on-screen times 60 fps, so at 640x480x32 at 60fps is 18432000 times per second, with a limit of 1us intervals maximum)).

Anyone can help me speed this up (currently running at 0.5-1.5fps)?

I want to at least get it up to 10fps (desent gaming speed)? If possible 30fps (which is the maximum I can get using SDL rendering with extra surfaces included).

Anyone has tips concerning software rendering (I'm rendering to a buffer of 1024x768 pixels, which has variable height/width which is managed by the GPU itself, the screen is only updated when changes occur.

This typically consists of:
1. rendering the buffer to a SDL surface (only when changed/dirty)
2. resizing the SDL surface to the console screen when needed (only when changed/dirty).
Steps 3+ are only executed when the above surface is dirty and set OR/AND any extra textsurfaces (fixed resolution) are dirty.
3. copying the surface to the destination screen with extra black borders (to keep the scale intact).
4. render all text surfaces over the destination screen to get the final screen.
5. Flip the screen (double buffering) when the rendering surface (SDL_getVideoSurface) is dirty.

(All together about ~50ms taken for all steps, 4-5ms when not updated, about 30-40ms is taken during the first 2 steps).

Anyone can help me get the VGA rendering faster?
Anyone has tips regarding the rendering speed? (Afaik the lookup buffers are already used).
1
2
3
4
5
6
7
start: //Process all overscan!
{
    if (!y2--) goto finishdac; //Break out if nothing is left!
    Sequencer_VGA->CurrentScanLine[y2] = Sequencer.overscancolor; //Overscan! Already DAC Index!
    goto start; //Return!
}
finishdac: //Finish the DAC 
WHY?

I'd like to run your code through a profiler. Could you send me something I can easily compile and run on Windows?
I don't program for windows (it's made for running on the PSP), nor is it in c++ (just plain C). I've made the loops into simple jumps, because they run faster than their for(;y2;y2--) equivalents for some reason.
I've made the loops into simple jumps, because they run faster than their for(;y2;y2--) equivalents for some reason.
That answers your question about performance, then. Get a better compiler.
Topic archived. No new replies allowed.