std::async / CPU-usage question


Hello everyone,

i never used any multithreading before this week, so i'm not reaaly common with it.

I implemented some simple std::async(std::launch::async, ...)-calls into an existing windows application using OpenGL and wxWidgets. There are some visible effects, like i.e. reorganizing several windows works without inerrupting the OpenGL-rendering, as it did before.

For testing i implemented the option to switch between unsing std::async and normal calls at runtime, but here the effect is not the one i expeced if there even is any.
Using standard-calls my program obvously runs on one CPU-core that is displayed mostly used (~80%) in the recource-monitor. With activated async-calls, the core-usage seems to grow, while it has no effect on other cores.

Is there anyone, who can explain what the reason might be?



Also i wonder concerning the following: in the number of thread that are shown in the resource-monitor for my program, i can see the difference if i swicht the modes, but in both cases the shown number of threads is more then 100.

So my final quetion is: Is there any reason why my application seems to use only one core even if there are a lot of threads shown in the resorce-monitor?



Thank you for reading...


Regards,
Frank
Last edited on
http://sscce.org/
Instead of your vast edifice of " OpenGL and wxWidgets", replace each with a trivial 'doWork' place holder that does nothing but consume CPU time for various intervals.

In the first instance, this gives you something to study, because "i never used any multithreading before this week". Well doing this exercise will give you the opportunity to study what's going on without massive graphical distractions.

In the second instance, you'll have something you can post that others can actually comment on.
I have no idea what you've run in parallel or what your code structure is.

Parallelism isn't a magic bullet that'll make anything go faster. As you've noted, I can make things slower and unstable.
Hello,

i thought my quesition was a mostly theoretical one, so somehow not really depending on details of my code. Also, i'm not sure, which parts of my code migth be relevant for the question. Also, it is not easy to describe what i do....

Concerning srtucture of my application:
My application generates video-mappings using opengl. It has a control-window and one or more output-windows. For the window-management i use wxWidgets 3.1.2 and all of this is controlled by my own script-interpreter.

Next to some helper-classes there are mainly two huge classes: clsParser (the interpreter) and clsControl (repesenting any type of window). On starting the program one clsParser-object is instanced that interprets a singel script instruction: include "Config/loader.txt";.
Now, for each object like menus, controls etc. one clsParser is instanced and some of then create clsControl-instances. Like this, the whole UI of my program is created.

If the user i.e. clicks a button, the clsContol-instance sends an small script to the associated clsParser-instance: eval(OnClick);, where "OnClick" might be a string-hashtable key, created during the initialisation-process. If it exists, it is interpreted.

The main-loop:
The rendering-loop uses the same technic using a timer event, that sends eval(OnTimer); to the root-clsParser-instance. There is only one "OnTimer"-script that is processed sequencially and that controls the rendering of one videoframe at a fixed framerate.

The interpreter knows a pair of instructions: "LinkTimer" and "EvaluateTimer". "LinkTimer" is called with an index (and a char*) while instancing an object i.e. at startup and causes a this-pointer to be enterd into one of several lists selected by the given index.

"EvaluateTimer" is also called with an index and is used in the OnTimer-script. It causes the interpreter to loop through the list with the given index, sending an event-instuction to each of the linked clsParser-instances (still sequentially). Like this, the execution is grouped in the following ordered steps:
1. Update of basic parameters
2. OpenGL-vertex-array-calculations (only for vertx-based 3d-rendering)
3. OpenGL-in-memory-rendering (to frambuffer)
4. Render frame to output-windows
5. Render frames to previews in the control-window

Step 4 & 5 of this sequence can be run in a parallel thread. What i expect to happen here is the following:
wxWidgets will refreshes the UI after the root-clsParser-instance returns from processing the OnTimer-instruction, which is a non-opengl-process that also don't need my interpreter. Therefore i want to use my interpreter in a parallel thread, to render the OpenGL-content to the output-vieports.


Concerning code:
The following c++-codelines are part of clsParser:
1
2
3
4
    else if (!strcmp (nam, "run")) {
        if (useThreads) std::async(std::launch::async, &clsParser::Exec, this, (char*)list[0] ) ;
        else Exec((char*)list[0]);
    }

...where "(char*)list[0]" contains the string eval(Thread_REFRESH); refering to the following script-code-string:
1
2
3
4
5
6
var Thread_REFRESH     = '
    EvaluateTread       (_T_THREAD_IDX_OUTPUT      );
    if (uiCNT<=0) {     uiCNT = 1 + (8 * CPU * CPU);
        EvaluateTread   (_T_THREAD_IDX_REFRESH     );
    };
';

The c++-code of "EvaluateThread" is:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
    else if (!strcmp (nam, "EvaluateTread")) {  

        wxArrayPtrVoid items = TimerThreads[(int)(list[0][0])];    clsVar *tStr;   
                                                           
        for (int i = 0; i < items.Count(); i++) { 
            if (items[i] && (tStr = ((sTTI*)items[i])->p->strHash[

                    (char*)((sTTI*)items[i])->name

                ]) && (char*)*tStr) { 

                    ((sTTI*)items[i])->p->Exec((char*)*(tStr)); 
            }
        }
    }

... with "list" as a parameter-list (simelar argc, argv) containig "clsVar"-instances that is a typeindependent container-class. (...)->name contains the mentioned char*, set with the "LinkTimer"-call, so that (...)->p->Exec((char*)*(tStr)); will start a local interpreatation of a locally stored script-fragment accesed by "->name".
In case of refreshing the OpenGL-outputs, this locally intepreted script-code may look like this:
 
    Display.Refresh ();

...where "Refresh" is assiciated with the following c++-code in clsParser:
1
2
3
4
5
6
    else if  (!strcmp (nam, "Refresh")) {  
        if (Control->CTRL && Control->CTRL->IsShownOnScreen()) {
            Control->CTRL->Refresh(false); 
            Control->CTRL->Update();
        } 
    }                          

Here Control is the clsConrol-instanse, CTRL is a pointer to wxGLCanvas, Refresh() envokes the wxPaint-event and Update() causes immediate execution of the paint-event-function. I tried it with and without the Update()-call.

Remember that the rendering is already done at this point, but all images are still in GPU-memory. So inside of the paint-event, OpenGL only renders a scaled version of the framebuffer to the wxGLCanvas-window. So the vertex-array has only four points.

So, if this is not done by the main-thread, the main-thread returns after calling the if (useThreads) std::async(std::launch::async, &clsParser::Exec, this, (char*)list[0] ) ;. So it runs idle and has time to refresh all the other UI-elements.



Sorry, that it is so much text, but i had no idea hwo i choud have described it in shorter words. The problem is, that because of interpreter, the executed code is splitted into little parts, that are located all over the source-code of my program.


---

Okay, so far the theory. But i'm not sure if it really does what i think.

In between i found out, that with "useThreads = true", the measured time between the eval(OnTimer);-instruction and root-parsers return is visibly shorter. So obviously the execution really runs parallel, and it does not crash the program.

But - as said in the first post - the CPU-diagram in the resurce-monitor grows in this case.

As i have no real experience with mulithreading, i was interested in some more information, but in internet i mostly find tutorials how to program complex multithreading-applications, but not really much concerning what is happening in a running parallel thread and how it interacts with i.e. the window-management of the system.
So i would be happy, if someone could tell me more concerning this, or where to find more information.


Thank you very much,
Frank


I must say it was quite ambitious of you to even attempt to add threads to a large existing program without really having an understanding of threads to begin with.

A couple of things spring to mind.
> 4. Render frame to output-windows
> 5. Render frames to previews in the control-window

1. Are you creating a separate thread for every frame?
If your process monitor is simply tracking thread creation/destruction, then only displaying the results every second or so, then you're going to see a lot of threads within that time interval.

Another thing to watch out for is that WxWidgets is not thread-safe. All updates to the windows should be done via the UI thread ( the one that sprang from main() ).

I don't know the intricacies of OpenGL, whether it has a separate thread-safe context for different windows.
Assuming all is well, you still might find that multiple threads are blocking each other over some shared mutex.






Hello salem c,

sorry for the delay....

I must say it was quite ambitious of you to even attempt to add threads to a large existing program without really having an understanding of threads to begin with.


Puh, okay, but if you'd see what i really do... For me, this seems to be only a small aditional featue...

However:
1. Are you creating a separate thread for every frame?

I Think so.
I use std::async-calls without storing a pointer. Like this, i can't adress an existing thread a 2nd time. This is one of my reasons to ask my qusetion. It would be qiet simple, to keep the thread over longer time, because ti can use a simelar techinc, that a use for several other objects too, but i was not sure if this is more efficient.

watch out for is that WxWidgets is not thread-safe

I know abut this. There is a wx-funcrion (currently i dont know the name but i have it somwhere in my code porobably commented out), that can be used as some kind of mutex.

I don't know the intricacies of OpenGL

OpenGL is a state-machiene and therefore not thread-safe.

---

However...
As far as i understand, i can do some things parallel, like i.e. GPU-internal things and refreshing wxWidgets controls. It also seems to be possible to render to wxGLCanvas using OpenGL while other controls are refreshed by wxWidgets, because OpenGL-content is drawn by the "system" while wxWidgets only provides the graphic-context.

Especially the mentionad fact, that OpenGL is a state-machiene is somehow an advantage in this case, because also in a single threded program that renders several images per output-frame, you have to do things in isolated blocks. Also the interpreter is helpful, i.e. this:

My script-interpreter knows the common keyword "eval". Replacing "eval" by "run" in the script-code, will currently cause an std:aync-call at this point. During generation of a single frame, there are hundreds of "eval"-initiated subscript-executions done and it is quiet simple, to see if OpenGL- or wxWidgets-are done inside.

What i mean: The "eval" call is the key for it all. Currently i use it like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
        ...
        else if (!strcmp(instruction, "eval") { 
            [evaluate script-string single-threaded and wait for return] 
        }
        else if (!strcmp(instruction, "run")  { 
            if (useMultithreading) { 
                [evaluate script-string parallel and continue immediately] 
            }
            else  { 
                [evaluate script-string single-threaded and wait for return] 
            }
        }
        ...


Like this i don't need to parallelisize lots of functions to paralellisize the execution of some parts of the rendering-loop.

Finally this means, that i do not have to hard-code where i use threads. It's part of the scripst, but i have to know more about what happens, if system-routienes (liken i.e. refreshing a window) are called from inside my paralleliszed "run".
Refreshing windows for example, is done in idle-time, but which thread does the work? The calling thread, the main-tread or a system-thread?

Best,
Frank
Last edited on
Registered users can post here. Sign in or register to post.