I found these two OSS sites which help simplify the complexities of threading in C++ and C.
You should check out Intel Cilk Plus. cilkplus.org has sample code, contributed libraries, open specifications and other information. Cilk supports task and data parallelism, making it easier to take advantage of more processing power and scale with ICC or GCC compilers. Also array notation simplifies vectorization, saving time.
Have you tried Intel Threading Building Blocks (TBB)? threadingbuildingblocks.org - Download the free OS version and check out the contributions, documentation, etc. TBB saves time over hand threading (p-threads, etc) and has a large set of components to do higher-level, task-based parallelism for scalable applications. It is also compiler independent.