I often use TBB (or TPL in Windows) library to realize concurrent shared-memory parallel programming.
I to some extent understand that they utilize thread systems managed by each OS and construct thread pool, and name the minimum unit of processing as task, then tasks are handled concurrently, which leads to efficient parallel computing.
I would like to use some cache data (variable, or pool data region. c.f. std::vector<DataPoolType> _cache(thread_number) # its index corresponds to thread(or task?) id) per each inimum unit of processing in order to maximize performance. However, I am afraid that data race might occur if the cache is accessed by different thread (or task?) index.
In tbb, I obtain thread index by using bb::task_arena::current_thread_index(), but I cannot understand that this is correct way to get individual (not overlapping) index from zero to thread number.
If you know some knowledge relating to this problem, I would appreciate it if you could advice me.