I sometimes think about load balance in TBB algorithms.
I know TBB automatically balance task's loads for each thread.
Here, what is difference between tbb::parallel_for and tbb::parallel_for_each in view of load balance?
(Q: do tbb alogrithms (e.g. parallel_do) automatically balance loads according to data chuncks?)
Such as the below codes,
parallel performance between them is almost the same, particularly?
(Or, parallel_for has some advantage over parallel_for_each?)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
std::vector<Obj> v;
// initialize v
tbb::parallel_for(tbb::blocked_range<std::size_t>(0,v.size()),
[&](const tbb::blocked_range<std::size_t>& r) {
for(size_t j=r.begin(); j!=r.end(); ++j){
// Do something to elements in v
}
});
tbb::parallel_for_each(v.begin(), v.end(),
[&](auto& v_elem) {
// Do something to elements (v_elem) in v
});
Though not problems of load balance,
I understand that tbb::parallel_for benefits from cache-efficient way by sequential access.
(I observed dramatic performance up)