Vector of struct vs. two 1D vectors

Pages: 123
if I recall correctly, .at is slower than []


Your recollection ability isn't impaired. [] doesn't do bounds checking but .at() does.
Can't compute_task_endpoints() be refactored as:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
static std::vector<int> compute_task_endpoints(int n_tasks, int n_iterations) {
	double constexpr a = -1;
	double constexpr b = 2 * n_part + 1;
	double constexpr a2 = 2 * a;
	double constexpr a4 = 2 * a2;
	auto constexpr n_part2 { n_part * (n_part + 1) };

	double const sqrb = squared(b);
	auto const n_part3 { squared(n_part) + n_part };

	std::vector<int> result(n_tasks + 1, 0);
	double k_previous = 0.0;

	for (int task = 1; task <= n_tasks; ++task) {
		double const c = ((n_tasks * (n_part - k_previous) * (n_part - k_previous + 1) -
			n_part2) / n_tasks) - n_part3;

		k_previous = (-b + sqrt(sqrb - a4 * c)) / a2;
		result[task] = static_cast<int>(floor(k_previous));
	}

	//sanity_check_task_endpoints(result, n_iterations);

	return result;
}

I'm sure it can, thanks.
I didn't bother optimizing it because it only runs one time ever.

I would leave the call to sanity_check_task_endpoints in because we're using floating point math to compute indices into an array, and we ought to detect rounding error if possible.
Last edited on
I would leave the call to sanity_check_task_endpoints in


Ah. I only commented it out so that the code would compile for me to make sure I had no typo/syntax errors (like unmatched brackets!). I forgot to remove the // for line 22 after I pasted the code. Doh!
Thanks mbozzi for the interesting reference and your prototyped implementation.

Since you "only" reached a speedup of 4% and the results in the paper don't look completely groundbreaking either, I think I will stick to the implemented dynamic schedule.
But I will keep this in mind for future optimization, I will come across loops like this again and again.

I think there are some typos in the paper, though. For example, the total number of iterations for N elements should be 0.5*N(N-1) not 0.5*N(N+1).
Also, in eq. (7) it says one needs to upround the resulting K1 to obtain the I1, whereas in (8) they round down.

I will likely open another thread soon to talk about modularity/utility of the code written with help of this thread.
Topic archived. No new replies allowed.
Pages: 123