Currently i have to implement the square root operation using the newton method.
For this i have to implement some functions in a header file. The rest is already provided. First part was to implement newton method itself which i got done somehow
Here is the code for this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
|
float sqrt1(float * a) {
float root;
// from here
// TODO: your code
float *ap = a; // value: 1354.77
float *rp = &root;
// ap is a pointer to the float value
// rp is a pointer to the value of the root
int *ai = reinterpret_cast<int *>(ap); // value: 1151948964
int *initial = reinterpret_cast<int *>(rp); // value: 0
// to get a good initial value
*initial = (1 << 29) + (*ai >> 1) - (1 << 22) - 0x4C000;// value: 1108339794
// casting initial back to float
float *initial2 = reinterpret_cast<float *>(initial); // value: 35.9808
// run newton
while(true) {
*rp = ((*ap / *initial2) + *initial2) / 2;
if (*initial2 != *rp) { *initial2 = *rp;}
else {break;}
}
// to here
return root; // value: 36.8072
}
|
(Probably not clean but its working and thats not what i am here for. I never learned C++ so i am jumping in cold water here)
My issue is with the second task. The description reads as followed:
Implement the function
"void sqrt 2 (float __restrict__ a, float __restrict__ root)"
which approximately calculates and returns the square root for four float values using the above method. a must be a pointer to four float values root is an output parameter that points to a memory area with four float values in which the results are written. The four calculations may not be made with a loop, so that the compiler can generate the SIMD commands, the calculation must be programmed in such a way that four identical commands (for a [0], a [1], a [2], a [3]) appear one after the other.
strict states that the pointers (guaranteed by the programmer) don't
point to the same memory areas. This is necessary so that the compiler has more optimization possibilities.
What i dont understand is how a single pointer can point to four different values. Same goes for the root output parameter how can this point to a memory area of four float values.
As far as i can see the parameters are just float pointers.
I just went ahead and treated it like an array or a list in any other language.
So my approach was to take the code of the first task and instead of taking a directly i took a[0] and the result goes into root[0] and then i copy this 4 times for a[0]-a[3] and root[0]-root[3].
Reading a[0] seems to work (even tho i have no idea how) but i cant make it work for root i can only write the result into root directly (root = &result).
Can someone explain me what is meant with "a must be a pointer to four float values" and "root is an output parameter that points to a memory area with four float values" and how to access it?
because i am pretty sure what i did is completely wrong.
Just in case here is a snipped how sqrt2 is called
1 2 3 4 5 6 7 8 9 10 11 12 13 14
|
const static int LOOP = 1000;
const static int N = 100000;
alignas(128) float floats[N * 4];
alignas(128) float roots[N * 4];
...
std::cout << "sqrt2 (Newton method for sequence of 4 floats)" << std::endl;
time.start_clock();
for (int j = 0; j < LOOP; j++) {
for (int i = 0; i < 4 * N; i += 4) {
sqrt2<LOOPS>(floats + i, roots + i);
}
}
|