sqrt with newton for four float values

Currently i have to implement the square root operation using the newton method.
For this i have to implement some functions in a header file. The rest is already provided. First part was to implement newton method itself which i got done somehow
Here is the code for this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
  float sqrt1(float * a) {
  float root;
  // from here
  // TODO: your code
  float *ap = a;                                          // value: 1354.77
  float *rp = &root;
  
  // ap is a pointer to the float value
  // rp is a pointer to the value of the root
  int *ai = reinterpret_cast<int *>(ap);                  // value: 1151948964
  int *initial = reinterpret_cast<int *>(rp);             // value: 0
  // to get a good initial value
  *initial = (1 << 29) + (*ai >> 1) - (1 << 22) - 0x4C000;// value: 1108339794
  // casting initial back to float
  float *initial2 = reinterpret_cast<float *>(initial);   // value: 35.9808

  // run newton
  while(true) {
    *rp = ((*ap / *initial2) + *initial2) / 2;
    if (*initial2 != *rp) { *initial2 = *rp;}
    else {break;}
  }
  // to here
  return root;                                            // value: 36.8072
}


(Probably not clean but its working and thats not what i am here for. I never learned C++ so i am jumping in cold water here)
My issue is with the second task. The description reads as followed:

Implement the function
"void sqrt 2 (float __restrict__ a, float __restrict__ root)"
which approximately calculates and returns the square root for four float values using the above method. a must be a pointer to four float values root is an output parameter that points to a memory area with four float values in which the results are written. The four calculations may not be made with a loop, so that the compiler can generate the SIMD commands, the calculation must be programmed in such a way that four identical commands (for a [0], a [1], a [2], a [3]) appear one after the other.
strict states that the pointers (guaranteed by the programmer) don't
point to the same memory areas. This is necessary so that the compiler has more optimization possibilities.

What i dont understand is how a single pointer can point to four different values. Same goes for the root output parameter how can this point to a memory area of four float values.
As far as i can see the parameters are just float pointers.
I just went ahead and treated it like an array or a list in any other language.
So my approach was to take the code of the first task and instead of taking a directly i took a[0] and the result goes into root[0] and then i copy this 4 times for a[0]-a[3] and root[0]-root[3].
Reading a[0] seems to work (even tho i have no idea how) but i cant make it work for root i can only write the result into root directly (root = &result).

Can someone explain me what is meant with "a must be a pointer to four float values" and "root is an output parameter that points to a memory area with four float values" and how to access it?
because i am pretty sure what i did is completely wrong.

Just in case here is a snipped how sqrt2 is called

1
2
3
4
5
6
7
8
9
10
11
12
13
14
const static int LOOP = 1000;
const static int N = 100000;
alignas(128) float floats[N * 4];
alignas(128) float roots[N * 4];

...

std::cout << "sqrt2 (Newton method for sequence of 4 floats)" << std::endl;
    time.start_clock();
    for (int j = 0; j < LOOP; j++) {
      for (int i = 0; i < 4 * N; i += 4) {
        sqrt2<LOOPS>(floats + i, roots + i);
      }
    }    
Last edited on
sqrt1 has undefined behavior because it violates [basic.lval], the strict aliasing rule
https://eel.is/c++draft/basic.lval#11

Use memcpy instead of pointer type punning.
pointers are like arrays.
float * fp = new float[4];
fp[0] = 3.14;
fp[1] = 2.78;
fp[2] = 42.0;
fp[3] = 2.0;

you should delete what you new when done with it.
delete[] fp;

however, you CAN just use arrays.
void root4(float* f) {code;}
float fp[4];
...
root4(fp); //fine, the compiler turns fp array to a pointer for you and root4 accepts this.

if you understand pointers,
What i dont understand is how a single pointer can point to four different values.

the answer is that out there in memory is a block of bytes that you have gotten access to (either as an array on the stack or new out in ram somewhere). Both arrays and pointers that are allocated as a block (as I did above) are solid blocks .. no bytes of other data from other programs etc between the locations ... so if you can get at the first location, you can get to all of them, they are just sizeof(float) offsets from the address in memory where it all was. the [] operator does the dirty work for you, but you can do it yourself (and many early / first pointer programs have you do that sort of thing with the addresses). There isnt any point to doing it, beyond seeing that its just bytes out in memory ...

if you want to play with sqrt, look at the ancient method of Babylon for fun too. Its amazing, given what they had to work with.
Last edited on
Ok i think i understand it now thanks for the replies.
Also figured out i had some error on Windows (probably stackoverflow) because of that i was not able to reference root[0] . On Linux it worked.
(On Windows too but only if i reduce the amount of loops that called sqrt2 which i didnt want to do).

Just to make a clean cut here the code i ended up with:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// get squareroot for four values of float
void sqrt2(float * __restrict__ a, float * __restrict__ root) {
  // from here
  for(int i = 0; i < 4; i++) {
		
    int *ai = reinterpret_cast<int *>(&a[i]);
	int *initial = reinterpret_cast<int *>(&root[i]);
	*initial = (1 << 29) + (*ai >> 1) - (1 << 22) - 0x4C000;
	float *initial2 = reinterpret_cast<float *>(initial);

	for(int j = 0; j < (int)LOOPS; j++) {
	   root[i] = ((a[i] / *initial2) + *initial2) / 2;
	   if (*initial2 != root[i]) {*initial2 = root[i];}
        }

  }
  // to here
}
I do not know what you mean about windows. I ran this in vs 2019 and it is fine. and 1000 loops is nuts, it converges in what, 10? less? Nice that you got it working!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#define __restrict__
#define LOOPS 1000
void sqrt2(float* __restrict__ a, float* __restrict__ root) {
    // from here
    for (int i = 0; i < 4; i++) {

        int* ai = reinterpret_cast<int*>(&a[i]);
        int* initial = reinterpret_cast<int*>(&root[i]);
        *initial = (1 << 29) + (*ai >> 1) - (1 << 22) - 0x4C000;
        float* initial2 = reinterpret_cast<float*>(initial);

        for (int j = 0; j < (int)LOOPS; j++) {
            root[i] = ((a[i] / *initial2) + *initial2) / 2;
            if (*initial2 != root[i]) { *initial2 = root[i]; }
        }

    }
    // to here
}


int main(int argc, char** argv)
{  
    float a[4]{ 2,4,17,50 };
    float r[4]{};
    sqrt2(a, r);
    int i = 0;
    for (auto& x : r)
        std::cout << x <<" " <<sqrt(a[i++]) << std::endl;
    return 0;
}
Last edited on
Topic archived. No new replies allowed.