Convert MATLAB function to C++

I have a function from a textbook: https://people.maths.ox.ac.uk/trefethen/spectral.html

Basically it calculates the Chebyshev differentiation matrix and I have been using it quiet a bit, so I'd like to have a similar one in C/C++.

MATLAB function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
function [ D, x ] = cheb ( N )
  if ( N == 0 )
    D = 0.0;
    x = 1.0;
    return
  end
  x = cos ( pi * ( 0 : N ) / N )';
  c = [ 2.0; ones(N-1,1); 2.0 ] .* (-1.0).^(0:N)';
  X = repmat ( x, 1, N + 1 );
  dX = X - X';
%  Set the off diagonal entries.
  D =( c * (1.0 ./ c )' ) ./ ( dX + ( eye ( N + 1 ) ) );
%  Diagonal entries.
  D = D - diag ( sum ( D' ) );

  return
end
 


My C/C++ attempt:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
void cheb(int N, double*& D, std::vector<double>& ygl) {
    if (N == 0) {
        D = (double*)fftw_malloc(sizeof(double));
        *D = 0.0;
        ygl.push_back(1.0); //adds a new element at the end of the vector, after its current last element 
        return;
    }

    ygl.resize(N + 1);
    for (int y = 0; y <= N; ++y) { //for (int i = 0; i <= N; ++i) {
        ygl[y] = 1. * cos(((y) * M_PI )/N);
    }

    std::vector<double> c(N + 1);
    for (int y = 0; y <= N; ++y) {
        c[y] = (y == 0 || y == N) ? 2.0 : 1.0;
        c[y] *= std::pow(-1.0, y);

    }
    double *X; 
    X = (double*) fftw_malloc(((N+1)*(N+1))*sizeof(double));
    for (int y = 0; y <= N; ++y) { 
        for (int x = 0; x <= N; ++x) {
            X[x + y * N] = ygl[y];
        }
    }

    double *dX; 
    dX = (double*) fftw_malloc(((N+1)*(N+1))*sizeof(double));
    for (int y = 0; y <= N; ++y) {
        for (int x = 0; x <= N; ++x) {
            dX[x + y * N] = X[x + y * N] - X[y + x * N];
        }
    }

    D = (double*)fftw_malloc((N + 1) * (N + 1) * sizeof(double));
    for (int y = 0; y <= N; ++y) {
        for (int x = 0; x <= N; ++x) { 
            D[x + y * N] += c[y] / (dX[x + y * N] + ((x == y) ? 1.0 : 0.0));
        }

    }

        for (int x = 0; x <= N; ++x) {
            D[x + x * N] -= D[x + x * N];
        }
    //test
    if (N == 8) {
        std::cout << "D for N=8:\n";
        for (int y = 0; y <= N; ++y) {
            for (int x = 0; x <= N; ++x) {
                std::cout << D[x + y * N] << " ";
            }
            std::cout << "\n";
        }
    }
    fftw_free(X);
    fftw_free(dX);
}


Main C/C++ code:
1
2
3
4
5
6
7
8
static const int nx = 8;
static const int ny = 8;
int main(){	
double *D; 
std::vector<double>  x;
cheb(ny, D, x);
fftw_free(D);
}

My C/C++ is incorrect, the calculations of x, c, X,dX are correct but the final matrix D is incorrect.

Correct output:

21.5000  -26.2741    6.8284   -3.2398    2.0000   -1.4465    1.1716   -1.0396    0.5000
    6.5685   -3.1543   -4.6131    1.8478   -1.0824    0.7654   -0.6131    0.5412   -0.2599
   -1.7071    4.6131   -0.7071   -3.0824    1.4142   -0.9176    0.7071   -0.6131    0.2929
    0.8100   -1.8478    3.0824   -0.2242   -2.6131    1.3066   -0.9176    0.7654   -0.3616
   -0.5000    1.0824   -1.4142    2.6131         0   -2.6131    1.4142   -1.0824    0.5000
    0.3616   -0.7654    0.9176   -1.3066    2.6131    0.2242   -3.0824    1.8478   -0.8100
   -0.2929    0.6131   -0.7071    0.9176   -1.4142    3.0824    0.7071   -4.6131    1.7071
    0.2599   -0.5412    0.6131   -0.7654    1.0824   -1.8478    4.6131    3.1543   -6.5685
   -0.5000    1.0396   -1.1716    1.4465   -2.0000    3.2398   -6.8284   26.2741  -21.5000
 

My C/C++ output:

0 26.2741 6.82843 3.23983 2 1.44646 1.17157 1.03957 -13.1371
-13.1371 0 -4.61313 -1.84776 -1.08239 -0.765367 -0.613126 -0.541196 0
0 -4.61313 0 3.08239 1.41421 0.917608 0.707107 0.613126 0
0 1.84776 3.08239 0 -2.61313 -1.30656 -0.917608 -0.765367 0
0 -1.08239 -1.41421 -2.61313 0 2.61313 1.41421 1.08239 0
0 0.765367 0.917608 1.30656 2.61313 0 -3.08239 -1.84776 0
0 -0.613126 -0.707107 -0.917608 -1.41421 -3.08239 0 4.61313 0
0 0.541196 0.613126 0.765367 1.08239 1.84776 4.61313 0 -0.519783
-0.519783 -1.17157 -1.44646 -2 -3.23983 -6.82843 -26.2741 inf 0


Thanks
Last edited on
remember that c++ indexes from 0 and matlab indexes from 1.
I say this because the second, third, ... entries on your C++ output match, but the first and last are borked, so it looks like somewhere you just iterated by the wrong value.
well, and the signs. Something is up there too.
pow is terribly slow, BTW, so -1 pow thing is inefficient, but get it working first.

.. just in your output loop I see <= N.
to print 8 values its for(i = 0; i < 8; i++) not <= ... if you did this in computations it would fry it.
Last edited on
You're allocating memory for N + 1 objects and then iterating from 0 to N inclusive (for (int y = 0; y <= N; ++y) ). As Jonnin mentions above, shouldn't you be allocating memory for N objects and then iterating from 0 to N - 1

 
for (int y = 0; y < N; ++y) 

etc.
The matrix should have dimensions n+1 by n+1, so it's not wrong to have <=. But, listen to Dijkstra, prefer < over <=.

The problem is that your array indexing uses N as the stride instead of N + 1. For example you've got
D[x + y * N]
but it should be
D[x + y * (N + 1)] because the matrix has N+1 columns.
Secondly didn't you want column-major order? x + y * width does row-major indexing, and column-major indexing looks like y + x * height.

Consider consistently using a function like ij_col_major from the other thread. This way you can forget about this detail.

Here's my translation of the MATLAB stuff.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
#include <iostream>
#include <vector>
#include <cmath>
#include <iomanip>
#include <fftw3.h>

static const int nx = 8;
static const int ny = 8;

int xy_col_major(int x, int y, int number_of_rows) { return y + x * number_of_rows; }

// Compute the mathematically correct result to pow(-1.0, n) for integer n.  The
// function name "m1_pow_n" abbreviates "minus 1 to the power of n".
double m1_pow_n(int n) { return (n % 2)? -1.0: +1.0; }
double squared(double x) { return x * x; }

struct chebyshev_matrix { double* D; double* x; };
chebyshev_matrix cheb2(int const n)
{
  int const dim = n + 1; // dimension (width & height) of the resulting matrix.

  double* D  = fftw_alloc_real(squared(dim));
  double* xv = fftw_alloc_real(dim); // Renamed x to xv to avoid name conflicts with row index x.
  double* c  = fftw_alloc_real(dim);
  double* X  = fftw_alloc_real(squared(dim));
  double* dX = fftw_alloc_real(squared(dim));
  if (!D or !xv or !c or !X or !dX) std::abort();
  
  if (n == 0) { D[0] = 0.0; xv[0] = 1.0; return { D, xv }; }
  
  for (int i = 0; i < dim; ++i) xv[i] = std::cos(i * M_PI / n);
  for (int i = 0; i < dim; ++i) c[i] = m1_pow_n(i);
  c[0] = 2.0 * m1_pow_n(0);
  c[n] = 2.0 * m1_pow_n(n);

  for (int x = 0; x < dim; ++x)
    for (int y = 0; y < dim; ++y)
      X[xy_col_major(x, y, dim)] = xv[y];

  for (int x = 0; x < dim; ++x)
    for (int y = 0; y < dim; ++y)
      dX[xy_col_major(x, y, dim)] = X[xy_col_major(x, y, dim)] - X[xy_col_major(y, x, dim)];

  for (int x = 0; x < dim; ++x)
    for (int y = 0; y < dim; ++y)
    {
      D[xy_col_major(x, y, dim)] = (c[y] * (1.0 / c[x]));
      D[xy_col_major(x, y, dim)] /= (dX[xy_col_major(x, y, dim)] + (x == y));
    }

  for (int y = 0; y < dim; ++y)
  {
    double sum_of_row = 0.0;
    for (int x = 0; x < dim; ++x)
      sum_of_row += D[xy_col_major(x, y, dim)];
    D[xy_col_major(y, y, dim)] -= sum_of_row;
  }

  fftw_free(dX);
  fftw_free(X);
  fftw_free(c);
  
  return { D, xv };
}

void print(double* m, int width, int height)
{
  std::cout << std::fixed << std::setprecision(4);
  for (int y = 0; y < height; ++y)
  {
    for (int x = 0; x < width; ++x)
      std::cout << std::setw(9) << m[xy_col_major(x, y, width)] << ' ';
    std::cout << '\n';
  }

  std::cout << '\n';
}

int main()
{	
  for (int n = 0; n < 6; ++n)
  {
    chebyshev_matrix const result = cheb2(n);
    double* D = result.D;
    double* x = result.x;
    print(D, n + 1, n + 1);
    fftw_free(x);
    fftw_free(D);
  }
}
Last edited on
Thanks A LOT everyone for the incredible help. I guess it's not such a great idea to code at 1 am after all!

@mbozzi (3887)
Secondly didn't you want column-major order?


Yes! But, I have two copies of my code one column-major and the other row-major for learning purposes and I just keep everything documented. I am trying several methods to solidify my c++ knowledge.

Also, I noticed you use struct a lot and I wonder if there's a preference between struct and class? Thanks again for the help.
Also, I noticed you use struct a lot and I wonder if there's a preference between struct and class?

The difference of struct and class is that in class members are private by default and in struct they are public by default.

In addition to how much you have to explicitly specify non-default access for some members, the choice also shows your intent.
The "class" implies that access to data members is via (public) member functions.
The "struct" implies direct access to data members -- less encapsulation.
usually one translates from M to C for performance reasons.
These days, professionally, you would probably translate M to eigen. I did a lot of this stuff by hand back in the day (think, 486 era) to avoid unnecessary matrix operations required by the linear algebra packages of the day, and had a whole M to C library I wrote.

But 95% of the performance work was the memory management, and that boiled down to two simple ideas that will help you a lot if that is your goal:
1) reuse memory via a manager class. Allocated a big pile, chop it up, hand it out, and recycle it when done. This saves a ton of allocate/free pairings that get expensive for all your intermediate work. This also keeps it all in a few pages, since its all allocated up front as one big block instead of here and there.

2) page faults and memory accessing. A quick example is the basic (not the smarter weird one) brute force matrix multiply: transpose in place and then across iterate rows instead of columns gave me a large increase. After realizing that, I rewrote large chunks of the library to, as you are doing incidentally, work on the transposed matrices. Then a little hand waving in the larger (not library) meta functions to typically transpose part of the input up front, do a lot of work, and transpose back at the end using the alternate forms gave me performance boosts similar to the multiply one on the grand scale. By storing your matrices in 1-d instead of 2-d, you can transpose in place very quickly and just swap row/col dimension values, no need to copy or reallocate the memory.

Hope that helps guide you as you dig into this.
Last edited on
@keskiverto
Thanks! This helps me a lot in understanding the main differences!

@jonnin
I have been mainly using Eigen with FFTW and it was super helpful to me. However, things sort of started getting messy. For example, using fftws require copy of my fftw_malloc data to/from Eigen matrices data and so on. Other than that the code I am building runs tens of hundreds (sometimes thousands) of iterations and I noticed Eigen significantly slows down the process and uses so much RAM. I am currently not using Eigen for those above reasons which I am not sure if I am making a mistake, but my code is not as heavy at least for now.

1) reuse memory via a manager class. Allocated a big pile, chop it up, hand it out, and recycle it when done. This saves a ton of allocate/free pairings that get expensive for all your intermediate work. This also keeps it all in a few pages, since its all allocated up front as one big block instead of here and there.


I am trying to do that at least with ffts since I am using them A LOT across my code and since my end goal is to parallelize everything with OpenMP.

All of your comments are helping me slowly achieve my goal, but I am a slow learner so unfortunately I may come back with more questions. Thanks!
heh, so much ram ... my 2 year old PC has 64 gb .. a number that, in my old age, I associate more with a hard disk (my first PC had I think 4 MB of ram!). Serious boxes have much more, of course. Don't fear ram use: the time space tradeoff, remember it? Using less memory may or may not be a good thing. As long as its well used, and you get something out of it.

Mistake? Hard to say. Rewriting your own, if its small, can be rewarding in both performance and learning to do it. But linear algebra goes sideways in a hurry if your matrices are not just so, due to numerical woes. I was lucky; I was working on very stable matrices from a control problem so I could all but ignore most numerical issues (we hit a few, and handled them as they came). But the general purpose stuff has to assume you have bad matrices and probably does extra work when its not necessary.

The bottom line is that you can probably code up a solver that will run faster than eigen if the problem is relatively small or simple. Its similar to how you can write assembly that runs faster than C++ ... it takes a lot of time and is bug prone and has lots of negatives, but if its worth doing that way, its rewarding. So the mistake to consider is whether or no reinventing the wheel is worth doing for a little speed. If eigen isn't fast enough, the first thing to do is review you process, make sure you have the right algorithm. If you are sure about that part, check that you used the library to express that algorithm in the best way. If you did all that and its not fast enough, then maybe you are on the right track to build a dedicated solver of your own.
Last edited on
@jonnin

Mistake? Hard to say. Rewriting your own, if its small, can be rewarding in both performance and learning to do it


That's truly the most apt description of my current situation. I have never formally learned to code and in writing this code I have learned SO MUCH maybe too much too fast. Nonetheless, it has been rewarding and I found this website to be an amazing resource. I am running more performance tests this month for Eigen and non-Eigen code and will hopefully reach to a solid conclusion about the best choice going forward. Thanks again!
@mbozzi

Also, I noticed something when I try to use the output D of cheb2(int const n) to perform matrix multiplication with different sizes I run into the issue of indexing D correctly. For example,

1
2
3
4
5
6
7
8
9
fftw_complex *duyk;
duyk = (fftw_complex*) fftw_malloc((nx*(ny + 1)) * sizeof(fftw_complex));
for(int y = 0; y < ny+1; y++){
            for(int x = 0; x < nx; x++){
                  for (int k = 0; k < ncomp; k++){
                        duyk[xy(x, y, nx)][k] = uFunk[xy(x, y, nx)][k] * D[xy(x, y, (ny+1))];
                  }
            }
      }

This obviously returns incorrect result since D must be indexed the following:
print(D, n + 1, n + 1); and uFunk,duyk must be indexed differently as printK(uFunk,nx,ny+1);. I am not sure if I am clear on what the issue is.
I noticed something when I try to use the output [...] to perform matrix multiplication
I'm not sure it's the right algorithm. Try something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
void matrix_product(
  fftw_complex* out,
  fftw_complex const* a, int a_width, int a_height,
  double const*       b, int b_width, int b_height)
{
  for (int x = 0; x < b_width; ++x)
    for (int y = 0; y < a_height; ++y)
    { 
      fftw_complex inner_product {0.0, 0.0};
      for (int i = 0; i < a_width; ++i)
      {
        double const p_real = a[xy(i, y, a_width)][0];
        double const p_imag = a[xy(i, y, a_width)][1];
        double const q = b[xy(x, i, b_width)];
        
        inner_product[0] += p_real * q;
        inner_product[1] += p_imag * q;
      }
      
      out[xy(x, y, b_width)][0] = inner_product[0];
      out[xy(x, y, b_width)][1] = inner_product[1];
    }
}

// ...

fftw_complex *duyk = (fftw_complex*) fftw_malloc((nx*(ny + 1)) * sizeof(fftw_complex));
matrix_product(duyk, uFunk, nx, ny + 1, D, n + 1, n + 1);

Last edited on
@mbozzi

This returns an error:
corrupted size vs. prev_size
Aborted (core dumped)
which I think is a size issue has to do with the line:
out[xy(x, y, b_width)][1] = inner_product[1];

So I will have to look into this more carefully. Thanks!
My mistake. I think the problem is that duyk is too small to hold the entire matrix product, which should have ny + 1 rows and n + 1 columns.

Test code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
#include <complex>
#include <iostream>
#include <iomanip>

#include <fftw3.h>

int xy(int x, int y, int number_of_cols) { return x + y * number_of_cols; }

void matrix_product(
  fftw_complex* out,
  fftw_complex const* a, int a_width, int a_height,
  double const*       b, int b_width, int b_height)
{
  for (int x = 0; x < b_width; ++x)
    for (int y = 0; y < a_height; ++y)
    {
      // compute inner product of xth column of b with yth row of a
      
      fftw_complex inner_product {0.0, 0.0};
      for (int i = 0; i < a_width; ++i)
      {
        double const p_real = a[xy(i, y, a_width)][0];
        double const p_imag = a[xy(i, y, a_width)][1];
        double const q = b[xy(x, i, b_width)];
        
        inner_product[0] += p_real * q;
        inner_product[1] += p_imag * q;
      }
      
      out[xy(x, y, b_width)][0] = inner_product[0];
      out[xy(x, y, b_width)][1] = inner_product[1];
    }
}

void print(fftw_complex* m, int width, int height)
{
  std::cout << std::fixed << std::setprecision(2);
  for (int y = 0; y < height; ++y)
  {
    for (int x = 0; x < width; ++x)
      std::cout << std::setw(6) << std::complex{m[xy(x, y, width)][0], m[xy(x, y, width)][1]} <<  ' ';
    std::cout << '\n';
  }

  std::cout << '\n';
}

int main()
{
  fftw_complex a[] {
    {1.0, 0.0}, {2.0, 0.0}, {3.0, 0.0},
    {4.0, 0.0}, {5.0, 0.0}, {6.0, 0.0}
  };
  int const a_width = 3;
  int const a_height = 2;

  double b[] {
    7.0,  8.0,
    9.0, 10.0,
    11.0, 12.0
  };
  int const b_width = 2;
  int const b_height = 3;

  int const p_width = b_width;
  int const p_height = a_height;
  fftw_complex p[p_width * p_height];
  matrix_product(p, a, a_width, a_height, b, b_width, b_height);
  print(p, p_width, p_height);
}

@mbozzi

I think the problem is that duyk is too small to hold the entire matrix product, which should have ny + 1 rows and n + 1 columns.


Fixing the size of duyk does git rid off the error, however the result isn't really correct. The thing is duyk is supposed to be a ny+1 rows by nx columns and that multiplied by D of ny+1 by ny+1 size which in MATLAB is done I guess differently?
Topic archived. No new replies allowed.