Conjugate gradient using MPI

Forum

Forum
General C++ Programming
Conjugate gradient using MPI

Conjugate gradient using MPI

Cplusc (448)

Hello everyone.
I've deleted this question because I've posted it twice, sorry for inconvenience in advance.

Last edited on

TheIdeasMan (6781)

Please don't start a new topic for the same subject, it's ultimately a time waster when different people reply to both, often saying the same things.

Can you use a debugger to solve your problem?

lastchance (6980)

When I ran your code I got:

With ONE processor:

C:\c++>"C:\Program Files\Microsoft MPI\bin"\mpiexec -n 1 test 
a  :
7.00000 3.00000 1.00000 
3.00000 7.00000 0.00000 
1.00000 0.00000 10.00000 
b :
1.00000 -5.00000 2.00000 

b :
1.00000 -5.00000 2.00000 

x :
0.52417 -0.93893 0.14758

With TWO processors (which shouldn't work because 2 doesn't divide into 3):

C:\c++>"C:\Program Files\Microsoft MPI\bin"\mpiexec -n 2 test 
a  :
7.00000 3.00000 1.00000 
3.00000 7.00000 0.00000 
1.00000 0.00000 10.00000 
b :
1.00000 -5.00000 2.00000 

b :
1.00000 -5.00000 2.00000 

x :
-0.16667 0.83333 -0.33333

With THREE processors:

C:\c++>"C:\Program Files\Microsoft MPI\bin"\mpiexec -n 3 test 
a  :
7.00000 3.00000 1.00000 
3.00000 7.00000 0.00000 
1.00000 0.00000 10.00000 
b :
1.00000 -5.00000 2.00000 

b :
1.00000 -5.00000 2.00000 

x :
-0.16667 0.83333 -0.33333

You won't be able to run this problem with more than three processors because your matrix is only 3x3.

Also, T doesn't have to be as large as 1000 - it only needs to be as large as the number of rows in the matrix.

Last edited on

lastchance (6980)

Here's a serial solver for conjugate gradients that you can try.

Your results for more than one processor (i.e. actually using MPI) are wrong. You should test matrix multiplication separately.

X:
       0.524173      -0.938931       0.147583


Check:
B:
              1             -5              2
AX:
              1             -5              2

#include <iostream>
#include <iomanip>
#include <string>
#include <vector>
#include <algorithm>
#include <numeric>
#include <cmath>
#include <cassert>
using namespace std;

const double SMALL = 1.0E-30;          // used to stop divide-by-zero
const double NEARZERO = 1.0E-10;       // interpretation of "zero" for printing purposes

using vec    = vector<double>;         // vector
using matrix = vector<vec>;            // matrix (=collection of (row) vectors)

//======================================================================

void print( const string &title, const matrix &A )
{
   if ( title != "" ) cout << title << '\n';

   for ( auto &row : A )
   {
      for ( auto x : row ) cout << setw( 15 ) << ( abs( x ) < NEARZERO ? 0.0 : x );
      cout << '\n';
   }
}

//======================================================================

void print( const string &title, const vec &A )
{
   if ( title != "" ) cout << title << '\n';

   for ( auto x : A ) cout << setw( 15 ) << ( abs( x ) < NEARZERO ? 0.0 : x );
   cout << '\n';
}

//======================================================================

matrix matmul( const matrix &A, const matrix &B )          // Matrix times matrix
{
   int rowsA = A.size(),   colsA = A[0].size();
   int rowsB = B.size(),   colsB = B[0].size();
   assert( colsA == rowsB );

   matrix C( rowsA, vec( colsB, 0.0 ) );
   for ( int i = 0; i < rowsA; i++ )
   {
      for ( int j = 0; j < colsB; j++ )
      {
         // scalar product of ith row of A and jth column of B
         for ( int k = 0; k < colsA; k++ ) C[i][j] += A[i][k] * B[k][j];
      }
   }
   return C;
}

//======================================================================

vec matmul( const matrix &A, const vec &V )                // Matrix times vector
{
   int rowsA = A.size(),   colsA = A[0].size();
   int rowsV = V.size();
   assert( colsA == rowsV );

   vec C( rowsA, 0.0 );
   for ( int i = 0; i < rowsA; i++ )
   {
      for ( int k = 0; k < colsA; k++ ) C[i] += A[i][k] * V[k];
   }
   return C;
}

//======================================================================

double innerProduct( const vec &U, const vec &V )          // Inner product of U and V
{
   return inner_product( U.begin(), U.end(), V.begin(), 0.0 );
}

//======================================================================

double norm( const vec &V )                                // L2 norm of V
{
   return sqrt( innerProduct( V, V ) );
}

//======================================================================

bool conjugateGradient( const matrix &A, const vec &B, vec &X )
{
   double TOLERANCE = 1.0e-10;
   int n = A.size();
   X = vec( n, 0.0 );
   vec R = B, P = B;

   for ( int k = 0; k < n; k++ )
   {
      vec Rold = R;                                         // Store previous residual
      vec AP = matmul( A, P );

      double alpha = innerProduct( R, R ) / max( innerProduct( P, AP ), SMALL );
      for ( int i = 0; i < n; i++ ) 
      {
         X[i] += alpha * P[i];                              // Next estimate of solution
         R[i] -= alpha * AP[i];                             // Residual
      }
      if ( norm( R ) < TOLERANCE ) break;                   // Convergence test

      double beta = innerProduct( R, R ) / max( innerProduct( Rold, Rold ), SMALL );
      for ( int i = 0; i < n; i++ ) P[i] = R[i] + beta * P[i];       // Next gradient     
   }

   return true;
}

//======================================================================

int main()
{
   matrix A = { { 7, 3, 1  },
                { 3, 7, 0  },
                { 1, 0, 10 } };
   vec B = { 1, -5, 2 };

   vec X;
   if ( conjugateGradient( A, B, X ) )
   {
      print( "X:" , X );
      cout << "\n\nCheck:\n";
      print( "B:" , B );
      print( "AX:", matmul( A, X ) );
   }
   else
   {
      cout << "Unable to solve\n";
   }
}

lastchance (6980)

The main problem with your code is in routine VecxVec.
Only the root processor knows the content of arrays x[] and y[], so the lines

for (int i = 0; i < local_N; i++)
	{
		local_x[i] = x [i + my_rank * local_N];
		local_y[i] = y [i + my_rank * local_N];
	}

are invalid and should be replaced by root (processor 0) scattering the data as in your other routine:

1
2

        MPI_Scatter(x, n / num_procs, MPI_DOUBLE, local_x, n / num_procs, MPI_DOUBLE, 0, MPI_COMM_WORLD);
        MPI_Scatter(y, n / num_procs, MPI_DOUBLE, local_y, n / num_procs, MPI_DOUBLE, 0, MPI_COMM_WORLD);

When this is corrected then I can run on THREE processors:

C:\c++>"C:\Program Files\Microsoft MPI\bin"\mpiexec -n 3 test 
a  :
7.00000 3.00000 1.00000 
3.00000 7.00000 0.00000 
1.00000 0.00000 10.00000 
b :
1.00000 -5.00000 2.00000 

b :
1.00000 -5.00000 2.00000 

x :
0.524173  -0.938931  0.147583

The final answer is correct.

There are many problems with your code:
(1) You are leaking memory like a sieve; every new should have a corresponding delete[].
(2) The number of iterations T need be no more than N_DIM.
(3) Since matrix a[][] never changes you are wasting MPI communications by continually sending it out.
(4) Remove MPI_Barrier!!!
(5) (For this problem) you can ONLY solve on 1 or 3 processors. Consider using MPI_Scatterv and MPI_Gatherv, rather than MPI_Scatter and MPI_Gather, so that you aren't forced to balance the load on all processors.

In case anyone is interested in multi-processor, high-performance computing (even on a PC!) then some useful links (reference and tutorial) for MPI (message-passing interface) are:
https://www.mpi-forum.org/index.html
https://www.open-mpi.org/doc/current/
https://docs.microsoft.com/en-us/message-passing-interface/mpi-functions
https://www.mpich.org/
https://www.rookiehpc.com/mpi/docs/
https://mpitutorial.com/tutorials/
https://hpc-tutorials.llnl.gov/mpi/
https://stackoverflow.com/questions/tagged/mpi

If it's any use, the first exposure I got to MPI was via a training course at work (the meat of the course starts at about slide 28, after a lot of "housekeeping"; note that the course covered both C/C++ and Fortran bindings):
http://wiki.rac.manchester.ac.uk/community/MPI/course/Intro?action=AttachFile&do=get&target=ITS_RMPIINTRO_Nov2012.pdf

Last edited on

Cplusc (448)

Many thanks for your answer. could you please explain item number 3 with more details? and thanks for sharing mpi tutorial links with us.

lastchance (6980)

You have deleted your code from this thread, which makes it impossible for anyone else to see the gist of the replies. Please put it back.

However. Item 3 related to the fact that (for this problem) you were always using the same matrix to multiply a vector. Since MPI transfers are much slower than normal copies between memory locations you should avoid repeatedly scattering the same matrix rows to other processors. In this instance you could just do the scattering once in main() and only pass the local matrix as procedure argument.

Actually, your matrix multiply routines could have been refactored to use the local arrays only, with all the MPI transfers, reduction operations and array sizing taking place in main(). This avoids the repeated declaration of new arrays, which leads on to the following ...

Your more significant problem will be all the new without deletes. Unlike Fortran, C++ won't free the locally allocated memory at the end of a procedure, so you will gradually consume more and more of your available memory, with previous allocations becoming inaccessible. Google "memory leak".

Last edited on

againtry (2313)

Why not just put this re-direct:
http://www.cplusplus.com/forum/general/278038/#msg1200161

Topic archived. No new replies allowed.