MPI_Create_op

I am working with two MPI processors and I need to create my own MPI operation and recover the results into a vector which will be of size 4 at the end.

Local result for Processor 0:
vector result{1,3,4} // local value
vector local2global {0,1,2} // The vector is saying which elements of global vector either in part or whole are being calculated in this processor
vector global2local {0,1,2,-1} // auxiliary vector in order to access the value of local result initialized with -1
vector mpi_repetition {2,1,2,1} // the number of time that one node is repeated among processors,

Local result for Processor 1:
vector result{2,5,6} // local value
vector local2global {0,2,3} // The vector is saying which elements of global vector either in part or whole are being calculated in this processor
vector global2local {0,-1,1,2}
vector mpi_repetition {2,1,2,1}

I wrote the following piece of code but when I run it I get the error " _STL_ASSERT(_Ptr_user[-2] == _Big_allocation_sentinel, "invalid argument");"

#include "iostream"
#include "mpi.h"
#include "vector"
#include <stdio.h>
#include <stdlib.h>

typedef std::vector<int> vecint;
typedef std::vector<vecint> vecvecint;

void my_reduce_function(void* inputBuffer, void* outputBuffer, int *len, MPI_Datatype* datatype)
{
    int myrank, num_procs;
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &num_procs);

    vecvecint* input = (vecvecint*)inputBuffer;
    vecint* output = (vecint*)outputBuffer;

//input[0]=local result 
//input[1]=local2global 
//input[2]=global2local 
//input[3]=mpi_repetition 

    for (int proc = 0; proc < num_procs; proc++) {
        if (myrank == proc) {
            for (int i = 0; i < (*input)[1].size(); i++)
            {

                (*output)[(*input)[1][i]] += ((*input)[0][(*input)[2][(*input)[1][i]]])/(*input)[3][(*input)[1][i]];

            }
        }

        MPI_Barrier(MPI_COMM_WORLD);
    }
}


int main(int argc, char* argv[])
{

    int myrank,num_procs;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
    int root_rank = 0;
    vecvecint data;

    //if (num_procs != 2)
    //{
    //    printf("This application is meant to be run with 2 processes.\n");
    //    MPI_Abort(MPI_COMM_WORLD, EXIT_FAILURE);
    //}

    for (int proc = 0; proc < num_procs; proc++) {

        if (proc == 0) {

            data = { { 1,3,4 },{ 0,1,2 },{ 0,1,2,-1},{ 2,1,2,1 } };
        }

        else {

            data = { { 2,5,6 },{ 0,2,3 },{ 0,-1,1,2 },{ 2,1,2,1 } };
        }
    }

    MPI_Op operation;
    int len = 4;
    MPI_Op_create(&my_reduce_function, 1,&operation);
    vecint reduction_results;
    MPI_Reduce(&data, &reduction_results, len, MPI_INT, operation, root_rank, MPI_COMM_WORLD);

        if (myrank == root_rank)
        {
            printf("The sum of first elements of data is %d.\n", reduction_results[0]);
            printf("The sum of second elements of data is %d.\n", reduction_results[1]);
            printf("The sum of second elements of data is %d.\n", reduction_results[2]);
            printf("The sum of second elements of data is %d.\n", reduction_results[3]);
        }

    MPI_Op_free(&operation);
    MPI_Finalize();
    return EXIT_SUCCESS;
}

Last edited on

lastchance (6980)

This is truly weird code.

In your

MPI_Reduce(&data, &reduction_results, len, MPI_INT, operation, root_rank, MPI_COMM_WORLD);

the contents of your send buffer are definitely not MPI_INTs.

https://www.mpich.org/static/docs/v3.3/www3/MPI_Reduce.html

Cplusc (448)

Why it's not MPI_INT?

lastchance (6980)

Why it's not MPI_INT?

Because data is a vector not a c-style array, and its contents are vector<int> not int.

Also, if you write

1
2

    for (int proc = 0; proc < num_procs; proc++) {
        if (myrank == proc) {

in your my_reduce_function then I don't think you know how MPI works. Each processor has its own required data - in this case the input buffer.

Last edited on

Cplusc (448)

data for first processor is {1,3,4} ,actually this is representing {val,val,val,0}
data for second processor{2,5,6} , representing {val,0,val,val}.
In this simple case I don't need such procedure but in case when this local vectors contain lots of elements with lots of zero elements implementing this code would save up memory.

I was aiming to do the following steps in my code.

1- sum up the first element of local result vector in processor 0 and first element of vector in processor 1 and divide by the number of their repetition and put the result in the first position in global vector.

2-put the second element of vector in processor 0 directly in the second position of global vector

3-sum up the third element of vector in processor 0 and second element of vector in processor 1 and divide by the number of their repetition and put the result in the third position in the global vector.

4-put the third element of vector in processor 1 directly into the 4th position in the global vector.

For doing this I need the local2global, global2local and mpi_repetition vector. That's why I defined a vector of vector as input buffer. if it was just like int* input = (int*)inputBuffer, then probably I would not be having problem with buffers.

Last edited on

Cplusc (448)

Is that a good idea if I make a structure containing 4 vectors including local_res, local2global, global2local and mpi_repetition. Then my_reduce_function will receive the structure inside the buffer.I just got confused with something. When writing user defined operation in mpi, if several vectors,array or vector of vector are needed how I can pass them through the function that mpi can handle them as
typedef void MPI_User_function( void *invec, void *inoutvec, int *len, MPI_Datatype *datatype).
Also I read about the commutativity and associativity and I didn't get when one is preferred over the other and more efficient. It seems it depends on the code implementation and the application. Any help with this?

Last edited on

keskiverto (10365)

Also I read about the commutativity and associativity and I didn't get when one is preferred over the other and more efficient.

Do you mean the:

a * b = b * a
(a * b) * c = a * (b * c)

They are closely related properties that some operators have.

lastchance (6980)

Cplusc wrote:
Is that a good idea if I make a structure containing 4 vectors including local_res, local2global, global2local and mpi_repetition.

If you intend to use MPI, then, no. MPI distributes contiguous data of POD type.

Cplusc wrote:
When writing user defined operation in mpi, if several vectors,array or vector of vector are needed how I can pass them through the function that mpi can handle them as

Again, you can't. You can define new MPI datatypes to handle structures containing plain only data (POD), but you would have to send advanced container types like vectors as size (one send) followed by their data buffer (another send).

Under no circumstances should you attempt to send vectors of vectors like this - they don't contain contiguous data. In Fortran you can allocate multi-dimensional arrays on the fly with a single statement and guarantee contiguous data, but in C++ you should use flattened arrays with MPI.

There's a lot of people advocating always using vectors rather than new and delete with standard arrays, and (coughs) "smart" pointers rather than "raw" pointers ... but they really aren't doing you any favours when it comes to parallelisation with MPI.

Last edited on

Cplusc (448)

@lastchance I was thinking to use int MPI_Type_create_struct(int block_count, const int block_lengths[], const MPI_Aint displacements[], MPI_Datatype block_types[], MPI_Datatype* new_datatype);
and then send this structure as input buffer to the user defined function (my_reduce_function) and then do the reduction. Flattening the vector is not the case here, because I may have different type of vectors. Could you give me any idea to solve my problem?

Last edited on

lastchance (6980)

You can't send a type of structure with something that is variable-length inside it. You would have to send those individual components separately - or combine, say, all the double into a single 1d array and send that (together with enough displacement information to unpack it at the end).

Cplusc (448)

@lastchance Exactly the problem is I can't send it because it's not a constant size and as you said it's variable-length. It would be great if it was possible to write a user defined function in mpi which takes several input buffer.

lastchance (6980)

I don't really understand what you are trying to do, but you might consider using MPI_Pack
https://www.open-mpi.org/doc/v3.1/man3/MPI_Pack.3.php
(and, obviously, MPI_Unpack).

Note that this would require you to pack everything you want to send into a single 1d buffer before sending (and unpack it at the other end). Both processors must also have the information to do this.

Reminder, however, that you cannot just send a std::vector per se, and you certainly can't send a vector of vectors.

Cplusc (448)

For (int i=0;i<local2global.size();i++){

global_result [i]+=local_result [global2local[local2global[i]]];
}

Last edited on

Topic archived. No new replies allowed.