How can I send rows of a matrix to all the processes using MPI_Scatterv?
I am working with the MPI interface. I want to split a matrix (by rows) and distribute the parts among every process.
For example, I have this 7x7 square matrix M.
M = [
0.00 1.00 2.00 3.00 4.00 5.00 6.00
7.00 8.00 9.00 10.00 11.00 12.00 13.00
14.00 15.00 16.00 17.00 18.00 19.00 20.00
21.00 22.00 23.00 24.00 25.00 26.00 27.00
28.00 29.00 30.00 31.00 32.00 33.00 34.00
35.00 36.00 37.00 38.00 39.00 40.00 41.00
42.00 43.00 44.00 45.00 46.00 47.00 48.00
];
I have 3 processes, so the split could be:
- Process 0 gets rows 0, 1
- Process 1 gets rows 2, 3, 4
- Process 2 gets rows 5, 6
After the Scatterv, it should look like this:
Process 0:
M0 = [
0.00 1.00 2.00 3.00 4.00 5.00 6.00
7.00 8.00 9.00 10.00 11.00 12.00 13.00
];
Process 1:
M1 = [
14.00 15.00 16.00 17.00 18.00 19.00 20.00
21.00 22.00 23.00 24.00 25.00 26.00 27.00
28.00 29.00 30.00 31.00 32.00 33.00 34.00
];
Process 2:
M2 = [
35.00 36.00 37.00 38.00 39.00 40.00 41.00
42.00 43.00 44.00 45.00 46.00 47.00 48.00
];
I think I was clear enough about what I want to achieve. Feel free to ask if I didn't explain well.
Now, I show you my code:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define BLOCK_LOW(id,p,n) ((id)*(n)/(p))
#define BLOCK_HIGH(id,p,n) ((id+1)*(n)/(p) - 1)
#define BLOCK_SIZE(id,p,n) ((id+1)*(n)/(p) - (id)*(n)/(p))
#define BLOCK_OWNER(index,p,n) (((p)*((index)+1)-1)/(n))
void **matrix_create(size_t m, size_t n, size_t size) {
size_t i;
void **p= (void **) malloc(m*n*size+ m*sizeof(void *));
char *c= (char*) (p+m);
for(i=0; i<m; ++i)
p[i]= (void *) c+i*n*size;
return p;
}
void matrix_print(double **M, size_t m, size_t n, char *name) {
size_t i,j;
printf("%s=[",name);
for(i=0; i<m; ++i) {
printf("\n ");
for(j=0; j<n; ++j)
printf("%f ",M[i][j]);
}
printf("\n];\n");
}
main(int argc, char *argv[]) {
int npes, myrank, root = 0, n = 7, rows, i, j, *sendcounts, *displs;
double **m, **mParts;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&npes);
MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
// Matrix M is generated in the root process (process 0)
if (myrank == root) {
m = (double**)matrix_create(n, n, sizeof(double));
for (i = 0; i < n; ++i)
for (j = 0; j < n; ++j)
m[i][j] = (double)(n * i + j);
}
// Array containing the numbers of rows for each process
sendcounts = malloc(n * sizeof(int));
// Array containing the displacement for each data chunk
displs = malloc(n * sizeof(int));
// For each process ...
for (j = 0; j < npes; j++) {
// Sets each number of rows
sendcounts[j] = BLOCK_SIZE(j, npes, n);
// Sets each displacement
displs[j] = BLOCK_LOW(j, npes, n);
}
// Each process gets the number of rows that he is going to get
rows = sendcounts[myrank];
// Creates the empty matrixes for the parts of M
mParts = (double**)matrix_create(rows, n, sizeof(double));
// Scatters the matrix parts through all the processes
MPI_Scatterv(m, sendcounts, displs, MPI_DOUBLE, mParts, rows, MPI_DOUBLE, root, MPI_COMM_WORLD);
// This is where I get the Segmentation Fault
if (myrank == 1) matrix_print(mParts, rows, n, "mParts");
MPI_Finalize();
}
I get a Segmentation Fault when I try to read the scattered data, suggesting that the scatter operation did not work. I already did this with one-dimensional arrays and it worked. But with two-dimensional arrays, things get a little trickier.
Could you help me find the bug, please?
Thanks
MPI_Scatterv
needs a pointer to the data and the data should be contiguous in memory. Your program is fine on the second part, but MPI_Scatterv
receives a pointer to pointers to data. So it would be a good thing to change for :
MPI_Scatterv(&m[0][0], sendcounts, displs, MPI_DOUBLE, &mParts[0][0], sendcounts[myrank], MPI_DOUBLE, root, MPI_COMM_WORLD);
There are also a couple of things to change for sendcounts
and displs
: to go 2D, these counts should be multiplied by n
. And the count of receive in MPI_Scatterv
is not rows
anymore, but sendcouts[myrank]
.
Here is the final code :
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define BLOCK_LOW(id,p,n) ((id)*(n)/(p))
#define BLOCK_HIGH(id,p,n) ((id+1)*(n)/(p) - 1)
#define BLOCK_SIZE(id,p,n) ((id+1)*(n)/(p) - (id)*(n)/(p))
#define BLOCK_OWNER(index,p,n) (((p)*((index)+1)-1)/(n))
void **matrix_create(size_t m, size_t n, size_t size) {
size_t i;
void **p= (void **) malloc(m*n*size+ m*sizeof(void *));
char *c= (char*) (p+m);
for(i=0; i<m; ++i)
p[i]= (void *) c+i*n*size;
return p;
}
void matrix_print(double **M, size_t m, size_t n, char *name) {
size_t i,j;
printf("%s=[",name);
for(i=0; i<m; ++i) {
printf("\n ");
for(j=0; j<n; ++j)
printf("%f ",M[i][j]);
}
printf("\n];\n");
}
main(int argc, char *argv[]) {
int npes, myrank, root = 0, n = 7, rows, i, j, *sendcounts, *displs;
double **m, **mParts;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD,&npes);
MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
// Matrix M is generated in the root process (process 0)
if (myrank == root) {
m = (double**)matrix_create(n, n, sizeof(double));
for (i = 0; i < n; ++i)
for (j = 0; j < n; ++j)
m[i][j] = (double)(n * i + j);
}
// Array containing the numbers of rows for each process
sendcounts = malloc(n * sizeof(int));
// Array containing the displacement for each data chunk
displs = malloc(n * sizeof(int));
// For each process ...
for (j = 0; j < npes; j++) {
// Sets each number of rows
sendcounts[j] = BLOCK_SIZE(j, npes, n)*n;
// Sets each displacement
displs[j] = BLOCK_LOW(j, npes, n)*n;
}
// Each process gets the number of rows that he is going to get
rows = sendcounts[myrank]/n;
// Creates the empty matrixes for the parts of M
mParts = (double**)matrix_create(rows, n, sizeof(double));
// Scatters the matrix parts through all the processes
MPI_Scatterv(&m[0][0], sendcounts, displs, MPI_DOUBLE, &mParts[0][0], sendcounts[myrank], MPI_DOUBLE, root, MPI_COMM_WORLD);
// This is where I get the Segmentation Fault
if (myrank == 1) matrix_print(mParts, rows, n, "mParts");
MPI_Finalize();
}
If you want to know more about 2D arrays and MPI, look here
Look also at the DMDA structure of the PETSc library here and there