Apr 4, 2012 at 11:27am UTC
Hi all,
I have a question concerning matrix multiplication. The following code is the way I would do it (the example is just for 3x3 matrices).
1 2 3 4 5 6 7 8 9 10 11 12
int C[3][3];
int A[3][3] = {{1,6,2},{2,7,9},{5,2,2}};
int B[3][3] = {{2,1,5},{1,1,4},{2,9,3}};
for (int leftLoop = 0; leftLoop < 3; leftLoop++){
for (int rightLoop = 0; rightLoop < 3; rightLoop++){
C[leftLoop][rightLoop] = 0.0;
for (int innerLoop = 0; innerLoop < 3; innerLoop++)
C[leftLoop][rightLoop] += A[leftLoop][innerLoop]*B[innerLoop][rightLoop];
}
}
But in my "great" book it is said that it is actual better to first transpose Matrix B and use the following lines for the last loop (BT is transposed matrix B):
C[leftLoop][rightLoop] += A[leftLoop][innerLoop]*BT[rightLoop][innerLoop];
I just don't see the advantage to do it like this. Am I missing something?
Last edited on Apr 4, 2012 at 11:29am UTC
Apr 4, 2012 at 2:18pm UTC
There is no advantage to using transpose, only a waste of time. Maybe the book though it was more clear that way.
Apr 4, 2012 at 2:27pm UTC
Just taking a guess here, but wouldn't a transposed B be better for cache performance taking into account that A is used "per row" and B "per column"?
Last edited on Apr 4, 2012 at 2:27pm UTC
Apr 4, 2012 at 2:54pm UTC
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Matrix Matrix::operator *(const Matrix& test)
{
if ( test.num_cols() == this ->num_rows()) {
Matrix tmp(num_rows(), test.num_cols());
for (int i = 0; i < num_rows(); i++){
for (int j = 0; j < test.num_cols(); j++){
for (int l = 0; l < num_cols(); l++){
tmp.mydata[i][j] += this ->mydata[i][l] * test.mydata[l][j];
}
}
}
return tmp;
}
else {
Matrix tmp(num_rows(), test.num_cols());
tmp.set_rows(0);
return tmp;
}
}
thats my code of how i did it with an overloaded operator of 2 matrix classes, you can use that to figure it out .
Last edited on Apr 4, 2012 at 2:57pm UTC
Apr 5, 2012 at 8:43am UTC
I don't know how this class should help me decide which method is better (transposed or not). Maybe I should just look at the running time for both implementations.