Heigh performance linear algebra algorithms

Hi,
I have a Matlab code that contains a lot of matrix-vector product operations (> 500 operations per/iteration, nbr of iterations can be more than 100 ) and matrix size in the order of (100000,100000). Then i thought to convert this code in C++ for more efficiency consedering one of these two solutions :
1- using BLAS and parallel CPU computing
2- using GPU (cublas for exemple)
can someone advice me about what should i choose solution 1,2 or other ?

Thank you
If you have GPU and the data can fit into its memory, then GPGPU would be my first choice. Matrix product is essentially a group of many simple and independent multiply and add operations and GPU is clearly made for that.
Topic archived. No new replies allowed.