No. I've had good results from Eigen's performance.
I recently used Eigen to replace old code using ATLAS. The use case was non-negative matrix factorization. The resulting code was roughly thrice as quick.
I'm not sure whether the performance is actually that good relative to modern versions of BLAS, but you have not provided a basis for comparison. Can you provide the specifics of your problem? What specifically is slow?
Thanks for sharing your experience! My problem is like this: under each iteration (loop), I need to do a multiplication of two 2X2 matrices. I first wrote a simple subroutine to do that and the code ran pretty fast. I then used Eigen to do the matrix multiplication and the code ran significantly slower. What do you think might be the problem? Many thanks!
I really can't say more without code and compiler flags.
- Are you compiling with a high level of optimizations enabled?
- Are you compiling with vectorization enabled?
- Are you using Eigen's fixed-size matrices?
- Compile for the native architecture (e.g., pass -march=native) and optimize for speed (-O3). The first flag allows the compiler to generate instructions available on your machine that otherwise wouldn't be used. SIMD (i.e., vector instruction) extensions are important examples of this.
- Use fixed-size matrices. Dynamically-sized matrices do not perform small size optimization and will allocate memory on the heap, but fixed-size matrices do not. This avoids extra indirection and the overhead of the memory allocation itself.
- Consider switching to matrices of single-precision float as long as your results maintain sufficient precision. The difference is that twice as many operations may occur in the same vector instruction (assuming 64 bit double and 32-bit float).
- Consider passing -ffast-math, as long as your results maintain sufficient accuracy. Floating point math is not strictly associative, but -ffast-math allows the compiler to perform certain transformations (e.g., reorderings) that would usually be illegal.