Sparse Matrix transpose

hello everyone.
I have a matrix in which most of the elements are zero. For efficiency purpose, I stored this matrix in csr format(Compressed row storage). Now I need to transpose this matrix. The transpose of csr is a matrix stored in csc(compressed column storage)format. I can either store the matrix in both format and then whenever I need the transpose I use the csc format or recover the original matrix using row_ptr,col_ind and non_zero value then obtain the transpose of matrix but none of them will be efficient in my work. I'm looking for an algorithm to give me the transpose just using three mentioned vectors.

example of csr storage
matrix A


csr storage format:

non_zero value{1,2,1,2,3,1,4,5,1,4};

Last edited on
No need to change the format. Just flip the row and columns on demand.
for(all the row ptr)
for(all the col ind)
do something with value, but instead of being at row,col its at col, row.

if you really need it you can generate the transpose by swapping the row and column pointers, right?
Last edited on
@jonnin thanks for your answer. the only thing that matters here is that the order of value in non-zero vector will change. but i don't know how make a connection between current row_tr and col_ind and new place of each non-zero element. I'm thinking to find the pair of row_ptr and col_ind for each non-zero element and then the new position instead of (row,col) will be (col,row), like what you said. but still I'm taking memory for this. I want to find one algorithm to take row_ptr and col_ind and give me new non_zero vector.
Last edited on
I don't think you can do it without SOME extra memory.
here is someone else's code:

I suspect that if you could chop value by its row start indices (so you can quickly access the start of each row) you could cook it up but that takes at least # of rows extra space.
doing it that way, you just go to the start of each row that exists (rows that don't exist are all zero, and so are the transposed columns!) and get the first value (tied to col 0) then the second value (tied to col 1) and so on to reassemble it.

I don't see any way to do it in place with this storage scheme.
an alternative scheme could do it in place... you need one where the row and col for the value are all 3 stored. The current scheme already has 2 of the 3 and almost the third, so a little more memory to store the rows explicitly makes this much easier to deal with.
you can do THAT with 2 values if you 'pack' the row and col into one integer, eg 32 bit int is 16 bits of row, 16 bits of col, or similar for 64 bit if your stuff is gigantic (32 and 32 in a 64).
Last edited on
I think you are right. It's not possible to do that without taking extra memory.
The only extra memory needed is that required to store the data in COO format, which you must use to transpose (or even convert between CSR and CSC). Transposition is otherwise an in-place transformation.

    • three arrays (R,C,V) where corresponding elements ⟶ (row, column, value) tuples
    • require the elements to be sorted in (R,C) order, as they are in your example
      (the strict sort requirement is for easy transposition conversion to|from CSR,CSC format)

To transpose a COO, simply perform a stable sort on the tuples with the column as the sort criterion, then exchange the row and column arrays.

    The array swap should be a pointer swap.
    If you are using std::vector, use the .swap() method.

Is there any technical reason you are using CSR over COO? A vector multiplication algorithm that operates over CSR, perhaps? Easy slicing?

[edit] fixed typo
Last edited on
Thanks for the answer. Yes there is a reason, I'm using this to operate a matrix-vector multiplication.
Okay, so, storing as CSR is only really useful for those particular operations — you should otherwise have space to store as COO normally. Convert to COO, perform your transpose, and convert back to CSR. This should not require additional memory — only that memory nominally required to have a COO to begin with. Hope this helps.
BTW, a CSR may actually use more space than a COO. The compression aspect of CSR (and CSC) only starts when there are more elements in the sparse matrix than rows (or columns). Again, the usefulness of each depends entirely on the algorithm you intend to use on the matrix...

How are you doing with this? Want some code?
Thanks for your answer.
What do you mean by this?
Again, the usefulness of each depends entirely on the algorithm you intend to use on the matrix

Actually I am using this storage format to assemble different matrices in the finite element. There will be more non-zero element than the number of rows or columns. that's why I am using csr. but I never thought which one will be more efficient, csr or csc. When would you prefer one over the abother?
Last edited on
Actually I am using this storage format to assemble different matrices in the finite element.

OK, so, that is something I have not done and cannot advise on intelligently.

What do you mean by this?
Again, the usefulness of each depends entirely on the algorithm you intend to use on the matrix

Choice of data structure depends on the algorithm being applied. Is there a specific structure most useful for the FEM algorithms you will apply? (What does the literature you have studied use?)

I am not convinced that the savings of CSR over COO is all that great, and if I read correctly, COO is just as available to FEM algorithms as CSR and other row-oriented storage structures. Just make sure to maintain invariant the (row, column) ordering of elements in the COO (as you would in CSR or any other matrix).

but I never thought which one will be more efficient, csr or csc. When would you prefer one over the abother?

Define “efficient”.

The best choice will be the one where required algorithms are easiest to understand and/or implement. For FEM stuff it looks like CSR would match your criterion.

Last edited on
Thanks for your answer. As you said in finite element all COO,CSR and CSC are available. I chose CSR cuz it was more convenient to apply. As far as I know some (row, column) pairs may appear more than once in COO. The solver will subsequently add them together but they take more memory than needed and also for some solvers COO is not a convenient format, so they will spend time converting it to a more suitable representation. I don't see that much difference between CSR and CSC.
Last edited on
Well, you cannot invert a CSR in-place. You must either:

  • have an intermediate conversion to COO
  • or produce a new instance of the CSR.

And again, I do not believe that the space savings are so great to be worth the effort of being particularly stuck on only using one format. It is not inefficient to use a little more memory than needed. The question should be only whether the memory used is excessive given normal memory constraints, or whether memory constraints are so strict that they require you to do some Tricky Programming.

That is, unless your memory constraints are so tight that you have to worry about them, then you have plenty of memory to store and convert between COO, CSR, and CSC, as needed by your algorithms.

But again I really do not know anything about your requirements other than a vague desire to reduce memory usage. I think you are focusing on the wrong part by playing with COO vs CSR|CSC.

The only differences between CSR and CSC are:

  • computationally: whether the algorithm works over rows or over columns, and
  • in memory: whether the number of rows or columns in the data is below an identifiably-useful threshold.

You really have not given me the impression that you have data supporting any significant need for small memory optimizations. The overhead of having the solver do a little extra work does not in itself justify storing your data in a slightly larger smaller memory footprint.

In the matrix you gave in the first post, for example, 40% of it is used, which is a lot, and the non-zero elements are spread such that there is only a 4-element advantage to using CSR|CSC over COO.

If you store your indices as (unsigned char) then that is only a 4-byte difference.
If you store your indices as (long or int) then that is only a 16-byte difference.
If you store your indices as (std::size_t) then that is only a 32-byte difference.

The savings only improve if you have fewer occupied rows (for CSR) or columns (for CSC), but again you will notice that the savings are really unremarkable. Even at the worst possible combination of types for your original example array it uses a maximum of 240 bytes (not counting bookkeeping for the vectors or arrays and matrix bounds, which adds maybe another 112 bytes of memory for 3×(ptr, capacity, size) + matrix bounds, none of this counting stuff on the stack) — all this assuming a 64-bit architecture using 16-byte pointers and 8-byte size_t.

The point is, you are piddling with grains of sand over CSR vs CSC vs COO.
Make your program convert your data to whatever structure is most useful as needed.

Hope this helps.

[edit] Fixed typo
Last edited on
I am glad you said that. I tried for a while to do it in place, and its maddening, seems like you almost can but never quite have all the info in a form you can use to do it.
OK, so, I need to step back a little.

You could do it in-place, but only if some specific (non-general) conditions exist: primarily that there exist equal number of non-zero columns as non-zero rows AND that there exist 1:1 partitioning between the initial column and final row subgraphs. I haven’t actually given it very much thought (at all), so I am very likely missing some other mathematical properties, but this already takes us to a point where the algorithm is far too complicated, requiring additional storage to simply build and process a transformation tree, either as an explicit structure or as data on the recursion stack.

In other words, doing it is programming nightmare fuel for an edge case.

You’re welcome :O)

I should add that my in-place algorithm above with a COO works, but it does require some care with sorting: in C++ it would be easiest to make a specialized iterator for the sparse matrix tuples that handles swap properly.

The simplest algorithm is to just build a new sparse matrix, taking the source as const argument.

And now... I’ve spent waaaaay too much time on this, lol.
Last edited on
Thanks for you good explanation. As jonnin said it seems you almost can but never have all the information in a form that you can use. At the end I used below algorithm to convert my csr to csc format.

Void csr_csc(m, n, nnz, csrRowPtr, csrColIdx, csrVal,
cscColPtr, cscRowIdx, cscVal)
// construct an array of size n to record current
available position in each column
∗curr = new int[n]();
for i ←0; i < m; i++ do
for j ←csrRowPtr[i]; j <csrRowPtr[i+1]; j++ do
cscColPtr[csrColIdx[j] + 1] + +;
for i ←1; i < n + 1; i++ do
cscColPtr[i]+=cscColPtr[i − 1];
for i ←0; i < m; i++ do
for j ←csrRowPtr[i]; j <csrRowPtr[i+1]; j++ do
loc = cscColPtr[csrColIdx[j]] + curr[csrColIdx[j]] + +;
cscRowIdx[loc] = i;
cscVal[loc] = csrV al[j];
delete[] curr;
Last edited on
...and now the transpose is ridiculously easy. :O)
Topic archived. No new replies allowed.