Huge matrix that adds and deletes rows dynamically

I'm trying to simulate the mean behaviour of an ensemble of neurons. This means I need to do calculations with a matrix of a couple of billions of elements (steps~10^6, neurons~10^4).

To avoid eating my whole RAM (and dying trying) I decided to delete rows from the matrix as soon as I'm done doing calculations with it. This is the code I've come up with (based on this question: http://www.cplusplus.com/forum/general/833/) but I'm still getting high RAM usage and segmentation fault if the matrix is too big (it works well if steps = 1000, for instance):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
int neurons = pow(10, 4);
int steps = pow(10, 6);

// Membrane potential matrix using st::vector
vector< vector<double> > v;

// First time step (initial state)
vector<double> row; // Create an empty row
for (int n = 0; n < neurons; n++) { // Initialise first row
	row.push_back(1 * n); // Add a column to the row
}
v.push_back(row); // Add the row to the main matrix
for (int n = 0; n < neurons; n++) {
	v[0][n] = v0;
}

double v_avg[steps + 1] = {v0}; // Mean membrane potential

// Loop
for (int i = 1; i < steps + 1; i++) {
	vector<double> row; // Create an empty row
	for (int n = 0; n < neurons; n++) { // Initialise first row
		row.push_back(i * n); // Add a column to the row
	}
	v.push_back(row); // Add the row to the main matrix
	for (int n = 0; n < neurons; n++) {
		// Calculations
		if(v[i-1][n] >= vp) {
			v[i][n] = -vp;
		}
		else {
			v[i][n] = v[i-1][n] + h * ( pow(v[i-1][n], 2) + I[i] + eta[n] );
		}
		v_avg[i] += v[i][n]; // Mean membrane potential
	}
	cout << "step " << i << "/" << steps << " done\n";
	v[i-1].clear(); // Clear row v[i-1]
	v_avg[i] = v_avg[i]/neurons;
}
v[steps+1].clear(); // Clear last row 
Last edited on
Why do you need to store data at all (time-)steps?

At any time you only are only working with time levels i and i-1, so you just need vectors of about 10000 elements to store each of these.

The vector you have asked for a million elements is v_avg, and you could write these values to disk instead if you really needed to store them at every time level (which I doubt).

Incidentally, as it stands, the line
double v_avg[steps + 1] = ...
is strictly illegal anyway, since steps isn't known at compile time, though you could make steps a const int to avoid this.

The other large-index variable is I[i]. You don't appear to set this anywhere after declaring your number of steps in line 2. If it stands for injected current shouldn't it be a function call, not a preset array element? It's not clear whether eta[n] is set anywhere either. If you are calling on variables here that you haven't set then you could easily create a segmentation fault by going outside allocated memory.

Use X * X, not pow( X, 2 ) for this and other small-integer exponents: it's far more efficient and also more correct if the answer is supposed to be an int. (pow() produces a double).

Finally, your integrate-and-fire model (lines 29 and 32) is designed to produce "spiky" neurons. You are going to need a tiny timestep h to keep this accurate and/or stable.
Last edited on
The reason why I need the whole matrix is because I need to implement a refractory time to the neurons that emit a spike ( if(v[i-1][n] >= vp){ ... } ), so that they stay in -vp for a certain amount of time (say 10 time steps) before being able to behave "normally".

I would need to create a new matrix that saves the times when a spike is emitted (store a 1 when it happens, pretty much). I haven't programmed that yet because I first want to deal with the memory.

The reason I decided to create rows on demand and deleting "old" ones is because of the spikes. I actually implemented the loop using only a 2×neurons matrix (v[0] for the "present" step, and v[1] for the "future" step, and reassigning v[0] = v[1] when the calculations were done) and it worked beautifully. But needing to track time on each neuron independently from the other when they emit a spike complicates the whole thing.

Technically, any method that would allow me to implement a refractory time, and to save both the v_avg (which I already do) and the spikes matrix on a file would help me.
Last edited on
To avoid eating my whole RAM (and dying trying) I decided to delete rows from the matrix as soon as I'm done doing calculations with it.

So then do you really need a multidimensional vector?

Why delete the rows?

Why not just reuse the vector and assign new values to the existing vector?

By the way do you know you can size a vector to the proper size when you construct the vector then instead of all the expensive push_back() operations you can just assign values to each element?
1
2
3
4
vector<double> row(neurons); // Create an empty row with "neurons" elements.
for (int n = 0; n < neurons; n++) { // Initialize the row.
	row[n] = n; 
}


// Also remember that arrays and vectors use zero based arrays, meaning that they start at element zero (v[0]) and end at size - 1.
for (int i = 1; i < steps + 1; i++) {
You're probably accessing your array out of bounds in this loop.

Be careful with this line: double v_avg[steps + 1] = {v0}; // Mean membrane potential
In C++ array sizes must be compile time constants (steps is not a compile constant). I'd suggest that you consider using a std::vector instead of the array.



The reason why I need the whole matrix is because I need to implement a refractory time to the neurons that emit a spike ( if(v[i-1][n] >= vp){ ... } ), so that they stay in -vp for a certain amount of time (say 10 time steps) before being able to behave "normally".

I would need to create a new matrix that saves the times when a spike is emitted


That's not true; you could store the time of last firing for each neuron as a vector with neurons elements.

Nowhere in your code or the underlying mathematical model do I see the need for a multi-dimensional array.

@jlb has indicated several points where you are likely to go outside array bounds. I still can't see what sort of entities I[] and eta[] are, and where they have been set.
Last edited on
I guess you are right, I could reuse the v vector (my original plan) and check if the neuron is on the refractory state in some other way.

Also remember that arrays and vectors use zero based arrays
, yeah, thanks for the warning, but the dimensions are correct (actually steps is defined as int steps = int((t_final - t_init)/h)).

I'd suggest that you consider using a std::vector instead of the array.
, that's a good idea, thanks.

I'll play with the code and see if I can find an optimal way to do the simulation in terms of memory (and compiling time, which shouldn't be a problem really).

yeah, thanks for the warning, but the dimensions are correct (actually steps is defined as int steps = int((t_final - t_init)/h)).

What? In the code you supplied steps is defined as: int steps = pow(10, 6);.


OP: could you post your entire program as it stands?
I still can't see what sort of entities I[] and eta[] are, and where they have been set.

Yeah, sorry, they are just vectors (the intensity I[] depends on the timestep, and the eta[] is a different constant for each neuron).

That's not true; you could store the time of last firing for each neuron as a vector with neurons elements.

That's a brilliant idea! I guess I've been thinking this code too much from a linear algebra point of view (mind you, I'm not a physicist, not a computer scientist; also I've been poking around with C++ only for a couple of weeks, this is not the kind of code I generally write).

EDIT: The entire program (without further modifications from the answers given here).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#include <iostream>
#include <fstream>
using namespace std;

#include <vector>

#include <cmath>
using std::pow;
using std::tan;
#include <cstdlib>
#include <ctime>

int main () {
	// Time
	double t_final = 100.0, t_init = 0.0;
	double h = 0.0001;
	int steps = int((t_final - t_init)/h);

	// Parameters
	int neurons = pow(10, 4);
	// int neurons = 5;
	double v0 = -2.0, vp = 100.0;
	double I0 = 3.0;

	// Initialize eta vector
	double eta[neurons];

	srand((unsigned)time(0));
	for ( int n = 0; n < neurons; n++ ) {
		eta[n] = tan(M_PI*(((double) rand() / (RAND_MAX))- 1/2 )) - 5;
	}
	// double I[steps + 1] = {3.0};
	double I[steps+1];
	std::fill_n(I, steps+1, I0);

	// Membrane potential matrix using st::vector
	vector< vector<double> > v;

	// First time step (initial state)
	vector<double> row; // Create an empty row
	for (int n = 0; n < neurons; n++) { // Initialise first row
		row.push_back(1 * n); // Add a column to the row
	}
	v.push_back(row); // Add the row to the main vector

	for (int n = 0; n < neurons; n++) { // Assign values to the first row
		v[0][n] = v0;
	}

	double v_avg[steps + 1] = {v0};

	// Loop
	for (int i = 1; i < steps + 1; i++) {
		vector<double> row; // Create an empty row
		for (int n = 0; n < neurons; n++) { // Initialise first row
			row.push_back(i * n); // Add a column to the row
		}
		v.push_back(row); // Add the row to the main vector
		for (int n = 0; n < neurons; n++) {
			// Calculations
			if(v[i-1][n] >= vp) {
				v[i][n] = -vp;
			}
			else {
				v[i][n] = v[i-1][n] + h * ( pow(v[i-1][n], 2) + I[i] + eta[n] );
			}
			v_avg[i] += v[i][n]; // Mean membrane potential
		}
		cout << "step " << i << "/" << steps << " done\n";
		v[i-1].clear(); // Clear row v[i-1]
		v_avg[i] = v_avg[i]/neurons;
	}
	v[steps+1].clear(); // Clear last row

	Save values in text file
	ofstream myfile ("v_avg.dat");
	if (myfile.is_open()){
		for ( int i = 0; i < steps+1; i++ ) {
			myfile << v_avg[i] << ", " ;
		}
		myfile.close();
	}
	else cout << "Unable to open file";

	return 0;
}

Last edited on
@aldomann

Thank-you for bringing an interesting problem to the forum and for posting your code. Just a few comments on the latest version - from inspection. I appreciate that you haven't yet made the changes we have flooded you with.

int neurons = pow(10, 4);
pow() returns a double, with possible floating-point inaccuracy. Better just to write
int neurons =10000;

Lines like
1
2
double eta[neurons];
double I[steps+1];

would be rejected by a standards-compliant compiler - the array size needs to be known at compile-time. You use vector elsewhere - why not use it here?

In
eta[n] = tan(M_PI*(((double) rand() / (RAND_MAX))- 1/2 )) - 5;
I'm afraid that 1/2 will do integer division ... and always result in 0. Better just to write 0.5 explicitly.

std::fill_n(I, steps+1, I0);
If I is a constant ... then save memory and code it just as the constant I0. Even if you are leaving open the possibility of subsequently having a time-dependent current then you could do it by a function call, rather than a pre-filled array. It only need be evaluated once per i-loop.

You still don't need a 2-dimensional vector v - just 2 time levels will suffice.

With v_avg[] I meant having a file open and writing the value (just of scalar v_avg at the end of each timestep), not storing to the end and then writing.

For the refractory period, perhaps set a vector<int> nextfire(neurons) which you set when the potential drops back to -vp.


However, my biggest questions are these.

(1) Why do you need arrays at all, multi-dimensional or otherwise? At the moment your nth neuron doesn't interact with any others and you could just do 10000 simulations sequentially. Is the interaction between neurons something that you are going to add later?

(2) Just a comment, but if I[] is constant and you have no interactions between neurons (as at present) the equation being solved in line 65 is
dv/dt = (v2 + constant)
This can be solved exactly without the need for numerical solution. (This will change if I[] ceases to be constant, or if you allow interaction between neurons - maybe this is something you are going to add.)






Last edited on
Why do you need arrays at all, multi-dimensional or otherwise?

Honestly, no idea. When working on systems like this analytically, I would just use linear algebra for convenience. Obviously, handling this problem in a program introduces a new paradigm and I should've thought this more carefully.

Even if you are leaving open the possibility of subsequently having a time-dependent current then you could do it by a function call

That is indeed the plan. It's how I actually do things in R (with which I'm much more familiar) for simulating the macroscopic system using fire rate equations.


Thanks guys for all the tips and for not being condescendent and for actually helping (unlike in StackOverflow). Sorry for the Frankenstein code; your answers have been truly enlightening.
Last edited on
Topic archived. No new replies allowed.