Regression Function

Hey, since I got such useful feedback the last time I posted a part of this program, I'm doing it again. I'm worried about time constraints, as I'm doing a lot of vector creation, multiplication and accumulating throughout this formula, since all that has a real chance of causing slowdown when the thing has been run for a while and the vectors start to get chunky (and the robots that own them start to get more numerous).

I'd also love it if someone double checked me on the regression math I've done! College was a while ago, and while we got large doses of theory most of the practical applications were plugging data into a computer program rather than the back end of telling the computer "how to do'.

Notes:
vector<int> history_gather holds gather history from past periods

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
int Food::get_history_slope(){
	//http://en.wikipedia.org/wiki/Simple_linear_regression#Numerical_example
	//quick brush up on regression w/ example

	const int historysize = history_gather.size(); //reinforces that vector should not change size

	std::vector<int> time; 
	for (int i=0; i<historysize; i++)
		time.push_back(i);

	const int timesize = time.size(); 
	
	int stime = accumulate(time.begin(), time.end(), 0); 
	
	int sgather = accumulate(history_gather.begin(), history_gather.end(), 0); 
	
	//timextime block, would prefer to do this in vector<int> time rather than toss into new vector
	//would have to move below stxg init if that happened
	std::vector<int> timextime; 
	for (int i=0; i<timesize; i++)
		timextime.push_back(time.at(i)*time.at(i));
	int stxt = accumulate(timextime.begin(), timextime.end(), 0);

	//timexgather block, this one needs the new vector, might still be a quicker way to do it
	std::vector<int> timexgather;
	for (int i=0; i<historysize; i++)
		timexgather.push_back(history_gather.at(i)*time.at(i));
	int stxg = accumulate(timexgather.begin(), timexgather.end(), 0); 
	
	//this can result in a double, not too concerned about losing tail
	int top = ((timesize * stxg)-(stime*sgather));
	int bottom = ((timesize * stxt)-(stime*stime));
	int slope = top/bottom;

	return (slope);

}


Question:
Time and history vectors are almost certainly exactly the same size, since time is made from historysize, but things happen, eh? Should I use a check, or is it not worth the slight risk?

1
2
3
4
5
int size;
if (historysize == timesize)
size == historysize;
else
//piss pants 
Since your function builds the time vector from the history vector, if ever the sizes don't match, it
would be a programming bug most definitely inside your get_history_slope() function. If you
want to check at all, make it an assert( time.size() == history_gather.size() );

(assert should be used to detect outright programming bugs, unless you are writing code
that is life or death).
Here's the new version that also calculates standard deviation. I went from a return to an object (sds) within food that holds both stdev and slope. I ended up not adding the assert, although I still probably should.

Again, any tips or tricks to improve the code would be welcomed.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
void Food::get_history_reg(){
	//http://en.wikipedia.org/wiki/Simple_linear_regression#Numerical_example
	//quick brush up on regression w/ example

	const int historysize = history_gather.size(); //reinforces that vector should not change size

	std::vector<int> time; 
	for (int i=0; i<historysize; i++)
		time.push_back(i);

	const int timesize = time.size(); 
	
	int stime = accumulate(time.begin(), time.end(), 0); 
	
	int sgather = accumulate(history_gather.begin(), history_gather.end(), 0); 
	
	//timextime block, would prefer to do this in vector<int> time rather than toss into new vector
	//would have to move below stxg init if that happened
	std::vector<int> timextime; 
	for (int i=0; i<timesize; i++)
		timextime.push_back(time.at(i)*time.at(i));
	int stxt = accumulate(timextime.begin(), timextime.end(), 0);

	//timexgather block, this one needs the new vector, might still be a quicker way to do it
	std::vector<int> timexgather;
	for (int i=0; i<historysize; i++)
		timexgather.push_back(history_gather.at(i)*time.at(i));
	int stxg = accumulate(timexgather.begin(), timexgather.end(), 0); 
	
	//this can result in a double, not too concerned about losing tail
	int top = ((timesize * stxg)-(stime*sgather));
	int bottom = ((timesize * stxt)-(stime*stime));
	int slope = top/bottom;

	//standard deviation block
	std::vector<int> gather_devs;
	for (int i=0; i<historysize; i++) {
		int left = history_gather.at(i) - mean;
		gather_devs.push_back(left * left);
	}

	double dev_sum = accumulate(gather_devs.begin(), gather_devs.end(), 0);
	double avg_dev_sum = dev_sum/historysize;
	int std_dev = sqrt(avg_dev_sum);

	//sds is a struct that holds slope and std dev 
	sds.slope = slope;
	sds.std_dev = std_dev;
}
Topic archived. No new replies allowed.