Linear Regression Calculation issue

I'm not coming up with the right answer and can't find my error. Any feedback would be greatly appreciated. The answer should be y = 34.53 - 29.8.

Reading in the following from a text file:
0 0
1 6.88
1.5 15.48
2 27.52
2.5 43
3 61.92
3.27 73.56
3.496 84.08
3.5 84.28
3.524 85.44
3.732 95.82
4 110.1
4.5 139.3
5 172

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
using namespace std;

#include <iostream>
#include <fstream>
#include <cmath>

//global varibles
ifstream Infile;
int i;
int n;
float x[50];
float y[50];
float sumX;
float sumY;
float sumXY;
float sumX2;
float sumY2;
float slope;
float base;
float b, A, B, C, D;

//function prototypes
void getData();
float sumOfX();
float sumOfY();
float xsquared();
float ysquared();
void calcSlope();
void printResults();





int main()

{
    //call functions in an orderly fashion
    
    getData();//grab data from text file, seperate into x and y, line 76
    sumOfX();//calculate sum of x values, line 101
    sumOfY();//calculate sum of y values, line 121
    xsquared();//calculate sum of each x value squared, line 139
    ysquared();//calculate sum of each y value squared, line 155
    calcSlope();//calcualte slope and base, line 170
    printResults();// ouput results to screen, line 203
  
   
return 0;
}








//Function that pulls data in from a text file
void getData()
{
     cout<<"We will review the data that was imported from the supporting text file."<<endl;
     
     Infile.open("RegressionData.txt");
    while(!Infile.eof())
                        {
                         Infile>>x[i]; Infile>>y[i];
                         
                         system("pause");
                         cout<<x[i]<<"  "<<y[i]<<endl<<endl;
                         i++;
                         }
    Infile.close();
    n=i;
}








//Function to calculate the sum of x values
float sumOfX()
{
  int i;  
  float sumX=0;
  
   for(i=0; i<n; i++)
       sumX = sumX + x[i];
 
	return sumX;
}








//Function to calculate the sum of y values
float sumOfY()
{
  int i;  
  float sumY=0;
  
  for(i=0; i<n; i++)
       sumY = sumY + y[i];
 
 	return sumY;
}







//Function to calculate all x variables squared and added together
float xsquared()
{
      for(int i=0;i<=(n-1);i++)
             {
             sumX2 = sumX2 + pow((x[i]),2);
             }
      return sumX2;
}







//Function to calculate all Y variables squared and add them together
float ysquared()
{
      for(int i=0;i<=(n-1);i++)
             {
             sumY2 = sumY2 + pow((y[i]),2);
             }
      return sumY2;
}







void calcSlope()
{
    
    int i; //create i integer locally
	//float slope=0;//initialize slope
	
	//Assign data to local variables for use in calucaltion
	A=sumX;
	B=sumY;
	C=sumX2;
	D=sumXY;


	for(i=0; i<=n; i++)
	         {
	         A +=x[i];
             B +=y[i];          
             C +=x[i]*x[i];
             D +=x[i]*y[i];
	         }
     
      //Calculate Slope
      slope = (n*D-A*B) / (n*C-A*A);// M = (N*xysum) - (xsum*ysum)/(N*x2sum) - (xsum*xsum)
      
      //Calculate Base
	  base = (C*B-A*D) / (n*C);// B = (x2sum*ysum) - (xsum*xysum) / (N*x2sum) - (xsum*xsum)
	
	
}




void printResults()
{
     cout<<"\nThe sample size is "<<n<<endl;// display total pairs of coordinates
     cout<<"\nThe sum of the X values is "<<sumOfX()<<endl;// display sum of x coordinates
     cout<<"\nThe sum of the Y values is "<<sumOfY()<<endl;// display sum of y coordinates
     sumXY = sumOfX() * sumOfY();
     cout<<"\nThe product of X sum multiplied by Y sum is "<<sumXY<<endl;// display product of x and y sums
     cout<<"\nEach X value squared and added together is "<<xsquared()<<endl;// display x values squared and added
     cout<<"\nEach Y value squared and added together is "<<ysquared()<<endl;// display x values squared and added
     cout<<"\nThe slope is "<<slope<<endl;// display x values squared and added
     cout<<"\nThe intercept is "<<base<<endl;// display x values squared and added
     cout<<"\nThe linear regression equation is "<<"Y="<<slope<<"X (+) "; cout<<base<<endl;
     cout<<"\n\n\nHere we are, trapped in the amber of the moment. There is no why."<<endl;//provide thought provoking quote
     cout<<" - Vonnegut\n\n"<<endl;
     
   system("pause");
}   
closed account (SECMoG1T)
Consider intializing all these variables to 0, because they contain garbage and using them in expressions such as sumX=sumX+ x [i]; will just give you the wrong thing
I.e garbage+ x [i]

1
2
3
4
5
6
7
8
9
 int i;
 int n; 
 float sumX;
 float sumY;
 float sumXY;
 float sumX2; 
 float sumY2;
 float slope;
 float base;
Last edited on
Figured it out. Never finished the formula in line 178.

 
base = (C*B-A*D) / (n*C);// B = (x2sum*ysum) - (xsum*xysum) / (N*x2sum) - (xsum*xsum) 


Duh.

I'll be back with more questions. Been lurking in this forum and now that C++ class is getting harder I'll be visiting more often.

Cheers!

Tom
Your code may work, but it's probably by accident? What about andy1992 said? Your code relies on most of these values being zero when the program starts. What if you Now it turns out that Andy1992 is wrong: global and static variables are initialized to zero, but local variables are not.

You can remove the reliance on zero and also shorten your code by using function parameters and return values. For example, instead of sumOfX and sumOfY(), create one function that adds up teh numbers in an array:
1
2
3
4
5
6
7
8
float sumOfArray(float array[], int size)
{
    float total = 0.0;
    for (int i=0; i<size; ++i) {
        total = total + array[i];
    }
    return total;
}


Then inside main you can add up X and Y like this:
sumX = sumOfArray(X, n);
sumY = sumOfArray(Y, n);

In a similar way you can replace xsquared() and ysquared()
I haven't looked too hard through your code, but I'm not sure you're using the right formula.

m = (SUM (x[n] - x_average) * (y[n] - y_average)) / (SUM ((x[n] - x_average)**2)
b = y_average - m*x_average

There's really no easy way to separate those terms out.


Since this is for CS, I recommend you also try to do it without global variables. Your main function might look something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
int read_from_file( double x[], double y[], int n_max );
double average( double x[], int n );
double sum_product( double x[], double x_average, double y[], double y_average, int n );

int main()
{
  double x[50];
  double y[50];

  int n = read_from_file( x, y, 50 );
  cout << "There are " << n << " samples.\n";

  double x_average = average( x, n );
  double y_average = average( y, n );

  double m_numerator   = sum_product( x, x_average, y, y_average, n );
  double m_denominator = sum_product( x, x_average, x, x_average, n );

  double m = m_numerator / m_denominator;
  double b = y_average - m * x_average;

  cout << "y = " << m << "x + " << b << "\n";
}

The sum_product() function take a little thinking. Remember, the formula is:

    SUM ((x[i] - x_average) * (y[i] - y_average))
    ---------------------------------------------
    SUM ((x[i] - x_average) * (x[i] - x_average))

Notice that the only difference between the two is what the second factor is -- whether it comes from x[] or y[]. The routine computes the same thing, using either x[] and y[] or x[] and x[].

That's a lot of help. Good luck!
Topic archived. No new replies allowed.