Calculating Y = Kx + B getting wrong answers

I'm given years x and y and each year's "X" and "Y" data. I have to calculate K and B and output them, but my program's always slightly wrong. I can't figure out why.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#include <iomanip>
#include <iostream>
using namespace std;
int main() {
double a, b;
cin >> a >> b;
double xtotal, ytotal =0.0;
int diff = (int)(b-a);
double xarr[diff];
double yarr[diff];
//cout<<"diff was "<<diff<<" "<<endl;
for (int i=0; i<diff; i++){
    string dummy; cin>>dummy;
    double x, y; cin>>x>>y;
    xarr[i] =x; yarr[i]=y;
    xtotal+=x; ytotal+=y;
    } 
    
    double avgx = xtotal/(b-a);
    double avgy = ytotal/(b-a);
    //cout<<"Xavg was "<<avgx<<" yavg was "<<avgy<<" ";
    
    // ez, use x-mean squared and diff x * diff y to calc avgY - k*avgX, where k = sum(varx)/sum(vary).
    double varx, vary=0.0;
for (int i=0; i<diff; i++){
    varx+= (xarr[i]-avgx)*(xarr[i]-avgx);
    vary+= (xarr[i]-avgx)*(yarr[i]-avgy);
    //cout<<"At xarr[i]= "<<xarr[i]<<" Varx became "<<varx<<" Vary became "<< vary<<" ";
    } 
    double var = vary/varx;
    cout<<setprecision(12)<<var<<" "<<setprecision(12)<<avgy-var*avgx;
    return 0;
}


input: 1920 2010
1920: 62 264
1921: 89 336
1922: 90 333
1923: 60 269
1924: 95 344
1925: 111 388
1926: 85 327
1927: 132 447
1928: 67 279
1929: 68 317
1930: 62 211
1931: 73 299
1932: 125 434
1933: 120 441
1934: 99 347
1935: 79 357
1936: 69 283
1937: 57 261
1938: 125 404
1939: 138 441
1940: 118 404
1941: 144 421
1942: 114 372
1943: 87 328
1944: 85 293
1945: 92 334
1946: 55 253
1947: 61 265
1948: 145 451
1949: 149 414
1950: 139 449
1951: 127 421
1952: 50 243
1953: 52 240
1954: 142 403
1955: 62 267
1956: 75 294
1957: 115 394
1958: 65 283
1959: 142 449
1960: 62 268
1961: 142 445
1962: 102 356
1963: 118 393
1964: 60 268
1965: 78 307
1966: 100 348
1967: 146 453
1968: 123 401
1969: 105 365
1970: 80 312
1971: 116 389
1972: 107 350
1973: 116 383
1974: 80 303
1975: 135 409
1976: 122 403
1977: 147 457
1978: 120 451
1979: 137 440
1980: 96 314
1981: 84 312
1982: 80 383
1983: 82 298
1984: 95 409
1985: 108 337
1986: 135 437
1987: 99 359
1988: 67 284
1989: 52 248
1990: 149 451
1991: 65 244
1992: 98 347
1993: 95 334
1994: 127 357
1995: 103 369
1996: 52 245
1997: 125 444
1998: 116 387
1999: 98 347
2000: 127 412
2001: 132 420
2002: 78 322
2003: 125 405
2004: 136 390
2005: 143 451
2006: 129 422
2007: 66 282
2008: 52 279
2009: 57 264
2010: 67 283

output: 2.12395042531 141.282641951

expected: 2.12418318843 141.253016439

any insight?
Last edited on
1
2
    varx+= (xarr[i]-avgx)*(xarr[i]-avgx);
    vary+= (xarr[i]-avgx)*(yarr[i]-avgy);

> vary+= (xarr[i]-avgx)
Looks like an artifact of copy-pasting the same line? Or maybe that's intentional.

Edit: In case that isn't the only issue, more importantly, you should turn on compiler warnings.
1
2
3
    double varx, vary=0.0;
for (int i=0; i<diff; i++){
    varx+= (xarr[i]-avgx)*(xarr[i]-avgx);

What is the initial value of varx?
Turning on compiler warnings will give you a warning that varx is being used uninitialized.
Last edited on
int diff = (int)(b-a)

for (int i=0; i<diff; i++)
Almost certainly not. If a=1920 and b=2010 you should have 91 pieces of data, NOT 90. In code: b-a+1



The above is what is giving you your cited error.
However, ... in addition:

1
2
3
int diff = (int)(b-a);
double xarr[diff];
double yarr[diff];

This is illegal in standard c++. For a "standard" array its size must be known at run time. Use vectors, or, at least, arrays with some surplus size.



double xtotal, ytotal =0.0;
Nope, this only initialises ytotal, not xtotal.



double varx, vary=0.0;
Same problem.




You do know that there are simpler formulae to do linear regression, don't you? You shouldn't have to use two separate loops.
Slope, m = (N Sxy - Sx Sy ) / ( N Sxx - Sx Sx )
Intercept, c = ( Sy - m Sx ) / N

where
Sx = sum(x), Sy=sum(y), Sxx=sum(x * x), Sxy=sum(x * y)

and these sums can be incremented in a single loop as you read in x and y.

Last edited on
.
Last edited on
You are still getting the number of data points wrong when you divide by (b-a). It should be (b-a+1) on lines 19 and 20.



You don't need arrays.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#include <iostream>
#include <string>
using namespace std;

int main()
{
   int a, b;
   cin >> a >> b;

   int N = b - a + 1;
   double Sx = 0, Sy = 0, Sxx = 0, Sxy = 0;
   for ( int i = 1; i <= N; i++ )
   {
      string dummy;
      double x, y;
      cin >> dummy >> x >> y;
      Sx  += x;
      Sy  += y;
      Sxx += x * x;
      Sxy += x * y;
   }
   double m = ( N * Sxy - Sx * Sy ) / ( N * Sxx - Sx * Sx );
   double c = ( Sy - m * Sx ) / N;
   cout << "Slope: " << m << "      intercept: " << c << '\n';
}


Slope: 1.91101      intercept: 78.2802
Last edited on
also, the first loop is to input data, the second loop is for actual calculation. Thanks for the help guys.
Last edited on
Topic archived. No new replies allowed.