find the fit parameters which minimize chi-square functions

dear programmers,

I have a small computationnal problem. I move to C++ as Mathematica (which I was using so far) is no longer adpated for this.
I would like to use/write a minimization routine for my chisquare function, and know if there is a way to implement this. For each dataset, I would like to minimize the function to find a and b. Then knowing a and b, I want to sum up all the chisquare functions to have a big chisquare (depending on c and d only), that I would like to minimize. Here is the code that I have:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>
#include <cmath>
#include <vector>
#include <stack>
using namespace std;


double fitfunction(double a, double b, double c, double d, double x, double y)
{
	double fct = a + b*x + c*y + d*y*y;
	return fct;
}

double chisquarecalculation(double a, double b, double c, double d, double x, double y, double z)
{
	double chisquare = ( (z - fitfunction(a,b,c,d,x,y) )*(z - fitfunction(a,b,c,d,x,y) ) ) / (z*z);
	return chisquare;	

}

int main()
{

	//definition of the input and output files
	ifstream input1("dataset1_usedtofit.txt");  
	ifstream input2("dataset2_usedtofit.txt");
	ifstream input3("dataset3_usedtofit.txt");
	ifstream input4("dataset4firstpart_usedtofit.txt");
	ifstream input5("dataset4secondpart_usedtofit.txt");
	ifstream input6("dataset4thirdpart_usedtofit.txt");
	ifstream input7("dataset4lastpart_usedtofit.txt");

	ofstream output1("chisquare1.txt");
	ofstream output2("chisquare2.txt");
	ofstream output3("chisquare3.txt");
	ofstream output4("chisquare4.txt");
	ofstream output5("chisquare5.txt");
	ofstream output6("chisquare6.txt");
	ofstream output7("chisquare7.txt");


	//definition of the chisquares function as a function of a,b,c,d
	double a=1; //fit parameters which I would like to use as theoretical entities, uninitialized
	double b=1;
	double c=1;
	double d=1;
	double x,y,z;

	double sum(0.);

	while(input1) //we didn't reach the end of the file
    	{
        	input1>> x >> y >> z;   //we take out values from the dataset

        	if(input1) //if we can read something
        	{
			output1<<chisquarecalculation(a,b,c,d,x,y,z)<<" "<<endl;
			sum+=chisquarecalculation(a,b,c,d,x,y,z);
        	}
		
    	}
	cout<< "chisquare for the 1st dataset by direct method: "<<sum<<endl;
	sum=0;

	while(input2) //we didn't reach the end of the file
    	{
        	input2>> x >> y >> z;   //we take out values from the dataset

        	if(input2) //if we can read something
        	{
			output2<<chisquarecalculation(a,b,c,d,x,y,z)<<" "<<endl;
			sum+=chisquarecalculation(a,b,c,d,x,y,z);
        	}
		
    	}
	cout<< "chisquare for the 2nd dataset by direct method: "<<sum<<endl;
	sum=0;

	while(input3) //we didn't reach the end of the file
    	{
        	input3>> x >> y >> z;   //we take out values from the dataset

        	if(input3) //if we can read something
        	{
			output3<<chisquarecalculation(a,b,c,d,x,y,z)<<" "<<endl;
			sum+=chisquarecalculation(a,b,c,d,x,y,z);
        	}
		
    	}
	cout<< "chisquare for the 3rd dataset by direct method: "<<sum<<endl;
	sum=0;

	while(input4) //we didn't reach the end of the file
    	{
        	input4>> x >> y >> z;   //we take out values from the dataset

        	if(input4) //if we can read something
        	{
			output4<<chisquarecalculation(a,b,c,d,x,y,z)<<" "<<endl;
			sum+=chisquarecalculation(a,b,c,d,x,y,z);
        	}
		
    	}
	cout<< "chisquare for the 4th dataset by direct method: "<<sum<<endl;
	sum=0;

	while(input5) //we didn't reach the end of the file
    	{
        	input5>> x >> y >> z;   //we take out values from the dataset

        	if(input5) //if we can read something
        	{
			output5<<chisquarecalculation(a,b,c,d,x,y,z)<<" "<<endl;
			sum+=chisquarecalculation(a,b,c,d,x,y,z);
        	}
		
    	}
	cout<< "chisquare for the 5th dataset by direct method: "<<sum<<endl;
	sum=0;

	while(input6) //we didn't reach the end of the file
    	{
        	input6>> x >> y >> z;   //we take out values from the dataset

        	if(input6) //if we can read something
        	{
			output6<<chisquarecalculation(a,b,c,d,x,y,z)<<" "<<endl;
			sum+=chisquarecalculation(a,b,c,d,x,y,z);
        	}
		
    	}
	cout<< "chisquare for the 6th dataset by direct method: "<<sum<<endl;
	sum=0;

	while(input7) //we didn't reach the end of the file
    	{
        	input7>> x >> y >> z;   //we take out values from the dataset

        	if(input7) //if we can read something
        	{
			output7<<chisquarecalculation(a,b,c,d,x,y,z)<<" "<<endl;
			sum+=chisquarecalculation(a,b,c,d,x,y,z);
        	}
		
    	}
	cout<< "chisquare for the last dataset by direct method: "<<sum<<endl;
	sum=0;
}



I don't know how to have a function of let say a and b without affecting these a and b in C++. I don't want these variables to be initialized by specific values but the code to work with abstract variables. As it is not the standard behaviour of a c/c++ program to deal with abstract variables, I don't know if it is possible to do what I am looking for. Also, are there already minimization algorithms suitable for my case? I tried to seek into stl or other sources but I didn't find anything.

Thanks in advance,

Jean-Philippe
Last edited on
I don't know how to have a function of let say a and b without affecting these a and b in C++.

If you have a function f(double a), and you call it with f(b), then f won't modify b.

FWIW, your code can be simplified a lot:

Call fitfunction() only once in chisquarecalculation() and remember the result:
1
2
3
4
5
6
7
8
double
chisquarecalculation(double a, double b, double c, double d, double x, double y, double z)
{
	double fit = fitfunction(a, b, c, d, x, y);
    double chisquare = ((z - fit) * (z -fit)) / (z * z);
    return chisquare;

}


The input loops can be greatly simplified. For example
1
2
3
4
5
    while (input1 >> x >> y >> z) {
		double chi = chisquarecalculation(a, b, c, d, x, y, z);
	    output1 << chi << " " << endl;
	    sum += chi;
	}


Better yet, put all of that in a function:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
#include <iostream>
#include <fstream>
#include <string>
#include <iomanip>
#include <cmath>
#include <vector>
#include <stack>
using namespace std;

double
fitfunction(double a, double b, double c, double d, double x, double y)
{
    double fct = a + b * x + c * y + d * y * y;
    return fct;
}

double
chisquarecalculation(double a, double b, double c, double d, double x, double y, double z)
{
    double fit = fitfunction(a, b, c, d, x, y);
    double chisquare = ((z - fit) * (z -fit)) / (z * z);
    return chisquare;

}

void inAndOut(const char *inFileName, const char *outFileName)
{
    ifstream input(inFileName);
    ofstream output(outFileName);

    //definition of the chisquares function as a function of a,b,c,d
    double a = 1;		//fit parameters which I would like to use as theoretical entities, uninitialized
    double b = 1;
    double c = 1;
    double d = 1;
    double x, y, z;

    double sum(0.);

    while (input >> x >> y >> z) {
	double chi = chisquarecalculation(a, b, c, d, x, y, z);
	output << chi << " " << endl;
	sum += chi;
    }
    cout << "chisquare for " << inFileName << " dataset by direct method: " << sum << endl;
}


int
main()
{
    inAndOut("dataset1_usedtofit.txt", "chisquare1.txt");
    inAndOut("dataset2_usedtofit.txt", "chisquare2.txt");
    inAndOut("dataset3_usedtofit.txt", "chisquare3.txt");
    inAndOut("dataset4firstpart_usedtofit.txt", "chisquare4.txt");
    inAndOut("dataset4secondpart_usedtofit.txt", "chisquare5.txt");
    inAndOut("dataset4thirdpart_usedtofit.txt", "chisquare6.txt");
    inAndOut("dataset4lastpart_usedtofit.txt", "chisquare7.txt");
}
Hi,

Just my 2 cents worth, in addition to dhayden's vastly superior knowledge & experience, and not related to your problem.

Parameters to functions can often be const:

10
11
12
13
14
15
16
17
18
19
20
double
fitfunction(const double a,
               const double b, 
               const double c, 
               const double d, 
               const double x, 
               const double y)
{
    double fct = a + b * x + c * y + d * y * y;
    return fct;
}


C++11 also has constexpr , auto and brace initialisation, apparently all of which should be used as much as possible, according to the link below :+)

https://github.com/isocpp/CppCoreGuidelines/blob/master/CppCoreGuidelines.md


Good Luck !!
@dhayen, TheIdeasMan: thanks for your replies; I will simplify my code using your advices. @dhayden,
If you have a function f(double a), and you call it with f(b), then f won't modify b.
: this I know. What I wanted to point out or ask is if there is a way to solve equations or define my function (here chiquare) without initializing a,b,c,d. So that the minimization routine can use these variables unitialized (what I call maybe wrongly theoretical entities) and find the values which minimizes the function.

I ask this because as far as I remember, when you don't initialize variables, C++ do it for you with crazy values. I don't know if we can avoid this behaviour.

Thanks for clarifying. I would definitely initialize those variables, both to ensure consistent results from run to run and because there are special values of type double for infinity and "not a number". I suspect that you really REALLY don't want a,b,c,d to start with either of these values.
obviously not. But if you initialize these variables, how can you make sure that the minimization algorithm won't just stay with those variables without calculating the a,b,c,d minimizing really the function? I mean if at a certain point I calculate the derivative of a function with respect to a for instance, and that a is initialized to a specific value, it can cause troubles...

I have very little experience in numerical recipes for this kind of problem. I know that maybe gsl is adapted for this problem, but I don't know how to use it. Are there other ways?
Last edited on
Topic archived. No new replies allowed.