### Help me with my first OO project

Hi everyone,

I started working on my first "real" object-oriented project. It is supposed to be a tool that allows me to easily perform various experiments in my daily work as an applied maths PhD student.

I would like to get some feedback, and also ask for guidance on how to increase its utility (I want to turn it into something that could easily be picked up on by other students as well).

Before I post the code, a high-level summary of what it is about.
The core of the project is a collection of sampling methods that draw samples from (continuous) probability densities.

For the less mathematical/physical people here, it is easier to think in terms of a particle moving across a landscape with hills and valleys.
I start at one position on this landscape, with a given velocity. Due to the slopes of the many hills in the landscape, the particle experiences forces and accelerates or decelerates, changes directions, and so on. The force that the particle experiences is of course dependent on its position. How the particle changes its movement based on the current force is governed by the sampling schemes themselves.
The samples that I take are the sequence of positions and velocities that result. Or, rather, I typically compute and store other quantities based on those positions and velocities.

If I develop new sampling schemes, I want to compare their performance on various problem classes, i.e. various different landscapes in the language from above. I want to have a framework where it is as simple as possible to define a new problem/landscape, and then run my various sampling schemes on it.

The schemes themselves should be "blind" to the landscape at hand, as well as to the kind of measurements that need to be obtained along the way (i.e. what kind of quantities I want to compute from my positions and velocities).

The user should only define the landscape, as well as the specifics of the measurement. They then should simply call one of the various sampling routines.

I think, in the language of software development, what I try to do is write a sampling scheme library and a corresponding API to use it? Not sure...
I now post the code. Afterwards I name a few design issues I see.
Note that I will be using mpi to draw different trajectories on different processors, but this is not implemented yet.
Note also that the code uses the word "parameter" instead of "position".

An example main-file. This is what every user needs to set up himself.
 ``1234567891011121314151617181920212223242526272829303132`` ``````#include "samplers.h" #include "setup_classes.h" int main(int argc, char *argv[]){ // we want to sample the double well potential using the OBABO sampler double T = 1; // sampler hyperparameters double gamma = 1; double h = 0.01; int N = 50000; // no. of iteration bool tavg = false; // perform time-average? int t_meas = 5; // take measurement every t_meas iterations (passed to sampler as well as print functions). OBABO testsampler(T, gamma, h); // construct OBABO object defined in header "samplers.h" PROBLEM double_well; /* construct object of the problem class defined by the user in header "setup_classes.h". */ measurement results = testsampler.run_mpi_simulation(N, tavg, double_well, t_meas); /* run sampler. "measurement" needs to be defined by user in header "setup_classes.h". It holds information of what quantities need to be obtained by the sampler and how to compute and print them. */ results.print_to_csv(t_meas); // print to file, as specified by the user in the "measurement" class. return 0; }``````

The library, the samplers themselves, are given in the header "samplers.h" Right now, there is just one, the OBABO sampler.
 ``1234567891011121314151617181920212223242526272829303132333435363738394041424344454647`` ``````#ifndef SAMPLERS_H #define SAMPLERS_H #include #include #include #include #include #include #include #include #include #include #include #include #include #include "setup_classes.h" // #include /* The OBABO sampler. It requires the "PROBLEM" class and the "measurement" class in header "setup_classes.h". They need to be written by the user and define the sampling problem and what measurements to take. Note that the member functions below require those classes to be structured in a certain way. */ class OBABO{ private: const double T; const double gamma; const double h; measurement collect_samples(const int N, const bool tavg, PROBLEM POTCLASS, const int randomseed, const int t_meas); // draws a single sampling trajectory public: // constructors OBABO(double T, double gamma, double h): T{T}, gamma{gamma}, h{h} { } measurement run_mpi_simulation(const int N, const bool tavg, PROBLEM POTCLASS, const int t_meas); /* sets up mpi environment and calls "collect_samples" on each process within. Also performs averaging. */ void print_sampler_params(); // print sampler hyperparameters. }; #endif // SAMPLERS_H ``````

The implementation of the samplers:
 ``123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110`` ``````#include "samplers.h" void OBABO::print_sampler_params(){ std:: cout << "Sampler parameters:\n"; std:: cout << "Temperature = " << T << ",\nFriction = " << gamma << ",\nStepsize = " << h << ".\n" << std:: endl; }; measurement OBABO::run_mpi_simulation(const int N, const bool tavg, PROBLEM POTCLASS, const int t_meas){ int seed = 1; print_sampler_params(); measurement RESULTS = OBABO::collect_samples(N, tavg, POTCLASS, seed, t_meas); return RESULTS; }; measurement OBABO::collect_samples(const int N, const bool tavg, PROBLEM problem, const int randomseed, const int t_meas){ std:: cout << "Starting OBABO simulation...\n" << std:: endl; // set integrator constants const double a = exp(-1*gamma*h); const double sqrt_a = sqrt(a); const double sqrt_aT = sqrt((1-a)*T); const double h_half = 0.5*h; size_t No_params = problem.parameters.size(); // number of parameters // COMPUTE INITIAL FORCES problem.compute_force(); // COMPUTE INITIAL MEASUREMENTS measurement RESULTS; RESULTS.take_measurement(problem.parameters, problem.velocities); // PREPARE RNG std:: mt19937 twister; std:: seed_seq seq{1,20,3200,403,5*randomseed+1,12000,73667,9474+randomseed,19151-randomseed}; std:: vector < std::uint32_t > seeds(1); seq.generate(seeds.begin(), seeds.end()); twister.seed(seeds.at(0)); std:: normal_distribution<> normal{0,1}; double Rn; auto t1 = std:: chrono::high_resolution_clock::now(); // MAIN LOOP - from here, we start our journey across the "landscape" for ( size_t i = 1; i <= N; ++i ) { // O + B steps for ( size_t j = 0; j < No_params; ++j ) { Rn = normal(twister); problem.velocities[j] = sqrt_a * problem.velocities[j] + sqrt_aT * Rn + h_half * problem.forces[j]; } // A step for ( size_t j = 0; j < No_params; ++j ) { problem.parameters[j] += h * problem.velocities[j]; } // COMPUTE NEW FORCES problem.compute_force(); // B STEP for ( size_t j = 0; j < No_params; ++j ) { problem.velocities[j] += h_half * problem.forces[j]; } // O STEP for ( size_t j = 0; j < No_params; ++j ) { Rn = normal(twister); problem.velocities[j] = sqrt_a * problem.velocities[j] + sqrt_aT * Rn; } // TAKE MEASUREMENT if( i % t_meas == 0 ) { // take measurement any t_meas steps. RESULTS.take_measurement(problem.parameters, problem.velocities); } if( i % int(1e5) == 0 ) std:: cout << "Iteration " << i << " done!\n"; } // END MAIN LOOP // FINALIZE auto t2 = std:: chrono:: high_resolution_clock:: now(); auto ms_int = std:: chrono:: duration_cast < std:: chrono:: seconds > (t2 - t1); std:: cout << "Execution took " << ms_int.count() << " seconds!\n "; return RESULTS; };``````

Note that the samplers require classes "PROBLEM" (which defines the landscape) and "measurement" which defines what measurements to collect.
Both of those need to be given by the user in the header "setup_classes", to be seen below.

The file "setup_classes.h"
Currently, the "landscape" is a simple double well in two dimensions.
 ``123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125`` ``````#ifndef SETUP_CLASSES_H #define SETUP_CLASSES_H #include #include #include #include constexpr double PI = 3.141592653589793; // some useful constants constexpr double eps = 1e-12; class PROBLEM { // THIS PROBLEM IS THE DOUBLE WELL POTENTIAL IN 2 DIMENSIONS /* In case one wants to change the problem to something else, eg. harmonic oscillator: The samplers require a "compute_force" function as well as the vectors "parameters", "velocities", and "forces" storing the corresponding quantities. */ public: // parameters, velocities, forces std:: vector parameters {1,0}; // CARE! THE SAMPLERS REQUIRE THIS VECTOR std:: vector velocities {0,0}; // CARE! THE SAMPLERS REQUIRE THIS VECTOR std:: vector forces {0,0}; // CARE! THE SAMPLERS REQUIRE THIS VECTOR // fill force vector void compute_force(){ /* CARE! THE SAMPLERS REQUIRE THIS FUNCTION. It needs to be written by the user. */ double x = parameters[0]; // current parameters (positions) double y = parameters[1]; double e1, e2, rho; // some help constants double x_mu_diff1 = x-mu1[0]; double x_mu_diff2 = x-mu2[0]; double y_mu_diff1 = y-mu1[1]; double y_mu_diff2 = y-mu2[1]; // the exponentials e1 = exp( -inv_two_det_SIG1 * ( SIG1[2]*x_mu_diff1*x_mu_diff1 - 2*SIG1[1]*x_mu_diff1*y_mu_diff1 + SIG1[0]*y_mu_diff1*y_mu_diff1 ) ); e2 = exp( -inv_two_det_SIG2 * ( SIG2[2]*x_mu_diff2*x_mu_diff2 - 2*SIG2[1]*x_mu_diff2*y_mu_diff2 + SIG2[0]*y_mu_diff2*y_mu_diff2 ) ); rho = pref1 * det1 * e1 + pref2 * det2 * e2; // the density //force part without noise, i.e. double well forces[0] = -1/rho * ( pref1 * e1 * (SIG1[2]*x_mu_diff1 - SIG1[1]*y_mu_diff1) + pref2 * e2 * (SIG2[2]*x_mu_diff2 - SIG2[1]*y_mu_diff2) ); forces[1] = -1/rho * ( pref1 * e1 * (-SIG1[1]*x_mu_diff1 + SIG1[0]*y_mu_diff1) + pref2 * e2 * (-SIG2[1]*x_mu_diff2 + SIG2[0]*y_mu_diff2) ); // if(sig!=0){ // normal_distribution<> normal{0,sig}; // add noise // F.fx += normal(twister); // F.fy += normal(twister); // } return; }; private: // constants that define the potential const std:: vector mu1 {1, 0}; // (x,y) of the first well const std:: vector mu2 {-1, 0}; // (x,y) of the second well const std:: vector SIG1 {0.6, 0.085, 0.02}; // elements of the two covariance matrices sig11, sig12, sig22 (note: sig12=sig21) const std:: vector SIG2 {0.6, 0.085, 0.02}; const double phi1 = 0.5; // mixing factor (phi2 = 1-phi1) // constants used by the force computation const double det1 = SIG1[0]*SIG1[2] - SIG1[1]*SIG1[1]; // determinants of cov. matrices const double det2 = SIG2[0]*SIG2[2] - SIG2[1]*SIG2[1]; const double inv_two_det_SIG1 = 1 / (2*det1) ; // used in the exponents in the force const double inv_two_det_SIG2 = 1 / (2*det2) ; const double pref1 = phi1 / ( 2*PI*pow( det1, 1.5) ); // prefactors in the force const double pref2 = (1-phi1) / ( 2*PI*pow( det2, 1.5) ); }; class measurement{ /* The measurement class that defines what quantities are collected by the samplers, how they are computed, and printed to a file. This class needs to be modified by the user. In this example, we only save the first parameter (corresponding to the x-coordinate when used in the double well problem above), as well as the kinetic energy. */ public: void take_measurement(std:: vector parameters, std:: vector velocities){ /* CARE!!! The samplers need this function. It needs to be written by the user. */ x.push_back(parameters[0]); kin_energy.push_back( 0.5* (velocities[0]*velocities[0] + velocities[1]*velocities[1]) ); } void print_to_csv(const int t_meas){ /* print results to file. this routine would be called in the main-file. It needs to be written by the user. "t_meas" gives the number of sampler iterations between two measurements (it is passed here to get the correct iteration count in the output file.)*/ std:: ofstream file{"TEST.csv"}; std:: cout << "Writing to file...\n"; for ( size_t i = 0; i x; std:: vector kin_energy; }; #endif // SETUP_CLASSES_H ``````

I implemented the methods of the "PROBLEM" and "measurement" classes inline because I don't want to burden the users with changing two files (the header and the implementation). I imagine I write a simple github tutorial on how to edit this file if one wants to change the problem/landscape or the measurement.

What is problematic:
Assume now that, in the "setup_classes.h" file, I want to change my problem from a double well to something else (say, a triple-well, ha) i.e. I need to change the PROBLEM class. As mentioned in the comments, no matter the PROBLEM class, all of them will need certain members (parameters, velocities, forces, albeit with different sizes), as well as a method to compute the position-dependent force (as it will be called by the samplers). Therefore, the user needs to be made aware not to mess this up (eg. by forgetting to include the force computation function). I do this with big "CARE!!!" comments now. The rest needs to come from the manual I guess...

However, the sampling library expects the class's name to still be "PROBLEM". That means I would need to comment out the current "PROBLEM" class and replace it with my new version.
What I would rather have is two different classes, one called "double_well", the current one, and the new one called "triple_well", and the samplers should be fine with receiving either of them.

Am I right in assuming that these things can be accomplished with class templates? Or how would one handle this? Can the concepts of polymorphism or inheritance somehow help me? I would need to learn them first...
Are there any other things I could do to increase the utility of the whole thing? Note that this is just a side-project, I do not have time to create a GUI or something like that (not that I knew how to do that anyway lol).

Thanks for reading, would appreciate any input!
Last edited on
Hi PhysicsIsFun,
If no one has replied to your problem, it is because the problem statement is not clear. If you present the problem in a more concise form, i think you will get the help that you need. I am not from a physics background and as you are aware, most people are here are not. Maybe, you present the full problem statement with
- What you intend to achieve
- What you receive as input
- What you get as output
- Any constraints for the inputs?
- Any constraints for the output?
- Any specific logic or formulae to consider?

It is generally difficult to interpret code without domain knowledge and proper problem description.

If you have already figured out your solution, then fine. Otherwise, try to provide more details.

Regards
I do have a slight physics background, but nothing like your level .. the first 2 or 3 college classes (calc based) and 15 years working on drones ( planes & boats ). Nothing useful at the atomic level :)

All the above, lets go back to basics and ignore your specific problem.
-names. Jargon names have a place, eg dx or alpha, in some contexts, but the more readable you make your names, the better your code will be.
class OBABO{ //what in the world is this? no one but you will ever know.

or, for another example:
int N = 50000; // no. of iteration
why not
int max_iterations{50000}; //does not need a comment anymore.
why? In large projects, the user does not have to memorize what N means or keep looking it up to read the comment, the comment is IN THE NAME. Its a little more typing as you crank it out, but its very worth it 5 years from now when you publish a paper and other people have to see the code.

look at it like this. You are on page 157 of c++ code in the middle of trying to understand the simulation, and you see this:
for(int i = 0; i < N; i++)
{
bunch of random things
}
vs this:
for(int i = 0; i < max_iterations; i++)
{
bunch of random things
}
which one is easier to follow?

single letter variable names, cryptic variable names, try to avoid them. I understand, and argue FOR, having short names SPECIFICALLY for math variables in well documented math heavy code. Eg x is x, if you are doing some f(x) thing. Fine. t is time and dt is a time slice in a simulation, fine. But if you are going to do that, you need to make sure that each and every one is crystal clear what it means and refers to. If there is any chance of confusion, use more letters. Outside the math code that closely follows known formulae, use longer names, please (and you are doing this mostly, but N and the class name immediately stand out as hard to make sense of).

use <cmath> not <math.h>. math.h is from C and is missing a few things that C++ offers.

pi is already defined in c++, use that one for consistency across platforms and to meet reader expectations.

None of that is more than housekeeping, though.
As far as feedback, it looks like you are trying to make a library.
Things like print sampler should be virtual, so the user can over-ride your console output and put it up in a gui or send it to a printer or whatever.
all input and output should be configurable this way, letting the user craft their own replacements.

when providing objects for future use, its a pain, to be honest. You have to step outside your vision of 'this is how that will be used' and imagine that someone is trying to use it in another way. Will your MPI work be stable if they have a vector of OBGYN objects? What other weird things can they do that might break things?
If you're doing something that requires others to change, then:
1. Which piece(s) do you want to let them change, and which not?
2. How are they going to change them? If they're not programmers, a UI is usually the only possible way, and if they are programmers, try to make your design suitable for some additional code. The best way is to provide APIs in your code and others only have to call them, as if your code is a complete software.
3. Does 'change' mean that the users have to rewrite what you have written? If so, try to turn that part of your code into an example, not a direct implementation. If not, then make sure they know what do they have to give - datas, programs or user-friendly inputs are all OK.
Hi guys,

thank you for your responses! I realize that I posted too much code and did not give structured enough explanations of what I want to do.

I might give it another try as soon as I go back to this work and need further help. I think I already have an idea of how to proceed.
A suggestion when dealing with a lot of code (a lot is a relative term) and wanting others to code review and offer revision help.

Create a git repository. Github is one repository service that is used a lot.

Having a git repository makes it easier for other people to check out your code. And if you allow it people who have cloned your repo can change the code and submit it back to the repo.

The repository owner has built-in versioning available for tracking changes.

If you are unfamiliar with git there is a free book available online in several different eBook formats.

https://git-scm.com/book/en/v2

I don't know what your OS is, and honestly don't care. :) If Windows there is a very nice (IMO) git client available. console mode Bash interface and a GUI.

https://git-scm.com/

I use Github for my repositories.

https://github.com/GeorgePimpleton
Topic archived. No new replies allowed.