I'm still not 100% confident I'm doing this right. I found it easier to put the smaller parts of code into functions first rather than the larger sections, though I have now broken down three large parts of the main function into separate functions. One of my main concerns is the number of variables I'm passing to my functions. Some of them require 15 or more arguments. This is an issue for readability to begin with, but because I'm passing by value, it's also an issue because the data are being copied every time it is passed to a function. When the data are large arrays, it seems like I should expect a decrease in efficiency, particularly when those data are being passed to functions inside the other functions.
I can see a couple of ways around this. Global variables would solve the problem, but are so frowned upon that I wouldn't even consider using them for this purpose. Passing by reference would solve the problem of copying data, but would introduce an extra possible source for bugs. Combining some of the passed variables into vectors would partly solve the issue of having so many arguments, but may not actually improve readability.
Am I on the right track with this? Or is there a better way to deal with it?
I've struggled with this problem many times (the first time I programmed a bigger algorithm, I had functions passing up to 20 arguments. I think the average number was around 7), but lately my attempts at circumventing it have quite been successful. The key is properly structuring your problem, most likely using an OOP design.
I was opposed to the idea of OOP to solve this kind of problem ("my data is needed everywhere, so encapsulation is useless!"), but it ultimately was the key. I regret not listening to this advice sooner. I'm still not that experienced in it (only my current project is somewhat a sucess story), but I'll try to explain how I do it: [Note: if someone else pops in and provides a completely different way of doing things, you might be better off following his/her advice.]
It boils down to data management and program flow. Data management is difficult at first, but even if, like in my case, nearly everything needs access to nearly all data, you can usually still separate it into parts.
For example, in my case, I can differentiate between "Problem-related data" (e.g. stuff read in from a file; constant during runtime, but differs between runtimes) and "Solution-related data" (e.g. calculation results and decision variables; changes during runtime).
Since the first batch is independent, it's a separate class. Basically, it's just a large collection of data with basic functionality (a "read-from-file" function plus some "transform this data into that data"). The rest are accessor functions (not all data needs accessor functions, but the more specific things can benefit from it. See "distance matrix" example below).
The second class (solution-specific data) needs access to data from the first class, so it keeps a reference to an object of that type. (Technically, it could just keep an object of that type, but I need the problem-specific data outside of the second class as well, so I just keep a reference).
At the bottom of it all, I have a single "control" class that contains an object of type class 1 and class 2 (and some other stuff). This class mostly contains high-level logic that separates "what I want to do" from "how I want to do it". The functions of this class often have very understandable "human" names ("buildStartingSolution"), while the other two have more functional names ("hPop()").
Then, algorithm logic is done in a top-down fashion. Program flow is expressed in very high-level controls, such as "read data", "build starting solution", "optimize solution", "report results". These are functions in the control class. Each function is then broken down separately and expressed in mid-level interface functions (e.g. "findNextStep" is a Solution-function called by the Control-function "optimize", but it still shows nothing of the "inner workings").
The mid-level functions ultimately delegate into low-level functions, i.e. the "how I want to do it" logic.
This structure allows me to quickly test different implementations of the same "step". Sometimes, "findNextStep" uses a Heap, or a BST, or just a vector. The "Control" class doesn't care; it gets his nextStep regardless of implementation. If I want to change how my input data is stored, I can just change that class and adjust the interfacing functions. (e.g. I read from a problem-specific distance table using dist(i, j), but the logic behind it has changed drastically between implementations: at first it was a 2D array, while another time it was a linearized "triangular-half-matrix").
The main advantages of this structure are:
a) I rarely pass any parameters to functions. Each class has access to the data it needs, or can delegate the work to a different class.
b) It's easier to change the program, because "what" and "how" are not so tightly connected. Sometimes it's still annoying to test two different implementations (I'm still not that good at program-design), but for most things it has become much easier.
c) It's easier to build the program, thanks to this top-down structure. I can express program flow in very high level functions (e.g. "optimizeSolution") and work my way down to very low level implementation-specific functions (e.g. "hShiftup").
d) It's easier to debug the program, because it's easier to localize the problem step by step. Check the high level functions to see which one is the problem and work your way down from there.
Ultimately, OOP has made my life much easier. The first few tries were a nightmare, but I'm glad I pushed through it. Don't get me wrong; I'm certain my code will make a decent programmer burst into tears, but I no longer get depressed at the thought of having to change a key subroutine.
I can see a couple of ways around this. Global variables would solve the problem, but are so frowned upon that I wouldn't even consider using them for this purpose. Passing by reference would solve the problem of copying data, but would introduce an extra possible source for bugs. Combining some of the passed variables into vectors would partly solve the issue of having so many arguments, but may not actually improve readability.
Am I on the right track with this? Or is there a better way to deal with it?
Just 1 comment about passing by reference. I don't see why your code has to be prone to bugs when you do that. Actually passing by reference is ALL what I do in my programs (of course, I use OOP for everything, but I mean for function calls).
You just have to keep in mind that when you pass by reference you just have to "protect" your passed variable if you don't want it to be changed. For example, if you just want to read the data from your 3D vector do the following:
Notice that const. That const is your protection against any mistake you do in the function that may change your variable. Using const is an art in C++, because with it you can ensure nothing is gonna change unless YOU want it to be changed.
Another thing, use OOP and templates, and build your own library.
With templates, you can ensure that your code can be used with any type in the future, so for that previous function, you don't have to use it only with doubles, but also with anything else:
Create a single folder for all your programs, and create a single folder within your programs call it "libs", and always have your common code in it. Code for your containers and operations that you think you're gonna use in the future.
And one last advice... reeeeeeeeeeeeeead as much as you can about C++. I read 1500+ pages from books to become what I'm now in C++ :-)
I've never used templates before, but I do have a trick for "last-minute type changes". Many of my variables are, code-wise, the same type (e.g. unsigned int) but are conceptually completely different (e.g. an index vs a calculation result). I keep them apart by defining a type for each conceptually different type. It looks a bit stupid to have 4 different names for "unsigned int", but it does allow for some flexibility, e.g.:
-Changing distance precision from float to double (or int).
-If requirements change, I can quickly switch ints to shorts or longs.
Also, the expressed conceptual difference avoids silly mistakes. If you're calculating a PROFIT, you know the calculation should be VALUE - COST. That's way more obvious that DOUBLE = DOUBLE - DOUBLE, especially if you tend to switch between PROFIT and LOSS calculations (or any similar example that only differ by sign).
Actually, integer types aren't a problem for me as much as floating types. I usually templatise over floats to choose between float, double and long double.
So it depends on the problem you're dealing with :-)
Having different type names is a very good idea. It is very good practice, and I use it all the time.
You have to be careful, however, when you decide what a type is. Your example of profit and loss, while illustrative, does not, in my mind, show the best understanding of types.
I think the type of for value, cost, profit and loss would be dollars (or yen, rubles, rupies, etc.). To me a type is (approximately) the units of a variable, and value, cost, profit and loss all share the same units.
In your example, the argument names, not the types, should show the meaning of the values:
It is for this reason that I always put my argument names in my function prototypes.
Also, what if you decide to change your representation of money from a float to an int representing the number of cents? Defining value, cost, profit and loss to all be of type DollarType (or maybe better yet MoneyType) will allow those things to change easily. This is easier than your suggestion of redefining VALUE, COST, PROFIT and LOSS separately (although this is still HUGE step up from searching the code for all the places where floats were used for monetary values, which is part of why you suggested it).
When you learn more about classes, you might even use (or write) a DollarType class that can hide the underlying storage of dollars and cents and be able to parse input and print output in the correct format.
But again, I agree wholeheartedly with the main thrust of your post. A better example may be a function to calculate distance from velocity and time.
You make a very good point. I never went that step further because it's not necessary in my applications, as most of my numbers are abstract, unitless measures. Even COST can be anything (a distance cost, a time cost, a monetary cost, etc), but it doesn't change anything to the meaning of the program. I threw the PROFIT/LOSS example in just to make it clear, but I clearly didn't put enough thought into it.
In my applications, it's generally not possible to consistently relay meaning through parameter names, thus "dollarType - dollarType" would still leave me clueless whether I'd be calculating a PROFIT or a LOSS. Normally, I guess you could avoid this by consistently using the sign of the number to relay meaning (-dollarType is a cost, +dollarType is a value), but I lack the Consistency Gene thus have to resort to these kinds of tricks.
Anyway, suggestion to OP: as doug4 has shown, there are other ways of doing it and I'm guessing there's no "best" way. In the end, it's all about making it easier on YOU. Your compiler doesn't give a shit about any of this.