This seems like a basic question, but I'm having trouble finding the answer by searching online.
I have arrays of the form vector<vector<vector<complex<double> > > > arrayName;
These are fairly large. At the moment, the largest are 128x128x101, but I plan on increasing this, possibly to 512x512x101 or more. Most of them are also complex, and there are several of these arrays in my program.
The thing is that, depending on user input, large sections of the code may be skipped using an "if" statement. I naively thought that I could just put all these parts of the code in "if" statements.
Unfortunately, I get 'undeclared identifier' errors, even though the declarations are in "if" statements that would also be run prior to the "if" statements where these arrays are used.
So, I either need to do away with the "if" statements around the initialisations completely, or I can declare the arrays outside the loops, and then initialise them inside the loops.
The former option is a more attractive solution in terms of code readability, but I have no idea if it takes longer than simply declaring the arrays (for the case where these chunks of code are not run). Basically, I don't want to spend time setting 1654784 complex elements of each array to 0+0i if it takes longer than simply declaring it, when they are not going to be used later in the program due to user input.
And then resize it according to the user's input? That declaration doesn't take any time at all because it has 0 elements. The internal initialisations for it doesn't exceed a few long ints for the size and reserved size.
When dealing with multidimensional arrays, you have 2 choices to improve the speed:
1- If you're gonna have a static size of the array at some point and not resize it anymore, make sure you do the resize only once if you're using vectors. And if you're using push_back to fill your arrays, make sure that you're using reserve in order to avoid unnecessary redundant memory allocations.
2- If you're gonna have a dynamically size-changing array, DON'T use vectors. Because vectors are contiguous in memory, which means, if you ever resize an internal array, all the other arrays inside will be affected and may be re-allocated. Use instead deque (if you need the operator[]) or list (if you have no problem accessing your elements with iterators.
deque is good, but accessing elements of deques is a little bit slower than it's with vectors, but adding elements is faster, because it contains internally chunks of c-arrays and not a single array like vector does.
You're right. I was making an issue out of nothing. I was thinking that having the declaration followed by the "if" statement and then the initialisation would make it messy with the number of arrays involved here, but of course, putting them at the beginning of the program will be fine.
The arrays will not change size during any individual execution of the program, so resizing issues won't come into it.
I was looking at other aspects of optimization, and came across this:
In C, all variables must be declared at the top of a function. It seems natural to use this same method in C++. However, in C++, declaring a variable can be an expensive operation when the object has a non-trivial constructor or destructor. C++ allows you to declare a variable wherever you need to. The only restriction is that the variable must be declared before it's used. For maximum efficiency, declare variables in the minimum scope necessary, and then only immediately before they're used.
1 2 3 4 5 6 7 8
// Declare Outside (b is true half the time)
T x;
if (b)
x = t;
// Declare Inside (b is true half the time)
if (b)
T x = t;
Without exception, it's as fast or faster to declare the objects within the scope of the if statement.
So now I'm confused. When I tried to declare the arrays inside "if" statements, I got "undeclared identifier" errors when I tried to assign a value to the arrays in a later "if" statement. Does this mean that I can declare inside the "if" statement if the value is changed only within the scope of the "if" statement? It would require a fair amount of work to rearrange the code so that all of these parts fall within the scope of a single "if" statement.
All variables are scoped. I'm not really sure what you mean by [quote]It would require a fair amount of work to rearrange the code so that all of these parts fall within the scope of a single "if" statement. [quote]
It's possible that I'm using incorrect terminology. By "scope", I mean withing the section of code that is executed if the argument of the "if" statement evaluates to "true"
i.e.
1 2 3 4
if(a=1)
{
//scope
}
The code is hundreds of lines for the main() function, so I won't post it here, but what I have is basically a loop that create 2 or 3 sets a arrays depending on user input. The first two sets are always created, and if the user wishes to create 3 sets, a third set is created in addition to the first two. There are 101 arrays in each set.
Then an algorithm is used to combine either two sets of arrays or three, again depending on what the user chose at the beginning. All arrays in all sets must already exist in order to run the algorithm, so this requires a second "for" loop containing another "if" statement.
for(size+t=0;t<101;t++)
{
//initialize and compute arrays 1 and 2
if(choice==3)
{
//initialize and compute array 3
}
}
for(size+t=0;t<101;t++)
{
if(choice==2)
{
//perform algorithm that utilizes 2 sets of arrays
}
else
{
//perform algorithm that utilizes 3 sets of arrays
}
}
But when if I declare array 3 when I initialize it in the first "if" statement, the compiler doesn't recognize it later as having been declared. Considering the page I linked to earlier said that it is better to declare variables inside the "if" statement, I can only assume that all mentions of this variable then need to be inside that statement.
Considering the page I linked to earlier said that it is better to declare variables inside the "if" statement, I can only assume that all mentions of this variable then need to be inside that statement.
It is only "better" to declare variables inside the if statement if they're limited to the scope of the if block.
Yours aren't, so it's not "better" for you to declare them there.
@ballzac: from your example, I think you're interpreting scope wrongly. There is no such thing as "for scope" or "if scope", i.e. two separate for loops don't share the same scope.
Scope is about hierarchy. A for loop in an if in a function in a class has access to all variables declared in:
a) the for loop itself.
b) the if around the for loop.
c) the function where the if is checked.
d) the class to which the function belongs.
e) the global scope (anything not limited to a class).
Thus, if I do this:
1 2 3 4 5 6 7 8 9 10 11 12 13
int x = 6;
void myFunction(void) {
int y = 7;
x += y; // That's okay: x is global, y is in the function scope.
}
y = 3; // Not okay: y belongs to myFunction() and doesn't exist outside of it.
void myOtherFunction(void) {
y = 12; // Not okay: y belongs to myFunction and doesn't exist outside of it.
int y = 12; // Okay: as far as myOtherFunction is concerned, 'y' didn't exist yet.
x += y; // Okay, because THIS 'y' belongs to myOtherFunction.
}
cout << x; // Okay, because x is global.
cout << y; // Not okay, because neither of the 'y's exists here.
If you're ever unsure about scoping, make a small class with a print/cout in the constructor and destructor. You'll see when an object dies, i.e. goes out of scope.
Thanks to both of you. Gaminic, your example is extremely clear. Thank you for that. It explains everything I needed to know about that so succinctly.
I've been using c++ for about 18 months now (I'm more familiar with matlab), but only doing things in a way that I new worked, never thinking about optimization. It's amazing to realize just how poorly I understand some of these basic aspects.
I wouldn't count on scoping for optimization. What TheDestroyer said about containers can have a big impact, but declaring a variable sooner than necessary isn't going to make much difference.
It's not so much that it will be declared sooner than necessary, but that it will always be declared regardless of whether it is later used or not.
I have no idea if the size of an array makes a difference to how much time is required to declare it, but considering I may be declaring 101 large, complex arrays that are never used, it would have been worth changing that if possible, if there is a substantial difference, but it seems that can't be done, at least not without a major overhaul of the program structure.
Anyway, I've made much more impressive improvements (I originally had all three sets of arrays being calculated regardless of whether they are used or not) and have shaved about 20% of the processing time off the total, so for the moment I'm happy with the way it is. :)
If you truly want to delay instantiation (not declaration), then you can always create pointers to your container types at some scope outside any condition statements you would need them for, then inside your control structure newthe memory for said containers, then you can assign your pointers that are outside this control structures scope and use that container elsewhere.
This was your problem before if I'm not mistaken:
1 2 3 4 5 6 7 8 9 10 11 12
//Psuedo syntax for readability, disregard code that will not compile.
if(choice == 1)
{
created container 1.
}
if(choice == 2)
{
created container 2.
}
//Somewhere else outside of the scope of those control structures...
container1.doSomething(); //Trying to use container1 outside of the scope with which it was defined, resulting in 'undeclared identifier'
This is what I gather you were trying to do... in order to use this type of logic you must use a pointer or reference to outlive the scope of the control structures. In your case, pointer... Using the above example again with pointers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
//Somewhere before any instantiation of the containers...
Container * pContainer = NULL; //Maybe you have more than one pointer I don't know...
if(choice == 1)
{
pContainer = new created container 1.
}
if(choice == 2)
{
pContainer = new created container 2.
}
//Somewhere else outside of the scope of those control structures...
pContainer->doSomething(); //You no longer have the 'undeclared identifier' and you also didn't instantiate the container until you wanted to.
Also, in your case I think you are confusing initialization and declaration. Declaration is something that the compiler uses at compile time. When you say...
1 2 3 4 5
//somewhere in code..
vector<vector<MySuperClass> > classvec; //This is initialization NOT declaration.
//Somewhere else....
classvec.push_back(blabla); //This is not initialization.
You will also need to free this memory somewhere in your program.
The code is hundreds of lines for the main() function, so I won't post it here, but what I have is basically a loop that create 2 or 3 sets a arrays depending on user input.
Whoa, think my brain blocked that out when I read it. Don't do massive main functions. Try to limit each function to 5~10 lines of code, unless you really have to.
It's impossible to write 100 consecutive lines correctly. By limiting yourself to 5~10 at once, you reduce the chances of errors. That being said, I just messed up a 1-line function a few minutes ago. Must limit self to 0.5-line functions!
When I started this project about 18 months ago, I got most of the code from another person who had worked on a similar project and have expanded and modified it to perform the tasks I need to perform. Considering I don't understand c++ (and programming in general) all that well, I just mostly followed the format of the original code, but wrote my own functions as needed.
I'm sure I could reduce the size of the main() function substantially, but don't see how I could possibly reduce it to fewer than several hundred lines.
I don't want to waste your time with basic questions, so if you know of a good reference that explains this, I'd be happy to peruse it :)
clanmjc,
I was considering using references as another method of optimization. I don't know anything about pointers, so I will have to look into that. Thanks for your input :)
Considering I don't understand c++ (and programming in general) all that well, I just mostly followed the format of the original code, but wrote my own functions as needed.
Given that your experience is very little, I will give you the simplest of all suggestions and that is to (since you know how to create functions), create function that delegate the instructions that you already have in main.
My meaning in code:
1 2 3 4 5 6 7
int main() //Main has thousands of lines of code
{
//For 100 lines of code the following instructions do something with a container
//For the next 50 lines of code these instructions take the results of that container and do something else
//For 500 lines of code bla bla bla
//...
}
Now start taking those blocks of "functionality" and incorporate them into a function (passing parameters that are needed and returning results if needed), so main now looks like this...
1 2 3 4 5 6 7
int main() //Main has thousands of lines of code
{
doSomethingWithContainer(param1, param2, param3);
doSomethingWithResultingContainer();
doSomethingForTheRestOfTheProgram(param1,param2);
//...
}
These are terrible names by the way just an example, but what you end up with is a main that calls three functions (so only 3 lines of code), in these functions they may also be calling other functions that perform a task. (Don't take my numbers literally and that includes having only 3 functions in main).
Normally you would probably have classes etc. but it sounds like that would open an entire new can of worms for you, either way, the way I clean up code is look at it and if I cant immediately know what it's doing then I can probably do something like, functionize it, objectize it, structureize it to make it instantly reveal what it's intent is. Your experience is probably going to limit you to functionizing, which will be good (if you are wanting to learn).
Thanks heaps for that advice. I'm working on my PhD in physics, so I don't have the time to be programming anything that will not save time in the long run, but having the code structure in that form should make it easier to follow and save me time. Objects etc. may be something I can incorporate into future code if I have the time to learn, and I think that would be great.
I'm also using programming as a functional part of my PhD, while not having any actual programming background. I find that improving my programming skills has helped me to greatly reduce the time it takes to get from an idea to results, just because it takes me less time to translate ideas to code, and less time to translate code to working code.
(It's sad that I still have to differentiate between two types of "code".)