does access to vector elements scale with vector size

Hi all. First I would like to thank all the forum members for the insightful posts that have helped me tremendously with c++. I have been self studying and using c++ in my research for the last couple years and this forum and website has been an incredible resource to help me along.

With that said my question is whether the time access to vector elements by means of [] will increase with the size of a vector? I am writing a program for doing a Monte-Carlo simulation on a abstracted configuration space of -1 and 1 representing the orientation of OH groups in my material. The program uses a class I wrote that takes care of some mathematics related to "cluster" expansions on a lattice. In the class a 3-d array of numbers is read in a particular format to a vector. To do the mathematics I want I than just map to the element of the 3-D array to the vector. Any way the trouble seems that as I increase the size of the array in my simulation the performance seems to really lag, even though I am not accessing any more elements of the array than before. Could the problem be related to optimization? When I try using the -o2 or -o3 flags I get errors about multiple definitions of the various functions in my classes.
I just learned about problems with multiple headers yesterday and thought I eliminated any of that.



Any advice would be appreciate it. The code seemed to be to long to submit but I can put in particular parts of it as appropriate.


Last edited on
Or, for the sake of completeness, you can post entire code on-line at various places like pastie, github, etc and just post the link here
With that said my question is whether the time access to vector elements by means of [] will increase with the size of a vector?

The elements are consecutive and the complexity O(1).
Any given element n is exactly n * sizeof(value_type) bytes from the start.
The sizeof(value_type) is a compile-time constant.
A p + n * s is one multiply and one addition, no matter what values the p, n and s have. A 3D version of it has more operations, but is still O(1).


However, there are hardware-dependent factors at the background. The RAM within CPU are fast, but small. You probably cannot fit entire vector into CPU. The CPU loads a block of bytes from slow main memory into CPU's fast memory. The block is more than one value.
If the next instructions need values from the same block, those are already loaded -- cached.
If the next values are elsewhere, a different block has to be retrieved into the cache -- a cache miss.
A sub-block of 3D matrix is probably not continuous. (That can be good or bad, depending on data access pattern.)

It can be beneficial to rearrange data in some ways, but CPU types are different; there is no one data layout strategy that would be optimal everywhere.


1. Are you sure that there are no other algorithmic complexity? Can you profile the job to show that vector operations are the primary problem?

2. Is it possible to run that simulation in parallel? Threaded? MPI? GPGPU? They do have their downsides, but are very effective in some problems.
From gunnerfunner's advice I have made pastie files for the program

http://pastie.org/10953953 to MonteCarlo
http://pastie.org/10953959#2 to cluster class
http://pastie.org/10953961 to the file class

The MonteCarlo is the .cpp which has the cluster class in the header
the file class is included in the header of the cluster class

the loop in the MonteCarlo.cpp file that goes from ii to num_trials is where the MonteCarlo takes place
the array or grid of configuration variables contained in the cluster object are accessed by using

i, j , k indices which are then mapped the cluster class functions to specific indices of the vector that is filled with the values of the array. The vector is initialized from a text file representation of the desired array in such a way that the (i-1)*j_sze + (k-1)*i_sze*j_sze + j -1 element of the vector is the i,j,k element of the array. i_sze , j_sze and k_sze are the dimensions of the array. In the main loop of my program no matter the size of the array I use I only ever access the same number of elements of it for each iteration. As the algorithm only requires looking at differences in energy in whether a site is 1 or -1. The energy difference is calculated from a local set of pair wise interactions. As it is now the code runs slower and slower for larger array size.

To keskvierto when you say profile to you mean clock various parts of the code?
Yes parts of it can be parrallelized but may not be necessary if I can optimize it fairly well for serial execution.
I found the problem. I had called the wrong function in the .spC() function of the Cluster class. Instead of calling sigMs_p() in cluster type 7 I called sigM_p(). The prior performs a single cluster multiplication while the later performs a sum over the entire array of a certain cluster type.
Topic archived. No new replies allowed.