What to use to store (and work on) lots of data?

Time appropriate greetings everybody!

I am currently working on a little programm that has to be able to load, save and work on "large ammounts" of data.
I mean thousands of integers, strings, floats, ....
(If they were individual variables ...)

So, my first thought was to use arrays for each "data-type", kind of like each array is a row in a table and the "index" is the column number.
And I can use vectors to resize the arrays at runtime.

Then I remembered that there was a thing in C++ called "structs".
They seem to do exactly what I want!
Have multiple "sets" of data that all had the same "variable types in them".
But I can't really find a (easy) way to dynamically "create new objects" (or variables, I have found both variants).

I have already been messsing around with using arrays to do what I want to do, but it got messy and I thought I'd just completely "redo" everything from scratch.

So, should I just use arrays (I know how to work with arrays) or is this "struct" thing the better approach?
It looks like a class or structure would be the way to go.

But I can't really find a (easy) way to dynamically "create new objects" (or variables, I have found both variants).


Why not? Do you realize that a class or structure is a User Defined Type and that these UDFs act the basically the same as any other type? Lastly you can have a vector/array of UDF as well.

So, should I just use arrays (I know how to work with arrays) or is this "struct" thing the better approach?


I suggest you get used to using std::vector and classes if you want to work in C++. Trying to maintain a bunch of parallel arrays will get really cumbersome rather quickly.

basic structs and vectors are very, very simple.
look.

struct stuff
{
int i;
double d;
string s;
};

vector<stuff> stuffz(100); // like array: stuff stuffz[100];
stuffz[3].s = "hello world"; //access fourth array location, the string variable

structs can get more complicated and vectors can do a lot more, but this is the first cut at it in the most simple form, and even cut down to these bare elements it is enough to do what you asked.

thousands of items is not a lot of data. Or in terms of bytes, less than a gigabyte of data is not terribly interesting on modern PCs. If you get that large, you should consider doing things efficiently, and if it gets past 1/2 the target computer's memory in size, you want to start using files or a database or something instead of hogging all the ram.

the vector can grow to hold more items, review the push_back() function.
Last edited on
Thanks for all of the quick replies!

Yeah, like I said "lots of data" is relative.
I have done some work with programming embedded microcontrollers in C and there, having a few MB of memory and a few kB of RAM is lots of space there ...

(I am also working on building the hardware for an 8 bit processor system "from scratch", so yeah, I am more into the "low level hardware" side of things ...)


Oh!
I think I just "got it". So you can use that struct like a variable (that's why it's called VARIABLE, duh!).
So instead of int ExapleVariable or float ExampleVariable, I can have mystruct ExampleVariable.
And then use a . to "access" the members of that "variable".
And I can also have an array of variables of that "custom type" ...

This sounds stupid, but I think I just understood the very simplified basics of the "struct"-stuff ...

Thanks!
I'll be using both then: An array of structs!
Thanks very much!
Last edited on
Another quick question:

Is there a way to resize an array of those structs?
Kind of like you can an array of "regular type"?
Like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct Unit
  {
    int A;
    int B;
  ];

Unit UnitArray [1];  //Make an array with 2 "variables" of the type "Unit" for now

//Have a function to resize the array to whatever size needed later

void ResizeUnitStorageArray(int NewSize)
  {
    UnitArray.resize(NewSize);
  }



Because if I try that, I get an error:
48:13: error: request for member 'resize' in 'UnitArray', which is of non-class type 'Unit [1]'
48 | UnitArray.resize(UnitsToGenerate);
| ^~~~~~
to be precise,
the struct is a user defined TYPE. I made a variable OF that type (a vector of them!). But it sounds like you get it now.


you can resize a vector, and push_back will do so automatically.
you cannot resize a normal array. the above example is a normal array, or C array may be a better name for it.

push_back will increase the size of the VECTOR for you.
arrays are really just handles to blocks of raw memory bytes. They have no object oriented methods or tools. Vector is an object oriented wrapper for them.
Last edited on
Ok, thanks for the clarification.

I will look into that push_back thing.

Until now I have always used resize() to change the size of an array vector.

I always thought that every bit of text that had some square brackets after it was an array, like Exapmle[i].
But I guess I was wrong ...

EDIT:
I wrote a simple programm to set the size of the struct array to a user-specified value, put some random values into the member varaibles of the struct and then print out all the strut array entries.
And it works!
Thanks for the recommendation!
Last edited on
A std::vector can be sized when constructed, if you have an idea of the approximate number of elements you might be working with. You can add or subtract elements "on the fly" as needed.

Need more elements in a block? Resize it to the larger number of elements. You can also reduce the number of elements and the memory that takes.

A vector manages the memory for you, you don't need to sweat the details. There are lots of things you can do with a std::vector that is consistent with the other C++ containers:
http://www.cplusplus.com/reference/vector/vector/

There are three different ways to access the elements:

1. operator[] - No bounds checking so you can go out of bounds.

2. operator at - Does check for out of bounds access. Is a bit slower than operator[]

3. iterators - useful when using <algorithm> library function. Can be used in for loops, especially ranged based for loops.

Vectors take a lot of the drudgery out of dealing with varying amounts of data.
Yes, I just found out about the "no bounds checking".
I accidentally tried to read from [-1] an got a lot of "garbage" from somewhere in the RAM (or wherever ,,,) ....
You tried to read a memory location that is WAY advanced past the end of your container. What is -1 represented as with an unsigned int? Nice big number, for sure. Depending on the number of bytes it is (2n)-1.

Going out of bounds is "undefined behavior."
https://en.cppreference.com/w/cpp/language/ub

A couple of things to note about at().....yes, it does bounds checking, but it won't stop you from trying to access memory your program doesn't own. Going out of bounds will raise an exception. If you don't try to catch that exception your program will crash. Because of the bounds checking it is marginally slower than operator[].

Undefined behavior, or a possible program crash. Going out of bounds is bad, period.
Last edited on
For c-style memory index, -ve index is valid (although not for vector/std::array). Consider:

1
2
3
4
5
6
7
8
9
10
#include <iostream>

int main()
{
	int arr[5] {1,2,3,4,5};

	auto myarr = &arr[3];

	std::cout << myarr[-2] << '\n';
}


which displays 2 - as myarr points to the 4 so 2 items before the 4 is the 2.
Last edited on
Topic archived. No new replies allowed.