Need help reading/sorting/writing data t

Forum

Forum
Beginners
Need help reading/sorting/writing data t

Need help reading/sorting/writing data to and from a file.

Jan 29, 2019 at 6:30pm

closed account (Ezyq4iN6)

Let me start by saying I know how to read from a file and write back to it, but I am having trouble with picking the best way to go about doing these things. Please read on for the details.

I am practicing creating a game that can have custom scenarios by reading a file, think like dungeons and dragons, choose your own adventure, etc. Right now I am handwriting the attributes into the file, but I want to be able to make changes to it in the program.

So my 3 problems are this:
1. I need to efficiently read the data in from the file and place it into
easy to access structures.
2. I need to be able to find and convert that read data to variables to
use in the game (like hit points, town names, etc.)
3. I need to be able to edit values and store them back into the file.

My file layout:

Game Attributes
Area Attributes
Item Attributes

Since this is practice done as a learning example, I have made each section a fixed length, just as every item is a fixed length for ease. In other words, I allow 1 byte for a timer in an area, 30 bytes for the name, etc. and that won't change. So the length of every field and all 3 main sections is known.

So here are my thoughts on each section:

1. READING FILE
When I read in the data, I tend to think about either reading it all into one long character array, or 3 character arrays. My reasoning is to minimize the amount of reads.

2. USING DATA
Here is where it gets complicated. Should I create a struct or class for each section's variables and then harvest them from the character array, or is there another way? Also, how exactly is an efficient way to grab 50+ variables from a section? They are mixed integers and strings, so my main theory is to have an 2D array for each section, which marks the starting character and length in the array for each section. However, I would still need 50+ functions to grab all of those specific values and then store them into specific variables. I guess another option would be to reference the array to an object constructor and then just grab all of the data there, but then an initializer list wouldn't work. So yeah, I'm really kind of lost here.

3. WRITING DATA
So once I have made some edits to the values, I have to write them back. Since I should be able to change any value, then I feel that I have a similar problem to #2. In other words, how do I efficiently cram all those new variable back into the character array(s) they were snatched from before overwriting the old file with the new data? Again, I just don't feel the 50+ functions way is efficient, and my constructor idea still seems crude.

I don't have any code at present worth adding, but as help comes in, I will add code (at least parts if it becomes long) to clarify things or to ask new questions in reference to the help I receive. Thanks!

Jan 29, 2019 at 7:12pm

jonnin (11494)

1. I need to efficiently read the data in from the file and place it into
easy to access structures.

the simple way to do this is a binary file of structs. Watch out for containers in the struct, avoid them if you can, even to the point of using c-strings and arrays instead of vectors/strings etc. You can promote back to higher level tools a level above this; its just for the I/O. If you decide to use containers that have pointers, you have to break up your read and write statements to handle that, so its really just as easy to use simpler data types up front that deal with that aggravation.

2 if you did the above, the data lands in the fields of the structs, which are easily moved to local variables etc.

3) update the struct and write it back. you can update only 1 record or all the records at once, as you need.

Jan 29, 2019 at 8:33pm

AbstractionAnon (6954)

I guess I'm missing why you think you need a function per variable to read and a second function per variable to write.

If you think about your attributes, They break down into three types (Game, Area, Item) and three basic data types (int, string, double).

I think you need to give some thought to the classes in your program.
Typically, each of your classes that you want save and restore will have functions to do that.
i.e. Your Game class will have a Game.Save(ostream &) function that is responsible for writing all Game releated data out to a file. Likewise, you would have a Game.Restore(istream &) function that is responsible for reading back the file (or portion of) that was written out.

A Game class is rather easy since it is a singleton. Your Area class is a little harder since you will probably have a variable number of areas. This can be handled by having a Game attribute that is the number of areas.

You don't need to identify the name of each attribute in the file, since you would be writing out the elements in a known order and reading them back in the same order.

However, this begs the concept of versioning the file since any change to the file would introduce incompatabilities.

Here's a very simple example:

class Game
{
    int num_players;
    int num_areas;

public: 
    void Save(ostream & os)
    {
        os << num_players << " ";
        os << num_areas << " ";
    }
    void Restore(istream & is)
    {
        is >> num_players;
        is >> num_areas;
    }
};

If you want to be fancy, you can overload the << and >> operators instead of using Save() and Restore().

Last edited on Jan 29, 2019 at 8:34pm

Jan 29, 2019 at 9:16pm

poteto (525)

Json, ini or something.

Only problem with using struct's raw is that it isn't cross platform, possibly not compatible with new versions of your compiler, or with different target settings (like optimized builds/debug builds), you may need compiler specific macros to make your struct consistent, and you can't easily have infinite sized strings (for descriptions that is important). If you can be happy with that, then go ahead, it is really fast (but remember you don't pay for what you don't use on runtime, worry more about what goes on in runtime, and before assuming benchmark before choosing).

Jan 29, 2019 at 10:50pm

closed account (Ezyq4iN6)

@jonnin

What exactly do you mean by a binary file of structs? I know that a file can contain data meant to be interpreted as binary instead of ASCII, but I don't understand the struct part. I'll avoid using vectors, but thanks for warning me about the strings too. I'll make sure all of my strings stay character arrays. What is the best way to read in a file to structs? fscanf? Also, once you have that data in the structs, where in the program should the conversion of the character arrays to integers take place? With the writing back I guess I imagined something like a get and set function for each variable, so I could use that. However, don't you have to overwrite the entire file with all data once you are ready to save?

@AbstractionAnon

I used get/set functions for private object values in the past, so I guess I figured I needed some kind of structure like that to work with the values.

To your point about each class having a save function, I thought when you wrote a file you had to rewrite the whole thing or append to the end, but couldn't actually go in and just edit parts of it. For this reason I planned only on having one save option that just told the program to write all sections, in order, back to the file. I envisioned 1 or 3 character arrays read from the file, those values would then be extracted into variables somewhere. When it was time to save, they would be inserted back into the arrays, and then those arrays would be written to the file, which I assume destroys the old one and creates a new one by the same name with the new values. Again, I'm not saying this is the correct way, it's just what was in my head at the time.

Thanks for the example, but I'm not sure I understand completely. Is it ok to read each field separately like that? I mean, assuming I had a lot of fields wouldn't that slow the load time immensely? Also, how do you deal with writing the other sections (Area, Item) to the file then? Sorry, it's confusing for me.

@poteto
Thank you for pointing out the possible platform issues, as I may one day like to move this between Windows and Linux, and even use it with other compilers. Right now this is just for practice so I can see what can be done for future ideas, but I will keep that in mind, thanks!

Jan 30, 2019 at 2:49am

jonnin (11494)

here is a better tutorial than I have time to write. See the bottom where he uses a class (same as struct in terms of what I was saying to do). https://courses.cs.vt.edu/cs2604/fall02/binio.html

no don't use C file functions. C++ fully supports this. Just watch for the string/vector / pointer problem. Once you understand the above we can talk about those issues if you want to go there.

binary data is not converted to text or back. its pure. a 64 bit int is 8 total bytes in a binary file; even if it held the value 1000000000 which is more ascii chars than 8.. (and it takes up 8 for 0, too...)

Last edited on Jan 30, 2019 at 2:52am

Jan 31, 2019 at 6:46pm

closed account (Ezyq4iN6)

@jonnin

Thanks for the link, it helps a lot. Here's a bit of the code I have so far that reads in the first section only. I am including the file contents below the code.

#include <fstream>
#include <iostream>
#include <string.h>
#include <math.h>

int char_to_num(char * const convertee, int char_size)
{
  int sum = 0;
  for (int i = 0; i < char_size; i++)
  {
    int multiplier = 1;
    for (int j = char_size - 1; j > i; j--)
      multiplier *= 10;
    sum += multiplier * (convertee[i] - '0');
  }
  return sum;
}

struct World
{
  char name[50];
  char width[4];
  char height[4];
  char starting_x[4];
  char starting_y[4];
  char total_areas[3];
  int w;
  int h;
  int start_x;
  int start_y;
  int area_count;
};

int main()
{
  // Opened as binary in case I use that to save space for numbers later.
  std::ifstream file("game.dat", std::ios::binary);

  World new_world;

  // Read in the character bytes from the file
  file.read(new_world.name, sizeof(new_world.name));
  file.read(new_world.width, sizeof(new_world.width));
  file.read(new_world.height, sizeof(new_world.height));
  file.read(new_world.starting_x, sizeof(new_world.starting_x));
  file.read(new_world.starting_y, sizeof(new_world.starting_y));
  file.read(new_world.total_areas, sizeof(new_world.total_areas));

  new_world.w = char_to_num(new_world.width, sizeof(new_world.width));
  new_world.h = char_to_num(new_world.height, sizeof(new_world.height));
  new_world.start_x = char_to_num(new_world.starting_x, sizeof(new_world.starting_x));
  new_world.start_y = char_to_num(new_world.starting_y, sizeof(new_world.starting_y));
  new_world.area_count = char_to_num(new_world.total_areas, sizeof(new_world.total_areas));

  std::cout << "World Name: " << new_world.name << "\n";
  std::cout << "World Width: " << new_world.w << " m\n";
  std::cout << "World Height: " << new_world.h << " m\n";
  std::cout << "Character X: " << new_world.start_x << " m\n";
  std::cout << "Character Y: " << new_world.start_y << " m\n";
  std::cout << "Total Areas: " << new_world.area_count << " m\n";

  file.close();

  return 0;
}

Edit & run on cpp.sh

// game.dat
123456789012345678901234567890123456789012345678901111222233334444555

Jan 31, 2019 at 7:43pm

jonnin (11494)

<cmath> (math.h does not have the namespace and may break something in a big program).
<cstring> (same)

don't forget that c-strings need +1 space for the terminal zero.
if you need to write 4 letters, you need 5 locations. its possible to not store the ending zero but you would have to put it back in the char arrays when you read, which is extra code to save a few bytes, don't go there.

its a good start, but you do realize you can do this, right:

file.read((&new_world, sizeof(World) );

dunno what char to num does, but if its converting text numbers to numbers, you do that with C strings via sprintf or ATOI / ATOF (sprintf is number to text, ato is text to number, int or float respective) or shovel your c-strings into strings at a higher level in your class structure and use C++ approaches for this.

Last edited on Jan 31, 2019 at 7:53pm

Feb 1, 2019 at 3:53pm

closed account (Ezyq4iN6)

I'll remove the math and string includes from the program. What alternative should I use for math functions and strings in the future?

I read about adding the '\0' to the end of each character string, but would that cause problems when writing back to the file? Also, do they need to be in the data file too?

That read statement only works for reading into structs with character arrays, right? I ask because otherwise I would read it straight into the integers too.
(Edit: Tried this, but it didn't work for me. Here's the error: no matching function for call to 'std::basic_ifstream<char>::read(World*, unsigned int)')

char_to_num does indeed convert the character arrays for each integer into actual integers. I can switch it to sprintf or atoi instead. Is either of those preferred? Also, is there an easy way to return the data from an integer to a character array so that I can easily write the data back to the file?

Thanks for all your help thus far.

Last edited on Feb 1, 2019 at 3:59pm

Feb 1, 2019 at 4:31pm

jlb (4973)

char_to_num does indeed convert the character arrays for each integer into actual integers.

Why are you storing the numbers as characters? Using characters will take up much more space than just "writing" and "reading" the numbers as numbers.

I read about adding the '\0' to the end of each character string, but would that cause problems when writing back to the file?

If you write a character string to the file using the "write" method then the end of string character should still be present since the "write" method will write the entire array.

That read statement only works for reading into structs with character arrays, right?

The write()/read() methods can be used to read and write a structure that contains both C-strings and numeric values.

(Edit: Tried this, but it didn't work for me. Here's the error: no matching function for call to 'std::basic_ifstream<char>::read(World*, unsigned int)')

Yes, it appears that you failed to properly cast the char* to the World*, and don't forget you must first successfully write the information before you can successfully read the information.

Something like the following (untested) (After removing all the unnecessary C-strings):

World new_world;
// Fill in the structure.
std::ostream fout("game.dat", std::ios::binary);
fout.write(reinterpret_cast<char*>(&new_world), sizeof(new_world));

I can switch it to sprintf or atoi instead. Is either of those preferred?

Only if you're writing C code. And note atoi() is also usually frowned upon in C code since it will silently fail. If you're using C++ then you should prefer stringstreams or stoX().

Also, is there an easy way to return the data from an integer to a character array so that I can easily write the data back to the file?

The easiest way would be avoid converting the numbers to and from strings in the first place.

Feb 1, 2019 at 7:00pm

jonnin (11494)

Only if you're writing C code. And note atoi() is also usually frowned upon in C code since it will silently fail. If you're using C++ then you should prefer stringstreams or stoX(). -- The easiest way would be avoid converting the numbers to and from strings in the first place.

Right. To clarify, I would have your struct being put inside a c++ class that promotes the char arrays back to strings, and use the above to do the conversion. If you NEED to work with the char arrays a little, the C functions can do that, but best avoided (esp if you are not well versed in them). Converting to and from text for numbers is one of the slowest things that is commonly done all over (many peoples') code. Do as little of it as possible.

the cast to char* is a little clunky but necessary for the struct read/write operations. Think of it as a cast to bytes, not a cast to characters. THAT is what it is really doing … it a low level operation and its asking you "gimme a block of bytes, tell me how many bytes there are, and ill read/write it". Char* is the block of bytes. And that is my mistake, I forgot the cast in my quick example too. Its easy to forget, but the compiler is quite happy to tell you all about it every time.

integer to char array is sprintf. But you don't need this. Keep it as integers all the time, and let cout (or similar) do the dirty work on demand. if you absolutely do find you must do this, it looks like this .. sprintf(array, "%i", integervar); and doubles get tricksy but my go-to of that is "%1.20f but you probably want to tailor the double one to your needs and there are 5 or 6 codes and nearly infinite formatting you can apply here.

Last edited on Feb 1, 2019 at 7:03pm

Feb 4, 2019 at 3:36pm

closed account (Ezyq4iN6)

@jlb
I'm still relatively new to C++, so I assumed the only way to read in from a file was to read in the bytes sequentially as a character array. I did not know that I could read the numbers, because since many of the number fields take up a different amount of bytes ie. (10, 200, 3000, etc.), I didn't know how to tell the program to expect a certain number of bytes for an integer or float.

Ok, so if I used a \0 after reading I should definitely make sure I write back with one less character to drop the \0.

How do I read into an int in the struct if say this is my file info:
SOME STRING25ANOTHER STRING500

The 25 is 2 characters, and the 500 is 3 characters. How does the read know to grab 2 characters one time and 3 the next time?

I didn't know about reinterpret_cast, but I added that and now it works for reading in the characters.

Since I am using C++, I'll read up on stoX() and stringstreams to see how they work.

@jonnin
So if char arrays are not good and as mentioned by jlb I can read the numbers in directly as numbers, what code would I use to read in the file strings to strings and the numbers to ints (or floats later if needed)? I am fine adding binary data later, but I would like to know how to do it from ASCII text too so that I can make some programs where the data can be easily edited by hand in a text editor. Also, I guess I would need to know how to write it back as ASCII as well.

I'm fine using whatever code is needed to read in the data correctly and it doesn't have to be char strings, as all I care about it getting the data, being able to edit it, and then save it back in the same way I retrieved it.

Thanks for all the help so far.

Feb 4, 2019 at 4:41pm

jlb (4973)

I did not know that I could read the numbers, because since many of the number fields take up a different amount of bytes ie. (10, 200, 3000, etc.), I didn't know how to tell the program to expect a certain number of bytes for an integer or float.

In a binary file numbers take a fixed number of bytes to hold any number that can be represented by the type. For example an int value of 1 is held in the same number of bytes as 1000000.

Ok, so if I used a \0 after reading I should definitely make sure I write back with one less character to drop the \0.

Why would you want to do that? The end of string character '\0' is what makes a string a string.

How do I read into an int in the struct if say this is my file info:

In binary mode you need to know exactly how the information has been written to the file. The read operation needs to read the same number of bytes in the same order.

By the way this: SOME STRING25ANOTHER STRING500 is not a binary representation, but rather a text mode representation.

A binary representation requires special programs to "read" since it is not considered human readable. Here is what a binary representation looks like:

0                 54 68 65 20  66 69 72 73  74 20 73 74  72 69 6E 67  The first string
00000010   00 00 00 00  00 00 00 00  00 00 00 00  00 00 0A 00  .
00000020   00 00 54 68  65 20 73 65  63 6F 6E 64  20 73 74 72  ..The second str
00000030   69 6E 67 00  00 00 00 00  00 00 00 00  00 00 00 00  ing.
00000040   D0 07 00 00                                         ....

What was written to the file was "The first string" (actually an array of char of 30 bytes in size). Followed by the number 10 (0A 00 00 00) followed by "The second string." (again an array of char 30 bytes long), followed by the number 2000 (D0 07 00 00). Note that both "numbers are int which take 4 bytes on my system.

I didn't know about reinterpret_cast, but I added that and now it works for reading in the characters.

Well you really don't need reinterpret_cast when dealing with arrays of characters.

Since I am using C++, I'll read up on stoX() and stringstreams to see how they work.

Don't forget that those functions work with C++ strings not arrays of char or C-strings.

Edit: By the way unless you really need binary mode I suggest you steer clear of binary and just stick with text mode using some kind of separator to separate your data (one item per line is the easiest).

Last edited on Feb 4, 2019 at 4:47pm

Feb 6, 2019 at 1:41am

closed account (Ezyq4iN6)

@jlb
I will definitely aim to have future numeric values in my files written in binary for the sake of an making the file easier for the program to read. Since you mentioned that it wouldn't be human readable, then I would have to design a program to put the values into the file in the first place.

As for the \0, I guess that's similar to the binary issue where I would want to have an editor program to attach that to all strings when it is written to the file.

As you mentioned in your last part, I'll probably stick with text mode for this current project I'm working on, mainly because I want to be able to easily hand edit the input file.

However, I am still a bit confused about reading in text mode. I like the idea of a separator character to keep the fields apart, but what would be the code to read in something like that? Would I read the file character by character until I saw that separator character and then skip ahead for the next field?

Also, I wanted to try one other way similar to my initial example. Assuming I give an exact character length for each field that never changes, how could I read that in?

Ex.
World name is 15 bytes, world width is 3, world height is 3, start area is 10:
World Name 10050 Mountains

I want to capture "World Name" as a string (or char array if better) for the world name, 100 as an integer for width, 50 as an integer for height, and "Mountains" as a string for the start area name.

Also, if I were to say change 100 to 200 in the program, and "Mountains" to "Plains," can that easily be written back? My current method has been reading in everything to set character array widths, then converting the arrays to numbers, then writing them back to arrays, and then writing all of the arrays back in order.

Just as an addendum, the reason I am doing it this way is that one of my professors said that there are many legacy files and also header parts to files that are done this way, so I am trying also learn how to read and write data like this so that I may be able to work with files like that in the future, but I am using my game example to make the learning more fun. :)

Feb 6, 2019 at 4:53am

jlb (4973)

I will definitely aim to have future numeric values in my files written in binary for the sake of an making the file easier for the program to read.

Using binary mode won't necessarily make a file easier to read. In fact, IMO, binary mode makes things more difficult because it is harder to tell what the file contains. Also binary mode is not portable so if you write the files on one system there is no guarantee that you can easily transfer the contents to another system.

As for the \0, I guess that's similar to the binary issue where I would want to have an editor program to attach that to all strings when it is written to the file.

If you write a string to a file you should write the end of string character to the file, that end of string character is what makes a string a string and not just an array of char.

As you mentioned in your last part, I'll probably stick with text mode for this current project I'm working on, mainly because I want to be able to easily hand edit the input file.

You are probably better off sticking with text mode for most everything. Reserve binary mode for special occasions.

However, I am still a bit confused about reading in text mode.

What are you confused about, exactly?

I like the idea of a separator character to keep the fields apart, but what would be the code to read in something like that?

That really depends on what separator you use to separate your data and the data itself. The easiest separator to deal with is the new line character (this means each field is on it's own line).

Would I read the file character by character until I saw that separator character and then skip ahead for the next field?

Usually not, reading character by character is very slow and error prone. You would normally want to read on entire piece of data at a time.

Also, I wanted to try one other way similar to my initial example. Assuming I give an exact character length for each field that never changes, how could I read that in?

First how are you going to write that file and insure that all the sizes are fixed. Remember reading requires something to read so work on writing the file first.

World name is 15 bytes, world width is 3, world height is 3, start area is 10:
World Name 10050 Mountains

Well first you need to make sure each field is the proper width. In your example "World Name" is only 10 characters (bytes). Next your width happens to be 3 in this case, but remember that means that you can only have a width of 100 to 999, is that what you envision? Next look at the height, you say it is 3 bytes but you're only showing two characters (bytes). So the first thing, if you're going to go this route, is to figure out how to write the file with the proper format.

Also, if I were to say change 100 to 200 in the program, and "Mountains" to "Plains," can that easily be written back? My current method has been reading in everything to set character array widths, then converting the arrays to numbers, then writing them back to arrays, and then writing all of the arrays back in order.

Well look at all that conversion you have to do every time you want to read a file, seems quite complex and generally a waste of time. You really should strive to read the file once, storing the information into a data structure for continued use. Then at the end of the session (and probably other logical times) re-write the file with the current information before exiting the program.

Just as an addendum, the reason I am doing it this way is that one of my professors said that there are many legacy files and also header parts to files that are done this way, so I am trying also learn how to read and write data like this so that I may be able to work with files like that in the future,

That may be true but if you want to learn this you really should find one of those file formats and actually try to write and read that particular format. For example a .bmp file is a good example of a binary file format that has a header and data section.

Feb 7, 2019 at 5:21pm

closed account (Ezyq4iN6)

I never considered the portability issue when using binary. I actually use my programs on both a Win 10 and Ubuntu 16.04 computer, so would using binary be a poor choice to move between these systems?

For \0 I didn't realize it had to be manually added for strings, just character arrays. I will keep that in mind when writing strings to a file.

I'll take your advice and stick with text mode.

The new line character makes a lot of sense as a delimiter is it called? It would sure make my files easier to read by a person. I would guess that using Notepad++ or other text editor would insert the correct newline character if I hit Enter/Return after a line, correct? This method also makes it much easier as to not have to read characters alone, but rather a line at a time. I guess getline() is what is used?

As to what I said about having a file with fixed length fields, what I meant is that I will pad the fields with spaces for unused characters. So for example that 50 I used for the second number would actually be [space]50. Padding the empty spaces with 0's for the numbers is also an option. I guess the extra spaces I added in my example got erased. I would also check to make sure I didn't erase spaces between characters inside of string fields, so "World Name" wouldn't become "WorldName".

My thought is that if I have something like [space]50, that I can feed that through a function to erase leading spaces and then convert the 5 and 0 characters to the integer 50. A similar thing could be done for 050, which is why I wrote the char_to_num function in a previous example. And also, in answer to your question, 000-999 is what I want to be able to hold in the 3 character number.

I agree that the number conversion seems like a pain, and that is why I hoped someone had a quicker way to convert lots of text mode numbers into integers, floats, etc. Also, I think what you have mentioned here is how I planned to deal with the file: read in data from the 3 sections into maybe 3 structs, convert the numeric data into integers, make changes to the data, convert the numbers back to characters and write them back to the structs, and then rewrite the 3 structs to the file in order before I exit the program.

I'll have to take a look at reading bmp files. I never thought about trying to read a graphics file, but I'll give it a shot.

Topic archived. No new replies allowed.