C++: Reading and Sorting Binary Files

Pages: 123
Jun 2, 2013 at 3:37am
I've been scratching my head and putting this homework off for a couple days but now that I hunker down to try and do it I'm coming up empty. There's 4 things I need to do.

1) Read a binary file and place that data into arrays

2) Sort the list according to the test scores from lowest to highest

3) Average the scores and output it

4) Create a new binary file with the sorted data

This is what the binary data file SHOULD look as a text file unsorted

A. Smith 89

T. Phillip 95

S. Long 76

But the .dat (the binary file) looks something like A.Smith ÌÌÌÌÌÌÌÌÌÌÌY T. Phillip ÌÌÌÌÌÌÌÌ_ S. Long ip ÌÌÌÌÌÌÌÌL J. White p ÌÌÌÌÌÌÌÌd

I can probably sort since I think I know how to use parallel arrays and index sorting to figure it out, but the reading of the binary file and placing that data into an array is confusing as hell to me as my book doesn't really explain very well.

So far this is my preliminary code which doesn't really do much:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
  #include "stdafx.h"
#include <iostream>
#include <fstream>
#include <Windows.h>
using namespace std;



int get_int(int default_value);
int average(int x, int y, int z);

int main() 

   {


    char filename[MAX_PATH + 1];
    int n = 0;
    char name[3];
    int grade[3];
    int recsize = sizeof(name) + sizeof(int);
    cout << "Enter directory and file name of the binary file you want to open: ";
    cin.getline(filename, MAX_PATH);    

    // Open file for binary write.
    fstream fbin(filename, ios::binary | ios::in);
    if (!fbin) {
    cout << "Could not open " << filename << endl;
    system("PAUSE");
    return -1;
   }

}
Jun 2, 2013 at 5:25am
Where did the binary file come from originally, for example was it created in another program, or supplied by someone else?

At any rate, additional information is required in order to interpret the data. Ideally you will know from the specifications provided along with the file what is the length of the string which holds the name. Also how is the numeric data stored, is it type int, or float or what? I can make guesses at these but that isn't the proper approach, the information should really be known.

Failing that, you could open the .dat file using a hex editor in order to examine the data in hexadecimal mode.


Last edited on Jun 2, 2013 at 6:10am
Jun 2, 2013 at 1:29pm
It was a binary file provided by my professor. And sorry the record structure is name (20 bytes) and grade (int).
Jun 2, 2013 at 2:45pm
ok, i can see that this is an assignment, but i'm gonna help you abit here.
the record is made of a 20bytes long name, and an integer grade.
let's take a look at the first record
A.Smith ÌÌÌÌÌÌÌÌÌÌÌY

the names are shorter than 20bytes, that's why the names are padded with enough Ì until it reaches 20 bytes.
in this record: A.smith has 6 letters, one dot, one whitespace, then it should be padded with 12 Ìs.
after the name comes the integer grade, if you look at the file with a hex editor you will see:
the letter Y is actually the value 0x59, convert it to decimal, it's actually 89 the grade of a.smith, this integer isn't stored in a fixed length, so you can use the standard extraction methods to extract it.
after the integer we can notice a white space to separate records.
Jun 2, 2013 at 2:49pm
you know something, in the records you provided, the names are all 19 bytes long, not 20.
are you sure that the records should be 20bytes, and those records are right?
Last edited on Jun 2, 2013 at 2:50pm
Jun 2, 2013 at 2:53pm
Yeah, that's what the assignment said "the record structure (20 bytes), grade (integer)
Jun 2, 2013 at 3:29pm
Just a quick comment, I was going to say more, but I'm short of time right now.
It looked to me as though the name was char[19] and the grade was a 2-byte integer, usually defined as short.

There are two ways to find out for sure, one is to write the code using whichever values actually work, a bit of trial-and-error.

My preferred approach would be to examine the file in hexadecimal, for Windows I can recommend the free Hexplorer.
http://sourceforge.net/projects/hexplorer/
Jun 2, 2013 at 3:41pm
yeah animus, you definitely should have a hex editor, not just for this assignment, but every programmer should have one.
if you can, open the file and post exactly what the hex editor displayed (i mean copy-paste).
that might unclear some more intel about the records file.
Jun 2, 2013 at 3:55pm
I downloaded HxD, and opened the binary file with it and this is what it shows.


41 2E 53 6D 69 74 68 00 CC CC CC CC CC CC CC CC CC CC CC 59 00 00 00 00 54 2E 20 50 68 69 6C 6C 69 70 00 CC CC CC CC CC CC CC CC 5F 00 00 00 00 53 2E 20 4C 6F 6E 67 00 69 70 00 CC CC CC CC CC CC CC CC 4C 00 00 00 00 4A 2E 20 57 68 69 74 65 00 70 00 CC CC CC CC CC CC CC CC 64 00 00 00
Jun 2, 2013 at 4:28pm
That looks a bit odd.
What I'm seeing there is:
19-bytes name
4-bytes  integer
1-byte   padding
except for the last line which doesn't have any padding.

Here's a quick program which reads the file according to the hex data posted above:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
#include <iostream>
#include <fstream>

using namespace std;

int main()
{
    string filename = "data_edited.dat";
    ifstream fin(filename.c_str(), ios::binary);


    const int nameLength = 19;
    const int maxRecords = 100;

    int scores[maxRecords];
    char names[maxRecords][nameLength];

    int count = 0;

    // read individual items
    while ( fin.read(names[count], sizeof(names[0]))
         && fin.read((char *) &scores[count], sizeof(scores[0])) )
    {
        count++;
        fin.ignore(); // skip the padding byte
    }

    for (int i=0; i<count; i++)
    {
        cout << "Name: " << names[i] << " Score: " << scores[i] << endl;
    }
}

Output:
Name: A.Smith Score: 89
Name: T. Phillip Score: 95
Name: S. Long Score: 76
Name: J. White Score: 100
Last edited on Jun 2, 2013 at 4:28pm
Jun 2, 2013 at 4:51pm
Here's an alternative version, which reads the entire record in one go - but in order to make this work I had to edit the dat file and insert an extra padding byte at the very beginning of the file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>
#include <fstream>

using namespace std;

const int nameLength = 19;
const int maxRecords = 100;

struct record {
    char filler;
    char name[nameLength];
    int score;
};

int main()
{
    string filename = "data_editedB.dat";
    ifstream fin(filename.c_str(), ios::binary);

    record rows[maxRecords];

    int count = 0;

    // read entire record
    while ( fin.read((char *) &rows[count], sizeof(record)))
        count++;

    for (int i=0; i<count; i++)
        cout << "Name: " << rows[i].name << " Score: " << rows[i].score << endl;
}



Jun 2, 2013 at 4:56pm
Thanks, but now I'm confused because the assignment wants me to have that data placed into arrays and then sorted, and averaged. This program reads the data and just outputs it to you right? How would you alter this so it reads and then does what I mentioned? I tried looking it up and came across something like putting that data into a buffer but I couldn't make sense of it.
Jun 2, 2013 at 5:07pm
Both programs posted above read the data into arrays.

The output is purely for diagnostic purposes, to verify that it is working. I suggest you read through the code slowly - some of it is obvious, but there are one or two tricky bits.
15
16
    int scores[maxRecords];
    char names[maxRecords][nameLength];

Those are the arrays. scores is just a single-dimension array. names is a 2D array as each name is itself an array of characters.
Last edited on Jun 2, 2013 at 5:18pm
Jun 2, 2013 at 6:23pm
One more question. The .dat file which I used in version 1 of my code was 95 bytes long. It contains four records. I would expect a binary file to utilise a fixed number of bytes for each record, therefore the file length should be divisible by 4.

So Animus would it be possible (if you would be so kind) to double check the length of the .dat file you have. If indeed it is not evenly divisible by 4 (or whatever is the number of records) I would suggest you query this with your professor and see if this can be explained.

Last edited on Jun 2, 2013 at 6:26pm
Jun 2, 2013 at 7:10pm
I would ask him, but the assignment is due on Tuesday. Anyways should I like upload the file and have you download it? Because all the assignment sheet says is "(the record structure: name (20 bytes), grade (integer)).
Last edited on Jun 4, 2013 at 1:22pm
Jun 2, 2013 at 7:29pm
Thanks, yes, if you could upload the file I'd be interested to take a look.
Jun 2, 2013 at 7:48pm
Jun 2, 2013 at 8:13pm
Thanks for that. It confirms what you previously posted (the hex data).
I'd still go with what I suggested in the previous post here: http://www.cplusplus.com/forum/beginner/103593/#msg558359

But it leaves me feeling uneasy, as I can't match the file with a string length of 20, no matter how I look at it.

My main concern is that part of your project requires you to output a file of your own. When you reach that stage the question will arise, should you do the job properly according to the specification, or try to create a file with a similar botched format to the one you were given.



Jun 2, 2013 at 8:15pm
Will the new file that's botched have a weird output or it just won't have the specified 20 byte for the name?
Jun 2, 2013 at 8:27pm
Not sure I understand. The new file will depend on what code you choose to write. So it's a design decision that needs to be made as to whether to follow the precedent set by the file you were given, or to follow the specifications.

In the real world problems like this can arise, and the decision made could depend upon the circumstances. If the supplied file is currently used by some other programs, you would tend to be pragmatic and follow the same format. But you'd still request clarification.

On the other hand, if a discrepancy is uncovered during program development, then it points to someone having made a mistake somewhere, and you would tend to favour the written specification, but would also query it with the other developers, to make sure you all end up working to the same spec.

I understand that there probably isn't time to get this fully resolved before the project is due. Perhaps you could discuss this with other students on the same course.
Last edited on Jun 2, 2013 at 8:32pm
Pages: 123