I need advices to create a class for a tabulated text multi-types file

Hi all,

I need to create a class to manage such text files :

Country  Town    QtPeople   BirthRate
France   Paris   10000000   0.023
Germany  Munchen 6000000    0.012


Here is a use case, say for an existing file like the previous one :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
vector<string> countries(CurrentItemsQt);
vector<string> towns(CurrentItemsQt);
vector<int> population(CurrentItemsQt);
vector<float> birthRate(CurrentItemsQt);
vector<string> columnsNames = {"Country", "Town", "QtPeople", "BirthRate"};
vector<vector<auto> > myVectors = {&countries, &towns, &population, &birthRate};
vector<vector<enum Modes> > modes = {READ, READ, RW, RW};
// the following constructor opens the file and fills the vectors
TableInFile myTable("stats.txt", columnsNames, myVectors, modes);
// the following method updates the corresponding vector, equivalent to : birthRate[0] = 0.022; birthRate[1] = 0.126
myTable.update("BirthRate", {0.022, 0.126});

vector<float> deathRate(CurrentItemsQt);
// init deathRate with data from somewhere
.....
// the following method adds a new column
myTable.addColumn("DeathRate", deathRate); // or myTable.addColumn("DeathRate", deathRate, typeId(float));
//the following method or the destructor rewrite the file
myTable.write();


Here is how would be the class :

1
2
3
4
5
6
7
8
9
10
11
12
13
class TableInFile {
public:
	TableInFile(const std::string &aFileName, std::vector<std::string> &aColumnNames, std::vector<std::vector <void *> > &aVectors, std::vector<enum OpeningMode> &aModes);
	~TableInFile();
	int update(const std::string aColumnName, std::vector <void *> aVector);
	int addColumn(const std::string aColumnName, std::vector <void *> aVector);
	int write();
private:
	std::string mFileName;
	std::vector<std::string> mColumnNames;
	std::vector<vector <void *> > mVectors;
	std::vector<enum OpeningMode> mModes;
};


I really don't know how to manage the arbitrary typing of my columns :
std::vector<std::vector <void *> > &aVectors
std::vector <void *> aVector

Any help would be welcome please.
Last edited on
You want to keep the rows together, this way you don't need parallel vectors.

1
2
3
4
5
6
7
8
9
10
11
12
13
struct CityInfo{
  string country;
  string town;
  unsigned int population;
  double birthRate;
};

class Encyclopedia{
private:
  vector<CityInfo> cities;

// make rest of class
};
Thanks for your answer LowestOne, but :
1) My question remains open.
2) I need parallel vectors instead of all the raws in a structure, because a)the client uses parallel vectors , b) it does not require pre-definitions of all the structures to be used.
Last edited on
I am trying something like this :

1
2
3
4
5
6
TableInFile::TableInFile(const string &aFileName, vector<string> &aColumnNames, vector<string> &aColumnTypes, vector<vector<void *> > &aVectors, vector<enum OpeningMode> &aModes)
		: mFileName(aFileName), mColumnNames(aColumnNames), mColumnTypes(aColumnTypes), mVectors(aVectors), mModes(aModes) {
	// some code
		if (mColumnTypes[j] == typeid(string).name()) {
			vector<string> p = reinterpret_cast<vector<string> >(mVectors[j]);
	// some code 


and get the compilation error :

invalid cast from type ‘std::vector<void*>’ to type ‘std::vector<std::basic_string<char> >’


I thought that reinterpret_cast was not checked by the compiler. How could I do it please ?
Last edited on
Finaly, I wrote it like this. It works, though I imgine that is not state of the art C++ code - do not hezitate to comment it :

.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
enum OpeningMode {READ, WRITE, RW, IGNORE};

class TableInFile {
public:
	TableInFile(const std::string &aFileName, std::vector<std::string> &aColumnNames, std::vector<std::string> &aColumnTypes, std::vector<void *> &aVectors, std::vector<enum OpeningMode> &aModes);
	~TableInFile();
	int update(const std::string aColumnName, std::vector <void *> aVector);
	int addColumn(const std::string &aColumnName, const std::string &aColumnType, void * &aVector, const std::string &aColumnNameToInsertBefore);
	int write();
private:
	std::string mFileName;
	std::vector<std::string> mColumnNames;
	std::vector<std::string> mColumnTypes;
	std::vector<void *> mVectors;
	std::vector<enum OpeningMode> mModes;
	std::ifstream mFile;
	std::ofstream mTmpFile;
	std::vector<int> mPosInFile2VectorId;
	static std::string msSeparator;
};
Last edited on
.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
string TableInFile::msSeparator = " ";

TableInFile::TableInFile(const string &aFileName, vector<string> &aColumnNames, vector<string> &aColumnTypes, vector<void *> &aVectors, vector<enum OpeningMode> &aModes)
		: mFileName(aFileName), mColumnNames(aColumnNames), mColumnTypes(aColumnTypes), mVectors(aVectors), mModes(aModes) {
	// initialisations
	fill(mPosInFile2VectorId.begin(), mPosInFile2VectorId.end(), -1); // -1 for columns to be ignored : all at init
	//
	if (is_file(aFileName.c_str())) { // exists ?
		// Read the input file
		string line, item;
		mFile.open(aFileName.c_str());
		int iNoComment = 0;
		for(int i = 0; getline(mFile, line); i++) {
			if (line[0] == '#') continue;
			iNoComment++;
			istringstream sline( line );
			for(int j = 0; getline(sline, item); j++) {
				if (iNoComment == 1) { // first line == ColumnNames
					bool isFound = false; int iPos = 0; for (vector<string>::iterator ii = mColumnNames.begin(); ii != mColumnNames.end(); ii++, iPos++) if (*ii == item) {isFound = true; break;}
					if (isFound) {
						mPosInFile2VectorId[j] = iPos;
					}
					else { // this column is ignored - we don't load the data in RAM to save ressources
					}
				}
				else if (iNoComment == 2) { // second line == Types
					if (mPosInFile2VectorId[j] != -1) {
						assert(item == mColumnTypes[mPosInFile2VectorId[j]]);
					}
				}
				else if (mPosInFile2VectorId[j] != -1 && mModes[mPosInFile2VectorId[j]] != IGNORE){ // read data
					if (mPosInFile2VectorId[j] == -1) continue;
					if (mColumnTypes[mPosInFile2VectorId[j]] == "string") { // typeid(string).name()) {
						vector<string>* p = static_cast<vector<string>*>(mVectors[mPosInFile2VectorId[j]]);
						p->at(i) = item;
					}
					else if (mColumnTypes[mPosInFile2VectorId[j]] == "int") { // typeid(int).name()) {
						vector<int>* p = static_cast<vector<int>*>(mVectors[mPosInFile2VectorId[j]]);
						p->at(i) = atoi(item.c_str());
					}
					else if (mColumnTypes[mPosInFile2VectorId[j]] == "u_int64_t") { // typeid(u_int64_t).name()) {
						vector<u_int64_t>* p = static_cast<vector<u_int64_t>*>(mVectors[mPosInFile2VectorId[j]]);
						p->at(i) = strtoull(item.c_str(), NULL, 0);
					}
					else if (mColumnTypes[mPosInFile2VectorId[j]] == "u_int32_t") { // typeid(u_int32_t).name()) {
						vector<u_int32_t>* p = static_cast<vector<u_int32_t>*>(mVectors[mPosInFile2VectorId[j]]);
						p->at(i) = strtoul(item.c_str(), NULL, 0);
					}
					else if (mColumnTypes[mPosInFile2VectorId[j]] == "u_int16_t") { // typeid(u_int16_t).name()) {
						vector<u_int16_t>* p = static_cast<vector<u_int16_t>*>(mVectors[mPosInFile2VectorId[j]]);
						p->at(i) = strtoul(item.c_str(), NULL, 0);
					}
					else if (mColumnTypes[mPosInFile2VectorId[j]] == "u_int8_t") { // typeid(u_int8_t).name()) {
						vector<u_int8_t>* p = static_cast<vector<u_int8_t>*>(mVectors[mPosInFile2VectorId[j]]);
						p->at(i) = strtoul(item.c_str(), NULL, 0);
					}
					else if (mColumnTypes[mPosInFile2VectorId[j]] == "float") { // typeid(float).name()) {
						vector<float>* p = static_cast<vector<float>*>(mVectors[mPosInFile2VectorId[j]]);
						p->at(i) = atof(item.c_str()); // strtof absent !
					}
					else if (mColumnTypes[mPosInFile2VectorId[j]] == "double") { // typeid(double).name()) {
						vector<double>* p = static_cast<vector<double>*>(mVectors[mPosInFile2VectorId[j]]);
						p->at(i) = atof(item.c_str()); // strtod absent !
					}
					else throw std::runtime_error(string("Error : unknown type : ") + mColumnTypes[mPosInFile2VectorId[j]]);
				}
			}
		}
		mFile.close();
	}
	else { // new file

	}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
#define areTypesEqual(x, y) (strcmp(x.c_str(), typeid(y).name())) == 0)

int TableInFile::write() {
	string tmpFileName = "." + mFileName;
	if (is_file(tmpFileName.c_str())) throw std::runtime_error(string("Error : temporary file exists : ") + tmpFileName);
	mTmpFile.open(tmpFileName.c_str());
	if (is_file(mFileName.c_str())) { // exists ?
		// Read the input file and write in the output file
		string line, item;
		mFile.open(mFileName.c_str());
		int iNoComment = 0;
		for(int i = 0; getline(mFile, line); i++) {
			if (line[0] == '#') {
				mTmpFile << line;
				continue;
			}
			iNoComment++;
			if (iNoComment == 1 || iNoComment == 2) { // first line == ColumnNames
				mTmpFile << line;
				continue;
			}
			istringstream sInputLine( line );
			ostringstream sOutputLine;
			for(int j = 0; getline(sInputLine, item); j++) {
				if (mPosInFile2VectorId[j] == -1 || mModes[mPosInFile2VectorId[j]] == READ || mModes[mPosInFile2VectorId[j]] == IGNORE) {
					sOutputLine << item;
					continue;
				}
				if (mColumnTypes[mPosInFile2VectorId[j]] == "string") { // string(typeid(string).name())) {
					vector<string>* p = static_cast<vector<string>*>(mVectors[mPosInFile2VectorId[j]]);
					sOutputLine << p->at(i);
				}
				else if (mColumnTypes[mPosInFile2VectorId[j]] == "int") { // string(typeid(int).name())) {
					vector<int>* p = static_cast<vector<int>*>(mVectors[mPosInFile2VectorId[j]]);
					sOutputLine << p->at(i);
				}
				else if (mColumnTypes[mPosInFile2VectorId[j]] == "u_int64_t") { // string(typeid(u_int64_t).name())) {
					vector<u_int64_t>* p = static_cast<vector<u_int64_t>*>(mVectors[mPosInFile2VectorId[j]]);
					sOutputLine << p->at(i);
				}
				else if (mColumnTypes[mPosInFile2VectorId[j]] == "u_int32_t") { // string(typeid(u_int32_t).name())) {
					vector<u_int32_t>* p = static_cast<vector<u_int32_t>*>(mVectors[mPosInFile2VectorId[j]]);
					sOutputLine << p->at(i);
				}
				else if (mColumnTypes[mPosInFile2VectorId[j]] == "u_int16_t") { // string(typeid(u_int16_t).name())) {
					vector<u_int16_t>* p = static_cast<vector<u_int16_t>*>(mVectors[mPosInFile2VectorId[j]]);
					sOutputLine << p->at(i);
				}
				else if (mColumnTypes[mPosInFile2VectorId[j]] == "u_int8_t") { // string(typeid(u_int8_t).name())) {
					vector<u_int8_t>* p = static_cast<vector<u_int8_t>*>(mVectors[mPosInFile2VectorId[j]]);
					sOutputLine << p->at(i);
				}
				else if (mColumnTypes[mPosInFile2VectorId[j]] == "float") { // string(typeid(float).name())) {
					vector<float>* p = static_cast<vector<float>*>(mVectors[mPosInFile2VectorId[j]]);
					sOutputLine << p->at(i);
				}
				else if (mColumnTypes[mPosInFile2VectorId[j]] == "double") { // string(typeid(double).name())) {
					vector<double>* p = static_cast<vector<double>*>(mVectors[mPosInFile2VectorId[j]]);
					sOutputLine << p->at(i);
				}
				else throw std::runtime_error(string("Error : unknown type : ") + mColumnTypes[mPosInFile2VectorId[j]]);
				sOutputLine << msSeparator;
			}
			mTmpFile << sOutputLine.str() << endl;
			if (i % 60 == 59) {
				mTmpFile << "# ";
				for (vector<string>::iterator ii = mColumnNames.begin(); ii != mColumnNames.end(); ii++) mTmpFile << *ii << msSeparator;
				mTmpFile << endl;
			}
		}
		mFile.close();
	}
	else { // new file
		for (vector<string>::iterator ii = mColumnNames.begin(); ii != mColumnNames.end(); ii++) mTmpFile << *ii << msSeparator; mTmpFile << endl;
		for (vector<string>::iterator ii = mColumnTypes.begin(); ii != mColumnTypes.end(); ii++) mTmpFile << *ii << msSeparator; mTmpFile << endl;
		bool isNotFinished = true;
		for (int i = 0; isNotFinished; i++) {
			ostringstream sOutputLine;
			try {
				for (u_int32_t j = 0; j < mVectors.size(); j++) {
					if (mColumnTypes[j] == "string") { // string(typeid(string).name())) {
						vector<string>* p = static_cast<vector<string>*>(mVectors[j]);
						sOutputLine << p->at(i);
					}
					else if (mColumnTypes[j] == "int") { // string(typeid(int).name())) {
						vector<int>* p = static_cast<vector<int>*>(mVectors[j]);
						sOutputLine << p->at(i);
					}
					else if (mColumnTypes[j] == "u_int64_t") { // string(typeid(u_int64_t).name())) {
						vector<u_int64_t>* p = static_cast<vector<u_int64_t>*>(mVectors[j]);
						sOutputLine << p->at(i);
					}
					else if (mColumnTypes[j] == "u_int32_t") { // string(typeid(u_int32_t).name())) {
						vector<u_int32_t>* p = static_cast<vector<u_int32_t>*>(mVectors[j]);
						sOutputLine << p->at(i);
					}
					else if (mColumnTypes[j] == "u_int16_t") { // string(typeid(u_int16_t).name())) {
						vector<u_int16_t>* p = static_cast<vector<u_int16_t>*>(mVectors[j]);
						sOutputLine << p->at(i);
					}
					else if (mColumnTypes[j] == "u_int8_t") { // string(typeid(u_int8_t).name())) {
						vector<u_int8_t>* p = static_cast<vector<u_int8_t>*>(mVectors[j]);
						sOutputLine << p->at(i);
					}
					else if (mColumnTypes[j] == "float") { // string(typeid(float).name())) {
						vector<float>* p = static_cast<vector<float>*>(mVectors[j]);
						sOutputLine << p->at(i);
					}
					else if (mColumnTypes[j] == "double") { // string(typeid(double).name())) {
						vector<double>* p = static_cast<vector<double>*>(mVectors[j]);
						sOutputLine << p->at(i);
					}
					else throw std::runtime_error(string("Error : unknown type : ") + mColumnTypes[j]);
					sOutputLine << msSeparator;
				}
				mTmpFile << sOutputLine.str() << endl;
			}
			catch(...) {
				isNotFinished = false;
			}
			if (i % 60 == 59) {
				mTmpFile << "# ";
				for (vector<string>::iterator ii = mColumnNames.begin(); ii != mColumnNames.end(); ii++) mTmpFile << *ii << msSeparator;
				mTmpFile << endl;
			}
		}
	}
	mTmpFile.close();
	return 0;
}

TableInFile::~TableInFile() {
	write();
}
use :
1
2
3
4
5
6
7
8
	vector<string> col = {"string1", "uint2", "float3"};
	vector<string> typ = {"string", "u_int32_t", "float"};
	vector<string> string1 = {"raw1", "raw2", "raw3", "raw4"};
	vector<u_int32_t> uint2 = {111, 222, 333, 444};
	vector<float> float3 = {1.1, 2.2, 3.3, 4.4};
	vector<void *> vec = {static_cast<void *>(&string1), static_cast<void *>(&uint2), static_cast<void *>(&float3)};
	vector<enum OpeningMode> mod = {READ, READ, RW};
	TableInFile t("table.txt", col, typ, vec, mod);


result :
$ cat .table.txt
string1 uint2 float3 
string u_int32_t float 
raw1 111 1.1 
raw2 222 2.2 
raw3 333 3.3 
raw4 444 4.4 

Last edited on
man wrote:
scanf, fscanf, sscanf, vscanf, vsscanf, vfscanf - input format conversion
1
2
3
       int scanf(const char *format, ...);
       int fscanf(FILE *stream, const char *format, ...);
       int sscanf(const char *str, const char *format, ...);


The scanf() family of functions scans input according to format. This format may contain conversion specifications; the results from such conversions, if any, are stored in the locations pointed to by the pointer arguments that follow format. Each pointer argument must be of a type that is appropriate for the value returned by the corresponding conversion specification.




1
2
3
4
5
		for(int i = 0; getline(mFile, line); i++) { //read 1 line
			if (line[0] == '#') continue;
			iNoComment++;
			istringstream sline( line ); //construct an stream with that line.
			for(int j = 0; getline(sline, item); j++) { //¿how many lines are in 1 line?  
Last edited on
Thanks ne555, but :

The function expects a sequence of pointers as additional arguments, each one pointing to an object of the type specified by their corresponding %-tag within the format string, in the same order.


As I don't know the order nor the types the client will use a priori.....
You do know the types and the order.
1
2
3
4
5
6
7
8
9
vector<string> countries(CurrentItemsQt); //types
vector<string> towns(CurrentItemsQt);
vector<int> population(CurrentItemsQt);
vector<float> birthRate(CurrentItemsQt);
//vector<string> columnsNames = {"Country", "Town", "QtPeople", "BirthRate"}; //read from the file
vector<vector<auto> > myVectors = {&countries, &towns, &population, &birthRate}; //order
vector<vector<enum Modes> > modes = {READ, READ, RW, RW};
// the following constructor opens the file and fills the vectors
TableInFile myTable("stats.txt", columnsNames, myVectors, modes);
IIRC you wanted a wrapper for
1
2
for(size_t K=0; input>>towns[K]>>population[K]>>birthRate[K]; ++K) //someway the vectors have the right size
  ;


There was a method with 'variadic templates' that I can't recall, to replace scanf()
Last edited on
You do know the types and the order.


No, that's only a test client I wrote. But I don't know what the real clients will do. Some may want just a few columns, some will want to derive new files picking some columns, adding new ones of their own, re-order columns.
To be able to use your class the client needs to create the vectors.
To create the vector it needs to know the types of the columns.
When choosing the columns that he wants, the user needs to know the order. (there could be a problem with repeated columns if he does not know).

So you read the columns specification from the file, and the users tells which ones wants. ¿what info is missing?
So you read the columns specification from the file, and the users tells which ones wants.
: No ! I don't have this information. The library will be delivered and then the client will use it. There will be no feed back. It has to be flexible, multi-users, multi-projects.
In fact, it seems that I have re-discovered data-bases !!! Stupid am I.
Topic archived. No new replies allowed.