parsing text file

hi i'm new to this forum but not so new to programming except that i've never really messed with reading and tokenizing string data input from a file into a csv file. the problem is that the information that i am parsing has a specific format that i can't think of the best way of handling and outputing into a vector struct that i have made

heres the data i'm parsing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
AUTO PARTES Y ARNESES DE MEXICO (AMSA)
D= MAGNETO 950 G= ARNESES
P= SR MINORU SUZUKI
C= LIC. RAYMUNDO SANTILLAN
R= LIC.MARIA EUGENIA VEGA
T= ABRAHAM MEJIA E= 440
TEL. 639-82-00 FAX EXT. 3050
E MAIL raymundo.santillan@yzk.com.mx

PRODUCTOS ELECTRICOS
D=PARQUE INDUSTRIAL JUAREZ
G=ARNESES PARA AUTOS
P= ING.RAFAEL PARDA
R=LETICIA GONZALEZ
C=ING.JORGE LOPEZ
T=HECTOR ACOSTA
TEL.629-52-00

ADVANCE TRANSFORMER (PHILLIPS)
D= PARQUE INDUSTRIAL FERNANDEZ
TEL.-623-57-47

APLICADORES MEXICANOS
D= MAGNETO #951 C.P. 32630
G= AIRE ACONDICIONADO (REGILLAS)
M= PLEWS EDELMANN, CHICAGO, ILL.
P= RICARDO AGUILAR
C= VANESA MORALES
R= MANUEL VALLES
T= RENE RIOS E= 287
TEL. 630-11-44 EXT 124

BILCO (B.C.H. DE MEXICO)
D= CALLE DEPORTISTAS #7820
G=ESCOTILLAS DE SEGURIDAD
TEL. 640-62-37
E MAIL manuelo@bilco.com


and the code i have so far is this
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
struct comp_info{
	string compania;
	string domicilio; // D=
	string planta_m; // M=  planta matriz
	string gerente_p; // P= planta
	string gerente_c; // C= compras
	string gerente_rh; // R= recursos humanos
	string gerente_m; // T= mantenimiento
	int n_empleados; // E= number de 
	string insumo; // I= requerido
	string giro; // G= 
};

void Tokenize(const string& str,vector<string>& tokens,const string& delimiters = "")  // via http://www.geocities.com/eric6930/cplus.html unknown author presumed Eric.
{
    string::size_type lastPos = str.find_first_not_of(delimiters, 0);
    string::size_type pos     = str.find_first_of(delimiters, lastPos);
    while (string::npos != pos || string::npos != lastPos)
    {
        tokens.push_back(str.substr(lastPos, pos - lastPos));
        lastPos = str.find_first_not_of(delimiters, pos);
        pos = str.find_first_of(delimiters, lastPos);
    }
}

int _tmain(int argc, _TCHAR* argv[])
{
	string file_n;
	vector<comp_info> ci; // this is the vector where i want to store all the info for the companies
	vector<string> tok;
	vector<string> tf;
	
        cout<<"Enter file to parse: ";
	getline(cin,file_n);
	cout<<endl<<"The file to parse is: "<<file_n<<endl;


	ifstream file;
	
	file.open(file_n.c_str(),ios::in);
	string ch[2000];
	if(file.is_open())
	{
		while(!file.eof())
		{
			getline(file,*ch);
			tf.push_back(*ch);		
		}
	}
	else
	{
		cout<<"Wrong filename, enter correct filename"<<endl;
		return 0;
	}
	file.close();
	// input operations completed

	// begin parsing
	comp_info Ci;
	for(int i=0;i<tf.size();i++)
	{
		tok.clear();
		Tokenize(tf.at(i),tok,"=");

		for(int a=0;a<tok.size();a++)
		{

		// right here is where i need the help
		}
	}

	return 0;
}


ok what this is is information from a website that has a directory of alot of companies and it is in the format as stated above.

what need help is since every entry doesn't have the same amount of information i need to verify every entry and then add it to the 'Ci' struct so i can push it back to the ci struct the contents which i will eventually write out to a csv file. and i have been unsuccesful in creating a method that parses each entry which is seperated by a line and can add all the information to the same entry before pushing it back to the ci vector.

some help would be very appreciated as i have been stuck with this and i have about 700 business leads in this format and i need to import them into a database.

thanks in advance, corntoe
getline(file,*ch);
Using a string type, not a char[]. Same for the .push_back() etc.

Then you wanna iterate through the vector. On a new entry, create a new structure and fill it's contents will iterating through *parsing each line* until you get to a new entry, then create new structure and repeat.

e.g
1
2
3
4
5
6
7
while (lines in vector)
 getline()
 if (line == new entry) 
  currentEntry = createNewEntry()
 else if (line.startsWith == "E MAIL")
  currentEntry.parseEmail()
end while


// etc

Topic archived. No new replies allowed.