How to extract only the data I required from different lines within a text file?

Here is the problem: I was asked to extract some data from a text file and insert them into a structure

Like this:

employee.txt

Emp Id = 1290229
Name = Luke Skywalker
Phone Number = 736-6587

Emp Id = 1201195
Name = Alice Watson
Phone Number = 789-4652

Emp Id = 1924899
Name = Jake Muller
Phone Number = 790-0658



And the structure looks like this:

struct Employee
{
char id[12];
char name[30];
char phone_num[10];
};



So the question is: How am I going to extract only the data I wanted from different lines(only the names, ids and phone numbers without other unwanted words)

Coz I just know how to extract whole line from a text file...

I need the concept in detail and if can some code examples will help me alot

Coz Google cannot give me the answer I wanted

Thanks in advance for your help guys.


Last edited on
You probably would like to:
1
2
3
4
Employee e;
while ( in >> e ) {
  // add e to collection
}


The question is, how to in >> e, which requires an operator:
std::istream& operator>> ( std::istream& lhs, Empoyee& rhs );

What should that operator do?
1. Read lines from lhs until it gets a line that starts with "Emp Id ="
2. You do know how many characters are before the ID, so use substr() to get the suffix part of the line
3. Copy suffix into rhs.id
4. Get next line. You should check that it starts with "Name ="
5. Again, copy suffix into rhs.name
6. Get next line. You should check that it starts with "Phone Number ="
7. Again, copy suffix into rhs.phone_num
8. return the lhs
Last edited on
Thanks for the reply at first!!

Honestly I am just a fresh college student and honestly this is one of the problems I met when I am doing my assignment....(but I'm not copying the whole question here!! I'm just wanna ask for the concept coz I am really confused)

Is there other any method besides this, which only requires basic knowledge of fstream, strings, structure and pointer?

Coz I really dont know what do you mean in your answer and even if I know the concept it might not be accepted by my college in this semester tho.......


Last edited on
Hello ThisAintMeDude,

Something that might be in line with what you know

1
2
3
4
5
6
7
8
inFile.ignore(9);
inFile.get(id, 8);

inFile.ignore(7);
inFile.get(name, 31);

inFile.ignore(16);
inFile.get(phoneNum, 11);

The "ignore" statements will get you past everything including the space after the (=) allowing you to store what is left in the variable.

In the struct you might want to increase the size of the arrays by +1 to leave room for the (\0) at the end of the string. As an example the ID would read 8 - 1 giving you 7 characters for the ID + the (\0) at the end.

See http://www.cplusplus.com/reference/istream/istream/get/

I have to ask is this a C or C++ program? And why the character array and not a "std::string"?

Andy
> Coz I just know how to extract whole line from a text file...
Is that as a std::string or a char array?

Basically, read the line and find the = is a good place to start.
std::string - use https://www.cplusplus.com/reference/string/basic_string/find/
char array - use https://www.cplusplus.com/reference/cstring/strchr/

> char id[7];
Your IDs need at least 8 characters.

If this is your 'tutor's' boilerplate code, you're on the wrong course.
*The id[7] has been edited to id[12], sry my typo T^T

Andy: Thanks for the answer!! Your answer is more understandable to me than the first ans given by keskiverto

Also, this is a c++ program indeed, but I also dunno why we have to use char array instead of strings....Maybe there are other reasons that i dont know?





salem c:
> Coz I just know how to extract whole line from a text file...

What i mean here is as a newbie in C++ I only know how to getline(inFile,line) to read the whole line from a text file.... That is why I am stuck here

Also, thanks for the concept!!
Consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
#include <fstream>
#include <iostream>
#include <iomanip>
#include <vector>
#include <limits>

struct Employee {
	char id[12] {};
	char name[30] {};
	char phone_num[10] {};
};

std::ostream& operator<<(std::ostream& os, const Employee& e)
{
	return os << std::setw(7) << e.id << "  " << std::left << std::setw(30) << e.name << "  " << e.phone_num;
}

std::istream& operator>>(std::istream& is, Employee& e)
{
	is.ignore(9);
	is.get(e.id, sizeof(e.id));
	is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

	is.ignore(7);
	is.get(e.name, sizeof(e.name));
	is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

	is.ignore(15);
	is.get(e.phone_num, sizeof(e.phone_num));
	is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
	is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

	return is;
}

int main()
{
	std::ifstream ifs("Employee.txt");

	if (!ifs)
		return (std::cout << "Cannot open file\n"), 1;

	std::vector<Employee> employees;

	for (Employee e; ifs >> e; employees.emplace_back(e));

	for (const auto& e : employees)
		std::cout << e << '\n';

	std::cout << '\n';
}


Which displays:


1290229  Luke Skywalker                  736-6587
1201195  Alice Watson                    789-4652
1924899  Jake Muller                     790-0658


Note that parsing the input file like this is 'fragile' in that if there is a minor change to the format, then the code will need to be changed.
Hello ThisAintMeDude,

It you have any code that you have started with post the complete code so everyone can see where you are at and do not have to guess at what you need.

I can put some code together, but It may not be what you have and may go in the wrong direction.

And just in case:

PLEASE ALWAYS USE CODE TAGS (the <> formatting button), to the right of this box, when posting code.

Along with the proper indenting it makes it easier to read your code and also easier to respond to your post.

http://www.cplusplus.com/articles/jEywvCM9/
http://www.cplusplus.com/articles/z13hAqkS/

Hint: You can edit your post, highlight your code and press the <> formatting button. This will not automatically indent your code. That part is up to you.

You can use the preview button at the bottom to see how it looks.

I found the second link to be the most help.



Andy
seeplus:

I looked through your code and found out there are some limits and vectors which I believe that it is forbidden in my current subject.....but I might take this as a new method to solve this kind of problem

Thank you for spending your time to type such long code for me tho


Andy:

Thanks for the reminder
This is my first time posting discussions here so I still dont know how to post codes heheh
Luckily your answer is understandable and applicable to me
OK Without using vector:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
#include <fstream>
#include <iostream>
#include <iomanip>
#include <limits>

struct Employee {
	char id[12] {};
	char name[30] {};
	char phone_num[10] {};
};

constexpr size_t MAXEMP {50};

std::ostream& operator<<(std::ostream& os, const Employee& e)
{
	return os << std::setw(7) << e.id << "  " << std::left << std::setw(30) << e.name << "  " << e.phone_num;
}

std::istream& operator>>(std::istream& is, Employee& e)
{
	is.ignore(9);
	is.get(e.id, sizeof(e.id));
	is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

	is.ignore(7);
	is.get(e.name, sizeof(e.name));
	is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

	is.ignore(15);
	is.get(e.phone_num, sizeof(e.phone_num));
	is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
	is.ignore(std::numeric_limits<std::streamsize>::max(), '\n');

	return is;
}

int main()
{
	std::ifstream ifs("Employee.txt");

	if (!ifs)
		return (std::cout << "Cannot open file\n"), 1;

	Employee employees[MAXEMP] {};
	size_t noEmp {};

	for (Employee e; noEmp < MAXEMP && ifs >> e; employees[noEmp++] = e);

	for (size_t i = 0; i < noEmp; ++i)
		std::cout << employees[i] << '\n';

	std::cout << '\n';
}


Note that <limits> is used for .ignore(). If you don't want to use limits, just replace std::numeric.... with say 1000. It's just to skip over chars to the next \n.
Last edited on
Hello ThisAintMeDude,

Here is an idea to consider. This should be closer to what you have learned.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include <iostream>
#include <string>

#include <fstream>

constexpr unsigned MAXSIZE{ 10 };

struct Employee
{
    char id[12]{};
    char name[30]{};
    char phone_num[10]{};
};

int main()
{
    const std::string inFileName{ "employee.txt" };  // <--- Put File name here.

    std::ifstream inFile(inFileName);

    if (!inFile)
    {
         //return std::cout << "\n\n     File " << std::quoted(inFileName) << " did not open.\n", 1;
        return std::cout << "\n\n     File \"" << inFileName << "\" did not open.\n", 1;
    }

    unsigned records{}, idx{};  // <--- ALWAYS initialize all your variables.
    Employee employees[MAXSIZE];


    while (inFile.ignore(9) && inFile >> employees[idx].id)
    {
        inFile.ignore(8);
        inFile.get(employees[idx].name, 31);

        inFile.ignore(16);
        inFile.get(employees[idx++].phone_num, 11);

        inFile.ignore();  // <--- Eats the blank line between records.

        records++;
    }

    for (unsigned idx{}; idx < records; idx++)
    {
        std::cout
            << "Emp Id       = " << employees[idx].id << '\n'
            << "Name         = " << employees[idx].name << '\n'
            << "Phone Number = " << employees[idx].phone_num << "\n\n";
    }



	// <--- Keeps console window open when running in debug mode on Visual Studio. Or a good way to pause the program.
	// The next line may not be needed. If you have to press enter to see the prompt it is not needed.
	//std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');  // <--- Requires header file <limits>.
	std::cout << "\n\n Press Enter to continue: ";
	std::cin.get();

	return 0;  // <--- Not required, but makes a good break point for testing.
}


This gives the output of:

Emp Id       = 1290229
Name         = Luke Skywalker
Phone Number = 736-6587

Emp Id       = 1201195
Name         = Alice Watson
Phone Number = 789-4652

Emp Id       = 1924899
Name         = Jake Muller
Phone Number = 790-0658



 Press Enter to continue:



That should give you something to work with.

Andy
Consider which doesn't require knowing how many chars to skip at the beginning, but ignores until finds a '=':

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
#include <fstream>
#include <iostream>
#include <iomanip>

struct Employee {
	char id[12] {};
	char name[30] {};
	char phone_num[10] {};
};

constexpr size_t MAXEMP {50};
constexpr std::streamsize IGNORE {1000};

std::ostream& operator<<(std::ostream& os, const Employee& e)
{
	return os << std::setw(7) << e.id << "  " << std::left << std::setw(30) << e.name << "  " << e.phone_num;
}

std::istream& operator>>(std::istream& is, Employee& e)
{
	is.ignore(IGNORE, '=');
	is.ignore();
	is.get(e.id, sizeof(e.id));
	is.ignore(IGNORE, '\n');

	is.ignore(IGNORE, '=');
	is.ignore();
	is.get(e.name, sizeof(e.name));
	is.ignore(IGNORE, '\n');

	is.ignore(IGNORE, '=');
	is.ignore();
	is.get(e.phone_num, sizeof(e.phone_num));
	is.ignore(IGNORE, '\n');
	is.ignore(IGNORE, '\n');	// Blank line

	return is;
}

int main()
{
	std::ifstream ifs("Employee.txt");

	if (!ifs)
		return (std::cout << "Cannot open file\n"), 1;

	Employee employees[MAXEMP] {};
	size_t noEmp {};

	for (Employee e; noEmp < MAXEMP && ifs >> e; employees[noEmp++] = e);

	for (size_t i = 0; i < noEmp; ++i)
		std::cout << employees[i] << '\n';

	std::cout << '\n';
}

@Andy.

1
2
3
inFile.get(employees[idx].name, 31);

inFile.ignore(16);


What happens if the amount of data is greater than the size of the char array? That's why I have the extra .ignore() after to remove anything left. See above.
Last edited on
@seeplus,

Good point, but I feel this is a school assignment to cover what has been covered so far. There for the struct may have been given with the size of the arrays to cover what might occur and the chances of any line being larger than the array would be small.

Having to deal with anything larger than any array may be for a future class or OJT.

Strange thing is when I tested the name "Jake Muller Alice Watson Luke Skywalker" it stored the whole name and the for loop printed it out with no problem.

My understanding of the above reference page here input should have only taken this much
"Jake Muller Alice Watson Luke ".

If it does that then there is nothing left but the (\n) to ignore.

Even https://en.cppreference.com/w/cpp/io/basic_istream/get says:

3) Same as get(s, count, widen('\n')), that is, reads at most count-1 characters and stores them into character string pointed to by s until '\n' is found.



Since it is taking the whole string one would thing that there would be a run time error when the string exceeds the character array.

Andy
Hello ThisAintMeDude,

I did find something I missed in the while loop.

while (inFile.ignore(9) && inFile >> employees[idx].id)

should be:
while (inFile.ignore(9) && inFile.get(employees[idx].id, 12))

Andy
Andy:

Thanks for the correction once again



seeplus:

your concept of "keeps ignoring until '=' is found" does help alot, thanks

Topic archived. No new replies allowed.