Reading individual words from text file

Hello,

I am struggling with finding the best way to read individual words in a text file.
I can use getline to extract whole lines, but I need to pull out individual words and store them in a string array for comparison to another word. Overall it's just a word search program.

Is there a simple way of reading words in C++ or do I have to have it look at every character and find spaces or something?

Thanks for the help!
The formatted extraction operator, >>, already does this. It reads until the next whitespace and skips any leading or trailing whitespace.
Last edited on
The operator >> skips only leading whitespace. It stops extracting when it reaches the end of a word.
I just ran into this issue myself, only I wish to have it ignore the whitespaces.
When you say 'ignore the whitespaces', this could mean one of many things - could you be more specific? To me, >> already ignores whitepaces, but I assume this is not what you mean.
I mean I was attempting to grab the whole line, and not individual words. I didn't really intend to hijack the thread, though. It appears the OP wants to grab individual words, rather than whole lines. There must be something fundamentally different in how we've coded, resulting in us each applying the code in the wrong way.This is the code I have , which could very well be the solution he is looking for.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
using namespace std;
int main()
{
    string r_content,w_content;
    fstream outfile;
    outfile.open ("data.txt",ios_base::out);
    if (!outfile)
    {
       cout<<"Error writing to file!";
       return 0;
    }
    else
    {
    cout<<"Please enter a string to be written to the file:\n";
    cin>>w_content;
    cout<<"Writing to file...\n";
    //send data to file before closing
    outfile<<w_content;
    outfile.close();
    }

    fstream infile;
    infile.open("data.txt",ios_base::in);
    if (!infile)
    {
        cout<<"Error reading file!";
        return 0;
    }
    else
    {
    cout<<"\nReading from file...\n";
    //read data in before closing
    getline(infile,r_content);
    cout<<"The file contents are:\n";
    cout<<r_content;
    infile.close();
    }
}

Please enter a string to be written to the file:
Hey You
Writing to file...

Reading from file...
The file contents are:
Hey

When I check the contents of data.txt manually, I find that only Hey is present, so I could be wrong that the issue was my getline implementation. Rather, it appears to be tied to cin. Changing line 15 to the following makes it work properly.
getline(cin,w_content);
Please enter a string to be written to the file:
Hey You
Writing to file...

Reading from file...
The file contents are:
Hey You

So in the end, I solved my own question. If Darth shows us some code, I'm sure we can nudge him in the right direction. I know that I've used getline to treat commas as delimeters, and put it in a loop to assign each word to it's own array position. Depending on the structure of his input file, this may or may not be a practical solution, or you may wish to have it treat spaces the same way. Try using getline like this within a loop:
getline(inputfilehandle,s[i],' ');
where s is an array of strings and i is the loop counter. If you are confused about the inputfilehandle part, take a look at how I used infile and outfile above.
Last edited on
So it appears I need to use >>. I will try that. Right now the code just uses getline to grab entire lines of text from a text file. So what I need is to figure out how to properly implement the >> operator in my code. Let me try it and if I can't figure it out I'll post my code.

Thanks for the help guys!
Some further experimentation reveals that adding the ' ' to the third parameter makes it treat spaces as delimeters.
Here's where I'm at...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>

using namespace std;

int main()
{
    string testline;
    string word;

    ifstream Test ( "TesterFile.txt" );

    if ( !Test.is_open() )
    {
        cout << "There was a problem opening the file. Press any key to close." << endl;
    }

    while( Test.good() )
    {
        getline ( Test, testline );
        cin >> testline;
        cout << testline << endl;
    }

    cin.ignore();

    return 0;
}


This is killing me, it's supposed to be part of a program I made that can search inside docx files for keywords, and everything works except I can't read individual words in a text file I feel stupid!

How can I make this read an individual word? Right now the console appears and just sits there, nothing displayed, doesn't seem to use a lot of memory or anything so I don't think it's an infinite loop or anything.

Ideas? I tried using .eof() and .good() in the loop, no luck.
Try changing line 22 to getline(Test,testline,' '); and see what happens.

EDIT: Actually it required another change. Try the change mentioned above, but also remove line 23. Also, on line 15, you can simply use !Test as the condition for your if statement.

Check out this code, adapted from yours.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <conio.h>
using namespace std;

int main()
{
    string testline;
    string word[16];

    ifstream Test ( "Data.txt" );

    if (!Test)
    {
        cout << "There was a problem opening the file. Press any key to close.\n";
        getch();
        return 0;
    }

    int i=0;
    //store words in array while outputting, skipping blank lines
    while( Test.good() )
    {
        getline ( Test, testline, ' ');
        if (testline!="")
        {
            word[i]=testline;
            cout << word[i] << endl;
            i++;
        }
    }

    //output whole array with spaces between each word
    cout<<"\nArray contents:\n";
    for (int k=0;k<i;k++)
        cout<<word[k]<<" ";
    return 0;
}

My input file looks like this:
Steven Seagal
1234 Post Drive
Ventura, CA 90734

Adam Sandler
356 Golf Street
Calabasas, CA 92136

and my output looked like this:
Steven
Seagal
1234
Post
Drive
Ventura,
CA
90734

Adam
Sandler
356
Golf Street
Calabasas,
CA
92136

Array contents:
Steven Seagal
1234 Post Drive
Ventura, CA 90734

Adam Sandler
356 Golf Street
Calabasas, CA 92136

It's not perfect though, when outputting word[1] it outputted
Seagal
1234

so more is needed to help it differentiate between words and numerical values, or it has to do with retention of invisible newline characters. I'm going to be looking into this further.
Last edited on
closed account (Dy7SLyTq)
why not just use:
1
2
3
ifstream file("the_file.whatever");
while(file >> some_string)
    some_vector_of_type_string.push_back(some_string);
? it grabs whitespace seperated words
If you use getline with ' ' as the delimiter, it will not skip multiple spaces and it will keep other forms of whitespace that >> would normally skip.
This would explain a lot about my program's behavior. Upon modification, I realized that some values, despite being outputted, were not properly being stored. I changed the array output loop at the end to
1
2
3
 cout<<"\nArray contents:\n";
        for (int i=0;i<11;i++)
        cout<<word[i]<<"("<<i<<")"<<endl;

hoping to display each array position's string. Not only did some entries output to multiple lines, but some values such as the first zip code, were never stored. If you output
word[4]
it is the CA from the first address, yet when outputting
word[5]
it only displays Adam.

I tried making changes as suggested above, but it only made things worse.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <conio.h>
#include <vector>
using namespace std;

int main()
{
    string testline;
    vector<string> word;

    ifstream Test ( "Data.txt" );

    if (!Test)
    {
        cout << "There was a problem opening the file. Press any key to close.\n";
        getch();
        return 0;
    }

    //store words in array while outputting, skipping blank lines
    while( Test>>testline )
    {
        getline ( Test, testline, ' ');
            cout<<testline;
            word.push_back(testline);
    }
    //output whole array with spaces between each word
    cout<<"\nArray contents:\n";
        for (int i=0;i<word.size();i++)
        cout<<word[i]<<"("<<i<<")"<<endl;
    return 0;
}


1234
Ventura,

Adam
356
Calabasas,92136
Array contents:
(0)

1234(1)
(2)

Ventura,(3)
(4)


Adam(5)

356(6)
(7)

Calabasas,(8)
(9)
92136(10)

Changing the condition of the while loop to merely be Test on line 25 results in closer to the desired output, yet still displays some oddities.
Steven
Seagal
1234
Post
Drive
Ventura,
CA
90734

Adam
Sandler
356
Golf
Street
Calabasas,
CA
92136
92136

Array contents:
Steven(0)
Seagal
1234(1)
Post(2)
Drive
Ventura,(3)
CA(4)
90734

Adam(5)
Sandler
356(6)
Golf(7)
Street
Calabasas,(8)
CA(9)
92136(10)
92136(11)

I have two questions:
#1. Why does it output and store the last zip code twice when reading it in?
#2. Why are multiple strings being combined into single entries? (I assume it has something to do with newline characters, but am unsure). Outputting
word[5]
gives me
90734

Adam
now.
Yet I wish to store the information as 16 differing entries, not 12, and without the redundancy of storing zip codes twice.
Last edited on
I got it mostly solved now. LB was right all along about the use of >>, it really is key to reading in individual words.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <conio.h>
#include <vector>
using namespace std;

int main()
{
    string testline;
    vector<string> word;

    ifstream Test ( "Data.txt" );

    if (!Test)
    {
        cout << "There was a problem opening the file. Press any key to close.\n";
        getch();
        return 0;
    }

    //store words in array while outputting
    while( Test )
    {
            Test>>testline;
            cout<<testline<<endl;
            word.push_back(testline);
    }
    //output whole array with array position numbers for each entry
    cout<<"\nArray contents:\n";
        for (int i=0;i<word.size();i++)
        cout<<word[i]<<"("<<i<<")"<<endl;

    return 0;
}

Array contents:
Steven(0)
Seagal(1)
1234(2)
Post(3)
Drive(4)
Ventura,(5)
CA(6)
90734(7)
Adam(8)
Sandler(9)
356(10)
Golf(11)
Street(12)
Calabasas,(13)
CA(14)
92136(15)
92136(16)

Why is it storing the last value twice? Other than that, this code should demonstrate how to collect individual words from file input.

EDIT: Problem solved. I changed the condition of the while loop on line 24 to !Test.eof() and this prevented the last entry being stored twice. Using the above code with that change should allow you to read any entries within a text file to an array or vector of strings. If choosing to go with an array, use word[i]=testline; with i as a counter, instead of push_back inside of the while loop. Just so you know Darth, the console window was awaiting input because of line 23 in the code you posted last.
Last edited on
CplusplusAcolyte wrote:
Why is it storing the last value twice?
Because your loop condition is incorrect:
23
24
25
26
27
28
29
    //store words in array while outputting
    while( Test )
    {
            Test>>testline;
            cout<<testline<<endl;
            word.push_back(testline);
    }
When you read the last bytes of a file, it does not set any eof or bad flags, so the stream is still in good state, meaning you run the loop an extra time. On this extra time, the input operation fails, leaving 'testline' unchanged, and then since you don't care that it failed, you push it onto the vector again. Always perform input in the loop condition:
24
25
26
27
28
while(Test >> testline)
{
    cout << testline << endl;
    word.push_back(testline);
}
CplusplusAcolyte wrote:
EDIT: Problem solved. I changed the condition of the while loop on line 26 to !Test.eof() and this prevented the last entry being stored twice.
No, it didn't, you just got lucky. Never loop on eof.
Last edited on
Thanks LB, that explained a lot. So my earlier attempt in using the loop the way you have shown didn't work because I included my redundant older attempt to read words into the string.
1
2
3
4
5
while(Test >> testline)
{
    cout << testline << endl;
    word.push_back(testline);
}

is much simpler. While I hadn't heard that loop conditions based on eof were bad practice, I like this solution better anyway, and it greatly simplified my code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int main()
{
    string testline;
    vector<string> word;

    ifstream Test ( "Data.txt" );

    if (!Test)
    {
        cout << "There was an error opening the file.\n";
        return 0;
    }

    //store words in vector
    while( Test>>testline )
            word.push_back(testline);

    //output whole vector with position numbers for each entry
    cout<<"Array contents:\n";
        for (int i=0;i<word.size();i++)
        cout<<word[i]<<"("<<i<<")"<<endl;

    return 0;
}

Would you similarly say that looping based on vector size was bad practice, or is this determination limited to eof?
Last edited on
Can't thank you guys enough! I'm learning a lot here.

I don't understand vectors, I'm still a noob and it wasn't covered in the class I took. After some googling, my understanding is a vector is essentially a dynamic array... Is that correct? And what does push_back() do?

Can I use strcmp with a vector that's a string type?


Thanks again for all the help I really appreciate it!
You are correct, a vector is similar to an array, except the size does not need to be known. push_back(testline) adds testline to the last position in the vector, and is similar to array[i]=testline;, although with an array, you need to know what position you are adding it to and make use of a loop to control the value of i.

The code below does the same thing, but stores the values in an array so you can see the difference. You will see that we had to declare how many values the array holds, and had to base our output loop on the size of the array.

An array cannot have values added to it beyond the number specified at declaration, wheras a vector can have additional entries added through the use of push_back, and doesn't need to know how many values it will hold. Lastly, a vector allows us to output without knowing the number of values by using vectorname.size() in the output loop.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int main()
{
    string testline;
    string word[16];

    ifstream Test ( "Data.txt" );

    if (!Test)
    {
        cout << "There was an error opening the file.\n";
        return 0;
    }

    //store words in array
    int i=0;
    while( Test>>testline )
    {
        word[i]=testline;
        i++;
    }

    //output whole array with array position numbers for each entry
    cout<<"Array contents:\n";
        for (int i=0;i<16;i++)
        cout<<word[i]<<"("<<i<<")"<<endl;

    return 0;
}
Last edited on
CplusplusAcolyte wrote:
Would you similarly say that looping based on vector size was bad practice, or is this determination limited to eof?
Yes, I would similarly say that looping on vector size is bad practice. Use iterators instead, or the range-based for-loop when possible.
I suppose I'm still learning the use of iterators, as vectors are fairly new to me and I found this usage simpler. I'll explore this issue further in another thread if I have trouble grasping it. Thanks for all the feedback, LB. It helped me a lot, and likely benefited Darth too.
Last edited on
Topic archived. No new replies allowed.