Read variable multiple words between numbers into single string

Aug 15, 2015 at 1:16pm
Hi Everyone,
My task is to read some data in .txt file, like this as example.

"11 03 AC 78 cplusplus is the best 4595890 5677...
83 G 450 A My topics 2344879 139..
WQ 3 11 124 UNIX/Linux Programming 234235 1341..."


Each line starts with 4 string field, maybe number or string, then follow the multiple words i would like to store in one single string field, after that the numerical code and fields.

With the known numbers of words i can use fscanf() to read each field by field, but variable multiple words, like "cplusplus is the best", "My topics", "UNIX/Linux Programming" shown above. How can i capture those and store in a single string field?

Thanks and looking forwards your help
Last edited on Aug 15, 2015 at 1:17pm
Aug 15, 2015 at 1:26pm
> ... after that the numerical code and fields.

How many trailing fields are there in a line?

Is this C or is it C++?
Aug 15, 2015 at 5:07pm
C++

Each line for example 10 trailing fields after multiple words
Aug 15, 2015 at 5:27pm
Use getline(cin, str, ' ') to read the first 4 fields. Then use getline(cin, str) to read the last one.
Aug 15, 2015 at 5:40pm
There are spaces between the first 4 fields and others.
Aug 15, 2015 at 5:50pm
Brute force; but this should be quite adequate unless the number of lines in the file is huge.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
#include <algorithm>

struct data
{
    static constexpr std::size_t HDR_SIZE = 4 ;
    std::string head[HDR_SIZE] ;

    std::string text ;

    static constexpr std::size_t TAIL_SIZE = 10 ;
    std::string tail[TAIL_SIZE] ;
};

bool get_data( std::string str, data& d )
{
    std::istringstream stm(str) ;
    const auto error = [&d] { d = {} ; return false ; } ;

    // read the header fields
    for( std::string& str : d.head ) stm >> str ;
    if( !stm ) return error() ;

    // read each rmaining white space separated token into a vector
    std::vector<std::string> vec ;
    while( stm >> str ) vec.push_back(str) ;

    const std::size_t n = vec.size() ;
    if( n < data::TAIL_SIZE ) return error() ; // #fields is less than expected

    // extract the last TAIL_SIZE fields into tail in reverse order
    for( std::string& s : d.tail ) { s = vec.back() ; vec.pop_back() ; }
    std::reverse( std::begin(d.tail), std::end(d.tail) ) ; // and reverse the tail

    // EDIT: redundant. vec.resize( n - data::TAIL_SIZE ) ; // throw away the extracted fields
    d.text.clear() ; // start with an empty string for text
    // // concatenate the fields that were left
    for( const std::string& str : vec ) d.text += str + ' ' ;
    d.text.pop_back() ; // remove the last extra space

    return true ;
}

int main()
{
    std::istringstream file( "11 03 AC 78 cplusplus is the best 0 1 2 3 4 5 6 7 8 9\n"
                             "83 G 450 A My topics aaa bbb ccc ddd eee fff ggg hhh iii jjj\n"
                             "WQ 3 11 124 UNIX/Linux Programming a0 b1 c2 d3 e4 f5 g6 h7 i8 j9\n" ) ;

    std::string line ;
    data d ;
    while( std::getline( file, line ) && get_data( line, d ) )
    {
        std::cout << "\nhead: " ; for( auto s : d.head ) std::cout << s << ' ' ;
        std::cout << "\n\ntext: '" << d.text << "'\n" ;
        std::cout << "\ntail: " ; for( auto s : d.tail ) std::cout << s << ' ' ;
        std::cout << "\n\n----------------\n" ;
    }
}

http://coliru.stacked-crooked.com/a/6e1268e7c9c99eb7
Last edited on Aug 15, 2015 at 6:10pm
Aug 16, 2015 at 1:49am
Thanks JLBorges.
I haven't use iostream library before, only math.h,ctype.h,string.h,stdlib.h, time.h, stdio.h
I may need time to understand your code
Last edited on Aug 16, 2015 at 2:22am
Aug 16, 2015 at 7:28am
Can we make this without iostream library?
Aug 16, 2015 at 3:24pm
> Can we make this without iostream library?

Yes, we can; but should we? For the beginner, the library provides easy to use facades like std::ifstream - far easier than, say, understanding the intricacies of std::scanf().

If we must do this with the standard C subset of the input/output library,
something like this, perhaps (caveat: not even cursorily tested):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#include <cstdio>
#include <string>
#include <cstring>
#include <deque>
#include <iterator>
#include <vector>
#include <cctype>

struct data
{
    static constexpr std::size_t HDR_SIZE = 4 ;
    std::string head[HDR_SIZE] ;

    std::string text ;

    static constexpr std::size_t TAIL_SIZE = 10 ;
    std::string tail[TAIL_SIZE] ;
};

std::string next_token( const std::string& str, std::size_t& pos )
{
    while( pos < str.size() && std::isspace(str[pos]) ) ++pos ;
    auto begin = pos ;
    while( pos < str.size() && !std::isspace(str[pos]) ) ++pos ;
    return str.substr( begin, pos-begin ) ;
}

std::deque<std::string> split( const std::string& str )
{
    std::deque<std::string> tokens ;

    std::string tok ;
    std::size_t pos = 0 ;
    while( !( tok = next_token( str, pos ) ).empty() ) tokens.push_back(tok) ;

    return tokens ;
}

std::string extract_front( std::deque<std::string>& deq )
{
    if( deq.empty() ) return {} ;

    std::string front = deq.front() ;
    deq.pop_front() ;
    return front ;
}

bool get_data( std::string str, data& d )
{
    const auto error = [&d] { d = {} ; return false ; } ;

    auto tokens = split(str) ;
    if( tokens.size() < ( data::HDR_SIZE + 1 + data::TAIL_SIZE ) ) return error() ;

    // read the header fields
    for( std::string& str : d.head ) str = extract_front(tokens) ;

    // read text
    const auto n_tokens_in_text = tokens.size() - data::TAIL_SIZE ;
    d.text.clear() ; // start with an empty string for text
    for( std::size_t i = 0 ; i < n_tokens_in_text ; ++i ) d.text += extract_front(tokens) + ' ' ;
    d.text.pop_back() ; // remove the last extra space

    // read tail
    for( std::string& str : d.tail ) str = extract_front(tokens) ;

    return true ;
}

// home-grown getline: std::string from C stream
bool lk_getline( std::FILE* file, std::string& str, std::size_t max_line_sz = 1'000'000 )
{
    static std::vector<char> buffer(max_line_sz+1) ;
    if( std::fgets( buffer.data(), buffer.size(), file ) == nullptr ) { str.clear() ; return false ; }

    auto end = std::begin(buffer) ;
    for( char c : buffer ) if( c != 0 && c != '\n' ) ++end ;
    str = { std::begin(buffer), end } ;
    return true ;
}

int main()
{
    std::FILE* file = std::fopen( "whatever", "r" ) ;

    std::string line ;
    data d ;
    while( lk_getline( file, line ) ) if( get_data( line, d ) )
    {
        std::printf( "\nhead: " ) ;
        for( auto s : d.head ) std::printf( "%s ", s.c_str() ) ;

        std::printf( "\n\ntext: '%s'\n", d.text.c_str() ) ;

        std::printf( "\ntail: " ) ;
        for( auto s : d.tail ) std::printf( "%s ", s.c_str() ) ;

        std::printf( "\n\n----------------\n" ) ;
    }
}
Topic archived. No new replies allowed.