Read variable multiple words between numbers into single string

Hi Everyone,
My task is to read some data in .txt file, like this as example.

"11 03 AC 78 cplusplus is the best 4595890 5677...
83 G 450 A My topics 2344879 139..
WQ 3 11 124 UNIX/Linux Programming 234235 1341..."


Each line starts with 4 string field, maybe number or string, then follow the multiple words i would like to store in one single string field, after that the numerical code and fields.

With the known numbers of words i can use fscanf() to read each field by field, but variable multiple words, like "cplusplus is the best", "My topics", "UNIX/Linux Programming" shown above. How can i capture those and store in a single string field?

Thanks and looking forwards your help
Last edited on
> ... after that the numerical code and fields.

How many trailing fields are there in a line?

Is this C or is it C++?
C++

Each line for example 10 trailing fields after multiple words
Use getline(cin, str, ' ') to read the first 4 fields. Then use getline(cin, str) to read the last one.
There are spaces between the first 4 fields and others.
Brute force; but this should be quite adequate unless the number of lines in the file is huge.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
#include <iostream>
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
#include <algorithm>

struct data
{
    static constexpr std::size_t HDR_SIZE = 4 ;
    std::string head[HDR_SIZE] ;

    std::string text ;

    static constexpr std::size_t TAIL_SIZE = 10 ;
    std::string tail[TAIL_SIZE] ;
};

bool get_data( std::string str, data& d )
{
    std::istringstream stm(str) ;
    const auto error = [&d] { d = {} ; return false ; } ;

    // read the header fields
    for( std::string& str : d.head ) stm >> str ;
    if( !stm ) return error() ;

    // read each rmaining white space separated token into a vector
    std::vector<std::string> vec ;
    while( stm >> str ) vec.push_back(str) ;

    const std::size_t n = vec.size() ;
    if( n < data::TAIL_SIZE ) return error() ; // #fields is less than expected

    // extract the last TAIL_SIZE fields into tail in reverse order
    for( std::string& s : d.tail ) { s = vec.back() ; vec.pop_back() ; }
    std::reverse( std::begin(d.tail), std::end(d.tail) ) ; // and reverse the tail

    // EDIT: redundant. vec.resize( n - data::TAIL_SIZE ) ; // throw away the extracted fields
    d.text.clear() ; // start with an empty string for text
    // // concatenate the fields that were left
    for( const std::string& str : vec ) d.text += str + ' ' ;
    d.text.pop_back() ; // remove the last extra space

    return true ;
}

int main()
{
    std::istringstream file( "11 03 AC 78 cplusplus is the best 0 1 2 3 4 5 6 7 8 9\n"
                             "83 G 450 A My topics aaa bbb ccc ddd eee fff ggg hhh iii jjj\n"
                             "WQ 3 11 124 UNIX/Linux Programming a0 b1 c2 d3 e4 f5 g6 h7 i8 j9\n" ) ;

    std::string line ;
    data d ;
    while( std::getline( file, line ) && get_data( line, d ) )
    {
        std::cout << "\nhead: " ; for( auto s : d.head ) std::cout << s << ' ' ;
        std::cout << "\n\ntext: '" << d.text << "'\n" ;
        std::cout << "\ntail: " ; for( auto s : d.tail ) std::cout << s << ' ' ;
        std::cout << "\n\n----------------\n" ;
    }
}

http://coliru.stacked-crooked.com/a/6e1268e7c9c99eb7
Last edited on
Thanks JLBorges.
I haven't use iostream library before, only math.h,ctype.h,string.h,stdlib.h, time.h, stdio.h
I may need time to understand your code
Last edited on
Can we make this without iostream library?
> Can we make this without iostream library?

Yes, we can; but should we? For the beginner, the library provides easy to use facades like std::ifstream - far easier than, say, understanding the intricacies of std::scanf().

If we must do this with the standard C subset of the input/output library,
something like this, perhaps (caveat: not even cursorily tested):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
#include <cstdio>
#include <string>
#include <cstring>
#include <deque>
#include <iterator>
#include <vector>
#include <cctype>

struct data
{
    static constexpr std::size_t HDR_SIZE = 4 ;
    std::string head[HDR_SIZE] ;

    std::string text ;

    static constexpr std::size_t TAIL_SIZE = 10 ;
    std::string tail[TAIL_SIZE] ;
};

std::string next_token( const std::string& str, std::size_t& pos )
{
    while( pos < str.size() && std::isspace(str[pos]) ) ++pos ;
    auto begin = pos ;
    while( pos < str.size() && !std::isspace(str[pos]) ) ++pos ;
    return str.substr( begin, pos-begin ) ;
}

std::deque<std::string> split( const std::string& str )
{
    std::deque<std::string> tokens ;

    std::string tok ;
    std::size_t pos = 0 ;
    while( !( tok = next_token( str, pos ) ).empty() ) tokens.push_back(tok) ;

    return tokens ;
}

std::string extract_front( std::deque<std::string>& deq )
{
    if( deq.empty() ) return {} ;

    std::string front = deq.front() ;
    deq.pop_front() ;
    return front ;
}

bool get_data( std::string str, data& d )
{
    const auto error = [&d] { d = {} ; return false ; } ;

    auto tokens = split(str) ;
    if( tokens.size() < ( data::HDR_SIZE + 1 + data::TAIL_SIZE ) ) return error() ;

    // read the header fields
    for( std::string& str : d.head ) str = extract_front(tokens) ;

    // read text
    const auto n_tokens_in_text = tokens.size() - data::TAIL_SIZE ;
    d.text.clear() ; // start with an empty string for text
    for( std::size_t i = 0 ; i < n_tokens_in_text ; ++i ) d.text += extract_front(tokens) + ' ' ;
    d.text.pop_back() ; // remove the last extra space

    // read tail
    for( std::string& str : d.tail ) str = extract_front(tokens) ;

    return true ;
}

// home-grown getline: std::string from C stream
bool lk_getline( std::FILE* file, std::string& str, std::size_t max_line_sz = 1'000'000 )
{
    static std::vector<char> buffer(max_line_sz+1) ;
    if( std::fgets( buffer.data(), buffer.size(), file ) == nullptr ) { str.clear() ; return false ; }

    auto end = std::begin(buffer) ;
    for( char c : buffer ) if( c != 0 && c != '\n' ) ++end ;
    str = { std::begin(buffer), end } ;
    return true ;
}

int main()
{
    std::FILE* file = std::fopen( "whatever", "r" ) ;

    std::string line ;
    data d ;
    while( lk_getline( file, line ) ) if( get_data( line, d ) )
    {
        std::printf( "\nhead: " ) ;
        for( auto s : d.head ) std::printf( "%s ", s.c_str() ) ;

        std::printf( "\n\ntext: '%s'\n", d.text.c_str() ) ;

        std::printf( "\ntail: " ) ;
        for( auto s : d.tail ) std::printf( "%s ", s.c_str() ) ;

        std::printf( "\n\n----------------\n" ) ;
    }
}
Topic archived. No new replies allowed.