file i/o convert rows to columns

Pages: 12
I have a few text files with 10000 rows and every row has 100 columns. Actually size is unknown in some cases. I just want to convert rows to columns.

Data is mix of integers, strings, floats, doubles.

I tried but it writes data the data in row if i dont use "endl" in the end and if i use it, then it writes the data in one coloum(very long).

can any one tell me what mistake am i making?

1
2
3
4
5
6
7
8
9
8,8,8,8,8,8,0.595974,8,40877036,73,601,8,8,73,1481,1479,2153,4922.35,8,6.13849,8,8,8,8,8,8,8,8,8,8,8,8,8,909,26154,8,8,129,0,8,1481,1481,0,8328.6,239633,6.11697,8,8,8,8,8,8,8,8,8,8,8,8,0.595974,1722821,40877036,73,601,8,45,73,1481,1479,2153,8,8,6.13849,7.86395,0.00847591,8,8,8,8,8,8,8,8,8,8
6.18782,8,8,8,8,8,0.595974,8,40877036,73,601,8,8,8,8,1479,2153,4922.35,116792,8,7.86395,0.00847591,0.00737798,8,8,8,8,8,8,8,8,8,0.5,909,8,129,8,8,8,1481,1481,8,8,8328.6,8,6.11697,7.86403,0.00046383,0.00608826,8,8,8,8,8,8,8,8,8,0.595974,1722821,8,73,8,8,45,73,8,8,8,4922.35,8,6.13849,8,0.00847591,0.00737798,8,8,8,8,8,8,8,8,8
7.86844,8,8,8,8,8,8,8,40877036,8,8,130,45,73,8,1479,2153,4922.35,116792,6.13849,7.86395,0.00847591,0.00737798,8,8,8,8,8,8,8,8,8,8,8,26154,129,8,8,0,1481,1481,8,0,8,8,8,8,8,0.00608826,8,8,8,8,8,8,8,8,8,0.595974,1722821,8,8,8,130,8,8,8,8,2153,4922.35,8,6.13849,7.86395,0.00847591,8,8,8,8,8,8,8,8,8,8
9,0.0010252,9,9,9,9,0.595974,1722821,40877036,73,601,130,45,9,1481,1479,9,9,9,6.13849,9,9,0.00737798,9,9,9,9,9,9,9,9,9,9,909,9,129,9,9,9,9,1481,1481,9,9,239633,6.11697,9,0.00046383,9,9,9,9,9,9,9,9,9,9,0.595974,1722821,9,73,601,130,45,9,1481,9,2153,9,9,6.13849,9,0.00847591,9,9,9,9,9,9,9,9,9,9
4,4,4,4,4,4,0.595974,1722821,4,73,601,130,4,4,1481,4,4,4922.35,4,6.13849,7.86395,0.00847591,0.00737798,4,4,4,4,4,4,4,4,4,4,909,26154,4,129,4,4,4,1481,4,4,8328.6,4,4,4,0.00046383,0.00608826,4,4,4,4,4,4,4,4,4,4,1722821,40877036,4,601,4,4,73,1481,1479,4,4,4,4,7.86395,4,0.00737798,4,4,4,4,4,4,4,4,4
4,0.0208309,4,4,4,4,0.595974,1722821,40877036,73,601,130,4,73,1481,4,2153,4,116792,6.13849,7.86395,0.00847591,0.00737798,4,4,4,4,4,4,4,4,4,4,909,4,4,129,4,0,1481,1481,4,4,4,4,4,4,0.00046383,0.00608826,4,4,4,4,4,4,4,4,4,0.595974,1722821,40877036,4,4,130,45,4,1481,4,2153,4922.35,116792,4,4,4,0.00737798,4,4,4,4,4,4,4,4,4
7.86634,0.00121498,3,3,3,3,0.595974,3,40877036,73,601,3,45,73,1481,1479,3,4922.35,3,6.13849,7.86395,3,0.00737798,3,3,3,3,3,3,3,3,3,3,909,3,129,3,129,3,1481,3,3,3,8328.6,239633,6.11697,3,3,3,3,3,3,3,3,3,3,3,3,0.595974,3,40877036,3,601,130,3,73,1481,1479,2153,4922.35,116792,6.13849,7.86395,3,0.00737798,3,3,3,3,3,3,3,3,3
0,0.00127912,0,0,0,0,0.595974,1722821,0,0,0,130,45,73,1481,1479,2153,0,0,6.13849,7.86395,0.00847591,0.00737798,0,0,0,0,0,0,0,0,0,0,0,26154,129,129,129,0,1481,1481,1481,0,8328.6,239633,6.11697,0,0,0,0,0,0,0,0,0,0,0,0,0.595974,1722821,0,0,0,130,45,0,0,0,0,4922.35,116792,6.13849,7.86395,0.00847591,0,0,0,0,0,0,0,0,0,0
7.85176,0.00126195,7,7,7,7,0.595974,1722821,40877036,7,601,7,7,7,1481,1479,2153,7,116792,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,129,129,129,7,1481,1481,7,7,7,239633,7,7,7,0.00608826,7,7,7,7,7,7,7,7,7,7,1722821,40877036,7,601,130,7,73,1481,1479,2153,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7


1
2
3
4
5
6
7
8
9
10
11
12
13
  std::string line;
		while (getline(infile_txt, line))
		{
			std::stringstream   linestream(line);
			std::string         value;
				
			while (getline(linestream, value, ','))
			{
				outfile  << value <<endl;
			}
				

			}
I just want to convert rows to columns.
You mean like that?
1 2 3   1 4 7
4 5 6 → 2 5 8
7 8 9   3 6 9


If so, you will need to either have multiple passes in file or store input somewhere.

As soon you insert newline smbol, you are done with the line. You cannot append to it anymore. So it is important that whole line is written before you start writing next one.
yes, exactly like this.

1
2
3
1 2 3       1 4 7
4 5 6  ->  2 5 8
7 8 9       3 6 9


i am reading the whole line first and then sending it to the output file but its giving me output like this:
1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9
Last edited on
1 read file
2 write file

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
void transpose_CSV( const std::string& filename )
{
  typedef std::vector <std::string> record;
  std::deque <record> table;
  std::size_t cols = 0;

  // read the file
  {
    std::ifstream f( filename );
    std::string s;
    while (std::getline( f, s ))
    {
      record r;
      std::istringstream ss( s );
      std::string cell;
      while (std::getline( ss, cell, ',' ))
        r.emplace_back( cell );
      table.emplace_back( r );
      cols = std::max <std::size_t> ( cols, r.size() );
    }
  }

  // write the file, transposing (col <--> row)
  {
    std::ofstream f( filename );
    for (std::size_t col = 0; col < cols; col++)
    {
      f << table[ 0 ][ col ];
      for (std::size_t row = 1; row < table.size(); row++)
      {
        f << ",";
        if (col < table[ row ].size()) f << table[ row ][ col ];
      }
      f << "\n";
    }
  }
}

I don't believe I missed anything.
The function does not assume that the CSV is square. (That is, it will still work if not all the records are the same size.)

Hope this helps.

[edit] Fixed some errors.
Last edited on
the program gives error about
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
void transpose_CSV( const std::string& filename )
{
  std::vector <std::string> record;
  std::deque <record> table;        // <----------- ERROR:  deque is not a member of std
  std::size_t cols = 0;

  // read the file
  {
    std::ifstream f( filename );
    std::string s;
    while (std::getline( f, s ))
    {
      record r;                           //<---------------ERROR: undeclared identifier
      std::istringstream ss( s );
      std::string cell;
      while (std::getline( ss, cell, ',' ))
        r.emplace_back( cell );
      table.emplace_back( r );
      cols = std::max( cols, r.size() );      //<--------------ERROR: no instance of overload function matches the argument list.
    }
  }

  // write the file, transposing (col <--> row)
  {
    std::ofstream f( filename );
    for (std::size_t col = 0; col < cols; col++)
    {
      f << table[ 0 ][ col ];
      for (std::size_t row = 1; row < table.size(); row++)
      {
        f << ",";
        if (col < table[ row ].size()) f << table[ row ][ col ];
      }
      f << "\n";
    }
  }
}
#include <deque>
okay and there are still other two errors...

record r;

you used the name of a vector of type string in declaration of another variable. I dint get that.

and secondly
std::max

i have read about it, its included in algorithm.h but how are you implementing it?
there are 4 ways of implementation.
Sorry, forgot to typedef that as a record.
And max should have worked -- both arguments are size_t... but fixed that as well.
Hopefully it will work for you now.
http://www.cplusplus.com/forum/beginner/171037/#msg852563
thank you so much Duoas! :-)

it worked!
one concern, since i have 10000 rows and every row has 100 columns, its taking long time to convert that file.

For less rows, it executes faster.

is there a way to optimize it?
For eg: dont store too much data, just read row by row and output column by column..

is it possible?
Since you are transposing, there is no other way than to read the entire file.

BUT... it does take a long time to mess with parsing the text while reading it.
For lickity-split speed, you need to read the file at once, then play with indices into the string, then write from that.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <tuple>
#include <vector>

//----------------------------------------------------------------------------
std::string load_file_at_once( const std::string& filename )
{
  std::string s;
  std::ifstream f( filename, std::ios::binary );
  if (!f)
  {
    std::cerr << "Could not open INFILE " << filename << "\n";
    std::exit( 1 );
  }
  std::ostringstream ss;
  ss << f.rdbuf();
  return std::move( ss.str() );
}

//----------------------------------------------------------------------------
typedef std::vector <std::vector <std::size_t> > indices;
typedef std::tuple <std::string, indices, std::size_t> index_info;

//----------------------------------------------------------------------------
index_info parse( const std::string& text )
{
  indices     xys;
  std::size_t max_col = 0;
  std::size_t index   = 0;
  
  auto push_new = [ &xys, &index ]() -> void
  {
    xys.resize( xys.size() + 1 );
    xys.back().push_back( index );
  };
  
  push_new();
  for (char c : text)
  {
    ++index;
    switch (c)
    {
      case ',':  xys.back().push_back( index ); break;
      case '\n': max_col = std::max( max_col, xys.back().size() );
                 push_new();
      case '\r': break;
    }
  }
  if (xys.back().size() < 2) xys.resize( xys.size() - 1 );

  return std::make_tuple( text, std::move( xys ), max_col );
}

//----------------------------------------------------------------------------
void save_file( const std::string& filename, const index_info& info )
{
  const std::string& text    = std::get <0> (info);
  const indices&     xys     = std::get <1> (info);
  const std::size_t  max_col = std::get <2> (info);
  
  auto get = [ &text, &xys, &max_col ]( std::size_t row, std::size_t col ) -> std::string
  {
    if (col < xys[ row ].size())
    {
      std::size_t n = xys[ row ][ col ];
      std::size_t x = text.find_first_of( ",\n\r", n );
      return text.substr( n, x - n );
    }
    return "";
  };
  
  std::ofstream f( filename );
  for (std::size_t col = 0; col < max_col; col++)
  {
    f << get( 0, col );
    for (std::size_t row = 1; row < xys.size(); row++)
    {
      f << "," << get( row, col );
    }
    f << std::endl;
  }
}

//----------------------------------------------------------------------------
int main( int argc, char** argv )
{
  if (argc != 3)
  {
    std::cout << "usage:  " << argv[0] << " INFILE OUTFILE\n";
    return 1;
  }
  save_file( argv[2], parse( load_file_at_once( argv[1] ) ) );
}

Enjoy!
> since i have 10000 rows and every row has 100 columns, its taking long time to convert that file.

How much time does it take when compiled with optimisations enabled?

Brute force, parsing each line on the fly, as it is read:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#include <iostream>
#include <vector>
#include <sstream>
#include <fstream>
#include <iomanip>

std::string trim( std::string str )
{
    static std::istringstream stm ;
    stm.clear() ;
    stm.str(str) ;
    return stm >> str ? str : "" ;
}

void process_line( std::vector< std::vector<std::string> >& vec, const std::string& line )
{
    static std::istringstream stm ;

    stm.clear() ;
    stm.str(line) ;

    if( vec.empty() ) vec.resize(1) ;
    std::size_t col = 0 ;

    std::string token ;
    while( std::getline( stm, token, ',' ) )
    {
        if( vec.size() < ( col+1 ) )
        {
            vec.resize( col+1 ) ;

            // for all previous lines, pad up with empty strings
            vec[col].resize( vec.front().size() - 1 ) ;
        }

        vec[col++].push_back( trim(token) ) ;
    }

    // take care of this line not having all the values
    while( col < vec.size() ) vec[col++].emplace_back( "" ) ;
}

std::vector< std::vector<std::string> > transpose_read_csv( std::ifstream file )
{
    std::vector< std::vector<std::string> > result ;

    std::string line ;
    while( std::getline( file, line ) ) process_line( result, line ) ;

    return result ;
}

void create_test_file( const char* path, int nlines, int nvalues )
{
    std::ofstream file( path ) ;
    for( int i = 0 ; i < nlines ; ++i )
    {
        for( int j = 0 ; j < nvalues + (i%3) * (i%5+1) ; ++j ) file << i * 10 + j + 10 << ',' ;
        if( i%2 ) file << 88 ;
        file << '\n' ;
    }

    // to verify that empty values, and trimming are handled correctly
    file << "   ab   " << "    ,  ,,   " << 98 << "    , , cd,,,," << 99 << ",,,,," ;
}

int main()
{
    const char* const path = "test.txt" ;
    const int nlines = 5 ;
    const int nvalues_per_line = 10 ;
    create_test_file( path, nlines, nvalues_per_line ) ;

    std::cout << std::ifstream( path ).rdbuf() << "\n-------------\n\n" ;

    for( const auto& row : transpose_read_csv( std::ifstream( path ) ) )
    {
        // use a period as placeholder for empty strings, a vertical bar as placeholder for end of line
        for( const auto& str : row ) std::cout << std::setw(4) << ( str.empty() ? "." : str ) ;
        std::cout << "|\n" ;
    }
}

http://coliru.stacked-crooked.com/a/8fa8cdd1dc6933a2


With 6668 lines, each having 100+ values (coliru wouldn't allow creation of larger files)
http://coliru.stacked-crooked.com/a/76565f70a40bf923

ln -s /Archive2/76/565f70a40bf923/main.cpp create.cpp
echo -e '======== create (clang++) ==========' && clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors create.cpp -o create && time ./create 
ls -l test.txt
echo -e '\n======== parse (clang++) ==========' && clang++ -std=c++14 -stdlib=libc++ -O3 -Wall -Wextra -pedantic-errors main.cpp -oparse && time ./parse
echo -e '======== create (g++) ==========' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors create.cpp -o create && time ./create 
ls -l test.txt
echo -e '\n======== parse (g++) ==========' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors main.cpp -oparse && time ./parse
======== create (clang++) ==========

real	0m0.283s
user	0m0.268s
sys	0m0.012s
-rw-r--r-- 1 2001 2000 4020976 Aug  9 09:32 test.txt

======== parse (clang++) ==========
111 rows 6668 cols

real	0m0.389s
user	0m0.344s
sys	0m0.040s
======== create (g++) ==========

real	0m0.111s
user	0m0.088s
sys	0m0.020s
-rw-r--r-- 1 2001 2000 4020976 Aug  9 09:32 test.txt

======== parse (g++) ==========
111 rows 6668 cols

real	0m0.294s
user	0m0.236s
sys	0m0.060s

http://coliru.stacked-crooked.com/a/87299ffa6ff9c60a

The bytes in the file (about 4MB) would be cached by the OS for the read; to compensate for that, we can add the two together, and we would get:
clang++ / libc++ : 672 millisecs elapsed wall clock time
g++ / libstdc++ : 405 millisecs elapsed wall clock time

The lines in your file are more, and each value in the line would be somewhat larger; but this would scale linearly, and it shouldn't take more than a couple of seconds.

The code that Duoas posted would be somewhat faster. I wrote the brute force version, parsing each line as it is read, because I was quite curious about how much the difference in performance would be.
ln -s /Archive2/76/565f70a40bf923/main.cpp create.cpp
echo -e '======== create srce file (g++) ==========' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors create.cpp -o create_srce && time ./create_srce 
ls -l test.txt
echo -e '\n======== create dest file (g++) ==========' && g++ -std=c++14 -O3 -Wall -Wextra -pedantic-errors main.cpp -ocreate_dest && time ./create_dest test.txt test_out.txt 
ls -l test_out.txt
======== create srce file (g++) ==========

real	0m0.113s
user	0m0.100s
sys	0m0.012s
-rw-r--r-- 1 2001 2000 4020976 Aug  9 10:00 test.txt

======== create dest file (g++) ==========

real	0m0.252s
user	0m0.192s
sys	0m0.060s
-rw-r--r-- 1 2001 2000 4067748 Aug  9 10:00 test_out.txt

http://coliru.stacked-crooked.com/a/20830ffd2be4be78
Last edited on
Duaos,

I am not able to run it.
When i run it, nothing happens.

JLBorges,

You are creating a file from within the program, We have a file already and we need to read every row and output as columns.
Actually i dint get how to implement your code with mine.
> When i run it, nothing happens.

Run it with a command line: <program> <input_file> <output_file> for instance:
./create_dest test.txt test_out.txt http://coliru.stacked-crooked.com/a/20830ffd2be4be78


> You are creating a file from within the program,
> We have a file already and we need to read every row and output as columns.

Replace main() with:

1
2
3
4
5
6
7
8
9
10
int main()
{
    const char* const path = "test.txt" ; // **** modify this with the path to the actual input file
    
    // get the transposed data
    const auto transposed =  transpose_read_csv( std::ifstream(path) ) ; 

    // ust the transposed  data
    // ...
}

Last edited on
i paste the whole code here, how it looks now and i got this error.

Error: identifier "transpose_read_csv" is undefined

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <tuple>
#include <vector>

//----------------------------------------------------------------------------
std::string load_file_at_once(const std::string& filename)
{
	std::string s;
	std::ifstream f(filename, std::ios::binary);
	if (!f)
	{
		std::cerr << "Could not open INFILE " << filename << "\n";
		std::exit(1);
	}
	std::ostringstream ss;
	ss << f.rdbuf();
	return std::move(ss.str());
}

//----------------------------------------------------------------------------
typedef std::vector <std::vector <std::size_t> > indices;
typedef std::tuple <std::string, indices, std::size_t> index_info;

//----------------------------------------------------------------------------
index_info parse(const std::string& text)
{
	indices     xys;
	std::size_t max_col = 0;
	std::size_t index = 0;

	auto push_new = [&xys, &index]() -> void
	{
		xys.resize(xys.size() + 1);
		xys.back().push_back(index);
	};

	push_new();
	for (char c : text)
	{
		++index;
		switch (c)
		{
		case ',':  xys.back().push_back(index); break;
		case '\n': max_col = std::max(max_col, xys.back().size());
			push_new();
		case '\r': break;
		}
	}
	if (xys.back().size() < 2) xys.resize(xys.size() - 1);

	return std::make_tuple(text, std::move(xys), max_col);
}

//----------------------------------------------------------------------------
void save_file(const std::string& filename, const index_info& info)
{
	const std::string& text = std::get <0>(info);
	const indices&     xys = std::get <1>(info);
	const std::size_t  max_col = std::get <2>(info);

	auto get = [&text, &xys, &max_col](std::size_t row, std::size_t col) -> std::string
	{
		if (col < xys[row].size())
		{
			std::size_t n = xys[row][col];
			std::size_t x = text.find_first_of(",\n\r", n);
			return text.substr(n, x - n);
		}
		return "";
	};

	std::ofstream f(filename);
	for (std::size_t col = 0; col < max_col; col++)
	{
		f << get(0, col);
		for (std::size_t row = 1; row < xys.size(); row++)
		{
			f << "," << get(row, col);
		}
		f << std::endl;
	}
}

//----------------------------------------------------------------------------
int main()
{
	const char* const path = "test.txt"; // **** modify this with the path to the actual input file

	// get the transposed data
	const auto transposed = transpose_read_csv(std::ifstream(path));  //ERROR

	// ust the transposed  data
	// ...
}

Last edited on
> i paste the whole code here, how it looks now and i got this error.

These are two different programs: the Duoas program and the JLBorges program.

It appears to me (hopefully I'm completely wrong) that you are doing a blind copy and paste job without making an attempt to understand either program.

Try to reason out on your own: why do you get the error 'identifier "transpose_read_csv" is undefined',
even though each of those programs compiled cleanly in isolation.
It would be a good learning experience.
JLBorges,

You are right! Sorry about that.
As you said, i replaces main() with the new one you sent, but nothing happened. I am not able to get any output.

Duoas,

Your program is also not giving any output.
I am sorry to say, at this point, theseus, that you need to put some actual thought into what you are doing.

Both JLBorges and I have provided you complete, working programs that will actually do your task for you.

For mine, if you simply compile it and run it, it will work.

For JLBorges's, a simple change in main(), as he instructed, is all it takes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
int main()
{
    const char* const infile_path = "test.txt" ; // **** modify this with the path to the actual input file
    const char* const outfile_path = "test2.txt";  // **** modify this as well. It is okay if it is the same as the infile path.
    
    // get the transposed data
    const auto transposed =  transpose_read_csv( std::ifstream(infile_path) ) ; 

    // use the transposed data (write it back over the original file)
    std::ofstream f(outfile_path);
    for (const auto& row : transposed)
    {
        const char* sep = "";
        for (const auto& field : row)
        {
            f << sep << field;
            sep = ", ";
        }
        f << std::endl;
    }
}

If neither of these work, it is because you are doing something wrong.
Ok i will re-check it, May be i am doing something wrong.
Repeat of what Duoas posted; deleted.
Last edited on
Pages: 12