Reading strings from file and putting them in an array

Pages: 12
Mmmmhhh yes, bad_alloc errors.
I have run into some of those when I tried creating array that had 1000000000 entries ...
Fun stuff ...

That last bit of code from you "againtry" with the structs is something that I can actually understand ....

The previous things might be more "elegant" or "perfomace optimized", but I don't "copy paste" thing that I don't understand ....
That just makes it impossible to debug if something goes wrong ...


I don't really care about how quickly/slowly this executes, as long as it works ...
(And yes, I found out that putting a cout in for every line that got read, REALLY slows thing down ...)
alleged bad_alloc error, if in fact that's what happened


Yep - it's definitely bad_alloc as that is the caught exception:

1
2
3
4
5
6
7
8
catch (std::bad_alloc& ba)
{
	std::cout << cnt << "  Bad allocation\n";
}
catch (...)
{
	std::cout << cnt << "  error\n";
}


[cnt is the number of read lines]
If in my code, data can be stipulated to remain in scope, then the struct can use string_view rather than string. This reduces the memory footprint and also gives a performance improvement. Consider:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
//ID#Hours#Minutes#Seconds#MinValue#MaxValue#Comments
struct Item {
	std::string_view id;
	int hours {};
	int min {};
	int sec {};
	int minval {};
	int maxval {};
	std::string_view comm;
};

int main()
{
	const auto startr = std::chrono::high_resolution_clock::now();

	constexpr char delim {'#'};
	const std::string fnam {"bigfile.txt"};

	std::ifstream ifs(fnam);

	if (!ifs.is_open())
		return (std::cout << "Cannot open input file\n"), 1;

	std::string data;

	ifs.seekg(0, std::ios_base::end);
	data.resize(static_cast<size_t>(ifs.tellg()));
	ifs.seekg(0, std::ios_base::beg);
	//ifs.read(data.data(), data.size());
	ifs.read(&data[0], data.size());

	std::vector<Item> items;
	items.reserve(std::count(data.begin(), data.end(), '\n') + 1);

	for (auto tokenStart = 0U, delimPosition = data.find('\n'); delimPosition != std::string::npos; delimPosition = data.find('\n', tokenStart)) {
		const auto d {data.find(delim, tokenStart + 1)};

		Item itm;
		char* ed {};

		itm.id = std::string_view(data.data() + tokenStart, d - tokenStart);
		itm.hours = strtol(data.data() + d + 1, &ed, 10);
		itm.min = strtol(ed + 1, &ed, 10);
		itm.sec = strtol(ed + 1, &ed, 10);
		itm.minval = strtol(ed + 1, &ed, 10);
		itm.maxval = strtol(ed + 1, &ed, 10);
		itm.comm = std::string_view(ed + 1, delimPosition - 1 - (ed - data.data()));

		items.emplace_back(itm);
		tokenStart = ++delimPosition;
	}

	const auto diffr = std::chrono::high_resolution_clock::now() - startr;
	std::cout << "read/parse " << std::chrono::duration<double, std::milli>(diffr).count() << " ms" << std::endl;

	//for (const auto & itm : items)
		//std::cout << itm.id << "   " << itm.hours << "  " << itm.min << "  " << itm.sec << "  " << itm.minval << "  " << itm.maxval << "  " << itm.comm << '\n';
}


Which for 10,000,000 lines on my computer takes 3,001ms (nearly a second faster on 3,917ms).
I tried it on my machine and using 1-, 5-, 10-, 20-million lines there is no dispute about the time, simply because they are two different machines.

This problem lends itself to parallel processing and not hardware grunt anyway.

But be that as it may.

For any of those numbers of lines, including processing and displaying ranges there is no exception error or mis-translation at my end. Unless you can account for it more substantially than making wild claims I can only conclude your apparent use of mingw (IIRC) which is notoriously buggy, is the problem.

In any case OP only wants to process 1000 lines, so what's the leg-pissing all about? You do your thing and I'll do mine. OP can decide what suits.

Besides I do these primarily for my own benefit as a personal challenge and with a view in mind to write clear and proven working lines of code that might be of help/interest to the original poster. You are not of any interest to me, very few here are beyond my general focus.

If you have a different agenda, then that's great for you but leave me out because to put it bluntly, I have seen hundreds of lines of your code and most of it might work, might be faster, and that's if you are telling the truth, but most of it is unreadable, unintelligible and not remotely worth my while in considering.
I use VS2019.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <string>
#include <chrono>

struct Data_Point
{
	std::string ID;
	int hours;
	int minutes;
	int seconds;
	int max_val;
	int min_val;
	std::string comment;
};

std::ostream& operator<<(std::ostream& os, const Data_Point& dt)
{
	os
		<< "ID: " << dt.ID << '\n'
		<< "H: " << dt.hours << " M: " << dt.minutes << " S: " << dt.seconds << '\n'
		<< "Min: " << dt.min_val << " Max: " << dt.max_val << '\n'
		<< "Range: " << dt.max_val - dt.min_val << '\n'
		<< "Comment: " << dt.comment << "\n\n";

	return os;
}

std::string getString(std::stringstream& iss, const char delim)
{
	std::string item;
	std::getline(iss, item, delim);
	return item;
}

int getInteger(std::stringstream& iss, const char delim)
{
	return std::stoi(getString(iss, '#'));
}

int main()
{
	// ID#Hours#Minutes#Seconds#MinValue#MaxValue#Comments
	const auto startr = std::chrono::high_resolution_clock::now();

	std::vector<Data_Point> Data;
	std::ifstream file("bigfile.txt");

	if (file.is_open())
	{
		std::string line, data;
		std::stringstream iss;

		Data_Point temp;
		int cnt = 0;

		try {

			while (std::getline(file, line))
			{
				++cnt;
				iss << line;
				temp.ID = getString(iss, '#');
				temp.hours = getInteger(iss, '#');
				temp.minutes = getInteger(iss, '#');
				temp.seconds = getInteger(iss, '#');
				temp.min_val = getInteger(iss, '#');
				temp.max_val = getInteger(iss, '#');
				temp.comment = getString(iss, '#');

				Data.push_back(temp);

				data.clear();
				iss.ios_base::clear();
			}
			file.close();
		}
		catch (std::bad_alloc& ba)
		{
			std::cout << cnt << "  Bad allocation\n";
		}
		catch (...)
		{
			std::cout << cnt << "  error\n";
		}
	} else
		{
			std::cout << "UNABLE TO OPEN FILE\n";
			exit(1);
		}

	const auto diffr = std::chrono::high_resolution_clock::now() - startr;
	std::cout << "read/parse " << std::chrono::duration<double, std::milli>(diffr).count() << " ms" << std::endl;

	// DISPLAY
	/*
	for (auto i : Data)
	{
		std::cout << i << '\n';
	}*/
}


produces on my computer:


7972439  Bad allocation
read/parse 13654.3 ms


As the OP prefers your solution, that's fine. I'm not bothered. :)

Others reading this may be interested.
Last edited on
I'm not bothered. :)

Yes, cheer up.
You obviously are bothered and why not. I am still keen to see where/why the exception occurs on your machine and not mine. When you're ready.
I was compiling as 32bit/release/optimised...... The problem is here, as expected.

 
Data.push_back(temp);


when memory re-allocation is attempted after 7,972,439 lines. As the size of Data_Point is 68 bytes, the vector is using 5,42,125,852 bytes (no extra memory used by the strings as size within SBO) and attempting to allocate new memory greater than that (possibly double depending upon the allocation algorithm) fails.

If the memory for the vector is reserved at the beginning, then the program works OK and reads/extracts the 10,000,000 lines in a time of 14,071ms on my computer.

Compiling as 64 bit, the program works without .reserve() with a time of 14,690ms. With the .resize(), the time is then 13,666ms.

Cheers! :)

PS recompiling my code as 64 bit gives a time of 3,025ms

Last edited on
So what?

At least 20,000,000 lines of sample data, amounting to a text file of 1,628,888,897 bytes, runs without exception at this end.
I tried to compile that last chunk of code (from seeplus) and I got this error:
main.cpp:76:19: error: 'class std::ios_base' has no member named 'clear'
76 | iss.ios_base::clear();
| ^~~~~
That's mainly againtry's code - not mine. I posted it there re the run-time error as it has timing info and try/catch etc. It compiles OK for me using VS2019.

Try:

1
2
//iss.ios_base::clear();
iss.clear();




Last edited on
Topic archived. No new replies allowed.
Pages: 12