Reading strings from file and putting th

Try this. Also, it's not good practice to have global variables like this. It's better to have LoadDataFromFile() to return the vector. With current C++ this doesn't do a copy-by-value.

#include <vector>
#include <string>
#include <fstream>
#include <iostream>
using namespace std;

//The function to load text from a file into the array
auto LoadDataFromFile(void)
{
	vector<string> TextFromFile;

	ifstream UnitDataFile("UnitDataFile.txt");

	if (UnitDataFile.is_open())
	{
		//Loop runs for every line in the file
		for (string LineFromFile; getline(UnitDataFile, LineFromFile); TextFromFile.push_back(LineFromFile));

		//After reading all data, close the file
		UnitDataFile.close();
	}

	//If the file can't be opened, print an error message:
	else
		cout << "ERROR: Can't open file 'UnitDataFile.txt'.\n";

	return TextFromFile;
}

int main()
{
	const auto TextFromFile {LoadDataFromFile()};

	for (const auto& v : TextFromFile)
		std::cout << v << '\n';
}

Last edited on

If you know the number of lines, or have an estimate, then you can do this:

auto LoadDataFromFile(void)
{
	constexpr size_t numberOfLines {2000};
	vector<string> TextFromFile;

	ifstream UnitDataFile("UnitDataFile.txt");

	if (UnitDataFile.is_open())
	{
		TextFromFile.reserve(numberOfLines);

		//Loop runs for every line in the file
		for (string LineFromFile; getline(UnitDataFile, LineFromFile); TextFromFile.push_back(LineFromFile));

		//After reading all data, close the file
		UnitDataFile.close();
	}

	//If the file can't be opened, print an error message:
	else
		cout << "ERROR: Can't open file 'UnitDataFile.txt'.\n";

	return TextFromFile;
}

This will save the reallocations(s).

PS What is the size of the file?

Note that when a vector re-allocates, it obtains new memory and copies the old values to the new memory and then frees the old memory. So there must be available memory to have 2 copies of the largest vector that resizes. Hence the reserve() which if large enough will stop the re-allocations.

Last edited on

Handy Andy (5051)

Hello leander g,

Looking at your code I would consider this:

vector <string> TextFromFile;  // <--- Avoid global variables. Should be in "main" and passed to the function.

void LoadDataFromFile(void)  // <--- 2nd "void" not necessary. Very old method.
{
    string LineFromFile;      //Variable to store the "text" of the currently read line
    int CurrentlyReadingLine = 0;  //The line number that is currently beeing read

    ifstream UnitDataFile("UnitDataFile.txt");
    if (UnitDataFile.is_open())
    {
        //Loop runs for every line in the file
        while (getline(UnitDataFile, LineFromFile))
        {
            //Resize the array to 1 entry larger then before
            //TextFromFile.resize(CurrentlyReadingLine + 1);  // <--- Not needed. Over thinking this.

            //Write a line from the file into the array
            //TextFromFile[CurrentlyReadingLine] = LineFromFile;  // <--- I see your point, but over thinking what you are doing.
            TextFromFile.emplace_back(LineFromFile);  // <--- Does the same as the previous two lines.

            //Increment the CurrentlyReadingLine counter
            CurrentlyReadingLine++;
        }
        //After reading all data, close the file
        UnitDataFile.close();  // <--- File will close when the function looses scope.
    }

    //If the file can't be opened, print an error message:
    else
    {
        cout << "ERROR: Can't open file 'UnitDataFile.txt'.\n";
    }
}

Line 19 just adds to the vector without having to "resize" it.

In VS2017 when I did a test:

int main()
{
    std::vector<std::string> vec;
    std::string str;

    vec.emplace_back("This is a tesst");

Line 3 creates a vector with no size.

Line 4 creates a string with a size, but a capacity of 15. If I understand this correctly 15 bytes of memory are created to hold a string.

In line 5 the quoted text, excuse my poor typing, but it makes a good point, has 16 characters. and tells me the size of the string is 16 with a capacity of 31 now.

When you think about it if a string has say 100 characters its capacity might be 200, as an example. so when you multiply 200 * 1000 you will have ~200,000 bytes of storage space for the strings.

When

seeplus wrote:
Note that when a vector re-allocates, it obtains new memory and copies the old values to the new memory and then frees the old memory. So there must be available memory to have 2 copies of the largest vector that resizes.

So you could have over 400,000 bytes of storage space needed. And the question would be does your computer have enough memory free to do this?

By just adding to the end of the vector you eliminate the need to make a copy to "resize" the vector.

I am not sure if I am right, but that is my thought for now.

Andy

Thanks for all of the quick replies!

A bit more info about the "txt file" that I am trying to load:
Every line looks like this:

#ID#Hours#Minutes#Seconds#MinValue#MaxValue#Comments#

ID is a string
Hourse, Minutes Seconds, MinValue and MaxValue are integers
Comments is a string

All values are separated by a #, so my programm can "dissect" each line into the respective values and store them each in their own array.
(Kind off like a "comma separated values" - file, but with hashtags ...
So the lines are not "that long" (maybe 200 characters maximum, but more like 50 or 100)
And there will be 1000 lines at least ...

Yeah, my computer should have enough RAM for a few thousand strings ... (8GB).

Aboout the "don't use so much global variables":

I am new to this whole "programming" thing.
I have lots of experience with hardware design and I think of global variables like a "bus" that connects different "building blocks" (functions) together and allows them to exchange information in an easy, uncomplicated way ...
(Each "function" has access to all of the variables it could possibly need. No need to rewrite every "function call" if I decide to change the parameters of a function. And I can have as many return-values as I want!)

Last edited on

Handy Andy (5051)

Hello leander g,

(Each "function" has access to all of the variables it could possibly need. No need to rewrite every "function call" if I decide to change the parameters of a function.

This may be true, but just as all those functions and lines of code can use the variable so can it be changed. After you have spent a day trying to track down where a global variable was changed when it should not have been changed then tell me how great the global variable is. You have more control defining the variable in "main" and passing it to the function. This makes it easier to find where something went wrong.

And I can have as many return-values as I want!

No. A function can only return 1 thing. that thing could be a struct, vector, or an array, but most of the time it is just a single variable value.

I realize that your input file is large, so if you do not have a link to the file that you can share then post the first 5 or 10 lines so everyone has something to work with and can at least test with the same information. It would not be hard to duplicate those lines into 1000 lines or more.

Andy

Last edited on

I get what you are saying with the "pass the variables / values to the function as parameters". And I am trying to get into the habit of treating "functions" like actual mathematical functions, where the "output" only depends on the parameters, and nothing else.

I know that a function can't literally "return" multiple things, but the function can change the global variables directly...
I know that that can get messy with lots of functions "writing to" lots of different globals ...

Just for testing I had copy/pasted the same line over and over again a few thousand times in the UnitDataFile.txt

#DefaultID#00#00#00#53#84#-#

Ganado (6785)

getline can take in a third parameter for the delimiter character. For example, it would be '#'.

I know, but I thought it might be crahsing because I was trying to read the file and do some other stuff, so I thought I'd split it up.
One function that just reads the file and puts it into an array.
Another function that does the "decoding" of the previously read stuff ...

Another take on this is to read the while file into a string and then parse. A simple method is as below, but it's not the fastest. I'll do some re-coding later when I have time to speed it up. Note needs C++17.

#include <iostream>
#include <string>
#include <fstream>
#include <filesystem>
#include <algorithm>
#include <sstream>

//#ID#Hours#Minutes#Seconds#MinValue#MaxValue#Comments#
struct Item {
	std::string id;
	int hours {};
	int min {};
	int sec {};
	int minval {};
	int maxval {};
	std::string comm;
};

int main()
{
	const std::string fnam {"bigfile.txt"};

	std::ifstream ifs(fnam);

	if (!ifs.is_open())
		return (std::cout << "Cannot open input file\n"), 1;

	std::string data;

	data.resize(std::filesystem::file_size(fnam));
	ifs.read(data.data(), data.size());

	const auto nolines {std::count(data.begin(), data.end(), '\n') + 1};

	std::vector<Item> items;
	items.reserve(nolines);

	for (auto tokenStart = 0U, delimPosition = data.find('\n'); delimPosition != std::string::npos; delimPosition = data.find('\n', tokenStart)) {
		Item itm;
		char del;

		const auto d{data.find('#', tokenStart + 1)};
		std::istringstream iss(data.substr(d + 1, delimPosition - tokenStart - d - 2));

		itm.id = data.substr(tokenStart + 1, d - tokenStart - 1);
		iss >> itm.hours >> del >> itm.min >> del >> itm.sec >> del >> itm.minval >> del >> itm.maxval >> del >> itm.comm;
		items.push_back(itm);
		tokenStart = ++delimPosition;
	}

	for (const auto& itm : items)
		std::cout << itm.id << "   " << itm.hours << "  " << itm.min << "  " << itm.sec << "  " << itm.minval << "  " << itm.maxval << "  " << itm.comm << '\n';
}

Last edited on

Handy Andy (5051)

Hello leander g,

I have a question.

Just for testing I had copy/pasted the same line over and over again a few thousand times in the UnitDataFile.txt

#DefaultID#00#00#00#53#84#-#

In your example is the first and last "#" required?

If not this would make it easier to read the line: DefaultID#00#00#00#53#84#-

Like the (,) in a CSV file you are using the (#) to separate the fields. The first and last (#) makes it more difficult to process the line, but not impossible.

For what it is worth I did this to create a file to use:

#include <iostream>
#include <iomanip>
#include <string>
#include <limits>

#include <fstream>

int main()
{
    const std::string outFileName{ "Data.txt" };

	std::ofstream outFile(outFileName);

	if (!outFile)
	{
		std::cout << "\n File " << std::quoted(outFileName) << " did not open" << std::endl;

        return 1;
	}

    constexpr int MAXSIZE{ 10 };

    char ht{ '#' };
    std::string id{ "Default Id" };
    int hours{ 12 };
    int min{ 00 };
    int sec{ 00 };
    int minval{ 53 };
    int maxval{ 84 };
    std::string comm{ "This is a long comment here and could be lonnger." };

    for (int lc = 0; lc < MAXSIZE; lc++)
    {
        outFile << id << lc + 1 << ht << hours << ht << min << ht << sec << ht << minval << ht << maxval << ht << comm << '\n';
    }


	// <--- Keeps console window open when running in debug mode on Visual Studio. Or a good way to pause the program.
	// The next line may not be needed. If you have to press enter to see the prompt it is not needed.
	//std::cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n');  // <--- Requires header file <limits>.
	std::cout << "\n\n Press Enter to continue: ";
	std::cin.get();

	return 0;  // <--- Not required, but makes a good break point.
}

Feel free to adjust to what you need.

It produces the file:

Default Id1#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id2#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id3#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id4#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id5#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id6#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id7#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id8#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id9#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id10#12#0#0#53#84#This is a long comment here and could be lonnger.

Notice there is no opening (#) and no closing(#).

Now that I have something to work with I will work on testing your original function . Although some of the replies may be worth using if you can.

Andy

No, the first and lasst # are not necessary.
Originally I had some extra data before / after that, but I removed that.
(Before the first #, I had a integer, but I can just use the "line number" for that ...)

Yeah, it is like a CSV, but with #s, so a HSV (Hashtag separated values, or .hsv, I just renamed a txt fiel to .hsv, that shouldn't matter right?)

I thought I'd use a character that I am sure I woldn't use in either of the strings, so the ; to use as a "separating character" was not really an option ...

Also, great idea to use a programm to create "test files".
I just copy pasted lines "by hand".
But this way you can have different values (with a random function) just to test ...

Yes, I will try some of the suggested things here as soon as I can (not feeling too well right now and I don't need a headache too )
;-)

Last edited on

I've re-vamped my previous program. It's now much faster and doesn't need C++17. It reads and parses a 1,000,000 line file in just over a second on my computer using the original format.

#include <iostream>
#include <string>
#include <fstream>
#include <algorithm>
#include <vector>

//#ID#Hours#Minutes#Seconds#MinValue#MaxValue#Comments#
struct Item {
	std::string id;
	int hours {};
	int min {};
	int sec {};
	int minval {};
	int maxval {};
	std::string comm;
};

int main()
{
	const std::string fnam {"bigfile.txt"};

	std::ifstream ifs(fnam);

	if (!ifs.is_open())
		return (std::cout << "Cannot open input file\n"), 1;

	std::string data;

	ifs.seekg(0, std::ios_base::end);
	data.resize(static_cast<size_t>(ifs.tellg()));
	ifs.seekg(0, std::ios_base::beg);
	ifs.read(data.data(), data.size());

	std::vector<Item> items;
	items.reserve(std::count(data.begin(), data.end(), '\n') + 1);

	for (auto tokenStart = 0U, delimPosition = data.find('\n'); delimPosition != std::string::npos; delimPosition = data.find('\n', tokenStart)) {
		const auto d {data.find('#', tokenStart + 1)};

		Item itm;
		char* ed {};

		itm.id = data.substr(tokenStart + 1, d - tokenStart - 1);
		itm.hours = strtol(data.data() + d + 1, &ed, 10);
		itm.min = strtol(ed + 1, &ed, 10);
		itm.sec = strtol(ed + 1, &ed, 10);
		itm.minval = strtol(ed + 1, &ed, 10);
		itm.maxval = strtol(ed + 1, &ed, 10);
		itm.comm = data.substr(ed - data.data() + 1, delimPosition - 2 - (ed - data.data()));

		items.emplace_back(itm);
		tokenStart = ++delimPosition;
	}

	for (const auto & itm : items)
		std::cout << itm.id << "   " << itm.hours << "  " << itm.min << "  " << itm.sec << "  " << itm.minval << "  " << itm.maxval << "  " << itm.comm << '\n';
}

PS If you change the format to remove the beginning and ending #, then this program will need changing.

There's only 2 lines to change. The originals are commented out. The delimiter is now defined as a const char at the beginning so can be easily changed.

#include <iostream>
#include <string>
#include <fstream>
#include <algorithm>
#include <vector>

//ID#Hours#Minutes#Seconds#MinValue#MaxValue#Comments
struct Item {
	std::string id;
	int hours {};
	int min {};
	int sec {};
	int minval {};
	int maxval {};
	std::string comm;
};

int main()
{
	constexpr char delim {'#'};
	const std::string fnam {"bigfile.txt"};

	std::ifstream ifs(fnam);

	if (!ifs.is_open())
		return (std::cout << "Cannot open input file\n"), 1;

	std::string data;

	ifs.seekg(0, std::ios_base::end);
	data.resize(static_cast<size_t>(ifs.tellg()));
	ifs.seekg(0, std::ios_base::beg);
	ifs.read(data.data(), data.size());

	std::vector<Item> items;
	items.reserve(std::count(data.begin(), data.end(), '\n') + 1);

	for (auto tokenStart = 0U, delimPosition = data.find('\n'); delimPosition != std::string::npos; delimPosition = data.find('\n', tokenStart)) {
		const auto d {data.find(delim, tokenStart + 1)};

		Item itm;
		char* ed {};

		//itm.id = data.substr(tokenStart + 1, d - tokenStart - 1);
		itm.id = data.substr(tokenStart, d - tokenStart);
		itm.hours = strtol(data.data() + d + 1, &ed, 10);
		itm.min = strtol(ed + 1, &ed, 10);
		itm.sec = strtol(ed + 1, &ed, 10);
		itm.minval = strtol(ed + 1, &ed, 10);
		itm.maxval = strtol(ed + 1, &ed, 10);
		//itm.comm = data.substr(ed - data.data() + 1, delimPosition - 2 - (ed - data.data()));
		itm.comm = data.substr(ed - data.data() + 1, delimPosition - 1 - (ed - data.data()));

		items.emplace_back(itm);
		tokenStart = ++delimPosition;
	}

	for (const auto & itm : items)
		std::cout << itm.id << "   " << itm.hours << "  " << itm.min << "  " << itm.sec << "  " << itm.minval << "  " << itm.maxval << "  " << itm.comm << '\n';
}

Last edited on

Ok, I tried copy & pasting your code and I get an error when trying to compile it:

32:20: error: invalid conversion from 'const char*' to 'std::basic_istream<char>::char_type*' {aka 'char*'} [-fpermissive]
32 | ifs.read(data.data(), data.size());
| ~~~~~~~~~^~
| |
| const char*
In file included from c:\mingw\lib\gcc\mingw32\9.2.0\include\c++\iostream:40,
from F:/..../main.cpp:1:
c:\mingw\lib\gcc\mingw32\9.2.0\include\c++\istream:486:23: note: initializing argument 1 of 'std::basic_istream<_CharT, _Traits>& std::basic_istream<_CharT, _Traits>::read(std::basic_istream<_CharT, _Traits>::char_type*, std::streamsize) [with _CharT = char; _Traits = std::char_traits<char>; std::basic_istream<_CharT, _Traits>::char_type = char; std::streamsize = int]'
486 | read(char_type* __s, streamsize __n);
| ~~~~~~~~~~~^~~

By the way, I have absolutely no idea what or how that code that you posted works ...
But using a struct instead of arrays is an interesting idea ...
Can you dynamically create struct objects at runtime?
(Have a for-loop that ready every line from the file and writes the stuff into the object-struct-thing or whatever that's called ...)
(I think I really need to print out those "basics" and hang them on my wall, so I don't forget what those things are called ...)

Last edited on

#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <string>

int main()
{
    // ID#Hours#Minutes#Seconds#MinValue#MaxValue#Comments
    
    std::vector<std::string> ID;
    
    std::vector<std::vector<int>> Data; // Hours - MaxValue
    Data.resize(5);
    
    std::vector<std::string> Comments;
    
    std::ifstream file ("data_source.txt");
    
    if (file.is_open())
    {
        std::string line, data;
        std::stringstream iss;
        int count{0};
        
        while(std::getline(file, line))
        {
            iss << line;
            
            count = 0;
            while(std::getline(iss, data, '#'))
            {
                if(count == 0)
                {
                    ID.push_back(data);
                }
                
                if(count > 0 and count < 6)
                {
                    Data[count - 1].push_back( std::stoi(data) );
                }
                count++;
            }
            
            Comments.push_back(data);
            
            data.clear();
            iss.ios_base::clear();
        }
        file.close();
    }
    else
    {
        std::cout << "UNABLE TO OPEN FILE\n";
        exit(1);
    }
    
    
    // DISPLAY SOME VECTOR CONTENTS
    for(auto i: ID)
    {
        std::cout << "ID: " << i << '\n';
    }
    
    for(auto i: Data[0])
    {
        std::cout << "Hours: " << i << '\n';
    }
    
    for(auto i: Data[4])
    {
        std::cout << "Max Value: " << i << '\n';
    }
    
    for(auto i: Comments)
    {
        std::cout << "Comments: " << i << '\n';
    }

    return 0;
}

data_source.txt


Default Id1#12#0#0#53#84#This is a long comment here and could be lonnger.
Default Id2#13#1#0#54#85#This is a long comment here and could be lonnger.
Default Id3#14#2#0#55#86#This is a long comment here and could be lonnger.
Default Id4#15#3#0#56#87#This is a not so short comment here and could be lonnger.
Default Id5#16#0#0#57#84#This is a long comment here and could be lonnger.
Default Id6#17#0#0#53#84#This is a long comment here and could be lonnger.
Default Id7#18#9#8#53#84#This is a long comment here and could be lonnger.
Default Id8#19#7#6#53#84#This is a long comment here and could be lonnger.
Default Id9#20#0#0#53#84#This is a long comment here and could be lonnger (sic).

output


ID: Default Id1
ID: Default Id2
ID: Default Id3
ID: Default Id4
ID: Default Id5
ID: Default Id6
ID: Default Id7
ID: Default Id8
ID: Default Id9
Hours: 12
Hours: 13
Hours: 14
Hours: 15
Hours: 16
Hours: 17
Hours: 18
Hours: 19
Hours: 20
Max Value: 84
Max Value: 85
Max Value: 86
Max Value: 87
Max Value: 84
Max Value: 84
Max Value: 84
Max Value: 84
Max Value: 84
Comments: This is a long comment here and could be lonnger.
Comments: This is a long comment here and could be lonnger.
Comments: This is a long comment here and could be lonnger.
Comments: This is a not so short comment here and could be lonnger.
Comments: This is a long comment here and could be lonnger.
Comments: This is a long comment here and could be lonnger.
Comments: This is a long comment here and could be lonnger.
Comments: This is a long comment here and could be lonnger.
Comments: This is a long comment here and could be lonnger (sic).
Program ended with exit code: 0

@OP
BTW, Once you've done the above simple extraction structs or classes to encapsulate the data naturally follow and lead to a single <vector> solution. Use same const for resize and number of columns, of course.

Fairly easy to adapt for further mixed strings ints doubles etc.

And, just for luck here it is a with struct.

#include <iostream>
#include <vector>
#include <fstream>
#include <sstream>
#include <string>

struct Data_Point
{
    std::string ID;
    int hours;
    int minutes;
    int seconds;
    int max_val;
    int min_val;
    std::string comment;
};

std::ostream& operator<<(std::ostream& os, const Data_Point& dt)
{
    os
    << "ID: " << dt.ID << '\n'
    << "H: " << dt.hours << " M: " << dt.minutes << " S: " << dt.seconds << '\n'
    << "Min: " << dt.min_val << " Max: " << dt.max_val << '\n'
    << "Range: " << dt.max_val - dt.min_val << '\n'
    << "Comment: " << dt.comment << "\n\n";
    
    return os;
}

std::string getString(std::stringstream& iss, const char delim)
{
    std::string item;
    std::getline(iss, item, delim);
    return item;
}

int getInteger(std::stringstream& iss, const char delim)
{
    return std::stoi( getString(iss, '#') );
}

int main()
{
    // ID#Hours#Minutes#Seconds#MinValue#MaxValue#Comments
    
    std::vector<Data_Point> Data;
    std::ifstream file ("data_source.txt");
    
    if (file.is_open())
    {
        std::string line, data;
        std::stringstream iss;
        
        Data_Point temp;
        
        while(std::getline(file, line))
        {
            iss << line;
            temp.ID = getString(iss, '#');
            temp.hours = getInteger(iss, '#');
            temp.minutes = getInteger(iss, '#');
            temp.seconds = getInteger(iss, '#');
            temp.min_val = getInteger(iss, '#');
            temp.max_val = getInteger(iss, '#');
            temp.comment = getString(iss, '#');
            
            Data.push_back(temp);
            
            data.clear();
            iss.ios_base::clear();
        }
        file.close();
    }
    else
    {
        std::cout << "UNABLE TO OPEN FILE\n";
        exit(1);
    }
    
    // DISPLAY
    for(auto i: Data)
    {
        std::cout << i << '\n';
    }
    
    return 0;
}

Last edited on

32:20: error: invalid conversion from 'const char*' to 'std::basic_istream<char>::char_type*' {aka 'char*'} [-fpermissive]
32 | ifs.read(data.data(), data.size());

Sorry I forgot about the change in the spec of .data() with C++17 (I use Ms VS2019).

1
2

	//ifs.read(data.data(), data.size());
	ifs.read(&data[0], data.size());

Last edited on

For info, for a file of 1,000,000 lines @againtry's code takes 1,865ms, mine 386ms (on my computer).

For a file of 10,000,000 lines, mine takes 3,917ms and @againtry's code causes a run-time error (bad_alloc) after reading 7,972,439 lines in 13,650ms.

Update. The run-time error was due to compiling as 32bit. It works when compiled as 64bit. See later posts.

Last edited on