How to replace NaN values in text file?

Pages: 12
I'm fairly new to C++ programming and I am working on my first project. The current issue I am facing right now is that after I read in my file, the data in the file contains NaN within the values being read in.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
int main()
{
	cout << "Welcome to the Mobile Phone Carrier Data Analysis Program " << endl;
	while (opt != 1 || opt != 2 || opt != 3)//User Must Enter Option To Advance
	{
		cout << "Please Enter A Option Number To Proceed \n" << endl;
		cout << "1" << " ----- " << " Read In File " << "\n" << endl;
		cin >> opt;
		cout << "\n" << endl;
		switcHfucntion(opt);//Function Allows Multiple Options For User To Choose From
	}
    return 0;
}

int switcHfucntion(int x)//Function Created Outside Main To Allow Easier Understanding When Reading Code
{
	switch (x)
	{
	case 1:
		if (x == 1)
		{
			cout << "Reading In File" << endl;
			ifstream infile("sms-call-internet-mi-2013-11-01_trunc.txt");
			try
			{
				if (infile.is_open())
				{
					cout << "File Has Been Read In" << endl;
					while (!infile.eof())
					{
						getline(infile, inContent);
						cout << inContent << endl;
					}

				}
				else
				{
					throw 1000;
				}
			}
			catch (int e)
			{
				cout << "Something Went Wrong File Not Read In" << endl;
				cout << "Error Code: " << e << endl;
			}
		}
		break;
	}
	return x;
}
Hello nicholasjb1996,

First off it is hard to tell you how or what you can do to read your file without knowing what this file looks like. I thought of using a string stream, but I do not know what would be there or how to process it.

For line 29:

Do not use the condition of "!infile.eof()". This does not work the way that you are thinking. Usually it will process the last read twice before "eof" is detected. What happens is the last line of the file is read and processed then the while condition is checked and is still not "eof", so the while loop is entered and the next read is done, but there is nothing to read so "eof" is set and nothing is read from the file, but the variables still have the information from the last read and that is processed before the while condition is checked to find "eof" and fails.

A more accepted and better way is:
1
2
3
4
while (getline(infile, inContent))
{
	cout << inContent << endl;
}

This way when "infile" fails "getline" fails and the while condition fails moving on to what is after the while loop.

I do not think I would use the try/catch quite the way you did, but it may still be useful in a different way.

I am think of changing the if statement to if (!infile.is_open()) and if true print an error message, pause and exit the program because there is nothing else that can be done until the file is opened. Otherwise move on to the rest of the code.

I will load up the program and see if there is anything else wrong.

Hope that helps,

Andy
Hello nicholasjb1996,

Initial errors I found:

1. No header files declared at the beginning of the program.

2. Best to leave out using namespace std;. and qualify what is in the standard name space.

3. There is no prototype for your function.

4. In main "opt" is an undeclared variable.

5. In the function "inContent" is an undeclared variable.

That was just to get the program to compile.

Andy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int switcHfucntion(int x);

int opt; int s; char c;
string inContent;
int main()
{
	cout << "Welcome to the Mobile Phone Carrier Data Analysis Program " << endl;
	while (opt != 1 || opt != 2 || opt != 3)//User Must Enter Option To Advance
	{
		cout << "Please Enter A Option Number To Proceed \n" << endl;
		cout << "1" << " ----- " << " Read In File " << "\n" << endl;
		cin >> opt;
		cout << "\n" << endl;
		switcHfucntion(opt);//Function Allows Multiple Options For User To Choose From
	}
    return 0;
}

int switcHfucntion(int x)//Function Created Outside Main To Allow Easier Understanding When Reading Code
{
	switch (x)
	{
	case 1:
		if (x == 1)
		{
			cout << "Reading In File" << endl;
			ifstream infile("sms-call-internet-mi-2013-11-01_trunc.txt");
			try
			{
				if (infile.is_open())
				{
					cout << "File Has Been Read In" << endl;
					while (getline(infile, inContent))
					{
						s++;
						cout << inContent << endl;
					}

				}
				else
				{
					throw 1000;
				}
			}
			catch (int e)
			{
				cout << "Something Went Wrong File Not Read In" << endl;
				cout << "Error Code: " << e << endl;
			}
		}
		break;
	}
	return x;
}
Thanks for the reply Handy Andy. I only included the main function because I was unsure on how to use the forum. I have included my whole code and I have replaced the " !infile.eof ".
My current problem I am facing is that I must read in a file with data and perform some statistic on that data but in that file there is "NaN" inside the data and I have to remove it and replace it with a "0" but I am unsure on how to do that. I know I must loop through and count the number of characters until I arrive at the "NaN" and then use the replace function to replace it with a "0" and so far I have been trying to find someway to do that.
This is an example of the data I am reading in.
1 1383266NaN0000 39 0.10803881889584513 0.10803881889584513 7.338716012816092
1 1383267000000 0 0.02730046487718618
1 1383267000000 39 0.05701250935247166 0.14490010379312837 6.779704731239178
1 1383267600000 39 0.13953817347663006 0.05522519924697222 7.19216205511537
1 1383268200000 39 0.05343788914147278 0.02730046487718618 7.50331NaN68316NaN9
1 1383268800000 39 0.02730046487718618 0.05460092975437236 0.02792473436978604 6.169533683920985
1 1383269NaN0000 0 0.02730046487718618
1 1383269NaN0000 39 0.05996286007087068 0.0563882398598718 0.029087774982685617 7.605452185775456
1 1383270000000 39 0.13417624316013174 0.05343788914147278 0.05343788914147278 6.56956533543098
1 1383270000000 49 0.02730046487718618
1 1383270600000 39 0.02730046487718618 0.02730046487718618 0.026137424264286602 6.1398216394456995
1 1383271200000 0 0.02730046487718618
1 1383271200000 39 0.193514833738NaN256 0.2190279885100893 0.026137424264286602 6.027583930846255
1 1383271800000 39 0.2463284533872755 0.2730046487718618 5.6863041NaN968626
1 1383272NaN0000 39 0.10920185950874473 0.10803881889584513 0.0017873101054994376 5.9573984NaN715992
1 1383273000000 39 0.30387973386004685 0.052274848528573205 5.466614342419241
1 1383273600000 39 0.08073835NaN1865896 0.05343788914147278 4.652791329675553
1 1383274200000 39 0.1911032541NaN30327 0.1899NaN213527NaN367 5.23137752NaNNaN662
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
#include <iostream>
#include <string>
#include <regex>
#include <fstream>

// replace all occurrences of "NaN" (case insensitive) with " 0 "
std::string nan_to_zero( const std::string& str )
{
    // http://en.cppreference.com/w/cpp/regex/regex_replace (4)
    static const std::regex nan( "nan", std::regex::icase ) ;
    return std::regex_replace( str, nan, " 0 " ) ;
}

// make a clean copy of the file, replacing "NaN" with " 0 "
bool make_clean_copy( std::string in_file_name, std::string clean_file_name )
{
    if( std::ifstream in_file{in_file_name} )
    {
        if( std::ofstream clean_file{clean_file_name} )
        {
            std::string line ;
            while( std::getline( in_file, line ) ) clean_file << nan_to_zero(line) << '\n' ;
            return true ;
        }
    }

    return false ;
}

int main()
{
    const std::string input_file = "sms-call-internet-mi-2013-11-01_trunc.txt" ;
    const std::string clean_file = input_file + ".cleaned.txt" ;
    make_clean_copy( input_file, clean_file ) ;

    // open clean_file for input
    // etc.
}
Hello nicholasjb1996,

Thank for the info. I will think about a little tonight and dig into more tomorrow.

It should nt be to hard to replace "Nan" with a "space 0 space" to extract the numbers from the new line.

Andy
JLBorges your code ran perfectly, but I would like to understand what it did exactly. Could you explain in further detail?

Handy Andy no problem, I will also wait for your solution so that I can also apply it or use it for future reference.
A regular expression std::regex encapsulates a pattern in a sequence of characters.

In this regular expression std::regex nan( "nan", std::regex::icase ),
the pattern is simple: a sequence of the three characters 'n', 'a' and 'n' , ignoring case.
This regex would match "NaN", "nAn", "NAN" etc.

The result of std::regex_replace( str, nan, " 0 " ) is a new string,
with every occurrence of the pattern "NaN", "nAn", "NAN" etc. in str replaced with " 0 ".

Regex tutorial: https://www.regexone.com/



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// make a clean copy of the file, replacing "NaN" with " 0 "
bool make_clean_copy( std::string in_file_name, std::string clean_file_name )
{
    if( std::ifstream in_file{in_file_name} ) // if in_file was opened successfully for input
    {
        if( std::ofstream clean_file{clean_file_name} ) // and if clean_file was opened successfully for output
        {
            std::string line ;
            while( std::getline( in_file, line ) ) // for every line read from the input file
                clean_file << nan_to_zero(line) << '\n' ; // write the line to the clean file with each "NaN" replaced by " 0 "
            return true ; // we are done creating a clean file
        }
    }

    return false ; // the operation failed: either the input file or the output file could not be opened
}
Hello nicholasjb1996,

I do like JLBorges's use of "regex" to solve the problem, but if you have not learned that yet or can not use "regex" yet this may work for you:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
int switcHfucntion(int x)//Function Created Outside Main To Allow Easier Understanding When Reading Code
{
	std::string inContent;


	//std::ifstream infile("sms-call-internet-mi-2013-11-01_trunc.txt");
	std::ifstream infile("Data.txt");  // <--- My change and use.

	if (!infile.is_open())  // <--- Better used here to see if you can continue.
	{
		std::cout << "\n File \"Data.txt\" did not open" << std::endl;
		std::this_thread::sleep_for(std::chrono::seconds(3));  // Requires header files "chrono" and "thread"
		exit(1);  // <--- Exit program because there is nothing to read. No point to continue.
	}

	switch (x)
	{
	case 1:
		std::cout << "Reading In File" << std::endl;  // <--- Better put here.

		if (x == 1)
		{
			//try  // <--- Not the best place for a try/catch. I do not believe it is set up right or will work.
			//{
			while (std::getline(infile, inContent))
			{
				std::cout << inContent << std::endl;
				NaNtoZero(inContent);
				//ProcessInput(inContent);  // <--- This function not complete yet. Not sure what you want to do.
				//std::cout << std::endl;  // <--- Used for testing.
			}

			std::cout << "File Has Been Read In" << std::endl;  // <--- Changed location. Works better here.

			//else
			//{
			//	throw 1000;
			//}
			//}
			//catch (int e)
			//{
			//std::cout << "Something Went Wrong File Not Read In" << std::endl;
			//std::cout << "Error Code: " << e << std::endl;
			//std::cout << "Error Code: " << e.what() << std::endl;  // <--- I believe this is the way it should be written.
			//}													}
		}
		break;
	}

	infile.close();  // <--- Putting the opening above the switch allows this here.

	return x;
}


I have included the whole function because I changed some things around. Like checking if the file stream is open after the stream name had been defined and file opened.

I commented out the try/catch as I do not believe it is of any use here right now and I do not believe some parts are coded correctly.

As I am writing this I noticed the if statement on line 21 has no real use now and is redundant. When you think about by the time the switch has entered "case 1" you already know that "x == 1". Without the try/catch and the else there is no real point in it now.

Moving opening the file stream to the top allowed me to put the close at line which should be done at the end of the function. Not necessary to include this line 50 in a function, but it is good form and programming.

Below is the functions that I used in the while loop of the read. The first one does the real work for now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void NaNtoZero(std::string& line)
{
	std::string zero{ " 0 " };
	std::size_t pos = line.find_first_of("NaN");

	while (pos!=std::string::npos)
	{
		line.replace(pos, 3, zero);
		pos = line.find_first_of("NaN", pos + 1);
	}
}

void ProcessInput(std::string& line)
{
	std::istringstream ss(line);

	// This space for processing string stream into variable(s).
	//ss >> variable or array of doubles;
}

When I know what you need to read this information into I can better finish the second function. For now it is an idea.

This is not as fancy as what JLBorges has shown you, but should be closer to what you might know at this point.

Hope this helps,

Andy
Thanks Handy Andy, JLBorges code did solve the problem really fast but I had not yet learned regex and I am currently trying to. The try /catch was coded for the reason that if the file was not present in the project directory which it wasn't at the time cause I forgot about it, ended up getting the error as soon as I ran which made me remember to put it in. I was trying to code a function similar to
1
2
3
4
5
6
7
8
9
10
11
void NaNtoZero(std::string& line)
{
	std::string zero{ " 0 " };
	std::size_t pos = line.find_first_of("NaN");

	while (pos!=std::string::npos)
	{
		line.replace(pos, 3, zero);
		pos = line.find_first_of("NaN", pos + 1);
	}
}

but it wouldn't run. I'll try your code and post the result.
I created a file filled with words and the NaN in between them and the code ran and cleaned up all the NaN but placed an extra zero. I have realised that any word with the letters "a, n" gets replace but since the file I want to clean is just numbers it will work without problem.
Last edited on
Hello nicholasjb1996,

That is why I find it better to open and check if the stream is good at the beginning if a function, main or otherwise. If the file is not open there is no point in continuing and trying to use a stream that will not read.

You could still put opening the stream and file inside the switch.

The function "NaNtoZero" you can rename that if you want. It is just a name that I thought was descriptive at the time.

Been having some playing with your program. I created a 2D vector to hold all the numbers read in and translated. Been working with a fixed size 2D vector and I think I try making the second dimension be more variable in its size.

Andy
Is the 2D vector the same as a 2D array?
closed account (E0p9LyTq)
Is the 2D vector the same as a 2D array?

A 2D vector created with known dimensions allows for accessing individual elements similar to accessing the elements of a 2D C-style array, but they are not the same.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include <iostream>
#include <vector>

int main()
{
   std::cout << "Creating a 2-dimensional vector, enter row size: ";
   int row_size;
   std::cin >> row_size;

   std::cout << "Enter column size: ";
   int col_size;
   std::cin >> col_size;
   std::cout << "\n";

   // create a 2 dimensional int vector with known dimensions
   // this initializes each element to zero 
   std::vector<std::vector<int>> aVector(row_size, std::vector<int>(col_size));

   // initialize the vector with some values
   for (int row_loop = 0; row_loop < row_size; row_loop++)
   {
      for (int col_loop = 0; col_loop < col_size; col_loop++)
      {
         aVector[row_loop][col_loop] = ((row_loop + 1) * 100 + col_loop + 1);
      }
   }

   // let's display the filled 2D vector
   for (int row_loop = 0; row_loop < row_size; row_loop++)
   {
      for (int col_loop = 0; col_loop < col_size; col_loop++)
      {
         // let's display the filled 2D vector
         std::cout << aVector[row_loop][col_loop] << ' ';
      }
      std::cout << '\n';
   }
   std::cout << '\n';
}

Creating a 2-dimensional vector, enter row size: 5
Enter column size: 3

101 102 103
201 202 203
301 302 303
401 402 403
501 502 503
Is it possible to use the 2D vector to read in the data and then sort it? The file I am reading in is in tsv format. I have to read in the data which I can do now and then I have to sort it so it makes sense.
Hello nicholasjb1996,

To your earlier question that I missed a vector is similar to an array, but not exactly the same. Now that I have used vectors I find they are much better for storing information over an array. An array must have a defined size at compile and run time and generally are defined larger than you need. Whereas a vector does not need a defined size when defined and will hold exactly what you need with out the extra space of an array. Using vectorName.size() will tell you how many elements are in the vector. Easier than trying to keep a counter for how much of an array is used. You can use the "std::sort()" from the "utility" header file where this will not work on a C style array.

Just so you know I have written defined a vector and resized it for both dimensions being of a fixed size. Then I changed the second dimension to be a variable size. Both versions worked, but the second version made the final displaying of the vector better because it only displayed a variable amount of numbers.

As I found recently, in the last three or four months, sorting a 2D array is almost impossible. I would say a vector would be the same.

Now if you want to sort each row as a 1D vector or array that is easy.

I am guessing that each row is a different customer and you only need to sort what is in each row which is not hard.

If you have something else in mind you will have to explain it better.

Hope that helps,

Andy
closed account (E0p9LyTq)
Is it possible to use the 2D vector to read in the data and then sort it?
Better to deal with a single dimension vector that simulates 2D when dealing with data on the fly such as reading the contents of a file.

The file I am reading in is in tsv format.
The format doesn't matter as long as you, the programmer, know how to work with the format.

If you are dealing with data that is repetitive in chunks, like customer records, you can create a struct or class to deal with the data of individual block of records and need only a single dimension vector to deal with each larger chunk of data.

With your data all safely tucked away inside a vector you create a custom sort function based on what part of your struct/class data members you want to sort on.

Sorting 2D vectors can get complicated.

There are also set/multiset and map/multimap containers you could look at using instead of a vector.
Last edited on
I haven't learned about using vectors yet but if it would be the easier option I would search for a tutorial on how to use it, but what I am currently trying to do is I'm given a file with data from a mobile carrier company that contains GRID ID, Time Interval, Country Code, Call In Activity, Call Out Activity. Each GRID ID has its own Country Code and within a Time Interval a Call In occurs with respect to that GRID ID and Country Code. However there are times where a Call Out would occur and there will be a blank space for the Call In and vice versa. While reading in the file I would like to replace that blank space with a "0". Is is the same as the NaNToZero function?
Pages: 12