Removing NUL values in text file

Sep 2, 2013 at 10:19am
Hello,

i have a measuring board that writes a .txt file in the following format:

board number;type;NUL;channel;measured value;date and time
DS207;5000007;NUL;0;20251;11.07.2013 12:30:02 MESZ
DS207;5000007;NUL;1;10159;11.07.2013 12:30:02 MESZ
DS207;5000007;NUL;4;27.18;11.07.2013 12:30:02 MESZ
DS207;5000007;NUL;0;20233;11.07.2013 12:35:02 MESZ
DS207;5000007;NUL;1;10149;11.07.2013 12:35:02 MESZ
DS207;5000007;NUL;4;27.31;11.07.2013 12:35:02 MESZ
DS207;5000007;NUL;0;20256;11.07.2013 12:40:02 MESZ
...

I would like to extract and analyse the data but the data behind the NUL entry is not accessible for me maybe due to the fact that NUL normally marks the end of a line. Is there a method to remove the NUL entries in this text file?

Thanks a lot!

Martin
Sep 2, 2013 at 10:22am
I presume you wrote the code that generated the txt file? As such I would suggest you check for the condition that generated the NUL entries and don't allow it to write anything to txt file.
Sep 2, 2013 at 10:50am
If you cannot change the code, it does look like the record size is fixed. So you might be able to open the file as binary and then read fixed size blocks in using ifstream::read()

Andy


Sep 2, 2013 at 11:33am
Thank you very much for your replies so far.

I presume you wrote the code that generated the txt file? As such I would suggest you check for the condition that generated the NUL entries and don't allow it to write anything to txt file.


Unfortunalely I do not have access to the code that generated the txt file and I already have hundreds of files formatted this way.
I will try to open the file as binary as Andy suggested.
If you have another ideas, please post them.

Martin
Sep 2, 2013 at 12:25pm

Is there a method to remove the NUL entries in this text file?


You could write a small utility program that just opens each file, reads each line and parses the fields delimited by the semi-colon and effectively filter out the NUL; fields. Then just output to a new file.
Sep 2, 2013 at 12:48pm
You could write a small utility program that just opens each file, reads each line and parses the fields delimited by the semi-colon and effectively filter out the NUL; fields. Then just output to a new file.


That's what I tried but if I use the following command

file.getline(row, 1024);

the variable "row" only contains the string "DS2007;50000007;" and nothing else.
Sep 2, 2013 at 1:56pm

Is this exactly what you have on an example line in your txt files:

DS207;5000007;NUL;0;20251;11.07.2013 12:30:02 MESZ

or do you have:

DS207;5000007;\0;0;20251;11.07.2013 12:30:02 MESZ

i.e. a NULL terminator char ?
Sep 2, 2013 at 6:47pm
When I open the file with Notepad++ it looks like this:

DS207;5000007;NUL;0;20251;11.07.2013 12:30:02 MESZ CR LF

Opening the file in binary mode does not yield better results.

I am not very experienced in working with text files so I don't really know how to proceed.
Unfortunately I have 80 of the measuring boards I mentioned above. The software has a small bug in writing this column containing "Nul" values.



Sep 3, 2013 at 4:59am
sed "s/\0//g" < input > output
Sep 3, 2013 at 8:28am
IF you open one of the files in a Binary editor what do you see for the NUL; field? Or you could pm me to arrange to send me an example file.
Sep 3, 2013 at 9:39am
closed account (z05DSL3A)
What do you get if you try something like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <fstream>
#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>
#include <regex>

int main () 
{ 
    std::ifstream infile("c:\\test\\test.txt");
    std::string line;
    if (infile.is_open())
    {
        while (std::getline(infile, line))
        {
            std::regex reg(";");
            std::string fmt(" ");
            line = std::regex_replace(line, reg, fmt);
            std::stringstream strstr(line);

            std::istream_iterator<std::string> it(strstr);
            std::istream_iterator<std::string> end;
            std::vector<std::string> results(it, end);

            std::ostream_iterator<std::string> oit(std::cout);
            std::copy(results.begin(), results.end(), oit);

            std::cout << std::endl;
        }
    }
    return 0; 
}


Edit:

or go real simple:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int main () 
{ 
    std::ifstream infile("c:\\test\\test.txt");
    char row[1024];
    if (infile.is_open())
    {
        while(infile.getline(row, 1024))
        {
            char * ptr = std::find(row, row + 1024,'\0');
            *ptr = ' ';
            std::cout << row << std::endl;
        }
    }
    return 0; 
}
NB: very crude code to show the idea.
Last edited on Sep 3, 2013 at 11:01am
Sep 3, 2013 at 11:39am
You could use std::replace ??

This code works with char buffers and uses istream::gcount to find out how many chars were read by the last operation (inc. term null, hence the -1 adjustment.)

If you use std::replace with std::string you don't have to worry about gcount, you just use begin() and end() as usual.

Andy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include <iostream>
#include <iomanip>   // for boolalpha
#include <fstream>
#include <string>
#include <algorithm> // for replace

void test_with_char_buff(bool use_replace)
{
    std::cout << "test_with_char_buff (use_replace = "
              << use_replace << ")\n\n";

    std::ifstream infile("c:\\test\\test.txt");
    if (!infile.is_open())
    {
        std::cerr << "error : open file failed\n";
        return;
    }

    const size_t buffer_size = 1024;
    char buffer[buffer_size] = {0};
    while (infile.getline(buffer, buffer_size))
    {
        if (use_replace)
        {
            size_t count = static_cast<size_t>(infile.gcount());
            if (0 < count)
            {
                std::replace(buffer, (buffer + count - 1), '\0', '*');
                // -1 as don't want to replace final null
            }
        }

        std::cout << buffer << "\n";
    }

    std::cout << "\n";
}

int main () 
{
    std::cout << std::boolalpha;

    test_with_char_buff(false);
    test_with_char_buff(true);

    return 0; 
}


test_with_char_buff (use_replace = false)

DS207;5000007;
DS207;5000007;
DS207;5000007;
DS207;5000007;
DS207;5000007;
DS207;5000007;
DS207;5000007;

test_with_char_buff (use_replace = true)

DS207;5000007;*;0;20251;11.07.2013 12:30:02 MESZ
DS207;5000007;*;1;10159;11.07.2013 12:30:02 MESZ
DS207;5000007;*;4;27.18;11.07.2013 12:30:02 MESZ
DS207;5000007;*;0;20233;11.07.2013 12:35:02 MESZ
DS207;5000007;*;1;10149;11.07.2013 12:35:02 MESZ
DS207;5000007;*;4;27.31;11.07.2013 12:35:02 MESZ
DS207;5000007;*;0;20256;11.07.2013 12:40:02 MESZ
Last edited on Sep 3, 2013 at 3:00pm
Sep 3, 2013 at 2:58pm
Hello 'Grey Wolf' and Andy,

your solutions both work very well!
Thanks a lot all of you for your help.
This saves a lot of time for me as I will not have to manually format several hundred text files.

Thank you very much again.

Martin
Topic archived. No new replies allowed.