Read Violation In Thread

Hi all,

I'm trying to write a program that will take a file and count the number of times each word occurs in it. I believe I am screwing up in passing which bytes each worker should read or just doing something wrong with the thread in general (still new to threads). Any input would be appreciated!

Below is my code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
#define _THREADS 4

#include <fstream>
#include <iostream>
#include <thread>
#include <string>
#include <cassert>
#include <cerrno>
#include <cstring>
#include <stdexcept>
#include <vector>
#include <Windows.h>
#include <sys/stat.h>
#include "Node.h"
#include "Hashtable.h"

using std::vector;
using std::string;
using std::thread;
using std::cout;
using std::ifstream;
using std::cin;

typedef unsigned int uint;

void read(const string& ifname, HashTable<Node*>& table, uint startByte, uint endByte);
off_t getFileSize(const string& ifname);

int main(int argc, char** argv) {
	HashTable<Node*> myTable;
	string ifname;
	off_t   fileSize = 0;
	vector<thread> manager;

	cout << "Enter the filename: ";
	cin >> ifname;

	fileSize = getFileSize(ifname);

	for(uint i = 0; i < fileSize; i + (fileSize % _THREADS)) {
		manager.push_back( thread(read, ifname, myTable, i, i + (fileSize % _THREADS)) );
	} // END for(i)

	for(uint i = 0; i < _THREADS; i++) {
		manager[i].join();
	} // END for(i)

system("Pause");
return 0;
} // END main(argc, argv)

void read(const string& ifname, HashTable<Node*>& table, uint startByte, uint endByte) {
	ifstream fin(ifname);
	assert(fin);

	fin.ignore(startByte / sizeof(char), EOF);

	uint byteCount = 0;
	string temp;
	Node* newNode = nullptr;
	while (!fin.eof() && (fin >> temp) && (byteCount <= endByte) ) {
		Node* cur = table.retrieve(temp);
		if (cur) {
			cur->upCount();
		} else {
			newNode = new Node(temp);
			table.insert(temp, newNode);
			newNode = nullptr;
		} // END if/else

		byteCount += sizeof(char) * temp.length();
	} // END while( (fin >> temp) && !fin.eof() )

	fin.close();

return;
} // END read(ifname)

off_t getFileSize(const string& ifname) {
	struct stat st;
	if(stat(ifname.c_str(), &st) == -1) {
		throw std::runtime_error(std::strerror(errno));
	} // END if

return st.st_size;
} // END getFileSize(ifname) 


I am getting this runtime error in Visual Studio 2015 debug:
Exception thrown: read access violation.

this was 0x6749005C

If there is a handler for this exception, the program may be safely continued.
I'm running this program with books as input; the first being Nietzche's Beyond Good and Evil which takes up 600mBs. Tried running it without the threads and gave up after 2 hours of runtime.
It's not an assignment :P I'm looking into inconsistencies in language theory
> Nietzche's Beyond Good and Evil which takes up 600mBs

600 MB is not a huge amount of memory.
Perhaps, have one thread read the file and as each segment is read (say, each segment being a certain number of - for instance, 100,000 - lines), launch an asynchronous operation to get the word counts for that particular segment.
At the end, consolidate the the counts returned by these asynchronous operations.

Something along these lines:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <iostream>
#include <fstream>
#include <string>
#include <cctype>
#include <unordered_map>
#include <vector>
#include <regex>
#include <future>
#include <iomanip>

using map_type = std::unordered_map< std::string, std::size_t > ;

std::string to_lower( std::string str )
{
    for( char& c : str ) c = std::tolower(c) ;
    return str ;
}

map_type word_count( std::vector<std::string> lines )
{
    static const std::regex word_re( "\\w+" ) ;
    static const std::sregex_token_iterator end ;

    map_type wc ;

    for( std::string& a_line : lines )
    {
        std::sregex_token_iterator iter( a_line.begin(), a_line.end(), word_re ) ;
        for( ; iter != end ; ++iter ) ++wc[ to_lower(*iter) ] ;
    }

    return wc ;
}

int main()
{
    const std::string path = "/usr/include/gmpxx.h" ; // adjust as required
    const std::size_t segment_size = 1000 ; // adjust as required

    std::ifstream file(path) ;
    std::string line ;
    std::vector<std::string> segment ;
    std::vector< std::future<map_type> > futures ;

    while( std::getline( file, line ) )
    {
        segment.push_back( std::move(line) ) ;
        if( segment.size() == segment_size )
            futures.push_back( std::async( std::launch::async, word_count, std::move(segment) ) ) ;
    }

    std::cout << "there were " << futures.size() << " async executions in all.\n\n" ;
    auto counts = word_count( std::move(segment) ) ;
    for( auto& f : futures )
        for( auto& pair : f.get() ) counts[ std::move(pair.first) ] += pair.second ;

    std::cout << "list of words occurring more than a hundred times:\n" ;
    for( const auto& pair : counts ) if( pair.second > 100 )
        std::cout << std::quoted(pair.first) << " - " << pair.second << '\n' ;
}

http://coliru.stacked-crooked.com/a/c4bd4eacc9a98973
http://rextester.com/BFZW93139
Topic archived. No new replies allowed.