Help using STL Properly

Hello everyone!

I am in need of some help with a homework assignment. It's basically creating a file indexer using C++.

I have 2 separate files that I need to read in: one is a text document and the other is a Skip words key. I am incredibly confused about how to approach this.

I know how to read both of them in and place them into separate containers (I plan on using vectors) but I just don't know how to combine them so that the Skip words text that line up with the document text won't be displayed (do I combine them into a map?).

I then need to display words that weren't in the skipWords doc in alphabetical order (I plan on using a vector and just sorting them) and output each location they appear.

Here is the doc file:

The quick brown fox
jumped over the lazy blue
fox. I can not believe I wrote such
a common phrase.
<newpage>
Where or where are you tonight?
Why did you leave me here all

alone?
<newpage>
I searched the world over
and thought I found true love.


and here is the skipWords file:
why
are
did
here
i
not
me
a
or
you
such
where
and
the


Can someone offer a stepping stone? I'm good at figuring it out if people just ask me the right questions :) I'm not asking for code, just guidance. Thank you so much in advance!
This is what I have so far. The Indexer header is just going to announce my prototypes and that's it. I don't intend on using any constructors. All I have is just reading the files in and placing them into separate vectors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include "Indexer.h"

typedef vector<string> docText;
typedef vector<string> skipWords;

using namespace std;

void insertDocVector()
{
	string fileName;
	docText docText;

	cout << "Please enter a document file name: ";
	cin >> fileName;

	ifstream inFile(fileName.c_str(), ios::in);

	if (!inFile)
	{
		cerr << "Error opening file '" << fileName << '\'' << endl;
		system("pause");
		exit(EXIT_FAILURE);
	}
	
	copy(istream_iterator<string>(inFile),
		istream_iterator<string>(),
		back_inserter(docText));

}

void insertSkipVector()
{
	skipWords toSkip;
	string fileName;

	cout << "Please enter a skip-words file  name: ";
	cin >> fileName;

	ifstream inFile(fileName.c_str(), ios::in);

	if (!inFile)
	{
		cerr << "Error opening file '" << fileName << '\'' << endl;
		system("pause");
		exit(EXIT_FAILURE);
	}

	copy(istream_iterator<string>(inFile),
		istream_iterator<string>(),
		back_inserter(toSkip));
}

int main()
{
	cout << "\t\t\t\t***The Indexer***\n\n";

	insertDocVector();
	cout << endl;
	insertSkipVector();
	cout << endl;



	system("pause");
	return 0;
}
Last edited on
What do want to do? Is it printing the doc's words by skip the words in skipWords file and with sorting?(Sorry, my english is not much good. Maybe I misunderstand it.)

If it's,
after you read 2 files stored in 2 vectors.
You have to run doubled-loop for deleting the doc's word.

just like,
1
2
3
4
5
6
7
8
9
10
11
insert doc
insert skip
for i=0 to doc.len :
    for j=0 to skip.len :
        if doc[i] == skip[i] : 
            //mark doc[i] as skipped or delete it
        end if
    end for
end for
sort doc // have to decide what'll do with skipped word in the vector
print doc


I don't know why have to sort before printing, non-sense accept that you're just wanted to use the sort function.
Last edited on
the easiest.. make tokens of each files words and keep them in vector or list. Remember additions would be faster in list but searching would be sequential and vice versa for vectors. As the vector size increases, the additions could take more time. In case you want to optimize it before hand :-) .

lets call them document vector and skip word vector.

iterate the document vector, pick each word one by one, search that word in skip word vector. If found, do not print and continue to the next word. If not found, print it and move on to the next word in document vector.

does this work ?
What about duplicate words? Are they kept or ignored?

Since you're talking about an alphabetical ordering and looking up each word in another container of words, you are likely interested in a set or multiset:
http://www.cplusplus.com/reference/set/set/
http://www.cplusplus.com/reference/set/multiset/

Those will automatically maintain their contents sorted (one eliminating duplicates and the other keeping them) and that enables fast look up, via binary search, in O(log n).

The input you've mentioned won't make such algorithmic improvements obvious, or even noticeable, but if your course included discussions about the various STL containers and their pros/cons, efficiency of operations might be something you want to demonstrate that you've learned.
Why put skipwords in a vector and sort it? Why not use a std::set. std::set will be implicitly sorted and supports an efficient find operation.
Just for the record, there are valid use cases for either approach--but at this scale it isn't going to matter. It's all about what was taught and whether or not it was learned.
> Can someone offer a stepping stone?
> I'm good at figuring it out if people just ask me the right questions

Question one: What do you want to do about punctuation and case?
If the word in the doc file is "here," and a word in the skipWords file is "Here", should the word be skipped?

Question two: Could the standard algorithm std::set_difference<>() be used?
http://en.cppreference.com/w/cpp/algorithm/set_difference
Last edited on
Topic archived. No new replies allowed.