find frequence used word

Write a program that determines the frequently used words in a text file. We’ll define frequently used to mean that a word accounts for at least 1% of all the words in a file.For example,in the file melville.txt there are 14,353 total words, so any word that occurs more than 143 times is a frequently used word. For melville.txt the frequently used words are shown in below with their number of occurrences.
for example: 334 a
219 his
194 that
519 i
603 the
265 in
432 to
I know the ifstream and ofstream but how I must use not understand.I think like this first we use ofstream to creat a txt file and enter 'n' times words after that we use ifstream to read this each words and at the last we calculate the frequence of them.But I'm not sure.Also I think we can use array and declared each word arrayword[n]={help,me,he,help,...,n}but maybe this way doesn't treu.But here I don't understand whether the user will enter the word or we will use file where the words already exist..I need your help.thanks for help.

rapidcoder (1010)

Easy. Hint: use SGI's hashmap or std::map<string, int> and add words to it, increasing the counter for each of the words. After reading all the words, just iterate over all the pairs in the map and return only those words that have counters over some threshold.

Would be more challenging if additionally you should satisfy all of these:
1. Only one pass over data were allowed.
2. Sublinear space complexity is required.
3. Linear time complexity is required. :)

Last edited on

cran (218)

You just need to call the file that has the words once, and count them while you are loading them rather than trying to load every word. I know you are asking for a lot more, but here is the basic steps you should take (it can be much different):

1. get file name (you can just hardcode it to test)
2. make an array of strings and an array of ints
3. set a loop reading each word from the file until the end (EOF)
4. in that loop check if the word is in the array of strings
-- if it's not add it to the array of strings and another number var to the array of ints set at 1
-- if it is, find the int that represents that word and add 1 to it

5: after you break from the loop, you can do a number of things, but basically, get all the words above the count you want.

cran (218)

rapidcoder is correct in saying to use hashmaps.. they make things much faster, but if you still have trouble how to start, then make something simple first, then learn about hash arrays. They can be complicated at first (well, if you want to understand them), but are well worth the time to learn. Map is good to, because it will keep track of both arrays for you, and you only need to update the map...

karabulak (6)

well I don't know nothing about hash maping,just I must here ifstream and ofstream and array but not much complicated.

cran (218)

Using the hash isn't really complicated actually, everything is done for you in the background.

Last edited on

Crutoy (39)

Hash map might not be part of his requirement, especially if its homework where the must show understanding of arrays.

karabulak (6)

yes this is homework and i must use just ofstream,ifstream and array ,if anyone have any idea how will do this please write.

acorn (276)

i tried to do this using map. i couldnt figure out though how to read all the contents of the file. i got 1 word then the next line. it didnt print out all of each line as a whole. im not going to post the code because its a homework assignment but im gonna mess with it some more.

rocketboy9000 (562)

Make sure you remove all non-letter characters from the words as you read them, otherwise "crime " and "crime, " won't be counted as the same word. God, I wish c++ had the =~ s/// operator...

Last edited on

wolfgang (381)

I got some working code and it does pretty well. I just need a good long paragraph or two to test its power.

cran (218)

wolfgang: just take something from a book online or from any webpage (you can do with or without the html if your code can handle that) you can use the first post in this thread...

rocketboy: you can always make/find a function to do it...

acorn: the strings should be the keys... maybe you know that, because the keys have to be unique. do a find on the string key and if it exists add one to the value.. if not, insert... there are plenty of references to show how to do it on this site and other places.

karabulak... what do you already know about coding? do you know what an array is? do you know how to iterate an array? do you know how to insert into it? are you required to have an any size array (dynamic) or did the prof say it won't be over a given size? that will help with ease... oh, can you use the std library for your array, or does it have to be just a normal array like char [] or something?

wolfgang (381)

Ah yes. I "borrowed" some text from an article here. I do thing this code has potential. Here's the text I used: http://www.cplusplus.com/articles/chrisname1/

input wrote:
The Difference [...] We will focus on gcc and Visual C in this article as they are the most popular compilers.

And my results:

 // Words that appear > 5 times.

The word 'a' appeared 14 times!
The word 'and' appeared 18 times!
The word 'c' appeared 6 times!
The word 'code' appeared 6 times!
The word 'compiler' appeared 10 times!
The word 'compilers' appeared 10 times!
The word 'the' appeared 18 times!

Process returned 0 (0x0)   execution time : 0.013 s
Press any key to continue.

Obviously my code is not perfect, and you can see it merged words that are hyphen separated (something I need to fix later)

Last edited on

Topic archived. No new replies allowed.

C++

Forum

find frequence used word