I'm given a folder of input text or HTML files (say 50 files) and a set of keywords (say 40 keyword).
I'm trying to build an efficient data structure (an index file) using hashing that will provide information as which files contain a certain keyword and in which line . The structure should be saved on hard disk and loaded in memory upon system start up . then i should build a small program in which a user will have a simple interface for a keyword.
input keyword to search:
The user then types "water" for example.
The program then searches the index file and should have an output such as:
the keyword "water" exist in :
water-resources.html:
Line 526:The water problem in the USA ......
Line 321:the Nile water is the ......
Line 3978:Water is a blessing from God....
and so on. can anybody handle the rule that when a file is changed , the index is re-constructed ????
and how can i read the file name that i search in ?
and how could i know the line number ?
can anyone help me to write a code of this engine? :)