I am having issues with my function below - namely EXC_BAD_ACCESS (code=1, address=0x0)
when searching to check if string is present.
I have an array
std::string* data[10]{NULL};
When searching and the function iterates to an element that is NULL it throws the error?
The idea is, when adding to the hash table, a hash is created. If the hash collides, places in the next available slot. Search starts from the hash and supposed to stop and return false if 1) iterates through and does not find it, or 2) comes a across NULL.
bool hash::contains(std::string str)
{
//variable to store hash
unsigned value{hash_function(str) % 10};
int loop_count = sizeof(data)/sizeof(data[0]) + (value + 1);
//first check element with hash value, if found return true and exit
if(*data[value] == str)
{
returntrue;
}
else
{
//if above fails loop through data array unitl either found or end of loop
for(int i{static_cast<int>(value + 1)}; i < loop_count;i++, value++)
{
//if(i == value) continue;//no point checking this element so skip
if(*data[value % 10] == str)//check each element, if found return true and exit
{
returntrue;
}
elseif(data[value] == NULL)
{
returnfalse;
}
}
}
returnfalse;//if all above fails default return false
}
I 'think' I have worked it out. I now add empty strings with the constructor.
Instead of data[index] = new std::string(str)
I know have data[index] = &str
Which enables my function (data[value%10]->empty())
Thanks! My array is supposed to contain pointers to new strings. That is why I have done it that way.
How can I get the input to 'new' if you get what I mean?
If there is a hash collision - then add_string is supposed to find the next available slot and store.
void hash::add_string(std::string str)
{
//check capacity of array.
//If >50% or 60% then resize array
//and move all stored elemnts to new array
//store these using a new hash key when moved
//otherwise continue below
//create index
unsigned index = hash_function(str) % 10;
if(data[index]->empty())//if index is empty store string
{
//data[index] = new std::string(str);
data[index] = &str;
added_items++;
}
else//otherwise loop through until next availble
{
do
{
++index %= 10;
} while(!((data[index]->empty())));
data[index] = &str;//store in next available empty slot
added_items++;
}
}
class hash {
staticconstexprunsigned NumBuckets = 10;
std::string *data[NumBuckets];
unsigned hash_function(std::string &str);
public:
bool contains (std::string str);
};
bool hash::contains(std::string str)
{
//variable to store hash
unsigned value{hash_function(str) % NumBuckets};
unsigned origValue {value};
do {
if(data[value] == nullptr) {
returnfalse;
} elseif (*data[value] == str) {
returntrue;
} else {
++value;
if (value == NumBuckets) value = 0;
}
} while (value != origValue);
returnfalse;
}
Both add_string() and contains() need to find the location of the string: where the string is, or where it should be, if inserted. You could make that a method and then the others become trivial:
// Return the location of str in the hash, or -1 if not found.
// The location is the index into the data array
int hash::location(std::string &str)
{
//variable to store hash
unsigned value{hash_function(str) % NumBuckets};
unsigned origValue {value};
do {
if(data[value] == nullptr || *data[value] == str) {
return value;
} else {
++value;
if (value == NumBuckets) value = 0;
}
} while (value != origValue);
return -1;
}
bool
hash::contains (std::string str)
{
return location(str) >= 0;
}
bool
hash::insert (std::string str)
{
int loc = location(str);
if (loc < 0) returnfalse; // hash is full
data[loc] = new std::string(str);
returntrue;
}
Thanks everyone! It is supposed to be a simple exercise to illustrate how a hash table works/is implemented.
This is just the beginnings. Ideally what should happen is items are added to the table and 'retrieved' at will.
Behind this, the code should check for a 'capacity' value, and if the hash table is above this, it automatically resizes, and rehashes to store them in the new bigger data structure.
I have yet to implement this. That was the thinking behind making the array store pointers. I'm open to suggestions 😂
My thinking is to concentrate on getting things to function correctly with a small array first. I have a variable added_items so when that is at a predetermined value then my code will flag up. It's at this point I need to think about resizing.
At first I had std::string* data[] and then in constructor created 10 empty new strings n each slot.
Then I got to thinking - is this correct?
Should it have been std::string* data = new std::string[10];
Then in constructor create 10 empty strings in each slot.
I want to create an array that stores pointers to each item thats is added?
See my confusion
It would be easier to use a vector of pointers instead of doing the memory management of the data array by yourself. But I understand that this might be a homework problem that requires you to do it.
I think it should be a vector or array of pointers. If you mark empty buckets by empty strings, then you make it impossible to store an empty string in your hash table.
Yes it is an exercise that I am trying to get to grips with. The idea is to create a hash table and add strings to it, at some point when it is 60% full it resizes to avoid collisions.
std::string* data[size] is better than std::string* data = new std::string[10] ?
I do not need to use pointers. I thought it would be easier when resizing, but I have been thinking about it. When I resize I will need to rehash the already stored items anyway. It seems the pointers are only adding a level of complexity at the moment, unless the array and stored items are of a significant size.
Think I will drop the pointers for the time being. As for empty strings, it was exactly that reason I chose to use them. `To test for 'emptiness'. My thinking was what is the point of storing and empty string?
My thinking was what is the point of storing and empty string?
Who are you to say what the users of your class want to do? :) Heck, even the users might not know that they're putting an empty string into the hash table.
std::string* data[size] is better than std::string* data = new std::string[10] ?
Not if the array needs to resize, which you've said that it does, so you should use a pointer to a dynamic array, or a std::vector
How can do I check for NULL instead of empty string?
this confuses people from other backgrounds than C/ C++ frequently.
nullptr is the modern null; it is a little smarter about types and such. But to answer your question:
if(stringvariable == "") //is this an empty string?
if(stringvarpointer == nullptr) //is POINTER null?
the above two answer your question, but just FYI and even more confusing:
the '\0' is called the 'null' character and is used in C and older or throwback written c++ that uses "c-strings". worse, c-strings are often seen as character pointers and you are back to null pointer vs empty (contains only null character!!) strings. The solution to that is to NOT USE c-style strings and stick to string objects in c++. If you do end up writing some C or dealing with char* strings (some interfaces to externals need them; I write for a database tool that wants c-style strings from the user written c++ modules), then you need to be very precise when throwing the word null about; be sure you refer to null character as the null character or string terminator, refer to empty strings as empty strings, and refer to null pointers as null pointers. :)
those boil down to these in C or C-string code:
if(stringvar == null) //is the pointer null?
if(stringvar[0] == 0) //is the string empty
or
if(strlen(stringvar) == 0) //is the string empty
or
if(strcmp(stringvar, "") == 0) //is the string empty //I think this is illegal in strict c++ not sure about strict C but loose compiler flags will accept it.
in practice most C coders would use the bold one as it does the least work.
Thanks everyone! The large_data is in my constructor. At the moment I construct both - mainly to see if things work. I am looking into only constructing the small array first then if/when required construct the second rehash and delete the first...
I decided to initialise my data strings with "sentinel value" so I can check for that.
I will go back and replace 'sentinel' with nullptr;
Forgive my confusion - can someone explain the following to me please.
std::string* data[10];
std::string* data = new std::string[10]
Forgive my confusion - can someone explain the following to me please.
std::string* data[10]; //create an array of POINTERS to STRINGS. this is 10 pointers, and no strings, so far. you need to get a valid address (via new or address of existing) for each one. This is ideal for holding 10 existing strings; this seems to be what you want for a hash table storage of existing data put into your container (?).
std::string* data = new std::string[10] //this creates an array of 10 strings using dynamic memory. we try to avoid this and use a vector instead in most c++. This is 'almost' the same as:
string data[10]; except now you have to delete what you newed: every new requires a delete statement when done with the memory.
or it is a bit like vector<string> data(10); <-- best way to get to what this statement did, but I don't think you want this construct at all in your code, I think you want above one (?).
Thanks jonnin, I had an idea but you have now set it clearly for me.
The hash table is a basic exercise. Input the data and retrieve it. The idea was to start of with a 'small' table then resize - make it larger - when the capacity reached a certain threshold say 70% full.
This was my thinking using
std::string* data = new std:string[size]
std::string* large_data = new std::string[large_size]
Once threshold is reached , loop through small array and rehash entries to large array then use destructor(s) delete[] small array.
If is use the following
std::string data[size]
std::string large_data[large_size]
The small array is still 'floating about' so to speak.
If I use
std::string* data[size]
std::string* large_data[large_size]
The destructor(s) will have to call delete on each entry in the array - I think.
So this is my predicament - what is the best way to go forward.
The problem with using larger_data is that you can only resize once. What if you have to resize twice? Or ten times?
I would do:
vector<string*> data;
When that gets full, you resize:
- make a copy of data into a local variable.
- set all the pointers in data to nullptr, then resize data to something bigger
- rehash the items using the larger size and insert them.
Your hash table destructor will have to delete the pointers in data.