hash table function

Pages: 12

I am having issues with my function below - namely EXC_BAD_ACCESS (code=1, address=0x0)
when searching to check if string is present.
I have an array
std::string* data[10]{NULL};
When searching and the function iterates to an element that is NULL it throws the error?
The idea is, when adding to the hash table, a hash is created. If the hash collides, places in the next available slot. Search starts from the hash and supposed to stop and return false if 1) iterates through and does not find it, or 2) comes a across NULL.

  
bool hash::contains(std::string str)
{
    //variable to store hash
    unsigned value{hash_function(str) % 10};
    int loop_count = sizeof(data)/sizeof(data[0]) + (value + 1);
    //first check element with hash value, if found return true and exit
    if(*data[value] == str)
    {
        return true;
    }
    else
    {
        //if above fails loop through data array unitl either found or end of loop
        for(int i{static_cast<int>(value + 1)}; i < loop_count;i++, value++)
        {
            //if(i == value) continue;//no point checking this element so skip
            if(*data[value % 10] == str)//check each element, if found return true and exit
            {
                return true;
            }
            else if(data[value] == NULL)
            {
                return false;
            }
        }
    }
    return false;//if all above fails default return false
}

jamesfarrow (211)

I 'think' I have worked it out. I now add empty strings with the constructor.
Instead of data[index] = new std::string(str)
I know have data[index] = &str
Which enables my function (data[value%10]->empty())

Hopefully I'm on the right track...

coder777 (8439)

I know have data[index] = &str

If data is not local and str is local. That would be wrong.

The loop on line 15 might be wrong. Is the new hash always the last hash + 1? If not you will have gaps where the data can be null.

Line 8/18 will crash (undefined behavior) when data[...] -> nullptr. So you need to check that before you compare.

Last edited on

jamesfarrow (211)

Thanks! My array is supposed to contain pointers to new strings. That is why I have done it that way.
How can I get the input to 'new' if you get what I mean?
If there is a hash collision - then add_string is supposed to find the next available slot and store.


void hash::add_string(std::string str)
{
    //check capacity of array.
    //If >50% or 60% then resize array
    //and move all stored elemnts to new array
    //store these using a new hash key when moved
    
    //otherwise continue below
    //create index
    unsigned index = hash_function(str) % 10;
    if(data[index]->empty())//if index is empty store string
    {
        //data[index] = new std::string(str);
        data[index] = &str;
        added_items++;
    }
    else//otherwise loop through until next availble
    {
        do
        {
            ++index %= 10;
        } while(!((data[index]->empty())));
        data[index] = &str;//store in next available empty slot
        added_items++;
    }
}

Last edited on

coder777 (8439)

Well, data[index]->empty() will crash as well when data[index] is null.

Line 14 is right while line 15/24 produces a 'dangling' pointer. The pointer to str will be invalid when add_string(...) is done.

The loop on line 20: What will happen when all slots are used?

What is this hash good for when the data might be stored at an arbitrary place?

dhayden (5795)

Referring to your original post.

Line 8 is wrong. It assumes that data[value] is not NULL.
Line 18 has the same problem.

Line 6: shouldn't loop_count be 10? or 9 since you already check the first bucket? Why would you check multiple buckets?

Also, the number of buckets (10) should be a constexpr inside the hash class.

class hash {
    static constexpr unsigned NumBuckets = 10;
    std::string *data[NumBuckets];
    unsigned hash_function(std::string &str);
public:
    bool contains (std::string str);
};

bool hash::contains(std::string str)
{
    //variable to store hash
    unsigned value{hash_function(str) % NumBuckets};
    unsigned origValue {value};

    do {
        if(data[value] == nullptr) {
            return false;
        } else if (*data[value] == str) {
            return true;
        } else {
            ++value;
            if (value == NumBuckets) value = 0;
        }
    } while (value != origValue);
    return false;
}

Both add_string() and contains() need to find the location of the string: where the string is, or where it should be, if inserted. You could make that a method and then the others become trivial:

// Return the location of str in the hash, or -1 if not found.
// The location is the index into the data array
int hash::location(std::string &str)
{
    //variable to store hash
    unsigned value{hash_function(str) % NumBuckets};
    unsigned origValue {value};

    do {
        if(data[value] == nullptr || *data[value] == str) {
            return value;
        } else {
            ++value;
            if (value == NumBuckets) value = 0;
        }
    } while (value != origValue);
    return -1;
}

bool
hash::contains (std::string str)
{
    return location(str) >= 0;
}

bool
hash::insert (std::string str)
{
    int loc = location(str);
    if (loc < 0) return false; // hash is full

    data[loc] = new std::string(str);
    return true;
}

jamesfarrow (211)

Thanks everyone! It is supposed to be a simple exercise to illustrate how a hash table works/is implemented.
This is just the beginnings. Ideally what should happen is items are added to the table and 'retrieved' at will.
Behind this, the code should check for a 'capacity' value, and if the hash table is above this, it automatically resizes, and rehashes to store them in the new bigger data structure.
I have yet to implement this. That was the thinking behind making the array store pointers. I'm open to suggestions 😂

My thinking is to concentrate on getting things to function correctly with a small array first. I have a variable added_items so when that is at a predetermined value then my code will flag up. It's at this point I need to think about resizing.

Last edited on

jamesfarrow (211)

Maybe I'm getting wrong end of stick...

At first I had std::string* data[] and then in constructor created 10 empty new strings n each slot.
Then I got to thinking - is this correct?
Should it have been std::string* data = new std::string[10];
Then in constructor create 10 empty strings in each slot.

I want to create an array that stores pointers to each item thats is added?
See my confusion

Thanks in advance!!

Last edited on

dhayden (5795)

It would be easier to use a vector of pointers instead of doing the memory management of the data array by yourself. But I understand that this might be a homework problem that requires you to do it.

I think it should be a vector or array of pointers. If you mark empty buckets by empty strings, then you make it impossible to store an empty string in your hash table.

jamesfarrow (211)

Yes it is an exercise that I am trying to get to grips with. The idea is to create a hash table and add strings to it, at some point when it is 60% full it resizes to avoid collisions.
std::string* data[size] is better than std::string* data = new std::string[10] ?

I do not need to use pointers. I thought it would be easier when resizing, but I have been thinking about it. When I resize I will need to rehash the already stored items anyway. It seems the pointers are only adding a level of complexity at the moment, unless the array and stored items are of a significant size.

Think I will drop the pointers for the time being. As for empty strings, it was exactly that reason I chose to use them. `To test for 'emptiness'. My thinking was what is the point of storing and empty string?

dhayden (5795)

My thinking was what is the point of storing and empty string?

Who are you to say what the users of your class want to do? :) Heck, even the users might not know that they're putting an empty string into the hash table.

std::string* data[size] is better than std::string* data = new std::string[10] ?

Not if the array needs to resize, which you've said that it does, so you should use a pointer to a dynamic array, or a std::vector

jamesfarrow (211)

Instead of using empty string - what can I use to identify an empty bucket that is available for storage?

dhayden (5795)

what can I use to identify an empty bucket that is available for storage?

Make the buckets pointers to strings. A null pointer means the bucket is empty.

I guess you could also keep a vector of bool where the n'th item tells whether the n'th bucket is occupied.

jamesfarrow (211)

Thanks dhayden. I have the following
std::string* data = new std::string[NumBuckets];
std::string* large_data = new std::string[LargeNumBuckets];

How can do I check for NULL instead of empty string?
If I use NULL below I get an error saying operator= overload is ambiguous


hash::hash()
{
    for(int i{}; i < NumBuckets;i++)
    {
        //data[i] = NULL;
        data[i] = sentinal;
        std::cout << "cons" << std::endl;
    }
    for(int i{}; i < LargeNumBuckets;i++)
    {
        //large_data[i] = NULL;
        large_data[i] = sentinal;
    }
}

dhayden (5795)

Use nullptr instead of NULL.

What is large_data for?

jonnin (11344)

How can do I check for NULL instead of empty string?

this confuses people from other backgrounds than C/ C++ frequently.
nullptr is the modern null; it is a little smarter about types and such. But to answer your question:

if(stringvariable == "") //is this an empty string?
if(stringvarpointer == nullptr) //is POINTER null?
the above two answer your question, but just FYI and even more confusing:
the '\0' is called the 'null' character and is used in C and older or throwback written c++ that uses "c-strings". worse, c-strings are often seen as character pointers and you are back to null pointer vs empty (contains only null character!!) strings. The solution to that is to NOT USE c-style strings and stick to string objects in c++. If you do end up writing some C or dealing with char* strings (some interfaces to externals need them; I write for a database tool that wants c-style strings from the user written c++ modules), then you need to be very precise when throwing the word null about; be sure you refer to null character as the null character or string terminator, refer to empty strings as empty strings, and refer to null pointers as null pointers. :)
those boil down to these in C or C-string code:
if(stringvar == null) //is the pointer null?

if(stringvar[0] == 0) //is the string empty
or
if(strlen(stringvar) == 0) //is the string empty
or
if(strcmp(stringvar, "") == 0) //is the string empty //I think this is illegal in strict c++ not sure about strict C but loose compiler flags will accept it.
in practice most C coders would use the bold one as it does the least work.

Last edited on

jamesfarrow (211)

Thanks everyone! The large_data is in my constructor. At the moment I construct both - mainly to see if things work. I am looking into only constructing the small array first then if/when required construct the second rehash and delete the first...
I decided to initialise my data strings with "sentinel value" so I can check for that.
I will go back and replace 'sentinel' with nullptr;
Forgive my confusion - can someone explain the following to me please.
std::string* data[10];
std::string* data = new std::string[10]

jonnin (11344)

Forgive my confusion - can someone explain the following to me please.

std::string* data[10]; //create an array of POINTERS to STRINGS. this is 10 pointers, and no strings, so far. you need to get a valid address (via new or address of existing) for each one. This is ideal for holding 10 existing strings; this seems to be what you want for a hash table storage of existing data put into your container (?).

std::string* data = new std::string[10] //this creates an array of 10 strings using dynamic memory. we try to avoid this and use a vector instead in most c++. This is 'almost' the same as:
string data[10]; except now you have to delete what you newed: every new requires a delete statement when done with the memory.
or it is a bit like vector<string> data(10); <-- best way to get to what this statement did, but I don't think you want this construct at all in your code, I think you want above one (?).

Last edited on

jamesfarrow (211)

Thanks jonnin, I had an idea but you have now set it clearly for me.
The hash table is a basic exercise. Input the data and retrieve it. The idea was to start of with a 'small' table then resize - make it larger - when the capacity reached a certain threshold say 70% full.
This was my thinking using
std::string* data = new std:string[size]
std::string* large_data = new std::string[large_size]

Once threshold is reached , loop through small array and rehash entries to large array then use destructor(s) delete[] small array.

If is use the following
std::string data[size]
std::string large_data[large_size]

The small array is still 'floating about' so to speak.

If I use
std::string* data[size]
std::string* large_data[large_size]

The destructor(s) will have to call delete on each entry in the array - I think.

So this is my predicament - what is the best way to go forward.

dhayden (5795)

The problem with using larger_data is that you can only resize once. What if you have to resize twice? Or ten times?

I would do:
vector<string*> data;

When that gets full, you resize:
- make a copy of data into a local variable.
- set all the pointers in data to nullptr, then resize data to something bigger
- rehash the items using the larger size and insert them.

Your hash table destructor will have to delete the pointers in data.

Pages: 12

C++

Forum

hash table function