Hash table problem

Apr 30, 2009 at 12:35am
Hello all,

I am trying to solve this question and I've pasted my solution below. I want to know points to improve in my pasted code, so that it becomes more efficient.

Implement a function that takes in 2 arrays/list of integers, and return the common numbers.
e.g A: [2,3, 2, 4, 5] B: [9, 2, 2, 2, 3, 3]
Return in any order: [2, 2, 3]

- The following function takes in arrays A and B with sizes num1 and num2 resp.
- It also takes array C to store duplicate values, with size k (lesser of num1 and num2)


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
int insertAndCheck(int* A, int* B, int* C, int num1, int num2, int k)
{
    
    if(!A || !B)
    {
        cout << "Array A or B are null" << endl;
        return NULL;                                        //NULL indicates error message;
    }
 
    if(!C)
    {
        cout << "Array to store duplicate values is passed as NULL" << endl;
        return NULL;
    }
 
    if(num1 = 0 || num2 = 0)
    {
        cout << "Number of elements in A or B or both are zero. Nothing to check." << endl;
        return NULL;
    }
 
    for(int i = 0; i < num1; i++)
    {
        int x = HashTable1.getValue(A[i]);
        ++x;
        HashTable1.insert(A[i], x);
    }
 
    int size = k;
    k = 0;
    for(int j = 0; j < num2; j++)
    {
        int y = HashTable1.getValue(B[j]);
        if(y != 0)
        {
            --y;                                                //decrement the value at the index, each time the value is duplicated
            C[k++] = B[j];
            if(k > size)
            {
                cout << "Size of array C incorrect" << endl;
                return NULL;
             }
 
            HashTable1.insert(B[j], y);
        }
        else
        {
            cout << "Element did not exist previously and hence not duplicated" << endl;
        }
        
    }
 
    return (k-1);            //returns number of duplicated elements in array C 
}


Please let me know:
- If I could use vectors instead of arrays.
- Where could I use exceptions?
Apr 30, 2009 at 12:50am
- Yes.
- Either you make it more efficient or you use exceptions. You can't have both. Also, remember: For a language with so many features as C++ (and still more to come), only use the features you really need, not the ones you feel like using just because.
Last edited on Apr 30, 2009 at 12:51am
Apr 30, 2009 at 12:55am
- As far as exceptions, since this question is from an exam, I wanted to use exceptions just to show the programming knowledge, they specifically asked for it.

- For vectors, I know that they are basically arrays with sizes that can be modified. How can vectors help here, when the size of arrays is really constant?

Thanks.
Apr 30, 2009 at 1:00am
Oh, okay, then. You can replace each return NULL with an exception throw.

Vectors throw an exception if you try to access out of bounds elements, so it's impossible to overflow them.
Apr 30, 2009 at 1:10am
Thanks helios.

Can you write a line of code showing vector usage instead of arrays?
Apr 30, 2009 at 1:20am
What usage? Creation, filling, or accessing?
Apr 30, 2009 at 1:34am
Actually never mind, I did get it on wiki. Thanks.
Apr 30, 2009 at 4:13am
1
2
3
4
for(vector<int>::const_iterator it = A1->begin(); it!=A1->end(); ++it)
{
        cout << *it << endl;
}


In the above code, is there a way I can print the value of index also? Obviously I can do that using a variable and incrementing everytime the loop executes. But is there an inbuilt way in the vectors?

Thanks.
Apr 30, 2009 at 12:19pm
There isn't, but if you want a generic solution,

std::distance( A1->begin(), it );
(defined in <iterator> I believe)

will give you what you want.
Apr 30, 2009 at 1:04pm
Thanks jsmith.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
int insertAndCheck(vector<int>* A1, vector<int>* B1, vector<int>* C1)
{	
	if(!A1 || !B1)
    {
        cout << "Vectors A or B are null" << endl;
        throw nullInputVectors();                                        //NULL indicates error message;
    }
 
    if(!C1)
    {
        cout << "Vector to store duplicate values is passed as NULL" << endl;
        throw nullOutputVector();
	}
    
 //HashTable1 is initialized to 0. 
//insert values of vector A in HashTable1. If it is a duplicate, increment the value at that index
//to indicate duplicate values
    for(vector<int>::const_iterator it = A1->begin(); it!=A1->end(); ++it)
    {
        int x = HashTable1.getValue(*it);
        ++x;
        HashTable1.insert(*it, x);
    }
 //Search the HashTable1 for the values contained in Vector B, sequentially.
//If the value is found, decrement the value at the index of HashTable1, and store it in
//another Vector C - to register common values between A and B
//If the value is not found, do not insert it in Vector C. Move on to another value.

    int count = 0;
    for(vector<int>::const_iterator it = B1->begin(); it!=B1->end(); ++it)
    {
        int y = HashTable1.getValue(*it);
        if(y != 0)
        {
            --y;                                                //decrement the value at the index, each time the value is duplicated
            C1->insert(C1->begin()+count, *it);
 
            HashTable1.insert(*it, y);
        }
        else
        {
            cout << "Element did not exist previously and hence not duplicated" << endl;
        }
		++count;
    }
}


1) I have used vectors, iterators and exceptions to modify my previous program. Is this written correctly? A recap: Above program collects common values between vectors A and B and stores them in C in O(n + m) time.

2) Are there container classes for hash tables available?

Thanks.
Last edited on Apr 30, 2009 at 2:40pm
Apr 30, 2009 at 1:44pm
If I understand the original problem, all you need is a function that takes two "sets" of integers and returns the union of those "sets" without duplicates, correct?

If so, you can achieve this in far less code if you use std::set, since by definition a set is a collection of unique elements (inserts of duplicates fail). set will be more efficient than hash tables since your keys are integers. (It is faster to compare X to Y than to compare the hashed value of X to the hashed value of Y).

Setting that aside, you would be better off passing the three parameters by reference rather than by pointer, as then you can delete lines 3-14 and not worry about catching exceptions.

Apr 30, 2009 at 2:41pm
Actually this program returns the common values (non-unique). Basically the intersection of the sets. Is there still an easier way?

I've updated my new program with comments, to understand it.

Thanks for second comment.
Apr 30, 2009 at 7:00pm
The simplest way, from a code perspective, is to order the elements in the vectors (sort them) and then use the set_intersection algorithm. This isn't the most efficient algorithm by far, but in terms of code it is the shortest.

1
2
3
4
5
6
7
8
9
10
11
12
#include <algorithm>
#include <iterator>
#include <vector>

vector<int> intersect( vector<int>& v1, vector<int>& v2 ) {
    std::sort( v1.begin(), v1.end() );
    std::sort( v2.begin(), v2.end() );
    vector<int> result;
    std::set_intersection( v1.begin(), v1.end(), v2.begin(), v2.end(),
         std::back_inserter( result ) );
    return result;
}


NOTE: This is off the top of my head. Not compiled.

The drawbacks to this are:
1) Modifies v1 and v2;
2) Requires two sorts (slow for large containers)

May 1, 2009 at 10:47pm
Good to know that. Like you mentioned, the runtime of the program you suggested would be exceed O(nlogn) (sorting will require nlogn time & Intersection will be time n).

Runtime of the program I wrote is O(n) - using hashtable. I am looking for most efficient and most easiest way for the problem. Your solution is definitely easiest but it would rank lower in efficiency. Thanks.
Topic archived. No new replies allowed.