Searching for a C string

Pages: 12
What's an efficient algorithm for searching for a C string in an unsorted array? I could just walk the array from start to finish but that would take O(n) time. I have a feeling that might be the best (or only) method, though. If not, what's a more efficient solution?

I can't use binary search because, like I said, the array is unsorted, and I can't sort it because I have two arrays of different objects where the index of an object in the first array has to be the same as the index of a corresponding object in the second array.
If you can't sort it and it's unsorted, you're stuck. :P

As for not being able to sort it, couldn't you make a single arrray of structures that contain both of the objects or something?
What kind of unsorted array? Probably just walking in the entire array, using your own compare function (and maybe your own strlen to make sure it's As Fast As Possible (A FAP?)) and also passing your strlen to your compare function to make sure your strlen function will only be called once thus hevitating more strlen calls, is the faster way, or, at least, the way i'd chose.
Like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
bool CompareN(const char * First, const char * Second, const size_t &Size)
{
    for(size_t i = 0; i < Size; ++i)
    {
        if(First[i] != Second[i])
            return 0;
    }
    return 1;
}
size_t CanFindInside(const char * Big, const char * Little)
{
    size_t biglen = strlen(Big);
    size_t lillen = strlen(Little);
    if(lillen > biglen)
        return 0;
    size_t maxbig = biglen - lillen;
    for(size_t i = 0; i < maxbig; ++i)
    {
        if(CompareN(Big+i,Little,lillen))
            return i+1; // Make sure result isn't 0 so can distinguish between fail and pass
    }
    return 0;
}


EDIT: Again, what do you mean by unsorted? Lol, maybe i'm totally out of topic.

EDIT: Edited code above to return the valid index. Based on your post count you probably are clearly able to find out how to adapt it.
Last edited on
@firedraco,
I was originally going to have it as an array of a struct that looks like this:
1
2
3
4
struct dict_pair {
        char* key;
        void* value;
};

but then I thought "what if the user [probably just me] wants to iterate over the keys or values separately?", so I ended up with this:
1
2
3
4
5
6
struct dict {
	char**	keys;
	void**	values;
	size_t	count;
	size_t	capacity;
};


@EssGeEich,
By 'unsorted' I mean the elements aren't necessarily in any order (they could, coincidentally, appear in some order, but the code will never know that). As for comparing, I do it like this right now:
1
2
3
4
5
6
7
8
9
static long __get_index_by_key(struct dict* dict, const char* key)
{
	size_t i;
	for (i = 0; i < dict->count; ++i) {
		if (strcmp(dict->keys[i], key) == 0)
			return (long)i;
	}
	return -1;
}

I'm just wondering if there's a more efficient way.
Like mine? Remember strcmp automatically calls two strlen each call, where mine calls ONLY two strlen at the beginning, and it still makes sure you don't overflow.
strcmp() doen't need to call strlen() at all. It can compare one character at a time until it hits null.

You can improve the compare time by comparing a machine word's worth at a time rather than one byte at time.
What is the output of :



#include<stdio.h.>
main()

{
printf(5+"My name is Dwe") ;
}



Plzz help me
That won't compile.

stdio.h. ? Did you mean stdio.h ?

main()
This isn't C++ and it never has been.


Please don't stick your questions on the end of an existing thread. You should start a new thread.
To get sub-linear time for insertion and lookup, use either std::unordered_map<> or a std::map<> for this.

typedef std::unordered_map< std::string, void* > dict ;

If you must search for a C string in an unsorted array, you can never get sub-linear time. The linear search can be made faster by:

1. Pre-computing and storing the key lengths
2. Comparing keys only if the lengths match
3. Using std::memcmp() instead of std::strcmp()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
struct dict
{
	char**	keys;
	std::size_t* key_lengths ;

	void**	values;
	std::size_t	count;
	std::size_t	capacity;

	std::ptrdiff_t get_index_by_key( const char* key ) const
	{
	    const std::size_t len = std::strlen(key) ;

	    for( std::size_t i=0 ; i < count ; ++i )
	        if( ( key_lengths[i] == len ) && ( !std::memcmp( keys[i], key, len ) ) )
	            return i ;

	    return -1 ;
	}

	void add_key_value_pair( const char* key, void* value )
	{
         if( get_index_by_key(key) == -1 )
         {
             const std::size_t key_length = std::strlen(key) ;
             // resize buffers if required
             // add key to back of keys array
             // add key_length to back of key_lengths array
         }
	}
};


I would expect a reasonable library implementation to provide the fastest possible std::memcmp().
For instance, on an x86, with a single instruction like repz cmpsb %es:(%edi),%ds:(%esi)
you are right but in c++ what will be the output


ok ! now what will be output of this



#include <iostream>

using namespace std;
int main ()
{
cout << 5+"My name is Dwe";
return 0;

}

Thanks for ur reply


You can start a new thread by pushing the button marked "New Topic". You are polluting this thread, which is about something completely different.
thanks . I am a beginner. :P
deendayal
Why are you hijacking someone else's thread with your own question? Why not start your own thread?

In answer to your question? Why don't you run the code instead of asking what the output will be?

If you want any help, I suggest you start a new thread in the beginners forum.
If the number of keys are large, and certain keys are searched for far more often than others, the amortized time to find a key can be reduced by bubbling the more often searched for keys to the front of the sequence.

Something like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
std::ptrdiff_t get_index_by_key( const char* key ) const
{
    const std::size_t len = std::strlen(key) ;

    if( ( key_lengths[0] == len ) && ( !std::memcmp( keys[0], key, len ) ) )
          return 0 ;

    for( std::size_t i=1 ; i < count ; ++i )
        if( ( key_lengths[i] == len ) && ( !std::memcmp( keys[i], key, len ) ) )
        {
            std::swap( keys[i], keys[i-1] ) ;
            std::swap( key_lengths[i], key_lengths[i-1] ) ;
            return i-1 ;
        }

    return -1 ;
}

kbw wrote:
You can improve the compare time by comparing a machine word's worth at a time rather than one byte at time.

Good idea. I'll write my own function for comparing the strings. [edit] How will I know when I reach the end of the string? "abc\0" is exactly 4 bytes, so on x86 it'll fit into a single word, but how do I check for a NULL terminator?

@JLBorges,
Is memcmp faster than strcmp? I can't use C++ STL functions as in your previous post because I'm writing in C.
Last edited on
kbw wrote:
strcmp() doen't need to call strlen() at all. It can compare one character at a time until it hits null.

Can also be, but anyways comparing length before checking may make things faster, and anyways everything can be adapted.
How will I know when I reach the end of the string?
That's a good question. I haven't thought of a satisfactory answer, but as EssGeEich points out, things get a whole lot easier if you hold the string length.

memcpy() has a fixed length, so a repeat instruction can be used to do the copy, whereas strcpy() must test each byte for null.
> I can't use C++ STL functions as in your previous post because I'm writing in C.

Why don't you just write a simple hash-table in C?


> How will I know when I reach the end of the string?
> "abc\0" is exactly 4 bytes, so on x86 it'll fit into a single word,
> but how do I check for a NULL terminator?

You also need to get to the first byte that meets the alignment requirement for a word, before you can start comparing word by word. Unless the strings involved are long, or the code is non-portably written (for a platform which has no alignment requirements for words), the whole exercise is probably not worth it.

@kbw,
I guess I'll use memcmp then.

@JLBorges,
A hash table/dictionary is what I'm writing. It mostly works now, I'm ironing out some bugs with Valgrind and then it should be fine.
Last edited on
@chrisname : Just informing, I'm sure memcmp/memcpy/memset are made in asm in most implementations. Don't know about strcpy/strcmp, but sure memcmp/cpy/set are the fastest way to compare/copy/set memory.
Pages: 12