What's an efficient algorithm for searching for a C string in an unsorted array? I could just walk the array from start to finish but that would take O(n) time. I have a feeling that might be the best (or only) method, though. If not, what's a more efficient solution?
I can't use binary search because, like I said, the array is unsorted, and I can't sort it because I have two arrays of different objects where the index of an object in the first array has to be the same as the index of a corresponding object in the second array.
What kind of unsorted array? Probably just walking in the entire array, using your own compare function (and maybe your own strlen to make sure it's As Fast As Possible (A FAP?)) and also passing your strlen to your compare function to make sure your strlen function will only be called once thus hevitating more strlen calls, is the faster way, or, at least, the way i'd chose.
Like:
@EssGeEich,
By 'unsorted' I mean the elements aren't necessarily in any order (they could, coincidentally, appear in some order, but the code will never know that). As for comparing, I do it like this right now:
1 2 3 4 5 6 7 8 9
staticlong __get_index_by_key(struct dict* dict, constchar* key)
{
size_t i;
for (i = 0; i < dict->count; ++i) {
if (strcmp(dict->keys[i], key) == 0)
return (long)i;
}
return -1;
}
I'm just wondering if there's a more efficient way.
Like mine? Remember strcmp automatically calls two strlen each call, where mine calls ONLY two strlen at the beginning, and it still makes sure you don't overflow.
struct dict
{
char** keys;
std::size_t* key_lengths ;
void** values;
std::size_t count;
std::size_t capacity;
std::ptrdiff_t get_index_by_key( constchar* key ) const
{
const std::size_t len = std::strlen(key) ;
for( std::size_t i=0 ; i < count ; ++i )
if( ( key_lengths[i] == len ) && ( !std::memcmp( keys[i], key, len ) ) )
return i ;
return -1 ;
}
void add_key_value_pair( constchar* key, void* value )
{
if( get_index_by_key(key) == -1 )
{
const std::size_t key_length = std::strlen(key) ;
// resize buffers if required
// add key to back of keys array
// add key_length to back of key_lengths array
}
}
};
I would expect a reasonable library implementation to provide the fastest possible std::memcmp().
For instance, on an x86, with a single instruction like repz cmpsb %es:(%edi),%ds:(%esi)
If the number of keys are large, and certain keys are searched for far more often than others, the amortized time to find a key can be reduced by bubbling the more often searched for keys to the front of the sequence.
You can improve the compare time by comparing a machine word's worth at a time rather than one byte at time.
Good idea. I'll write my own function for comparing the strings. [edit] How will I know when I reach the end of the string? "abc\0" is exactly 4 bytes, so on x86 it'll fit into a single word, but how do I check for a NULL terminator?
@JLBorges,
Is memcmp faster than strcmp? I can't use C++ STL functions as in your previous post because I'm writing in C.
How will I know when I reach the end of the string?
That's a good question. I haven't thought of a satisfactory answer, but as EssGeEich points out, things get a whole lot easier if you hold the string length.
memcpy() has a fixed length, so a repeat instruction can be used to do the copy, whereas strcpy() must test each byte for null.
> I can't use C++ STL functions as in your previous post because I'm writing in C.
Why don't you just write a simple hash-table in C?
> How will I know when I reach the end of the string?
> "abc\0" is exactly 4 bytes, so on x86 it'll fit into a single word,
> but how do I check for a NULL terminator?
You also need to get to the first byte that meets the alignment requirement for a word, before you can start comparing word by word. Unless the strings involved are long, or the code is non-portably written (for a platform which has no alignment requirements for words), the whole exercise is probably not worth it.
@chrisname : Just informing, I'm sure memcmp/memcpy/memset are made in asm in most implementations. Don't know about strcpy/strcmp, but sure memcmp/cpy/set are the fastest way to compare/copy/set memory.