C++ - Speed in data structures

Feb 12, 2012 at 9:46pm
I'm quite new to C++ and I'd like to get feedback on different data structures. It's been a while since I was programming, but if I remember correctly the most common data structures are
Linked list
Binary tree
Map/HashTree
Each preforms differently depending on task. I'm trying to find the best structure to a project where there's going to be a big data set and execution time might be an issue. The reason why I'm going to this in C++ is the openmpi library. I'm more familiar with java.

A list of the different structures/performance. Let's say I'm only planing to delete and insert end/beginning or with a known traverse position.
1
2
3
4
          List     BinTree     HashTree/Map
Insert    O(1)     O(log n)    ???
Delete    O(1)     O(log n)    ???
Search    O(n)     O(log n)    ???


The idea is to search for strings that match in the data structure<string>. Then merge hits and add them to the structure and erase the two that got merged. Up til now I've been fooling around with list<string>. It seems ok, but I'm leaning towards making my own class. Please point out any other data structure in some standard C++ library I should know about. Or better yet, the überstructure I should go with. :-)
Feb 12, 2012 at 9:56pm
If I understand your idea correctly, you're eliminating the duplicates in your structure?
Why not use a structure that does not allow duplicates to start with, std::set<std::string>?
Feb 12, 2012 at 10:18pm
The strings won't be exact duplicates rather a hit is when the beginnings, or ends, matches. Like this a string asdffdsafasd an other string fsafsdfaaaaaafasdf becomes fsafsdfaaaaaaf|asdf|fdsafasd. Cheers!
Feb 13, 2012 at 3:30am
> The strings won't be exact duplicates rather a hit is when the beginnings, or ends, matches.

A generalized suffix tree in conjuction with a trie come to mind immediately. But if...

> I've been fooling around with list<string>. It seems ok

Why break something that works, unless you are anticipating a scalability problem? Simplicity and transparency are design goals; performance is only a design constraint.
Feb 13, 2012 at 8:13am
About the suffix tree structure, I've never used it but just by reading on wikipedia it's seems bitching good. You don't happen to have any particularly good links or likewise on C++ and suffix tree?

Trie come..? Care to elaborate further!

Thanks for the input.
Feb 13, 2012 at 8:33am
> Trie come..? Care to elaborate further!

Should have been ... trie comes to mind ...

A trie is a prefix tree http://en.wikipedia.org/wiki/Trie

A patricia trie might be what is appropriate for this case http://en.wikipedia.org/wiki/Patricia_trie


> You don't happen to have any particularly good links or likewise on C++ and suffix tree?

Any good book on algorithms + data structures would cover these.

GNU has C++ implementations of some of these. http://gcc.gnu.org/onlinedocs/libstdc++/ext/pb_ds/
Last edited on Feb 13, 2012 at 8:39am
Topic archived. No new replies allowed.