Since the questions about BST's are frequent around here, I made a small exercise of balancing a tree by hand in a piece of paper. I hadn't done BST's myself before nor have I had searched much about them either, not to mention the many BST variations out there.
So the question is quite straight: Which is the best balancing algorithm?
Definitions:
Best: Fastest and least resources consumed.
Second Best: Fastest but heavy on resources.
Third Best: A trade-off between speed and resources.
You should read about AVL, red black and splay tree's. These are one of most fundamental self balancing trees. AVL is the oldest and can be faster than red black tree's. Red black has better worst case times than AVL and insertion and retrieval is better. A good discussion can be found in "Algorithms by Robert Sedgewick" (the inventor of the tree). I have not read much about splays but what i know is, they keep the most frequently touched nodes at the head/root. So in that sense they seem to be fastest but you have to see.
For these to understand, reading about 2-3-4 top down tree would help.
Thanks for dropping by. After I asked the question I sought a bit about BST's and indeed found the AVL tree. I must say, however, that I don't want to read about other trees right now before I fully grasp the operation of a good ol' BST (or self-balanced BST). I am guessing these two are the absolute basic ones, am I correct?
About the AVL, I do see an algorithm when an item is inserted in order to maintain the tree balanced. But I ask this: Given an unbalanced tree (unbalanced to any extent, even orders of magnitude apart), which balancing algorithm is the fastest? Not that I have read about many. Actually I think just two so far. Not sure if there are more. Since this is a topic that has been a "black box" to me up until now, I am unsure if I am searching correctly, and thought it would be best to ask here for better guidance. Plus, I want to know if the one I devised in paper is out there and how it is called.
BST is not a balancing tree. for example if you give a sorted list to BST it will expand in one direction only.
For the balancing tree's, these tree are kept balanced on each insertion. For a totally unbalanced tree to make it balanced, I don't know which one will do better. These tree's are quite complex and working on them only will give good idea on advantages and disadvantages.
I suggest you start with AVL and then go to red black.
So the question is quite straight: Which is the best balancing algorithm?
There is no simple answer to this question. You can hear about many such algorithms because each has its own set of pros and cons. There are no obvious winner here.
Also, one thing you seem to be forgetting is that self-balancing BSTs are rarely "ideally" balanced, so quality of this balancing should be another criteria when analyzing a specific algorithm.
There is no simple answer which one is the best, but the most widely used are red-black trees. Almost every standard library has them implemented. I have yet to see AVLs in production.
yeah, red black trees are implemented in lot of areas, the good example which i remember is the latest scheduler in Linux kernel 2.6 called the "Completely Fair Scheduler" is implemented using the red black trees.
Thanks for all the replies. For you guys saying "the best is not obvious", "the answer is not simple" and other variants: Please see my original post on the definition of Best, Second Best, and Third Best. Initially, I'm always interested in speed. Does that help you deciding on the best? I sure hope so.
I also understand that a BST is not a balancing tree. But I can make it balance itself every X number of operations (insertions/deletions). I just want to play with the fact that an AVL is just a BST with relentless balancing. I want to have a soft blend of both for my varied taste. :-) Besides, if I were to have a constructor that initializes a tree from an array/vector, that data may be ordered, and construction would require initial balancing of the tree, which I presume is faster than ordering after each insertion.
Sorry, but your definitions are useless. According to your requirements, the best balancing algorithm is no balancing at all, as it is obviously the fastest (you didn't say anything about the efficiency of searches). You must understand that the relative speed depends on how you use the tree.
As for red black trees, it's true that they are very popular, but I believe it has a lot to do with, well, lets call it "tradition" (I'm not saying they are bad, just that they are many other algorithms of similar quality).
In algorithms there is nothing like best of worst.. each algo is good for certain scenario.. and can perform poorly in others.. insertion sort can beat quick sort in many scenario's but quick sort is generally said to be faster than insertion sort..
Similarly, each tree is good for certain work. where a B+ tree work , a red black tree will not perform. Algorithms are best utilized when studied deeply, implemented. So you have study these data structures if you want to take advantage of these for some particular task. Blindly using any data structure may give you bad results even if that one is generally said to be best.
@Abramus: The definitions are not useless. They clearly state the requirement. And no balancing at all doesn't fit the definitions + the question. No balancing cannot be a possible answer to "what is the fastest BST balancing algorithm?". Does no balancing balance the tree in some way balaces it so that it fits the question?? And I did not say anything about the efficiency of searches, true. But that doesn't matter either. I am not asking for the fastest search algorithm. I am asking for the fastest balancing one. Why I want it balanced, well, that's up to me. Its ultimate purpose is irrelevant because a balanced tree fulfills this all the time: All subtrees have a difference of at most 1 between the levels of the left and right branches.
Now, lets talk about relative speed. What is relative speed? Relative to what? And how does one use a tree that could incur in differences pertinent to tree balancing? I mean, the tree has a very specific set of rules that must be followed. If you refer to the way data is filled in or similar, that is irrelevant: Given an unbalanced tree: What is the fastest way to balance it? Is that simple a question. I would understand if there is no simple answer to the simple question, but I think overall there should be at least one algorithm that everyone tend to prefer in terms of speed, right?
And I see people mentioning red black and others even when I specifically stated I want nothing to do with those right now. So I am starting to suspect something: Is the choice of algorithm dependent on the type of tree and that's why others keep mentioning them? If so, then let's just concentrate in simple BST's or AVL trees, which are pretty much the same, only that one balances the tree after each insertion/deletion. Other tree types may be much more worth the effort, but I need to lay down my basis in this topic before filling my head with other things.
OK, I think some things are not clear for you. First, there are:
1) balancing algorithms that transform completely unbalanced trees into balanced ones,
2) balancing algorithms that update balanced trees after a single insertion/deletion (these ones are used by self-balancing trees, like red-black, AVL, etc.).
Which ones we are talking about?
Second, "perfect" balance (when max and min height differs at most by one) is rarely achieved, simply because it is too costly in most cases. For example, red-black trees only ensure that max height is at most twice as large as min height. This is why I used an example of no balancing at all – "balanced" is a value in range 0-100% for me, and not a true/false flag. You should specify a requirement that would define how well the tree is supposed to be balanced.
When we’re at it then define what do you mean by "the fastest".
As for the "relative" speed, I simply meant efficiency of a given algorithm compared to other algorithms. This proportion can change depending on many factors.
I see. Yes, I figured there was a misunderstanding here because of people continuing to mention other tree types. Good thing I called it.
We are talking #1: Balancing algorithms that fully balance an unbalanced tree. The difference in levels in the unbalanced tree can be from 2 to tens or hundreds or even thousands. Algorithms that work regardless of the differences in height of the subtrees.
I would like to read more about how difficult and rare is to achieve perfect balancing. Is this documented anywhere? I'll Google around but if you have a link already, it is appreciated.
Hello ne555. I ran a search about in-order traversal as a method to balance a tree, but I found nothing. :-( Plus, I have been thinking about it and I just don't see how traversing the tree in order would help me determine the amount of nodes to be moved, for example. Can you please elaborate?
You have a BST. If you perform an in-order traversal you will have a sorted list of elements.
A sorted array is a balanced BST, with the root in the mid element (and recursively)
I see. So the fastest you describe is to flatten the tree into an array (a vine), and then breaking the vine in two, and continue breaking the sub-vines until there are no more to break. Right?
Another method would be to recursively traverse the tree in post-order, rotating nodes when number of nodes in left and right subtree differ by more than one.
As for the "perfect" balance, this issue is mainly related to self-balancing trees, because such algorithms try to make tree changes as "local" as it is possible (only a tiny part of the tree is visited). If you must visit all nodes anyway, then achieving the ideal balance makes sense. Although even then we could probably find exceptions (like random trees).
std::vector<node*> tree; //pointers to avoid copy
void Tree::balance_tree(node *root){
if(not root) return;
balance_tree(root->left);
tree.push_back(root); //the vector has reserved enough space (no reallocation needed)
balance_tree(root->right);
}
That was what I was thinking, but it is O(n) in time and O(n) in memory.
I don't see the need to "reconstruct" the tree, until you insert/delete a node other than the last.