Logarithms

Hi guys,

so I'm studying time complexities of algorithms, I'm having some confusion with logarithms.

I understand what a logarithm is, let's say 2^3 = 8, so log 2(8) = 2^x = 8 which equals 3, so log 2(8) = 3,

or another way to think of it we have 8 how many times can we divide 8 by half until we reach 1, 8 -> 4 -> 2 - > 1 = 3 times

but using logarithms with algorithms kind of confuses me.

we have a binary tree with 7 items

1
2
3
4
5
6
7

    X
   / \
  X   X
 /\  / \
X  X X  X


so log(7) gives us 2, but this is wrong as we need log(n+1) which would be 8, so why we do we the number of nodes+1 to get the number of levels?

secondly why isn't it log(4)? for example if we had a function to find out how many levels of work a recursive algorithm would do on an array, let's say we have an array [1,2,3,4,5,6,7,8]

we would do log(8) to find the levels of work right?? ( why is it different with the binary tree? why do we use the number of nodes+1 and with the array algorithm we use the number of elements?

let me try draw an example it may explain it a little better

1
2
3
4
5
6
7
8
9

[1,2,3,4,5,6,7,8]
    /         \
[1,2,3,4]   [5,6,7,8]
  /    \      \    /
[1,2] [3,4] [5,6] [7,8]
 / \   / \    / \   / \
[1][2][3][4] [5][6][7][8]


if we count an element as each [] then we have 15 [] , 1 in the first,2 in the second,4 in the third, 8 in the last, so log(15+1) = log(16) = 4

this gives us 4 levels of work, but if we do log(8) we get 3 levels, so which one is correct? and how come we use log(n+1) in a binary tree and log(num of elements in array) in finding levels of work? and lastly why log(n+1) but not log(num of elements in array+1)?

* log base 2 is implied in this question

thanks

There's a lot of confusion here.

so log(7) gives us 2, but this is wrong as we need log(n+1) which would be 8, so why we do we the number of nodes+1 to get the number of levels?
The point of complexity analysis is to give a general trend for the amount of work needed as the size of the input increases. Most n-ary tree algorithms exhibit logarithmic complexity, which means that the amount of work grows proportionally to logbase(node_count), where base is any positive real.

if we count an element as each [] then we have 15 [] , 1 in the first,2 in the second,4 in the third, 8 in the last, so log(15+1) = log(16) = 4
This is completely meaningless to me.
It's not enough to say that you're doing a "recursive algorithm" on an array to know what the complexity is. Exactly what algorithm is being executed on the array? Binary search doesn't take the same amount of time as quick sort.

this gives us 4 levels of work, but if we do log(8) we get 3 levels, so which one is correct?
It depends on what the algorithm does. Still, in the long run whether the root of the tree structure is counted or not makes little difference. If you have a tree with a million levels, one more level doesn't change much. Complexity analysis measures the long-term behavior of the algorithm (more specifically, as the input grows to infinity).
a logarithm is an exponent, or specifically, one that you had to solve for. log2 is lg, log e is ln, and log is log10. there may be some other shorthands. Be careful here, we know you mean log 2 when you say log, but log is base 10, lg is base 2, and when speaking precisely you will confuse yourself and others if you mix them up.

so log(7) gives us 2, but this is wrong as we need log(n+1) which would be 8, so why we do we the number of nodes+1 to get the number of levels?

because zero :) its an exponent. 2 to the 0 is one, but you don't have zero levels, you have 1 level...

secondly why isn't it log(4)?
The same reason it isnt log(23). What is the 'it' you are trying to resolve? Its math... you want to know how many levels are in a tree, its the log of the number of nodes, if the tree is balanced. And this is because each level has twice the number of nodes as the previous.. its just math and exponents. the first level has 1, then 2,4,8 ... 2 to the 0, 1, 2, 3... that relationship is the 'why'.

we would do log(8) to find the levels of work right?? ( why is it different with the binary tree? why do we use the number of nodes+1 and with the array algorithm we use the number of elements?

are you asking about big-O type notations? Lets take a binary search, which is what a tree really represents, is a data structure that has innate binary search encapsulated in how it stores the data. you have 16 things. you want to find if x is in there. its sorted. you cut it in half, and know that the item if it exists is in one of the 2 halves. now you have 8 things, then 4, then 3 then 2 then 1 and its there or not. Look familiar? So that is why we say binary search is lg(n) where n is the number of items. remember that big-o is not precise, its a generalization of the number, so sometimes it was lg(n)+2 and we just say lg(n) etc. You wouldnt say +1 in that notation because the notation says to ignore constant offsets and such.

this gives us 4 levels of work, but if we do log(8) we get 3 levels, so which one is correct?
you are over thinking it. humans count from 1, not zero. the depth of a tree if counted from 1 needs that +1 to convert from zero offset to 1 offset, is all. no matter how you *say* it, there are 4 levels in that tree. {0,1,2,3} is 4 things, right? If you do a loop for each of those, you looped 4 times, not 3 times, just as for(x = 0; x <4; x++) loops 4 times.

Does this help? If not, can you retry asking it another way? At the end of the day, there are 2 concepts to take home.
a log is an exponent. nothing more.
and there is a relationship to how many times you do work to how many items you have. When that is a logarithmic relationship, the work done increases slowly, as the number of items increases by a multiple. so to go from 3 to 4 work done in a problem that is log 2, you need to go from 8 items to 16 items, doubled, to get there. This is incredibly efficient, you can do 1 million items in only 20 attempts.

Last edited on
Also, lg(7) is not 2. It's 2.807.... So perhaps the answer you want is ceil(lg(7)) = 3.
because zero :) its an exponent. 2 to the 0 is one, but you don't have zero levels, you have 1 level...


ahh yes, that does make sense so it should be 1 + log(n) to account for the first(0) level?

I really think I am overfly complicating something that doesn't even need to be complicated lol.

This is completely meaningless to me.
It's not enough to say that you're doing a "recursive algorithm" on an array to know what the complexity is. Exactly what algorithm is being executed on the array? Binary search doesn't take the same amount of time as quick sort.

very true, lets say just a divide and conquer algorithm that will print each element of the array.

Also, lg(7) is not 2. It's 2.807.... So perhaps the answer you want is ceil(lg(7)) = 3.

yes



my example was probably bad, what I'm trying to say is, with the binary tree(assuming the tree has 15 nodes) , we could do lg(16) and we would get 4, or we could do lg(8)+1 (+1 counting for the first level ) and we would also get 4, so are both ways acceptable?

the reason I ask this is because going back to the array of elements, we know how many elements are in the array but it isn't as obvious as to how many [] ( I'm guessing each [] is a function call ) are in the array. So we wouldn't know how many individual function calls there is, but since there is 15 and lets say we know this we could do log(15+1) and we would still be right?

1
2
3
4
5
6
7
8
9
10

[1,2,3,4,5,6,7,8]
    /                 \
[1,2,3,4]         [5,6,7,8]
  /      \             \         /
[1,2] [3,4]    [5,6]    [7,8]
 /   \     / \     /   \      /   \
[1][2][3][4] [5][6]   [7][8]



(diagram gets formatted wrong please see diagram from first post of the recursive tree)
code from above diagram.
1
2
3
4
5
6
7
8
9
10
11
12
13
14

void print(int* arr,int start,int end){

    if(start == end){
        
        cout << arr[start] << endl;
        return;
    }
    int halfway = (start + end) / 2;

    print(arr,start,halfway);
    print(arr,halfway+1,end);
}

thanks
Last edited on
very true, lets say just a divide and conquer algorithm that will print each element of the array.
That would take O(n), not O(log n).
my example was probably bad, what I'm trying to say is, with the binary tree(assuming the tree has 15 nodes) , we could do lg(16) and we would get 4, or we could do lg(8)+1 (+1 counting for the first level ) and we would also get 4, so are both ways acceptable?

again, its math. The numbers are not pulled out your backside, they have to represent something. if you have 16 things, and a log base 2 algorithm, its 16. If you have 8 items, its lg (8). If its the height of a tree, its lg(number items in tree). If it has 15 nodes, and its the height of the tree or a binary search, you need 4 levels (1+2+4+8 is 15). There is no 8 in a 15 item tree looking for its height, you can't just make up a number even if it happens to give the right answer. For every tree height, there is a range of item counts that it could be, take the next (5th) level... it starts at 16, and continues to 31. The 32nd item starts level 6, so that range all gives the same answer, but if the tree has 16 items you can't just pick 29.873 and take the log of it, even though its the 'right' answer. I mean if we can pick numbers, lg(2*3+7) gives the right answer too for 15, but then you have to explain what 2, 3, and 7 are... and theres no answer to that.
Last edited on
very true, that makes sense but

going back to the divide and conquer algorithm that keeps splitting elements until it reaches and returns one item ( base case ), we use log(8), and we get 3 levels, but for a binary search tree we use log(number of nodes) that kind of confuses me, must be something simple I'm overthinking.
The answer is in getting what that 8 represents. If you know what it means, it should make sense, when looking at a specific algorithm, what is going on.

I can tell you why.
Why 1 is that you can't see the forest for the trees. The question in algorithm analysis is simply 'how much work does it do in the best, average, and worst cases? This is usually fairly easy to do, there are 5 or 6 common patterns that 99% of code falls into ... O(1), (n), (n*n or higher exponents), log(n), log(n)*n, and the intractable stuff like N! or 2^N. A few weirdo and contrived problems for exams are different or tougher, eg shellsort's n ^ (x+1)/x but only a grad student would be asked to derive a shellsort big-O.

Why 2 is that a tree stores the data in a way that it is already cut in half as you traverse it. Every time you follow a pointer you drop 1/2 the data, in a balanced binary search tree. An array does not do that for you, you have to split it up yourself, so the algorithms look very different. You can use total number of nodes and log for both. But if you know the tree's height its a shortcut to that value. In truth, both should use the # of items, but you can be sloppy with that.

... what is the work being done, and what values affect it? Usually, that is just the size of the input. Usually analysis is just 'as the input increases, what happens'. Here that is the number of items in the tree/array/whatever.
Last edited on
that makes perfect sense, so in reality they are two different situations, if we want to divide an array down until we get to one element until we get each element, we just need to know the number of elements so an array with 8 elements will be

8 -> 4 -> 2 -> 1

so this is just telling us how many times we can divide it up but in turn will also just happen to tell us how many levels of work there is?

so it was split 3 times,

where as with a binary tree we just want to know what the worse case path or depth of the tree is, this in turn will give us the number of levels

?

Thanks Jonnin
Last edited on
correct. The only thing to keep in mind is that the properties of a balanced binary search tree are kind of a special snowflake and do not all carry over to all other types of trees. An array is an array, but trees come in several flavors to solve different problems.
very true thanks Jonnin :)
Topic archived. No new replies allowed.