What I've done is about as simple as it gets -- What's left is some basics you are missing.
functions
A function takes some number and produces another. It can be defined any way you like. How about a common function, like sine?
sine(0) → 0
sine(π) → 0.0548...
sine(2π) → 0.109...
etc.
When you plot that on a graph, you get the famous wavy line:
https://www.google.com/search?q=sine&s=&search=pi&tbm=isch&gws_rd=ssl
Here are some other very useful functions:
f(x) = x (linear)
f(x) = log x (logarithmic)
f(x) = x² (quadratic)
Using a function g(x) to measure performance
The way that a function
performs can be graphed. However, the
exact performance depends on a lot of things, and it is not practical to be that specific. And, typically, we are interested in performance based on
how many things the function is operating on.
For example, a sorting algorithm like bubble sort is pretty quick with only ten or so things to sort. But once you get 500, or 10000, then it is too slow to be useful. When we are comparing sorting algorithms, we are interested in how it performs based on how many items it has to sort. That is, our function g(x), which is our magic function that describes how bubble sort works with N items, tells us how much time/memory/whatever-you-are-measuring it takes to use bubble sort on N items.
Bubble sort is a quadratic (n²) function in the worst case. Given 10 items, it will take 100 'operations' to sort. Given 500 items, it will take 250000 'operations'. It gets worse as n² gets worse.
O/Θ/Ω Notation
Now, a well-written bubble sort performs best if the array is already sorted (because it knows that it should quit after the first pass). It performs worst if the array is the exact opposite of sorted (sorted in reverse order). So for some inputs the bubble sort performs better (faster/less memory/etc) and for some inputs the bubble sort performs worse -- even if N has not changed.
What this means is that g(x) can vary pretty widely for a single N.
In fact, the g(x) function can be so complicated that it isn't worth writing it out. (Or even understanding it!)
Part of the problem is that the g(x) may be
different depending on things the programmer cannot control easily: hardware, compiler, OS, number of neutrinos passing through the chip, etc. It is not useful to talk about g() directly.
But we
can tell things about the
best and the
worst behaviors of g(). That's where the O(n) comes in:
O(n) tells us that wherever the fuzzy value we get from g(n) is, it is always
under O(n) on the graph.
O(n)
│ /
│ / ░░░░ g(n)
│ / ░░░
│ / ░░░
│ / ░░
│/ ░
┼───────────────── |
(Sorry I don't have better pictures.)
The Big-O says that the function given in parentheses [O(n) → f(n)=n → linear] is very close to the largest value you'll get from g(n) for that specific n.
A Little-o means that the g() is somewhere way below f(n)=n:
o(n)
│ /
│ / ░░░░ g(n)
│ / ░░░
│ / ░░░
│ / ░░
│/ ░░
┼───────────────── |
Theta would put a line right through the middle of that fuzziness from g(x).
Omega would put a line underneath all the fuzziness from g(x).
Hope this helps.