I have searched online about when will the worst case of quicksort happen, but there are many results like:

1. When the pivot is picked randomly.

2. When the pivot is picked at the middle of array.

3. When the array is sorted and the pivot is the first element (ascending order) or last element (descending order).

4. When the pivot is largest or smallest number in the array (doesn't matter if the array is sorted).

I'm very confused, I don't know when will the worst case happen, can you guys tell me the correct result? Thank you

1. When the pivot is picked randomly.

2. When the pivot is picked at the middle of array.

3. When the array is sorted and the pivot is the first element (ascending order) or last element (descending order).

4. When the pivot is largest or smallest number in the array (doesn't matter if the array is sorted).

I'm very confused, I don't know when will the worst case happen, can you guys tell me the correct result? Thank you

Last edited on

The worst case of quicksort is O(N^2)

The best case is O(N*logN)

BUT depends upon the chosen pivot point for the data to be sorted.

If you don't know anything about the data, then often you'd take the pivot point so that the two sub-arrays are roughly the same size.

From what source are you learning about quicksort? That source should detail the issues around the pivot point.

Are you writing you own quicksort or using a 3rd party? std::sort() uses a mixture of sorts. By default, it uses QuickSort, but if QuickSort is doing unfair partitioning and taking more than N*logN time, it switches to HeapSort. When the array size becomes very small, it switches to InsertionSort.

The best case is O(N*logN)

BUT depends upon the chosen pivot point for the data to be sorted.

If you don't know anything about the data, then often you'd take the pivot point so that the two sub-arrays are roughly the same size.

From what source are you learning about quicksort? That source should detail the issues around the pivot point.

Are you writing you own quicksort or using a 3rd party? std::sort() uses a mixture of sorts. By default, it uses QuickSort, but if QuickSort is doing unfair partitioning and taking more than N*logN time, it switches to HeapSort. When the array size becomes very small, it switches to InsertionSort.

Last edited on

how does quicksort work? what exactly needs to happen to get N^2? Once you can see that, it will make a LOT more sense. There may be an animation online that shows what happened!

I know how it works, time complexity for best case, average case, worst case, the question is which scenario does the worst case occur, because there are many different results I found online (I listed above) and I don't know which one is correct.

Last edited on

Simply put, **quicksort** works by splitting the current list into two sub-lists, *in each recursive step*. One of those two sub-list will contain all the elements that are *greater than or equal* the "pivot" element, and the other sub-list will contain all the elements that are *smaller* than the "pivot" element.

Now, in the "best" case, both sub-lists will be ~1/2 the length of the original list. This means that the list is split into half in each step, which amounts to**O(log2(n))** steps until we are done. In the "worst" case, though, one sub-list contains only a single element whereas the other sub-list contains the rest! This means that the length of the list is reduced by only *one* element in each step, which amounts to **O(n)** steps until we are done!

Note: The above only considers the "depth" of the recursion.*Within* each recursive step, we also need an "inner" loop with a complexity of **O(n)** in order to actually split the lists. So, overall, **quicksort** ends up with a complexity of either **O(n log2(n))** in the "best" case, or **O(n^2)** in the "worst" case.

________

If you know*nothing* about the input data, then the choice of the "pivot" element is arbitrary. Any random "pivot" element is likely to work as good (or as bad) as any other "pivot" element. But if you *do* have information about the input data, you can pick the "pivot" element in such a way that the list will be split into half. A *heuristic* approach often used in practice is to pick the __median__ of the first, middle and last element.

Specifically, if you*do* know that the input list is (almost) __sorted__ already, then you certainly want to pick the middle element as your "pivot" element, **not** the first or last element! It should be obvious why ;-)

Now, in the "best" case, both sub-lists will be ~1/2 the length of the original list. This means that the list is split into half in each step, which amounts to

Note: The above only considers the "depth" of the recursion.

________

If you know

Specifically, if you

Last edited on

I was trying to get him to say that, but ^^^ is exactly right.

what that means is that *any* grab one type pivot strategy "CAN" degenerate into N*N time. (picking the median every time costs extra but will always do well).

case 1 is lottery winner type bad luck, one in a billion type thing for a list of any real size (even 100 elements, you have to pick wrong 100 times, odds are pretty darn low), where the random selection stinks every single time.

case 2 is usually pretty good as well. here, its more about how the data was generated than random bad luck. A hand-crafted set of data could trigger the worst case on purpose, but doing that by accident would be quite the feat, just as difficult as random.

3) is the definition of worst case. because its sorted, picking the extreme ends *ensures* the worst case. However, because its sorted, no swapping is done, just compares, so its still going to do the extra work quickly: you may not realize its being stupid unless your N is very large.

4) also the definition of worst case, as described by Kigar above. Worse, it will swap items if not sorted. You now have an extra slow bubblesort.

so, basically, there are more than one scenario where worst case can happen. As I said, any scheme that grabs an element as the pivot without looking at all the elements to make the choice can and occasionally will fall into worst case or near worst case run times. For most of those, the odds of it happening are so low as to never happen in real life. In practical applications, the more concerning issue is picking the worst value in one of the first K iterations, where K is say 1 to 10. After that, the lists are so small even for pretty large N that picking wrong has minimal impact, but esp for iterations 1 and 2, picking the worst choice is a big hit. Do you see why?

what that means is that *any* grab one type pivot strategy "CAN" degenerate into N*N time. (picking the median every time costs extra but will always do well).

case 1 is lottery winner type bad luck, one in a billion type thing for a list of any real size (even 100 elements, you have to pick wrong 100 times, odds are pretty darn low), where the random selection stinks every single time.

case 2 is usually pretty good as well. here, its more about how the data was generated than random bad luck. A hand-crafted set of data could trigger the worst case on purpose, but doing that by accident would be quite the feat, just as difficult as random.

3) is the definition of worst case. because its sorted, picking the extreme ends *ensures* the worst case. However, because its sorted, no swapping is done, just compares, so its still going to do the extra work quickly: you may not realize its being stupid unless your N is very large.

4) also the definition of worst case, as described by Kigar above. Worse, it will swap items if not sorted. You now have an extra slow bubblesort.

so, basically, there are more than one scenario where worst case can happen. As I said, any scheme that grabs an element as the pivot without looking at all the elements to make the choice can and occasionally will fall into worst case or near worst case run times. For most of those, the odds of it happening are so low as to never happen in real life. In practical applications, the more concerning issue is picking the worst value in one of the first K iterations, where K is say 1 to 10. After that, the lists are so small even for pretty large N that picking wrong has minimal impact, but esp for iterations 1 and 2, picking the worst choice is a big hit. Do you see why?

Last edited on

Topic archived. No new replies allowed.