All You Need to Know About Sorting Algorithms
In this article, we'll be comparing famous sorting algorithms and in the end, you can find a code-wise implementation of the discussed algorithms.
Foreword
Sorting algorithms are some procedural methods that we use to rearrange the elements in an ordered list of items based on a criterion. A developer must study this subject extensively and regularly because sorting algorithms are:
Always present in any data-driven projects (practically speaking, any worthwhile subject)
A common source of performance degradation, if not implemented well
Very effective at optimizing the performance of a logical operation, if chosen and implemented well enough
Easy to reimplement once enough expertise is achieved
Infamous for being presented as hard interview questions
Overview
There are many known sorting algorithms but we'll be only covering some of the most well-known and practical ones. Here is a comparative list:
Algorithm Name | Best Case | Average Case | Worst Case | Memory | Stable |
Quick Sort | nlog(n) | nlog(n) | n | log(n) | No |
Merge Sort | nlog(n) | nlog(n) | nlog(n) | n | Yes |
Heap Sort | nlog(n) | nlog(n) | nlog(n) | 1 | No |
Tree Sort | nlog(n) | nlog(n) | nlog(n) | n | Yes |
Selection Sort | n² | n² | n² | 1 | No |
Bubble Sort | n | n² | n² | 1 | Yes |
Tim Sort | n | nlog(n) | nlog(n) | n | Yes |
(Source: GeeksForGeeks)
Before explaining each one, let's review two basic concepts presented in the table:
Each case is the time complexity of the sorting algorithm, measured using
Big-O notation for the worst-case scenario
Theta Notation for an average scenario
Omega Notation for the best-case scenario
A sorting algorithm is said to be stable if it retains the relative order between two elements that are deemed to be equal according to the sorting criteria. Unstable sorting algorithms do not care and might change their places with one another.
Introduction
Quick Sort
The first step in a quick sort algorithm is choosing a pivot element, and there are many ways to go about it, depending on how one chooses the pivot element. For example, the pivot element could be:
The last element
The first element
The median
Random element
Regardless of how to choose the pivot element, the next step is to divide the array into elements higher than the pivot and elements lower than the pivot and to repeat this process until all elements are examined.
Merge Sort
In merge sort, one will divide the subject array into two subarrays and sort each subarray in a recursive manner, then finally put together all the parts in a sorted manner by merging the smaller arrays.
Tip
Note that the merge function performs well when number of subarrays is a power of 2.
(Source)
Heap Sort
For Heap sort, you first have to form a heap from your array, whether it's a maximum or a minimum, doesn't matter.
Short Background
Heap is a basic data structure (like an array or int) and they represent a complete binary tree, but they come strictly in two forms:
Minimum Heap: Each parent node is less than both of its children, and the top node is the minimum among all the values
Maximum Heap: Each parent node is more than both of its children, and the top node is the maximum among all the values.
What's the difference with an array?
That's a good question, and the answer is Time Complexity. In certain use cases, it's much faster to use a heap than to use an array. Here's a short comparison between the two:
Operation | T-Complexity for Array | T-Complexity for Heap |
Find Min/Max | O(n) | O(1) |
Add/Remove Element | O(1) ~ O(n)* | O(log(n)) |
Remove Min/Max | O(n) | O(log(n)) |
* If the necessary space is already allocated for the new element, then it's O(1), if not, a new array should be created which would be O(n).
Sorting Process
Create a maximum heap from the array
Pop the max element
Reevaluate Heap
Repeat (2-3) until the heap is empty
For example, the Mergesort algorithm is exceedingly fast but requires a lot of space to do the operations. On the other side, Bubble Sort is exceedingly slow but requires the minimum space.
Tree Sort
The sorting process involves creating a Binary Search Tree.
Short Background
A tree data structure is called binary when each node has at most 2 children. And for that to be a binary search tree it has to have 3 more properties:
The left child of a node must have a value lesser than the value of the node.
The right child of a node must have a value greater than the value of the node.
Each left and right subtree must be binary search trees themselves.
Sorting Process
Create a binary search tree from the array.
Perform an in-order traversal on the tree.
Tip
If we know the median and set it as the root of BST, we reduce the search space by half in each step.
Selection Sort
In this sorting algorithm, one iteratively searches for the minimum (or maximum) value and moves it to the correct position in the array by swapping elements.
Bubble Sort
This is the simplest and most widely known sorting algorithm. For this technique, one will iteratively swap the position of two adjacent elements that are in the wrong order.
Tim Sort
This is an efficient and very popular sorting algorithm that is also picked up by some programming languages as the native sorting method like the sorted
method in Python. The algorithm implements two sorting methods:
Insertion Sort
Merge Sort
The algorithms steps are as follows:
Divide the unsorted array into subarrays (called
run
)Because merging is most efficient when the number of runs is equal to, or slightly less than, a power of two, and notably less efficient when the number of runs is slightly more than a power of two, Timsort chooses minimum run size to try to ensure the former condition
(Source)
The minimum run size is chosen as a number between 32 and 64.
Use Insertion Sort to sort runs.
Use Merge Sort to merge the sorted arrays.
Code Implementations
You can refer to this repository for the coding implementation of the formerly described sorting algorithms*
* Note: With the exception of Tim Sort. Because the native sorted
method in Python already supports that and I didn't implement it for ruby just yet.