Main content
Computer science
Overview of merge sort
Because we're using divide-and-conquer to sort, we need to decide what our subproblems are going to look like. The full problem is to sort an entire array. Let's say that a subproblem is to sort a subarray. In particular, we'll think of a subproblem as sorting the subarray starting at index p and going through index r. It will be convenient to have a notation for a subarray, so let's say that
array[p..r]
denotes this subarray of array
. Note that this "two-dot" notation is not legal JavaScript; we're using it just to describe the algorithm, rather than a particular implementation of the algorithm in code. In terms of our notation, for an array of n elements, we can say that the original problem is to sort array[0..n-1]
.Here's how merge sort uses divide-and-conquer:
- Divide by finding the number q of the position midway between p and r. Do this step the same way we found the midpoint in binary search: add p and r, divide by 2, and round down.
- Conquer by recursively sorting the subarrays in each of the two subproblems created by the divide step. That is, recursively sort the subarray
array[p..q]
and recursively sort the subarrayarray[q+1..r]
. - Combine by merging the two sorted subarrays back into the single sorted subarray
array[p..r]
.
We need a base case. The base case is a subarray containing fewer than two elements, that is, when p, is greater than or equal to, r, since a subarray with no elements or just one element is already sorted. So we'll divide-conquer-combine only when p, is less than, r.
Let's see an example. Let's start with
array
holding [14, 7, 3, 12, 9, 11, 6, 2], so that the first subarray is actually the full array, array[0..7]
(p, equals, 0 and r, equals, 7). This subarray has at least two elements, and so it's not a base case.- In the divide step, we compute q, equals, 3.
- The conquer step has us sort the two subarrays
array[0..3]
, which contains [14, 7, 3, 12], andarray[4..7]
, which contains [9, 11, 6, 2]. When we come back from the conquer step, each of the two subarrays is sorted:array[0..3]
contains [3, 7, 12, 14] andarray[4..7]
contains [2, 6, 9, 11], so that the full array is [3, 7, 12, 14, 2, 6, 9, 11]. - Finally, the combine step merges the two sorted subarrays in the first half and the second half, producing the final sorted array [2, 3, 6, 7, 9, 11, 12, 14].
How did the subarray
array[0..3]
become sorted? The same way. It has more than two elements, and so it's not a base case. With p, equals, 0 and r, equals, 3, compute q, equals, 1, recursively sort array[0..1]
([14, 7]) and array[2..3]
([3, 12]), resulting in array[0..3]
containing [7, 14, 3, 12], and merge the first half with the second half, producing [3, 7, 12, 14].How did the subarray
array[0..1]
become sorted? With p, equals, 0 and r, equals, 1, compute q, equals, 0, recursively sort array[0..0]
([14]) and array[1..1]
([7]), resulting in array[0..1]
still containing [14, 7], and merge the first half with the second half, producing [7, 14].The subarrays
array[0..0]
and array[1..1]
are base cases, since each contains fewer than two elements. Here is how the entire merge sort algorithm unfolds:Most of the steps in merge sort are simple. You can check for the base case easily. Finding the midpoint q in the divide step is also really easy. You have to make two recursive calls in the conquer step. It's the combine step, where you have to merge two sorted subarrays, where the real work happens.
In the next challenge, you'll focus on implementing the overall merge sort algorithm, to make sure you understand how to divide and conquer recursively. After you've done that, we'll dive deeper into how to merge two sorted subarrays efficiently and you'll implement that in the later challenge.
This content is a collaboration of Dartmouth Computer Science professors Thomas Cormen and Devin Balkcom, plus the Khan Academy computing curriculum team. The content is licensed CC-BY-NC-SA.
Want to join the conversation?
- What about...
array.prototype.sort()
? Isn't that waaay easier? And numerical sort is just...array.prototype.sort(function(a,b) {return a-b}
.(0 votes)- Someone had to program how the sort() function works.(33 votes)
- Based on pseudocode
MergeSort(array A, int p, int r) {
if (p < r) { // we have at least 2 items
q = (p + r)/2
MergeSort(A, p, q) // sort A[p..q]
MergeSort(A, q+1, r) // sort A[q+1..r]
Merge(A, p, q, r) // merge everything together
}
}
I dont understand one thing, after the mergesort left right are done, the merge will be run. It is supposed to end there, but it it said that the control run back to the mergesort right to keep going on. Why is that happen while I dont see any code to do that after the merge? Please explain to me clearly, I am not so smart :((4 votes)- When you use recursion, there may be several copies of a function, all at different stages in their execution. When one function returns the function that called it continues to execute.
So, if the currently executing copy of the MergeSort function was called by another copy of the MergeSort function to sort the calling functions left subarray, then after the currently executing version finishes, the version that called it will then move on to its next step, which is to call MergeSort to sort its right subarray.
It is probably best illustrated with an example.
In the below example:
- For MergeSort on arrays with <2 elements the function is: Do Nothing
- For MergeSort on array with >= 2 elements the function is:MergSort(left subarray)
MergeSort(right subarray)
Merge Above Together
- I'll denote each step that each version of the function is executing with a >.
Suppose we call MergeSort on [4,3,2]
The 1st copy of the function will be made which looks like this:> MergeSort([4,3])
MergeSort([2])
Merge Above Together
The call to MergeSort([4,3]) will then generate a 2nd copy of the function
(shown below the 1st copy)> MergeSort([4,3])
MergeSort([2])
Merge Above Together
> MergeSort([4])
MergeSort([3])
Merge Above Together
The call to MergeSort([4]) will then generate a 3rd copy of the function
(shown below the 2nd copy)> MergeSort([4,3])
MergeSort([2])
Merge Above Together
> MergeSort([4])
MergeSort([3])
Merge Above Together
> Do Nothing
The Do Nothing step will finish. The 3rd copy of the function will return (and vanish).
The 2nd copy of the function will move on to the next line.> MergeSort([4,3])
MergeSort([2])
Merge Above Together
MergeSort([4])
> MergeSort([3])
Merge Above Together
The call to MergeSort([3]) will then generate a 3rd copy of the function
(shown below the 2nd copy)> MergeSort([4,3])
MergeSort([2])
Merge Above Together
MergeSort([4])
> MergeSort([3])
Merge Above Together
> Do Nothing
The Do Nothing step will finish. The 3rd copy of the function will return (and vanish).
The 2nd copy of the function will move on to the next line.> MergeSort([4,3])
MergeSort([2])
Merge Above Together
MergeSort([4])
MergeSort([3])
> Merge Above Together
The Merge Above Together step will finish. The 2nd copy of the function will return (and vanish).
The 1st copy of the function will move on to the next line.MergeSort([4,3])
> MergeSort([2])
Merge Above Together
The call to MergeSort([2]) will then generate a 2nd copy of the function
(shown below the 1st copy)MergeSort([4,3])
> MergeSort([2])
Merge Above Together
> Do Nothing
The Do Nothing step will finish. The 2nd copy of the function will return (and vanish).
The 1st copy of the function will move on to the next line.MergeSort([4,3])
MergeSort([2])
> Merge Above Together
The Merge Above Together step will finish. The 1st copy of the function will return (and vanish).
The array should now be sorted.
Hope this makes sense(18 votes)
- I spent hours trying to figure out the challenge while I kept getting overflow issues. Hours later I found out that the above tutorial does not properly state the "Divide" portion.
Incorrect: "Divide by finding the number qq of the position midway between pp and rr. Do this step the same way we found the midpoint in binary search: add pp and rr, divide by 2, and round down."
I found the correct way to perform the "Divide" step by researching the code online. I came across a post with the code written in C, and the way to find "q" is completely different. Pay attention to the top portion of the code where it talks about the "mid" variable.
http://stackoverflow.com/questions/12030683/implementing-merge-sort-in-c#answer-12030723
I hope this post saves someone from the same headaches as I had to endure.(4 votes)- It's unfortunate that you had problems with the challenge, but the technique describe in the article is not incorrect. In fact, it is a fairly standard technique. I just checked it and it works for me. I suspect you made an error when you tried to implement the technique described.
'mid = low + floor((high-low)/2)' works just fine too (they are mathematically equivalent), but the technique described in the article tends to be more intuitive for most people.
As far as overflow issues go, you would exceed the largest possible array size (4294967295) long, long before the calculation described in the article would lose precision or have a numeric overflow (the largest int you can represent in javaScript is 9007199254740991 ). However, an incorrectly implemented midpoint calculation might cause the function to infinitely recurse and result in a stack overflow...
If you would like, you could post a copy of your code, that was causing overflows, as a comment to this answer, and I could have a look at it for you.(7 votes)
- Hi,
My program runs fine and the sorted array looks good. However I am getting an instruction in the beginning with respect to the code for calculating the midpoint.
var q = floor((p + r - 1) / 2);
Any idea what is the issue here?(2 votes)- p is the index of the 1st element of the subarray.
r is the index of the last element of the subarray
Suppose p=2 and r=4
q= floor( (p+r-1)/2)
q= floor((2+4-1)/2)
q= floor( 5/2)
q= floor(2.5)
q= 2
The above calculation gives a result of 2, but the midpoint of 2 and 4 should be 3 not 2.(9 votes)
- I don't understand why you need all the divide steps. Can't you just start by merging the individual members of the array in pairs - i.e. just go directly to the first merge step? What value is there in doing all the iterative divide computations?(4 votes)
- Because you're not starting with "individual members", you're starting with an array, and you need to break that array into it's individual members. That "divide" step might seem trivial to most humans, but it's an important detail to the "divide"-and-conquer logic. And a very important detail to remember to write, for your code to run properly! As the lesson says, the "real" work is mostly done in the merge step.
The later lesson "analysis of merge sort" goes deep into the details, but the short answer is that merge sort ends up being faster.(1 vote)
- I used the correct code but the thing says "Maximum call stack exceeded."(2 votes)
- There is unbounded recursion in your code somewhere. Check to make sure the recursion terminates.(5 votes)
- I'm confused as to how the merge step sorts anything. Doesn't it need a rule to know how to sort the numbers (the rule being sorting them in ascending order)?(3 votes)
- The merge step takes two sorted subarrays and produces one big sorted subarray with all those elements. It just repeatedly looks at the front of the two subarrays and takes the smallest element, until it runs out of elements. It only works because the two subarrays were already sorted.
In the example above (last merge) we have:
[3,7,12,14] and [2,6,9,11] being merged.
So, let's repeatedly take the smallest element from the front of the two arrays and remove them.
1st array, 2nd array, result
[3,7,12,14], [2,6,9,11], []
[3,7,12,14], [6,9,11], [2]
[7,12,14], [6,9,11], [2,3]
[7,12,14], [9,11], [2,3,6]
[12,14], [9,11], [2,3,6,7]
[12,14], [11], [2,3,6,7,9]
[12,14], [], [2,3,6,7,9,11]
[14], [], [2,3,6,7,9,11]
[], [], [2,3,6,7,9,11,14]
That's how it sorts(3 votes)
- The example given shows subarrays of
array[0..0]
andarray[1..1]
as the base case. When is there the case thatp > r
instead ofp === r
?(3 votes) - quoting "The base case is a subarray containing fewer than two elements, that is, when p >= r"
How is "p >= r" possible? How is it possible for p (the start index) to be larger than r (the end index)?(2 votes)- p=r happens when a subarray has only one element
p > r happens when a subarray has no elements (if we have array[0..n-1], and n=0, then p=0,r=0-1=-1 and 0>-1)
in both case you want to stop trying to sort them(2 votes)
- I still confused how "merge the first half with the second half" works?(2 votes)