If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Analysis of merge sort

Given that the merge function runs in Θ(n) time when merging n elements, how do we get to showing that mergeSort runs in Θ(nlog2n) time? We start by thinking about the three parts of divide-and-conquer and how to account for their running times. We assume that we're sorting a total of n elements in the entire array.
  1. The divide step takes constant time, regardless of the subarray size. After all, the divide step just computes the midpoint q of the indices p and r. Recall that in big-Θ notation, we indicate constant time by Θ(1).
  2. The conquer step, where we recursively sort two subarrays of approximately n/2 elements each, takes some amount of time, but we'll account for that time when we consider the subproblems.
  3. The combine step merges a total of n elements, taking Θ(n) time.
If we think about the divide and combine steps together, the Θ(1) running time for the divide step is a low-order term when compared with the Θ(n) running time of the combine step. So let's think of the divide and combine steps together as taking Θ(n) time. To make things more concrete, let's say that the divide and combine steps together take cn time for some constant c.
To keep things reasonably simple, let's assume that if n>1, then n is always even, so that when we need to think about n/2, it's an integer. (Accounting for the case in which n is odd doesn't change the result in terms of big-Θ notation.) So now we can think of the running time of mergeSort on an n-element subarray as being the sum of twice the running time of mergeSort on an (n/2)-element subarray (for the conquer step) plus cn (for the divide and combine steps—really for just the merging).
Now we have to figure out the running time of two recursive calls on n/2 elements. Each of these two recursive calls takes twice of the running time of mergeSort on an (n/4)-element subarray (because we have to halve n/2) plus cn/2 to merge. We have two subproblems of size n/2, and each takes cn/2 time to merge, and so the total time we spend merging for subproblems of size n/2 is 2cn/2=cn.
Let's draw out the merging times in a "tree":
A diagram with a tree on the left and merging times on the right. The tree is labeled "Subproblem size" and the right is labeled "Total merging time for all subproblems of this size." The first level of the tree shows a single node n and corresponding merging time of c times n. The second level of the tree shows two nodes, each of 1/2 n, and a merging time of 2 times c times 1/2 n, the same as c times n.
Computer scientists draw trees upside-down from how actual trees grow. A tree is a graph with no cycles (paths that start and end at the same place). Convention is to call the vertices in a tree its nodes. The root node is on top; here, the root is labeled with the n subarray size for the original n-element array. Below the root are two child nodes, each labeled n/2 to represent the subarray sizes for the two subproblems of size n/2.
Each of the subproblems of size n/2 recursively sorts two subarrays of size (n/2)/2, or n/4. Because there are two subproblems of size n/2, there are four subproblems of size n/4. Each of these four subproblems merges a total of n/4 elements, and so the merging time for each of the four subproblems is cn/4. Summed up over the four subproblems, we see that the total merging time for all subproblems of size n/4 is 4cn/4=cn:
A diagram with a tree on the left and merging times on the right. The tree is labeled "Subproblem size" and the right is labeled "Total merging time for all subproblems of this size." The first level of the tree shows a single node n and corresponding merging time of c times n. The second level of the tree shows two nodes, each of 1/2 n, and a merging time of 2 times c times 1/2 n, the same as c times n. The third level of the tree shows four nodes, each of 1/4 n, and a merging time of 4 times c times 1/4 n, the same as c times n.
What do you think would happen for the subproblems of size n/8? There will be eight of them, and the merging time for each will be cn/8, for a total merging time of 8cn/8=cn:
A diagram with a tree on the left and merging times on the right. The tree is labeled "Subproblem size" and the right is labeled "Total merging time for all subproblems of this size." The first level of the tree shows a single node n and corresponding merging time of c times n. The second level of the tree shows two nodes, each of 1/2 n, and a merging time of 2 times c times 1/2 n, the same as c times n. The third level of the tree shows four nodes, each of 1/4 n, and a merging time of 4 times c times 1/4 n, the same as c times n. The fourth level of the tree shows eight nodes, each of 1/8 n, and a merging time of 8 times c times 1/8 n, the same as c times n.
As the subproblems get smaller, the number of subproblems doubles at each "level" of the recursion, but the merging time halves. The doubling and halving cancel each other out, and so the total merging time is cn at each level of recursion. Eventually, we get down to subproblems of size 1: the base case. We have to spend Θ(1) time to sort subarrays of size 1, because we have to test whether p<r, and this test takes time. How many subarrays of size 1 are there? Since we started with n elements, there must be n of them. Since each base case takes Θ(1) time, let's say that altogether, the base cases take cn time:
A diagram with a tree on the left and merging times on the right. The tree is labeled "Subproblem size" and the right is labeled "Total merging time for all subproblems of this size." The first level of the tree shows a single node n and corresponding merging time of c times n. The second level of the tree shows two nodes, each of 1/2 n, and a merging time of 2 times c times 1/2 n, the same as c times n. The third level of the tree shows four nodes, each of 1/4 n, and a merging time of 4 times c times 1/4 n, the same as c times n. The fourth level of the tree shows eight nodes, each of 1/8 n, and a merging time of 8 times c times 1/8 n, the same as c times n. Underneath that level, dots are shown to indicate the tree continues like that. A final level is shown with n nodes of 1, and a merging time of n times c, the same as c times n.
Now we know how long merging takes for each subproblem size. The total time for mergeSort is the sum of the merging times for all the levels. If there are l levels in the tree, then the total merging time is lcn. So what is l? We start with subproblems of size n and repeatedly halve until we get down to subproblems of size 1. We saw this characteristic when we analyzed binary search, and the answer is l=log2n+1. For example, if n=8, then log2n+1=4, and sure enough, the tree has four levels: n=8,4,2,1. The total time for mergeSort, then, is cn(log2n+1). When we use big-Θ notation to describe this running time, we can discard the low-order term (+1) and the constant coefficient (c), giving us a running time of Θ(nlog2n), as promised.
One other thing about merge sort is worth noting. During merging, it makes a copy of the entire array being sorted, with one half in lowHalf and the other half in highHalf. Because it copies more than a constant number of elements at some time, we say that merge sort does not work in place. By contrast, both selection sort and insertion sort do work in place, since they never make a copy of more than a constant number of array elements at any one time. Computer scientists like to consider whether an algorithm works in place, because there are some systems where space is at a premium, and thus in-place algorithms are preferred.

This content is a collaboration of Dartmouth Computer Science professors Thomas Cormen and Devin Balkcom, plus the Khan Academy computing curriculum team. The content is licensed CC-BY-NC-SA.

Want to join the conversation?

  • leaf green style avatar for user Fabio Pulito
    Can someone please explain or clarify the content of the last paragraph? Working in place, taking space, etc.?
    Many thanks.
    (13 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Thomas Kidder
    What if we didn't divide n by 2 at each step, but instead divided by 3?

    With this algorithm, "We start with subproblems of size n and repeatedly halve until we get down to subproblems of size 1.... the answer is l = lg n + 1.

    Would the answer to l be log base 3 of n instead of log base 2? This would make the runtime: O(n log base 3 n).
    (5 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      O(n log_2 n) and O(n log_3 n) are still just O(n log n ) because they only differ by a constant factor.
      Remember that we can convert log_a( x ) to log_b( x ) by multiplying by 1/( log_a(b) )
      e.g. log_3 (9) = 1/(log_10(3)) * log_10 (9) = 2

      One problem, that you may not be considering, with dividing the array in thirds is the problem you face when you want to merge the parts back together again.

      Before, with only 2 subarrays of size n/2 :
      - to merge them you had, at worst, to perform one comparison (to find the smallest element between the 2 subarrays) per element
      - 1 * n = n comparisons

      But with 3 subarrays of size n/3 :
      - now (supposing we do it in a straight forward manner), we have at worst, 2 comparisons (to find the smallest element between the 3 subarrays) per element
      - 2 * n = 2n comparisons.

      So we reduced our recursion depth by a factor of log(3)/log(2) = 1.585
      But we increased our number of comparisons by a factor of 2
      So overall we increased our running time by a factor of (2 * 1/1.585) or ~ 1.262

      Hope this makes sense
      (26 votes)
  • piceratops tree style avatar for user jakeayala
    The implementation in the challenge includes the following in the function merge:
        for (i = 0; k <= q; i++, k++) {
    lowHalf[i] = array[k];
    }
    for (j = 0; k <= r; j++, k++) {
    highHalf[j] = array[k];
    }

    doesn't that make the efficiency worse? I would expect another n term.
    (7 votes)
    Default Khan Academy avatar avatar for user
    • piceratops seedling style avatar for user evilvision
      I don't think it will make much of a difference. The time complexity of creating these temporary array for merge sort will be O(n lgn). Since, all n elements are copied l (lg n +1) times.
      Which makes the the total complexity: O(n lgn) + O(n lgn) = O(2n lgn).
      And we know that constants doesn't impact our complexity substantially. So time complexity will still be O(n lgn).
      (12 votes)
  • blobby green style avatar for user Łukasz
    Can anyone please explain what constant c is? How to calculate it? Can anyone give where can I read about it or explain it on an example?
    (3 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      Suppose we had a chunk of code which added two numbers.
      e.g. var a = 2 + 3
      We could time how long this chunk of code took to run.
      We could use a stop watch, or we could give the computer instructions to time it for us.
      e.g.
      //find the starting time in milliseconds
      var start = (new Date()).getTime();

      //our chunk of code we are trying to measure
      var a = 2 + 3;

      //find the finish time
      var finish = (new Date()).getTime();
      //this is the # of milliseconds it took the code to run
      var timeElapsed = finish - start;

      We would expect this chunk of code to take about the same time every time we ran it. So we could say that this chunk of code runs in a constant amount of time. We could give that amount of time a label, e.g. we could call it c, or c1, or k. Let's call it c.

      If we ran that chunk of code n times, then the time it would take to complete would be:
      c * n

      The same type of thing is going on above in the article. During the merge, each element in the subarray will require some chunk of code to run that will require a constant amount of time to execute. We just give that constant amount of time the label c. You could measure it if you wanted too, but when we use asymptotic notation, we just ignore those constants.

      Hope this makes sense
      (14 votes)
  • leaf green style avatar for user Rizwan PrettyBoy Ali
    "To make things more concrete, let's say that the divide and combine steps together take cn time for some constant c." What does the author mean by to make things concrete, can't we use just the term n ?
    (2 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      Typically, when someone says they are making things more concrete, they mean that instead of talking in a theoretical sense, they will talk about a specific example.

      Θ(n) is a a set of functions (imagine a giant bag holding a bunch of functions), so, in theory, the discussion could be talking about any of the functions that are in Θ(n). To make the discussion more concrete, they reached their hand into the bag of functions Θ(n), and picked out the function "c*n", to use it as an example. The function "n" is also in Θ(n) so they could have used "n" too, but it usually good practice when analyzing an algorithm to use some constant multiple (like "c") of the most significant term to see what impact it has on the analysis, if any.

      Hope this makes sense
      (8 votes)
  • aqualine sapling style avatar for user kentasuzuki325
    Why is putting c before n (merge part) in the recursion necessary?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      c is just a constant. So cn is just saying that the merge takes some constant amount of time per element being merged. If you just used n, it would be saying that the merge takes exactly 1 unit of time per element being merged.
      (9 votes)
  • winston baby style avatar for user Andrej Benedičič
    Hey, I've got the question: Why doesn't return the sorted array2 if the compiler accepts the code?
    [Edited to only show relevant code]
    var arr2 = [17, 15, 14, 7, 4, 6];
    merge(arr2, 0, Math.floor((0 + arr2.length-1) / 2), arr2.length-1);
    println("Second array after merging: " + arr2);
    Program.assertEqual(arr2, [4, 6, 7, 14, 15, 17]);
    (1 vote)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      [17, 15, 14, 7, 4, 6] is an invalid input to the merge function, because the merge function require the two subarrays that are being merged to be sorted. In the above, neither of the two subarrays [17,15,14] or [7,4,6] are sorted.

      Invalid input = unexpected results.
      (7 votes)
  • blobby green style avatar for user prasainarayan7
    Help me to figure out, what am I doing wrong?
    it says 'Hm do all your assertion pass'?
    the code is:

    while(i<=(q-p)+1 && j<=r-q-1){
    if(lowHalf[i]<highHalf[j]){
    array[k]=lowHalf[i];
    i++;
    }
    else {
    array[k]=highHalf[j];
    j++;
    }
    k++;
    }
    while(i<=(q-p)+1){
    array[k]=lowHalf[i];
    i++;
    k++;

    }
    while(j<=(r-q-1)){
    array[k]=highHalf[j];
    j++;
    k++;
    }

    };
    (2 votes)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      If you get "Hm do all your assertion pass?" that means one of your assertions is failing. Looking at the asserion that failed should help you diagnose the problem. Compare what the assertion expected vs what you actually got. If other assertions pass, then you can try to narrow down what the problem is even more.

      Is one of the elements missing in the actual result ?
      Which half did it come from ?
      What might cause that ?

      Good Luck
      (3 votes)
  • blobby green style avatar for user Junyoung TJ Lee
    It keeps asking if the condition in while loop work if p is not 0.

    [Edited to only show relevant code]
        while (i <= q && j <= r-q-1) {
    ...
    }


    It passes the assertion but cannot proceed to next step in the challenge.
    Any hint?
    (1 vote)
    Default Khan Academy avatar avatar for user
    • male robot hal style avatar for user Cameron
      The merge function is designed to merge two sub arrays: [p..q] and [q+1..r]
      The first while loop should stop when one of these arrays runs out of elements to pick from.

      Ask yourself:
      - How many elements are in a subarray that starts at index p and ends at index q ?
      - How many elements are in a subarray that starts at index q+1 and ends at index r ?.
      (5 votes)
  • blobby green style avatar for user hirmaysandesara
    I wanted to know that if there is a difference between running times and invariants of iterative and recursive merge sort. How to change the Merge sort (iterative or recursive version) in such a way that the best case is the same as in the case of Insertion sort?
    (3 votes)
    Default Khan Academy avatar avatar for user