Shell Sort

Shell Sort is the first sorting algorithm we'll look at that requires fewer than O(N2) comparisons and exchanges, on average. Although it is easy to develop an intuitive sense of how this algorithm works, it is very difficult to analyze its execution time.

Shell Sort is really just an extension of Insertion Sort, with two observations in mind:

• Insertion Sort is efficient if the input is "almost sorted".
• Insertion Sort is inefficient, on average, because it moves values just one position at a time.

Shell Sort is similar to Insertion Sort, but it works by taking "bigger steps" as it rearranges values, gradually decreasing the step size down towards one. In the end, Shell Sort performs a regular Insertion Sort but by then, the array of data is guaranteed to be "almost sorted".

Consider a small value that is initially stored in the wrong end of the array. Using Insertion Sort, it will take roughly N comparisons and exchanges to move this value all the way to the other end of the array. With Shell Sort, we'll first move values using giant step sizes, so a small value will move a long way towards it's final position, with just a few comparisons and exchanges.

It's easiest to understand how Shell Sort works by looking at a specific example. The table below shows what happens when we apply a version of Insertion Sort that has been modified to use a step size of 4. (An asterisk * indicates a value that has been exchanged.)

 step size = 4 index input i=5 i=6 i=7 i=8 i=9 i=10 output 10 20 *26 26 9 28 *29 29 8 21 *22 22 7 25 25 25 6 23 *26 *20 26 5 27 *29 *28 28 4 22 *21 21 3 24 24 24 2 26 *23 *20 23 1 29 *27 27 27

As you can see, with a relatively small number of comparisons and exchanges, smaller values (such as 20, 21) and larger values (such as 29) have been moved a lot closer to their final sorted position.

Now that the array is a lot closer to being "almost sorted", we can apply the usual Insertion Sort algorithm, with a step size of 1. In general, we expect this final step to be quite efficient, although that may not be apparent in this small example.

 step size = 1 index input i=2 i=3 i=4 i=5 i=6 i=7 i=8 i=9 i=10 output 10 26 *29 29 9 29 29 *28 28 8 22 *28 28 *27 27 7 25 *28 *27 *26 26 6 23 *28 *27 *25 25 25 5 28 28 *27 *25 *24 24 4 21 *27 27 *24 24 *23 23 3 24 *27 *24 *23 *22 22 2 20 *27 *24 *21 21 21 21 1 27 *20 20 20 20

In general case, when the number of values to be sorted can be very large, Shell Sort uses a sequence of diminishing step sizes. One popular sequence is: ..., 1093, 364, 121, 40, 13, 4, 1, and this tends to work well in practice.

A summary of the Shell Sort algorithm is given below. (A complete implementation is given on the Pascal page.)

```procedure shellsort;
var
step        : integer;
begin
step := 1;
repeat
step := 3*step+1
until step > n;

repeat
step := step div 3;

{ do insertion sort, ... }
{ ... with specified step size }

until step = 1;
end;
```

The diagram below may help you develop an intuitive sense of how Shell Sort actually works. The vertical axis is the position within the array and the horizontal axis is time. Values in the array are represented by small squares with differing brightness. The goal of the algorithm is to sort the values from darkest to lightest. The input array is shown as the vertical column on the left, with an initially random assortment of brightness values. In this example, there are 64 values, and the step size sequence is: 40, 13, 4, 1.

[shell sort]
shellsort

In the diagram above, a new "snapshot" of the partially sorted array was output every time another N comparisons were completed, so the work done by Shell Sort is roughly proportional to the area. Since the diagram is a tall, thin rectangle, you can immediately see that significantly fewer than N2 comparisons were needed for this example.

In general, Shell Sort is very difficult to analyze. In fact, no one has been able to figure out a precise description of the execution time of Shell Sort. For the implementation given above, it has been conjectured that when N is large, the execution time is rougly proportional to N log2 N or N1.25, but no one has been able to prove it.