[Data Structure] Sort (3)

sort

We have already learned about [insertion sort] , selection sort and exchange sort in sorting. This article will introduce the counting sorting in the remaining merge sort and non-comparative sorting, and also introduce the stability of various sorting algorithms, and summarize the sorting.

1. [Merge Sort]

concept

Merge sorting refers to merging the ordered subsequences to obtain a completely ordered sequence; that is, first ordering each subsequence, and then ordering the subsequence segments. If two sorted lists are merged into one sorted list, it is called two-way merge.

Algorithm Analysis and Implementation

The idea of ​​merging is to merge two already-ordered arrays to reconstitute a new ordered array. Therefore, the fundamental idea of ​​the merge algorithm is to divide the array to be sorted into multiple already-ordered arrays, and then perform one by one. Merge, and finally get an ordered array. Therefore, the implementation of the merge sort algorithm is obvious: in the merge algorithm, we use a temporary array to store the ordered sequence, and finally copy the contents of the array to the original array.
So the code for merge sort is:

void _MergeSort(int* a, int left, int right, int* tmp)
{
    if (left >= right) //When there is only one element, end the recursion 
        return ;
     int mid = (left + right) / 2 ;
     //First divide the interval into [left, mid] and [mid + 1, right] parts 
    // recurse [left, mid] to make the left half ordered
    _MergeSort(a, left, mid, tmp);
    //Recursive [mid + 1, right] to order the right half 
    _MergeSort(a, mid + 1 , right, tmp);
     //Merge the recursive interval 
    int index = left;
     int begin1 = left, end1 = mid ;
     int begin2 = mid + 1 , end2 = right;
     //At this time, the two intervals [begin1, end1] and [begin2, emd2] have been ordered . 
    //Merge the two intervals into an ordered array 
    while (begin1 < = end1 && begin2 <= end2)
    {
        if (a[begin1] <= a[begin2])
        {
            tmp[index++] = a[begin1++];
        }
        else
        {
            tmp[index++] = a[begin2++];
        }

    }
    while (begin1 <= end1)
    {
        tmp[index++] = a[begin1++];
    }
    while (begin2 <= end2)
    {
        tmp[index++] = a[begin2++];
    }
    //Copy the tmp array to the original array 
    int i = 0 ;
     for (i = left; i < index; i++)
    {
        a[i] = tmp[i];
    }
}


void MergeSort(int* a, int n)
{
    int* tmp = (int*)malloc(sizeof(int) * n);

    _MergeSort(a, 0, n - 1, tmp);

    free(tmp);
}

It is inconvenient to use the MergeSort function itself to recurse here, so a sub-function is used to achieve recursion, so as to achieve the purpose we want to achieve.

Non-recursive algorithm implementation

When learning the code implementation of the Fibonacci sequence, we know that there are two ways, recursive and non-recursive. The recursive implementation of the Fibonacci sequence is implemented by the formula F(n) = F(n – 1) + F(n – 2), but in actual operation, if n is taken to dozens of times, it will cause stack overflow. . Therefore, it is necessary to optimize the code by changing to a non-recursive form, that is, looping from n = 0 to n to calculate the Fibonacci sequence.
Similarly, for the non-recursive implementation of merge sort, we can learn from the idea of ​​Fibonacci, that is, introduce a variable groupNum to record the data length of each merge, and let groupNum loop from 1 to n (of course, it may not be divisible, but Never mind), thus implementing non-recursive code for merge sort.
So the non-recursive code for merge sort is:

//non-recursive version 
void  MergeSortNonR ( int * a, int n)
 {
     // temporary array copy data to be merged 
    int * tmp = ( int *) malloc ( sizeof ( int ) * n);

    int groupNum = 1;
    while (groupNum <= n)
    {
        for ( int i = 0 ; i < n; i += groupNum * 2 ) //Interval of each merge * 2
        {
            // merge the intervals [begin1, end1] and [begin2, end2] 
            int begin1 = i, end1 = i + groupNum - 1 ;
             int begin2 = i + groupNum, end2 = i + groupNum * 2 - 1 ;
             // consider the boundaries 
            / /1. The interval [begin2, end2] does not exist 
            if (begin2 >= n)
            {
                //Modify to construct a non-existent interval
                begin2 = n;
                end2 = n - 1;
            }
            //2.end1 out of bounds 
            if (end1 >= n)
            {
                //Modify end1 so that it is not out of bounds 
                end1 = n - 1 ;
            }
            //3.end2 out of bounds 
            if (end2 >= n)
            {
                //Modify end2 to make [begin2, end2] reasonable 
                end2 = n - 1 ;
            }
            //merge 
            int index = begin1; //start from the element with index begin1 
            while (begin1 <= end1 && begin2 <= end2)
            {
                if (a[begin1] <= a[begin2])
                {
                    tmp[index++] = a[begin1++];
                }
                else
                {
                    tmp[index++] = a[begin2++];
                }
            }
            while (begin1 <= end1)
            {
                tmp[index++] = a[begin1++];
            }
            while (begin2 <= end2)
            {
                tmp[index++] = a[begin2++];
            }
        }
        //Copy the tmp array back to the a array after each merge 
        for ( int i = 0 ; i < n; i++)
        {
            a[i] = tmp[i];
        }
        groupNum *= 2 ; //group distance *2 after each merge
    }

    free(tmp);
}

What needs to be paid attention to here is the processing of boundaries, pay attention to clarifying the logical progression.

summary

  1. The disadvantage of merge is that it requires O(N) space complexity. The thinking of merge sort is more to solve the problem of external sorting in disk.
  2. Time complexity: O(N*logN)
  3. Space complexity: O(N)

2. [Counting sort]

concept

The sorting introduced above is to sort the array by comparing the size. Next, we will introduce a [sorting algorithm] that does not pass the comparison , that is, counting sorting.
Counting sorting, as the name suggests, is a method of counting the number of occurrences of each element in the array, and then sorting according to the number of occurrences of each element.

Analysis of Algorithms

The code implementation of counting sorting is relatively simple. It only needs to count the number of occurrences of each number, and then restore it to the original array.
It should be noted that when the numbers in the array are too large, we can map this array to an array in the range of [0, max – min].
So, the code implementation for counting sort is:

void  CountSort ( int * a, int n)
 {
     // find the maximum and minimum values ​​in the array 
    int i = 0 ;
     int max = a[ 0 ];
     int min = a[ 0 ];
     for (i = 0 ; i < n; i++)
    {
        if (min > a[i])
            min = a[i];
        if (max < a[i])
            max = a[i];
    }
    //Map the elements in array a to the array with subscript [0, max - min] 
    int range = max - min + 1 ; //Number of elements in the count array 
    int * count = ( int *) calloc ( range, sizeof ( int ));
     if (count == NULL )
    {
        printf("calloc fail\n");
        exit(-1);
    }
    // Traverse array a and record the number of occurrences of each number 
    for (i = 0 ; i < n; i++)
    {
        count[a[i] - min]++;
    }
    //sort the original array according to the count array 
    int index = 0 ;
     for (i = 0 ; i < range; i++)
    {
        while (count[i]--)
        {
            a[index++] = i + min;
        }
    }
    free(count);
}

summary

  1. When the count sorting is concentrated in the data range, the efficiency is very high, but the applicable scope and scenarios are limited.
  2. Time complexity: O(MAX(N, range))
  3. Space complexity: O (range)
    In fact, counting sorting is only used when the data to be sorted is relatively concentrated and has a large number of occurrences, so the usage scenarios are very limited.

Stability of Sorting Algorithms

1. The concept of stability

Assuming that there are multiple records with the same keyword in the sequence of records to be sorted, if sorted, the relative order of these records remains unchanged, that is, in the original sequence, r[i]=r[j], and r [i] is before r[j], and in the sorted sequence, r[i] is still before r[j], the sorting algorithm is said to be stable; otherwise, it is said to be unstable.

2. Analyze the stability of various sorting algorithms

1. Direct Insertion Sort

Stability: Stable.
Since the direct insertion sort inserts the numbers to be sorted into the already sorted sequence, the order of the same numbers is the same as the original ones.

2. Hill sort

Stability: Unstable
As the Hill sort is pre-sorted, if there are the same numbers in different groups, the relative order of the same numbers may change. Therefore, the Hill sort is unstable.

3. Direct [selection sort]

Stability: Unstable
Each time the most valued swap is picked from the back, it may cause the original order of the same numbers to change, so the algorithm is unstable.

4. Heap Sort

Stability: Unstable
heap sorting may cause the relative position of the same number to change when adjusting the heap, during the process of building the heap, or when the top element of the heap is exchanged with the last element, so it is unstable. .

5. Bubble sort

Stability: Stable
Each time of bubble sorting, the largest number is taken to the end of the array. During this process, if the same number is not exchanged, the stability of the sorting algorithm is guaranteed.

6. Quick Sort

Stability: Unstable
When quicksort divides an interval, if the same number is encountered faster from right to left than from left to right, then its relative order will change, so quicksort is unstable.

7. Merge Sort

Stability: Stable
Merge sort For the two intervals to be merged, if the same number is encountered, it is only necessary to merge the numbers in the previous interval to ensure the stability of the sorting.
Note: All algorithms can be unstable, but some sorting algorithms can guarantee stability, so we discuss this stable situation.

3. The meaning of stability

In our daily life, we may sort by a variety of key values, such as age, name, etc. If the order has been sorted by name, and then sorted by age, if the sorting is unstable, the original order of people of the same age may be disrupted, so such a sorting algorithm is not good enough. Therefore, the stability of sorting makes a lot of sense.

Summarize

Leave a Comment

Your email address will not be published. Required fields are marked *