Sorting Trilogy 3. Special sorting algorithm

Sorting Trilogy 3. Special sorting algorithm

Sorting not based on comparison

Insertion sort, merge sort, quick sort, heap sort, etc., are all sorting algorithms based on comparison . This means that in the same process, no matter what types of comparison are int, float, double, Student, Animal, etc., as long as the ratio between the two instances is specified, then the sorting process can be reused.

Data that is not based on comparison will be restricted by data conditions!

Common sorts that are not based on comparison are: counting sorting, cardinal sorting.

Count sort

1. Applicable scenarios

Now there is an array arr of type int, which stores the ages of all employees. Now we need to sort the ages of all employees.

It is known that the age of employees must be between [0, 200], so the data range can be considered as 0~200.

Apply for an int type array frequency with a length of 0~200 as the word frequency statistics table. The subscript of the array is regarded as the age of the employee, and each subscript corresponds to the value in the array as the number of employees of that age. For example: frequency[0] indicates the number of employees who are 0 years old, and frequency[1] indicates the number of employees who are 1 year old.

Traverse the array arr, when traversing to a 0-year-old employee, frequency[0] increments by 1. When traversing to a 1-year-old employee, frequency[1] increments by 1. When the traversal of arr ends, the word frequency statistics table frequency is constructed.

Refer to the word frequency statistics table frequency to construct a sorted array result, which is the result of arr sorting.

2. Code

public static int [] countingSort( int [] arr) { //Determine the length of the word frequency statistics table according to the data range int [] frequency = new int [ 200 ]; //Build the word frequency statistics table for ( int i = 0 ; i < arr.length; i ++) { frequency[arr[i]] ++; } //Construct a sorted array through the word frequency statistics table int index = 0 ; int [] result = new int [arr.length]; for ( int i = 0 ; i <frequency.length; i ++) { for ( int j = 0 ; j <frequency[i]; j ++) { result[index ++] = i; } } return result; } Copy code

3. Summary

The time complexity of counting and sorting is O(N), and the extra space complexity is O(N)

If the data range of a scene is [2^-31, 2^31], it means that the length of your word frequency statistics table is 2^32, which will take up a lot of extra space.

Therefore, sorting that is not based on comparison is a sorting algorithm customized according to the data status , and there is no such a wide range of applications as sorting based on comparison.

Base sort

1. Sorting process

Sort the integer array [17, 13, 25, 100, 72].

First determine the number of digits of the largest element in the array, this question is 3 digits. Then add 0 to the high bits of elements whose length is less than 3 bits > [017, 013, 025, 100, 072]

Then determine the base number of the elements in the array, this title is base 10. Therefore, 10 buckets numbered 0-9 are constructed. The structure of the bucket can be an array, a stack, or a queue (usually a queue).

Traverse the original array for the first time, and perform bucket sorting according to the single digits of the elements

Traverse the result array of the first sort for the second time, and sort the buckets according to the tens digits of the elements

Traverse the result array of the second sort for the third time, and sort the buckets according to the hundred digits of the elements

2. Code

The cardinality sorting method implemented by this code is very elegant, and the above-mentioned cardinality sorting process has been greatly optimized at the code level. In the code implementation process, the corresponding number of buckets is not established one by one, but an auxiliary array of the size of the word frequency statistics table and the original array is established to simulate multiple buckets for sorting.

In each round, do word frequency statistics for the corresponding bits in the sorted elements, and then convert the word frequency statistics table into prefixes and arrays. Previously, by looking at the word frequency statistics table, you can know the number of corresponding bits equal to a certain number; now by looking at the prefix and array, you can know the number of corresponding bits less than or equal to a certain number. Traverse the result array of the previous round of sorting from right to left (the first round is the original array), and then use the structure of the bucket as the nature of the queue to determine that the last element that enters the bucket will be ranked in the last position in a certain area. Overwrite the element at the corresponding position of the original array, and subtract 1 from the prefix and the corresponding position of the array. When the traversal is over, it is equivalent to simulating a round of in and out of the bucket for a certain person.

//Get the maximum number of digits in the sorted array public static int getMaxDigit ( int [] arr) { int max = Integer.MIN_VALUE; //Find the maximum value in the array for ( int i = 0 ; i <arr.length ; i ++) { if (arr[i]> max) { max = arr[i]; } } int maxDigit = 0 ; //Get the maximum number of digits while (max> 0 ) { maxDigit ++; max/= 10 ; } return maxDigit; } //Get the value of a certain number public static int getDigitValue ( int number, int digit) { for ( int i = 1 ; i <= digit- 1 ; i ++) { number/= 10 ; } return number% 10 ; } /** * Cardinality sort * @param arr sorted array * @param left left subscript * @param right right subscript * @param maxDigit The maximum number of digits in the array element */ public static void radixSort ( int [] arr, int left, int right, int maxDigit) { //radix, indicating that the sort element is a decimal number int radix = 10 ; int i = 0 , j = 0 ; //Auxiliary array, used to store the result of each round of sorting int [] help = new int [right-left + 1 ]; //What is the number of digits of the largest element, which means how many rounds need to enter and exit the bucket, d represents the current sorting What is the position for ( int d = 1 ; d <= maxDigit; d ++) { //The word frequency statistics table of a single bit of an element, frequency[i] represents the number of elements whose value of the current bit d is between 0 and i int [] frequency = new int [radix]; //Build a word frequency statistics table for ( i = left; i <= right; i ++) { j = getDigitValue(arr[i], d); frequency[j] ++; } //Construct prefix and array from word frequency statistics table for (i = 1 ; i <radix; i ++) { frequency[i] = frequency[i- 1 ] + frequency[i]; } //Traverse the array from right to left, simulate a round of sorting in and out of the bucket, and store the sorting result help for (i = right; i >= left; i --) { j = getDigitValue(arr[i], d); help[frequency[j] -1 ] = arr[i]; frequency[j] --; } for (i = left, j = 0 ; i <= right; i ++, j ++) { arr[i] = help[j]; } } } Copy code

3. Summary

Cardinality sorting starts from the low order of the sorted elements, and enters and exits the bucket sequentially with the high order. From low to high, the priority of sorting increases sequentially.

The single digit has the lowest priority and comes first. When arranging the tens digit, the priority of the single digit is reserved. When arranging hundreds of digits, the priority of single digits and tens digits is reserved.

This is the essence of cardinal sorting.

Cardinality sorting is better than counting sorting. Counting sorting requires word frequency statistics for all sorted elements in a range, which consumes a lot of additional auxiliary space, and the utilization of auxiliary space is very low. The radix sort only needs to determine the number of digits of the sorted elements, and then create a fixed number of buckets, the additional auxiliary space will not be too large, and the utilization rate is guaranteed.

The radix sort still receives the limitation of the data status, because the element of the radix sort must be a number with a hexadecimal number .

Sorting stability

1. Description

If the array is sorted from small to large [3, 2, 3, 1, 1, 2, 3, 2, 3, 1], the result of the sort must be [1, 1, 1, 2, 2, 2, 3, 3 , 3, 3]. At this time, multiple identical elements are arranged together. If [1, 1, 1], [2, 2, 2], [3, 3, 3, 3], each of the three groups of identical elements can keep [3, 2, 3, 1, 1, 2, 3, 2, 3, 1] The relative order of each element , then it can be said that the order is stable.

2. Application scenarios

The stability of sorting is meaningless for pure numbers such as 1, 2, and 3. Because the meanings of these numbers are the same, the relative order of each element after sorting is the same or different has no effect on the overall meaning represented by the entire array.

When the sorting object is a non-basic type, it can be used.

For example, the object of sorting is students, and each student has two attributes, classID and age.

The first round is sorted according to the age of the students from small to large, and the second round is sorted according to the student s classID from small to large.

If the sorting algorithm in the second round is stable, the result of the final sorting is not only the classID from small to large, but the age of students with the same classID is also from small to large.

This means that the second round of sorting operation retains the relative order of each element after the first round of sorting.

3. Choose Sort

Not stable.

The rule of selection sorting is to find the smallest element in the unsorted area in each round and place it on the left boundary of the area.

Suppose [3,3,3,3,1,3,3,3] is sorted from small to large.

When the 1 on [4] is placed on the left boundary and needs to be exchanged with [0], the 3 on [0] violates the rules of stability.

4. Bubble sort

Have stability.

The bubble sorting rule is that in each round of the unsorted area, the largest element is exchanged to the right boundary of the area through the exchange of the ratio of adjacent elements.

Suppose [2,7,4,6,1,7] is sorted from small to large.

When the 7 on [1] is adjacent to the 7 on [5] after three comparisons and three exchange operations, the comparison ends and the next round starts because the adjacent elements are equal. In this way, the three elements exchanged by [1] maintain the relative order, and the 7 on [1] and the 7 on [5] also maintain the relative order. The other numbers can be deduced by analogy.

5. Insertion sort

Have stability.

The rule of insertion sort is to expand the right boundary of the sorted area by one element to the right in each round, and then compare and exchange the new elements from right to left.

Suppose [4,7,9,4,1,6] is sorted from small to large.

When the 4 on [3] enters the sorted area, it needs to be exchanged with [1] and [2]. After the exchange, since it is equal to the 4 on [0], the comparison ends and the next round starts. In this way, the two elements exchanged by [3] keep the relative order, and the 4 on [0] and the 4 on [3] also keep the relative order. The other numbers can be deduced by analogy.

6. Merge Sort

Have stability.

The rule of merge sorting is that each round of two ordered parts are merged into one ordered part through outer sorting.

Suppose [1,1,2,2,3] and [1,2,3,3,4] are sorted from small to large.

When the first array [0] and the second array [0] are compared and equal, let [0] of the first array enter the auxiliary array first, and [0] of the second array enter the auxiliary array afterwards. Because [1] of the first array is smaller than the second array [1], it then enters the auxiliary array, so that [0] and [1] of the first array maintain the relative order. Other numbers can be deduced by analogy.

7. Quick Sort

Not stable.

The rule of quick sorting is to divide the last element of the array in each round, put the elements smaller than the element on the left of the array, put the elements larger than the element on the right of the array, and put the elements equal to the element in the middle of the array. Finally, the element is exchanged with the left boundary larger than the area to complete the sorting of the element.

Suppose [6,7,6,6,3,5] is sorted from small to large.

In the first round, the 5 on [5] is used as a division. [0], [1], [2], and [3] are all greater than 5, and the smaller area remains unchanged. When traversing to [4], 3 on [4] is less than 5, so let [4] exchange with the next one that is smaller than the area, that is, exchange [4] and [0]. In this way, the 6 on [0] violates the rules of stability.

8. Heap Sort

Not stable.

The rule of heap sorting is to construct a big root heap first in each round, and then exchange the root of the big root heap with the last node, and then separate the last node from the big root heap structure to complete the root sorting.

Suppose [5, 6, 4, 4] are sorted from small to large.

Build the big root heap first, and the array will be [5, 4, 4, 6] after the completion of the construction. Thus the 4 on [3] violates the rules of stability.

9. Cardinality Sorting

Have stability.

Because heap sorting relies on buckets for sorting, buckets are generally implemented as queues. The nature of the queue shows that the elements of the advanced queue must be out of the queue first. So cardinal sorting will not change the relative order of sorted objects.

Sort summary

1. Sorting performance summary

time complexitySpace complexitystability
Select sortO(N^2)O(1)X
Bubble SortO(N^2)O(1)
Insertion sortO(N^2)O(1)
Merge sortO(NlogN)O(N)
Quick sort (random)O(NlogN)O(logN)X
Heap sortO(NlogN)O(1)X

2. Sorting options

Under normal circumstances, you will choose quick sort to achieve the purpose of sorting. Although the index of quick sort is the same as that of merge sort and heap sort, the constant item of quick sort is the lowest after experimental verification. In other words, under the condition of sorting the same data, quick sorting is the most efficient.

If you need to use sort stability, you can choose merge sort.

If the space is strictly limited, you can choose heap sort.

3. Sorting optimization

  • Engineering optimization of sorting: use the advantages of each sort to combine together for sorting, for example, insert sorting on a small scale, and quick sorting on a large scale scheduling.
  • Stability optimization of sorting: consider whether the sorting needs to have stability in the specific implementation of the sorting algorithm.

4. Expansion issues

Question 1 : Can a sorting algorithm based on comparison reduce the time complexity to less than O(NlogN)?

Answer: Not currently.

Question 2 : Can a sorting algorithm based on comparison reduce the space complexity to less than O(N) when the time complexity is O(NlogN), and ensure stability?

Answer: No.

Question 3 : Can the space complexity of merge sort be reduced to O(1)?

Answer: Yes, you can learn about "Merge Sort Internal Cache Method" and "In-Place Merge Sort Method". The "Merge Sort Internal Cache Method" is very difficult to implement and difficult to master, and when the space complexity drops to O(1), the sort is not stable. The "in-place merge sort method" does not require the use of auxiliary arrays. When the space complexity is reduced to O(1), the time complexity of sorting will rise to O(N^2).

Question 4 : Can quicksort be stable?

Answer: Classic quick sort cannot be done, but a paper-level quick sort algorithm derived from classic quick sort can be implemented. You can learn about the paper "01 stable sort". Very difficult to understand and not easy to grasp.

Question 5 : There is an array, the elements of the array are either odd or even. Can we put all the odd numbers on the left side of the array and all the even numbers on the right side of the array, and ensure that the relative order of each element of the array remains unchanged?

Answer: The partiton of the classic quick sorting is 01 standard, and the parity problem of this topic is an adjustment strategy. However, the partition of classic fast sorting cannot be stable, so classic fast sorting cannot solve this problem. Please refer to the paper "01 stable sort" for the solution to this problem.

Question 6 : Why does the Arrays.sort method use quick sort when the input parameter is a basic type? When the incoming parameter is a custom type, use merge sort?

Answer: Stability. Because the stability of the sorting algorithm has no effect when the sorting object is a basic type, the fast sorting with a lower constant time will be used. When the sorting object is a non-basic type, the system does not know whether the business needs to use the stability of the sort, so it uses the merge sort to maintain the stability of the sort.