Enhanced Insertion Sort by Threshold Swapping

Sorting is an essential operation that takes place in arranging data in a specific order, such as ascending or descending with numeric and alphabetic data. There are various sorting algorithms for each situation. For applications that have incremental data and require an adaptive sorting algorithm, the insertion sort algorithm is the most suitable choice, because it can deal with each element without the need to sort the whole dataset. Moreover, the Insertion sort algorithm can be the most popular sorting algorithm because of its simple and straightforward steps. Hence, the insertion sort algorithm performance decreases when it comes to large datasets. In this paper, an algorithm is designed to empirically improve the performance of the insertion sort algorithm, especially for large datasets. The new proposed approach is stable, adaptive and very simple to translate into programming code. Moreover, this proposed solution can be easily modified to obtain in-place variations of such an algorithm by maintaining their main features. From our experimental results, it turns out that the proposed algorithm is very competitive with the classic insertion sort algorithm. After applying the proposed algorithm and comparing it with the classic insertion sort, the time taken to sort a specific dataset was reduced by 23%, regardless of the dataset’s size. Furthermore, the performance of the enhanced algorithm will increase along with the size of the dataset. This algorithm does not require additional resources nor the need to sort the whole dataset every time a new element is added. Keywords—Sorting; design of algorithm; insertion sort; enhanced insertion sort; threshold swapping


I. INTRODUCTION
Sorting is considered as one of the fundamental operations and extensively studied problems in computer science. It is one of the most frequent tasks needed mainly due to its direct applications in almost all areas of computing. The various applications of sorting will never be obsolete, even with the rapid development of technology, sorting is still very relevant and significant [1]. Formally any sorting algorithm will basically consist of finding a permutation or swapping of elements of a dataset (typically as an array) such that they are organized in an ascending (or descending) or lexicographical order (alphabetical value like addressee key). A large number of efficient sorting algorithms have been proposed over the last ten years with different features [2].
In this paper, we will consider the insertion sort (IS) algorithm, which is one of the popular and well-known sorting algorithms. It is simply building a sorted array or list by sorting elements one by one. The IS algorithm begins at the first element of the array and inserts each element encountered into its correct position (index), after determining and locating a suitable position. This process is repeated for the next element until it reaches the last element in the dataset. Fig. 1 illustrates a classical procedure of the insertion sort algorithm where A is an array of elements. The main side effect of the sorting procedure is overwriting the value stored immediately after the sorted sequence in the array.
The complexity of the insertion sort algorithm depends on the initial array. If the array is already sorted by examining each element, then the best case would be O(n) where n is the array's size. However, the worst case would be O(n 2 ), as each value has to be swapped through the whole dataset, which makes the complexity increase exponentially as the dataset size increases. The average case would be under O(n 2 ), since most values will be sorted to the beginning of the dataset, which is highly expected in large datasets.
Note that the insertion sort algorithm is less efficient when it comes to huge datasets than advanced algorithms, such as heap sort, quick sort, or merge sort. The main insertion sort procedure has an iterative operation, which takes one element with each repetition and compares it with the other elements to find its correct place in the array. Sorting is typically done inplace, by iterating through the array and increasing the sorted array behind it [1].
The Insertion sort algorithm is the optimal algorithm when it comes to incremental, instantly, and dynamically initiated data, which is due to its adaptive behavior. This paper proposes an enhanced algorithm to reduce the execution time of the insertion sort algorithm by changing the behavior of the algorithm, more specifically on large datasets. This proposed algorithm called Enhanced Insertion Sort algorithm (EIS), which aims to enhance how the elements are relocated from the first part of the dataset, rather than waiting to find its correct position by comparing and swapping. Instead, a simple question is asked during the algorithm's execution; is the particular element less than the determined threshold? If yes, then the algorithm applies by traversing the elements that are under the threshold to find the correct position of this particular element. This algorithm will be explained in detail in Section III. The structure of the paper is as follows. Section II presents a brief of related works that are proposed to handle and improve the insertion sort algorithm. Section III describes the proposed EIS algorithm with an explanation of its complexity cost, Pseudo-code, implementation code and finally simple comparisons between EIS and other IS algorithms. Section IV shows the experimental results of our proposed EIS algorithm. Finally, the conclusion of the paper presented in Section V. 472 | P a g e www.ijacsa.thesai.org

II. RELATED WORK
The IS algorithm is considered as one of the best and most flexible sorting methods despite its quadratic worst-case time complexity, mostly due to its stability, good performances, simplicity, in-place and online nature. It is a simple iterative sorting procedure that incrementally builds the final sorted array [2]. There were several suggestions to improve insertion sort and some of them were even implemented, as seen in [2][3][4]. Insertion sort can be simplified by using an external element, known as a sentinel value [5]. Bidirectional approaches were proposed in [6], where it consists of two steps, the first step compares both the first and last elements, and then swaps them if the first element is larger. The second step takes two adjacent elements from the beginning of the array and then compares them as well. Abbasi and Dahiya [7] proposed a bidirectional approach to minimize the shifting process, which supposes that there are two sorted parts on the left and right. This approach reduces the shifting process, rather than an element that may shift through the whole array. Patel, et al. [8] presented an approach of inserting elements from the middle of a dataset and applying a bidirectional sorting, using arrays as structured data. Paira, et al. [9] proposed an approach that applies a dual scan from both directions, which locates the position from both sides. Sodhi, et al. [10] presents a binary insertion sort that achieves a time complexity of O(n 1.585 ) for some average cases by reducing the number of comparisons. This approach starts with the middle element, which reduces the number of swaps needed. Then it determines the position of the suitable location for each element. Afterward, it chooses one direction, either left or right, then adds or appends it to another array. Khairullah [11] presented an approach that keeps track of both directions. It also starts from the middle element's location and compares it according to the element in the middle. Some approaches were not bidirectional, such as [12], which implements an algorithm to simply arrange a worst-case insertion sort that reverses the values.
However, bidirectional methods are efficient when compared with other classical insertion sort algorithms, but these kinds of approaches require the complete dataset to be sorted before knowing its size. In this case, these approaches cannot be implemented in applications that have incoming incremental data.

III. METHODOLOGY
The methodology section demonstrates a detailed explanation of the proposed EIS algorithm method in subsection III.A, then it presents an analysis of the algorithm's complexity in subsection III.B. Further, subsections III.C and III.D handling the details of Pseudo-code and the implementation code of the EIS algorithm. Finally, subsection III.E illustrates a simple comparison between the EIS and IS algorithms.

A. EIS Algorithm Method
In the EIS algorithm, the enhancement occurs when the algorithm behaves differently, more specifically when a value of the selected element is lower than a given threshold. The threshold is defined as the index of selected elements from the sorted part of the array, A, in the particular step during the EIS algorithm. Note that the Threshold= ⌊i/3⌋, where i is defined as the index of the particular element which is select to be sorted and 0<in. Please note that if i element is selected now to be sort, then this means that all elements in the array A from A[0] to A[i-1], are fully sorted according to the original insertion sort algorithm behavior. Moreover, the threshold is determined by ⅓ of the dataset size, which changes dynamically as the algorithm sorts the elements one by one and i will increase by 1 in each iteration. The ratio, ⅓, of the dataset was chosen after executing several experiments, which concluded that it is the best case in terms of time complexity. It is clear that there is no specific way to determine the optimal ratio for the threshold unless you try out a bunch of different values and test the performance. This will be further discussed in section VI.
The functionality of the proposed algorithm is the same as the insertion sort process, but it asks a question before it starts comparing and swapping the selected i element in A[i] where the index=i; is the value of the i element being compared less than the value of the element in threshold index? If yes, then.
The EIS algorithm searches for the correct index to move the i element and place it. Note that EIS will start searching for the correct index for element i from the segment part (block) that contains elements that have values that are less than the value of the threshold index. When the suitable index is spotted, it swaps the selected element to the specific index and then shifts all other elements to the right. This operation reduces the number of comparisons and swapping of elements and this reduction will increase if the size of the array is also increased. In case that the value of the element is higher than the value of the threshold, then EIS behaves like the original insertion sort process and the original IS procedure will be done for this particular selected element in index i. Fig. 2 demonstrates an illustration example of the procedure of the proposed Enhanced Insertion Sort (EIS). The threshold is dynamically changing based on the size of the traversed elements while the algorithm sorts the elements on by one. In each step, the algorithm examines whether the value of the selected element i is lower than the value of the threshold index or not. In steps 6 and 9, the algorithm begins to search beyond the threshold, and then it replaces the elements to a suitable index, and then shifted all the elements to the correct index. To illustrate that if you look at step 9, rather than www.ijacsa.thesai.org wherein this particular step the threshold = 3. Note that the number of the comparisons in step 9 will be 9 if we use the original LS, but when using our proposed EIS, the number of the comparisons will reduce to 3, as shown in Table I.

B. The Complexity of the EIS Algorithm
As shown in Fig. 3, the cost of the execution is reduced to n 2/3 , as the algorithm relies heavily on the threshold procedure to reduce the search space. Given the following  2(n 2/3 ) + 3n 2 + 4n. It will remain as O(n 2 ). However, the results empirically show more efficient behavior. The Pseudo-code of the proposed Enhanced insertion sort algorithm is shown in Fig. 3. Fig. 4 shows the java source code of the proposed Enhanced insertion sort algorithm. Furthermore, the full implementation of the EIS algorithm can be downloaded and run from the Github website:https://github.com/muhyidean/ EnhancedInsertionSort-ThresholdSwapping. Table I shows a detailed comparison between the insertion sort (IS) and our proposed algorithm (EIS) using the same dataset in Fig. 2 as an example. As shown in Table I, it is clear that the number of comparisons for the dataset example using the proposed EIS is better than using IS, where the number of comparisons for the IS algorithm is 45; while the number of comparisons for EIS is 34. As a particular example for specific iteration in step 9, when i = 9, it is clear that the number of the comparisons will be 9 if we use the original IS, but when using our proposed EIS, the number of the comparisons will reduce to 3, which is the one third (⅓) threshold value as shown in Table I.

A. The Implementation Procedure
The proposed EIS algorithm was implemented using the Java programming language. In order to evaluate the performance of the proposed EIS algorithm. Three algorithms, the classical IS algorithm, the proposed EIS algorithm using ⅓ threshold, and ¼ threshold are utilized and the results are compared. The performance evaluations were performed on a computer machine with a 2.8 GHz Intel Core i5 processor with 4 GB 1600 MHz DDR3 memory on a windows platform. An experimental test has been done on empirical data (integer numbers) that are generated randomly using Java.
To verify that the same data is examined for each execution, a random dataset is generated and copied to three different arrays. Then the data is used to apply each algorithm and the time it takes (in milliseconds) to complete the sorting process was recorded. This is to assure that the algorithm's performance works for all types of data organizations and sizes. Twenty random datasets were utilized and the average execution times are reported. The sizes of the datasets that were utilized were 10000, 50000, 100000, and 500000. Table II shows the overall performance results of the employed algorithms on different utilized datasets. As shown in Table II, the performance results, exposed by the classical IS algorithm, are stable on all utilized datasets. These results are expected since the computational complexity of the IS algorithm is practically the same. Further, the reported results in Table II emphasize that the proposed EIS algorithm using thresholds of ¼ and ⅓ reported enhanced performance results as compared to the classical IS algorithm. In fact, when the threshold equals ⅓ the results are superior. Consequently, a threshold of ⅓ states an appropriate threshold for employing the EIS algorithm.

B. Experimental Results and Discussion
The results are also showing, as the size of the dataset increases, the deviations of the performance results are also improved. Fig. 5 to 7 illustrates exceptional performance results of the proposed EIS especially when the threshold equals ⅓ and when the dataset's size is larger. As Fig. 5 Shows, the performance results of the EIS algorithm are much improved in terms of computational complexity time. Although, the complexity of the EIS algorithm, as reported in section III.B, states that the EIS has an O(n 2 ) computational time, but the results empirically demonstrate better performance. Fig. 6 and 7 demonstrate also similar performance results. When the size of a dataset is larger, the EIS algorithm performs better. Overall, the EIS algorithm, with varying datasets sizes and with a range of threshold values, performs empirically better than the classical IS algorithm. To further compare the performance of the employed algorithms, the average, maximum and minimum execution times for each dataset's size are reported and compared. As Table III shows the classical IS reported results are the lowest in terms of average execution time. When the threshold equals ¼, the EIS algorithm has a higher maximum value and a lower minimum value than using ⅓. This indicates that when the threshold equals ⅓ the reported results are more efficient. The lowest average execution times are highlighted to determine the most efficient performance. To conclude, The EIS algorithm, with a threshold of ⅓, demonstrates the best average performance results as highlighted in Table III. The improvements of the proposed EIS over the IS algorithms are also compared in terms of the average execution time and reported. Table IV shows that the proposed EIS algorithm outperforms the classical IS algorithm. The performance improvements of the proposed EIS algorithm in terms of execution time on average was 23%. The main reason for the significant improvement in performance is that the threshold procedure of the EIS algorithm reduces the number of comparisons and swapping needed to complete the sorting procedure. www.ijacsa.thesai.org     C. Source Code of Implementation Due to the space limitation, the source code of the proposed EIS algorithm is uploaded online to the GitHub website (https://github.com/muhyidean/EnhancedInsertionSort-ThresholdSwapping).

V. CONCLUSION
Insertion sort is the suitable sorting algorithm when it comes to incremental, instantly and dynamically initiated data. Yet, its complexity increases exponentially when the data size increases, making it inefficient. In this paper, an enhancement of the IS algorithm was proposed, named enchanted insertion sort, to improve the computational complexity of the IS algorithm by changing its behavior.
The proposed algorithm reduces the number of comparisons and swapping needed to complete the sorting procedure. After executing the algorithm and comparing it with the classical insertion sort algorithm, there was an improvement of 23% in terms of the execution time taken to complete the sorting process. The worst case of the complexity remains O(n 2 ), but the reported results are empirically promising. The efficiency of the proposed EIS algorithm is attributable to the reduction of the number of comparisons during the sorting process.