Monday, October 22, 2007

Sorting techniques and algorithms (1)

Have an array you need to put in order? Keeping business records and want to sort them by ID number or last name of client? Then you'll need a sorting algorithm. To understand the more complex and efficient sorting algorithms, it's important to first understand the simpler, but slower algorithms. In this article, I'll talk about the 3 algorithms we studied at college which are bubble sort, including a modified bubble sort that's slightly more efficient; insertion sort; and selection sort. Any of these sorting algorithms are good enough for most small tasks, though if you were going to process a large amount of data, you would want to choose one of the sorting algorithms that I'll talk about in the next posts isA.

Bubble Sort:
The bubble sort is the oldest, simplest and generally considered to be the most inefficient sorting algorithm in common usage. Unfortunately, it's also the slowest. It works by comparing each item in the list with the item next to it, and swapping them if required. The algorithm repeats this process until it makes a pass all the way through the list without swapping any items (in other words, all items are in the correct order). This causes larger values to "bubble" to the end of the list while smaller values "sink" towards the beginning of the list.

Since the worst case scenario is that the array is in reverse order, and that the first element in sorted array is the last element in the starting array, the most exchanges that will be necessary is equal to the length of the array. Here is a simple example:

Given an array 23154 a bubble sort would lead to the following sequence of partially sorted arrays: 21354, 21345, 12345. First the 1 and 3 would be compared and switched, then the 4 and 5. On the next pass, the 1 and 2 would switch, and the array would be in order.

The basic code for bubble sort for sorting an integer array using C looks like this:

A better version of bubble sort, known as modified bubble sort, includes a flag that is set if an exchange is made after an entire pass over the array. If no exchange is made, then it should be clear that the array is already in order because no two elements need to be switched. In that case, the sort should end. The new best case order for this algorithm is O(n), as if the array is already sorted, then no exchanges are made. After modifying the code it will be like this:

In part two I will talk about insertion sort and selection sort. Please post a comment if you have any questions :)