/*****************************************************************************
* Author: Isai Damier
* Title: Bucketsort
* Project: geekviewpoint
* Package: algorithms
*
* Statement:
* Given a disordered list of integers, rearrange them in natural order.
*
* Sample Input: {8,5,3,1,9,6,0,7,4,2,5}
* Sample Output: {0,1,2,3,4,5,6,7,8,9,5}
*
* Time Complexity of Solution:
* Best Case O(n); Average Case O(n); Worst Case O(n).
*
* Approach:
* If it sounds too good to be true, then most likely it's not true.
* Bucketsort is not an exception to this adage. For bucketsort to work at
* its blazing efficiency, there are multiple prerequisites. First the
* hash function that is used to partition the elements need to be very
* good and must produce ordered hash: if i < k then hash(i) < hash(k).
* Second, the elements to be sorted must be uniformly distributed.
*
* The aforementioned aside, bucket sort is actually very good considering
* that counting sort is reasonably speaking its upper bound. And counting
* sort is very fast. The particular distinction for bucket sort is that
* it uses a hash function to partition the keys of the input array, so
* that multiple keys may hash to the same bucket. Hence each bucket must
* effectively be a growable list; similar to radix sort.
*
* Numerous Internet sites, including university pages, have erroneously
* written counting sort code and call them bucket sort. Bucket sort uses
* a hash function to distribute keys; counting sort creates a bucket for
* each key. Indeed there are perhaps greater similarities between radix
* sort and bucket sort, than there are between counting sort and bucket sort.
*
* In the presented program Java's Collections.sort(C) is used to sort each
* bucket. This is to inculcate that the bucket sort algorithm does not
* specify which sorting technique to use on the buckets. A programmer may
* choose to continuously use bucket sort on each bucket until the
* collection is sorted (in the manner of the radix sort program below).
* Whichever sorting method is used on the buckets, bucket sort still
* tends toward O(n).
*
****************************************************************************/
public void bucketsort(int[] input) {
//get hash codes
final int[] code = hash(input);
//create and initialize buckets to ArrayList: O(n)
List<Integer>[] buckets = new List[code[1]];
for (int i = 0; i < code[1]; i++) {
buckets[i] = new ArrayList<Integer>();
}
//distribute data into buckets: O(n)
for (int i : input) {
buckets[hash(i, code)].add(i);
}
/**
* Sort each bucket: O(n).
* I mentioned above that the worst case for bucket sort is counting
* sort. That's because in the worst case, bucket sort may end up
* with one bucket per key. In such case, sorting each bucket would
* take 1^2 = O(1). Even after allowing for some probabilistic
* variance, to sort each bucket would still take 2-1/n, which is
* still a constant. Hence, sorting all the buckets takes O(n).
***/
for (List bucket : buckets) {
Collections.sort(bucket);
}
int ndx = 0;
//merge the buckets: O(n)
for (int b = 0; b < buckets.length; b++) {
for (int v : buckets[b]) {
input[ndx++] = v;
}
}
}
private int[] hash(int[] input) {
int m = input[0];
for (int i = 1; i < input.length; i++) {
if (m < input[i]) {
m = input[i];
}
}
return new int[]{m, (int) Math.sqrt(input.length)};
}
private int hash(int i, int[] code) {
return (int) ((double) i / code[0] * (code[1] - 1));
}
import org.junit.Test;
import static org.junit.Assert.*;
public class SortingTest {
@Test
public void testBucketsort() {
System.out.println(""bucketsort"");
int[] A = {8, 5, 3, 1, 9, 6, 0, 7, 4, 2, 5};
Sorting instance = new Sorting();
instance.bucketsort(A);
for (int i = 1; i < A.length; i++) {
if (A[i - 1] > A[i]) {
fail(""bucketsort method fails."");
}
}
}
}