# Bucket Sort In Pythonby Isai Damier, Android Engineer @ Google

```#=======================================================================
#  Author: Isai Damier
#  Title: Bucketsort
#  Project: geekviewpoint
#  Package: algorithms
#
#  Statement:
#  Given a disordered list of integers, rearrange them in natural
#     order.
#
#  Sample Input: [8,5,3,1,9,6,0,7,4,2,5]
#  Sample Output: [0,1,2,3,4,5,6,7,8,9,5]
#
#  Time Complexity of Solution:
#  Best Case O(n); Average Case O(n); Worst Case O(n).
#
# Approach:
# If it sounds too good to be true, then most likely it's not true.
# Bucketsort is not an exception to this adage. For bucketsort to
#   work at its blazing efficiency, there are multiple prerequisites.
#   First the hash function that is used to partition the elements need
#   to be very good and must produce ordered hash: if i < k then
#   hash(i) < hash(k). Second, the elements to be sorted must be
#   uniformly distributed.
#
# The aforementioned aside, bucket sort is actually very good
#   considering that counting sort is reasonably speaking its upper
#   bound. And counting sort is very fast. The particular distinction
#   for bucket sort is that it uses a hash function to partition the
#   keys of the input array, so that multiple keys may hash to the same
#   bucket. Hence each bucket must effectively be a growable list;
#
# Numerous Internet sites, including university pages, have
#   erroneously written counting sort code and call them bucket sort.
#   Bucket sort uses a hash function to distribute keys; counting sort
#   creates a bucket for each key. Indeed there are perhaps greater
#   similarities between radix sort and bucket sort, than there are
#   between counting sort and bucket sort.
#
# In the presented program insertionsort is used to sort
#   each bucket. This is to inculcate that the bucket sort algorithm
#   does not specify which sorting technique to use on the buckets.
#   A programmer may choose to continuously use bucket sort on each
#   bucket until the collection is sorted (in the manner of the radix
#   sort program below). Whichever sorting method is used on the
#   buckets, bucket sort still tends toward O(n).
#=======================================================================
def bucketsort( A ):
# get hash codes
code = hashing( A )
buckets = [list() for _ in range( code[1] )]
# distribute data into buckets: O(n)
for i in A:
x = re_hashing( i, code )
buck = buckets[x]
buck.append( i )

# Sort each bucket: O(n).
# I mentioned above that the worst case for bucket sort is counting
# sort. That's because in the worst case, bucket sort may end up
# with one bucket per key. In such case, sorting each bucket would
# take 1^2 = O(1). Even after allowing for some probabilistic
# variance, to sort each bucket would still take 2-1/n, which is
# still a constant. Hence, sorting all the buckets takes O(n).

for bucket in buckets:
insertionsort( bucket )

ndx = 0
# merge the buckets: O(n)
for b in range( len( buckets ) ):
for v in buckets[b]:
A[ndx] = v
ndx += 1

import math

def hashing( A ):
m = A[0]
for i in range( 1, len( A ) ):
if ( m < A[i] ):
m = A[i]
result = [m, int( math.sqrt( len( A ) ) )]
return result

def re_hashing( i, code ):
return int( i / code[0] * ( code[1] - 1 ) )

```
```import unittest
from algorithms import sorting

class Test( unittest.TestCase ):

def testBucketsort( self ):
A = [8, 5, 3, 1, 9, 6, 0, 7, 4, 2, 5]
sorting.bucketsort( A )
for i in range( 1, len( A ) ):
if A[i - 1] > A[i]:
self.fail( "bucketsort method fails." )
```