I was never able to pull off a wheelie with the bike, or to do a walk-the-dog with the yoyo. As many of us have learned at the playground, not knowing these tricks usually means that you won’t be able to hang around with the bigger kids. Not even if you want it so, so much. Fast-forward a few decades and, as it happens, you are looking for a job in data science. The bigger kids have transformed into companies that do awesome things like convolutional neural networks and reinforcement learning. The playground logic, however, still generally applies. Unless you know the right programming stunts, you are unlikely to ace interviews with those amazing companies. And what’s worse, you still can’t pull off a wheelie.

An essential coding trick to learn in preparation for data-science job interviews is sorting algorithms. If you are not familiar with these, and are looking for a gentle introduction, you have landed in the right blog post. Hopefully the code will not show it, but I am a beginner too. Indeed, this blog post originated as an exercise, where I challenged myself to program every algorithm described in 0612’s YouTube video series *Sorting Algorithms Redux*. Encouraged by ASI Data Science, I turned that piece of work in a brief tutorial, adding some text and watchfully revising the code. I have learned a lot both from 0612’s videos and from several comments offered by Pascal Bugnion re. how to correctly implement the algorithms. Let me take the opportunity to thank them, and to inform the reader that any mistake is entirely my own. If you are getting ready for coding interviews, this blog post should be useful, particularly if you are aiming high!

Two side-comments are in order. (i) The source (Jupyter) notebook for this blog post can be downloaded here. (ii) As I have avoided using further packages, the code relies purely on in-built functions. The unique exception is NumPy, Python's main package for scientific calculations, which I employed for testing the output of algorithms.

```
# NumPy will be used only to test the code
import numpy as np
```

## 00. Introduction

## 01. Time complexity

## 02. Selection sort

```
def argmin(array):
# Get the index of the smallest
# element in the array
arg = 0
for i in range(1, len(array)):
if array[i] < array[arg]:
arg = i
return arg
def selection_sort(array):
n = len(array)
for i in range(n):
j = argmin(array[i:n]) + i
array[i], array[j] = array[j], array[i]
return array
```

### 02.a Checks

Input:

```
selection_sort([]) == []
```

Output:

```
True
```

Input:

```
array = list(np.random.randint(100, size=120))
np.array_equal(selection_sort(array), np.sort(array))
```

Output:

```
True
```

## 03. Bubble sort

```
def bubble_sort(array):
n = len(array)
for i in range(n):
changed = False
for j in range(n-i-1):
if array[j] > array[j+1]:
array[j], array[j+1] = array[j+1], array[j]
changed = True
if not changed:
break
return array
```

### 03.a Checks

Input:

```
bubble_sort([]) == []
```

Output:

```
True
```

Input:

```
array = [i for i in range(10)]
bubble_sort(array) == array
```

Output:

```
True
```

Input:

```
array = list(np.random.randint(100, size=120))
np.array_equal(bubble_sort(array), np.sort(array))
```

Output:

```
True
```

## 04. Cocktail sort

```
# Sort while moving forward or backward
def partly_sort(array, i, forward):
n = len(array)
# Range of forward traverse
if forward == True:
interval = range(i, n-i-1)
# Range of backward traverse
elif forward == False:
interval = reversed(range(i, n-i-2))
# Partly sort
changed = False
for j in interval:
if array[j] > array[j+1]:
array[j], array[j+1] = array[j+1], array[j]
changed = True
return (array, changed)
# Cocktail sort
def cocktail_sort(array):
n = len(array)
for i in range(n):
(array, changed) = partly_sort(array, i, True)
if not changed:
break
(array, changed) = partly_sort(array, i, False)
if not changed:
break
return array
```

### 04.a Checks

Input:

```
cocktail_sort([]) == []
```

Output:

```
True
```

Input:

```
array = [i for i in range(10)]
cocktail_sort(array) == array
```

Output:

```
True
```

Input:

```
array = list(np.random.randint(100, size=120))
np.array_equal(cocktail_sort(array), np.sort(array))
```

Output:

```
True
```

## 05. Insertion sort

```
def insertion_sort(array):
n = len(array)
for i in range(n-1):
j = i
while (j+1 > 0) and (array[j] > array[j+1]):
array[j], array[j+1] = array[j+1], array[j]
j -= 1
return array
```

### 05.a Checks

Input:

```
insertion_sort([]) == []
```

Output:

```
True
```

Input:

```
array = list(np.random.randint(100, size=120))
np.array_equal(insertion_sort(array), np.sort(array))
```

Output:

```
True
```

## 06. Bucket sort

```
def bucket_sort(array, mini, maxi, b, basic_sort = bubble_sort):
n = len(array)
# Safety checks
if n == 0:
return []
elif n < b:
raise ValueError('Number of buckets exceeds length of array.')
# Bucketing
step = float(maxi - mini)/b
buckets = [[] for i in range(b)]
result = []
for value in array:
# Range for last bucket
# includes both endpoints
if value == maxi:
j = b - 1
# Ranges for other buckets
# include left endpoint only
else:
j = int((value - mini)/step)
buckets[j].append(value)
# Sorting and recombining
for bucket in buckets:
result.extend(basic_sort(bucket))
return result
```

### 06.a Checks

Input:

```
mini = 0
maxi = 20
array = np.random.random_integers(0, 20, size=20)
np.array_equal(bucket_sort(array, mini, maxi, 10), np.sort(array))
```

Output:

```
True
```

Input:

```
mini = 0.
maxi = 20.
array = np.random.uniform(mini, maxi, size=20)
np.array_equal(bucket_sort(array, mini, maxi, 10), np.sort(array))
```

Output:

```
True
```

## 07. Recursion

```
# Get the nth Fibonacci number
def get_fibonacci(n):
if n <= 2:
return 1
else:
return get_fibonacci(n-1) + get_fibonacci(n-2)
```

Test the function `get_fibonacci(n)`

. Input:

```
[get_fibonacci(i) for i in range(1, 10)]
```

Output:

```
[1, 1, 2, 3, 5, 8, 13, 21, 34]
```

### 07.a Aside

Even though `get_fibonacci(n)`

illustrates well the mechanics of recursion, it is not an effective way of generating Fibonacci numbers. One can verify this using the diagram below, which lists the function calls `f(n)`

= `get_fibonacci(n)`

required to compute the 6th Fibonacci number. It appears evident that many quantities are calculated repeatedly. For example, the whole subtree leading to `f(4)`

is evaluated twice. These unnecessary operations are wasteful, and make `get_fibonacci(n)`

very inefficient. As a matter of fact, it is easy to deduce from the structure of the diagram below that the function has exponential time complexity.

It may now look like recursion ought to be abandoned as a technique for calculating Fibonacci numbers. This impression, however, turns out to be mistaken. Indeed, fixing the problems that weigh `get_fibonacci(n)`

down results in an efficient algorithm, whose time complexity is `O(n)`

. As demonstrated in `get_fibonacci_memo(n)`

below, the key is to store the intermediate function values as they are calculated. To be specific, the dictionary `lookup`

keeps track of all previously encountered Fibonacci numbers, making sure that no time is wasted in computing these again. Notably, the trick of saving partial results is called memoization, and has a central role in dynamic programming.

```
lookup = {}
def get_fibonacci_memo(n):
if n <= 2:
return 1
elif n in lookup:
return lookup[n]
else:
lookup[n] = get_fibonacci_memo(n - 1) + get_fibonacci_memo(n - 2)
return lookup[n]
```

Test the function `get_fibonacci_memo(n)`

. Input:

```
[get_fibonacci_memo(i) for i in range(1,10)]
```

Output:

```
[1, 1, 2, 3, 5, 8, 13, 21, 34]
```

Memoization provides reduced time complexity by increasing the space complexity of the algorithm. In other words, a speedup is obtained at the cost of some memory, that occupied by `lookup`

. This can be regarded as a drawback of memoization, but not in all cases. If the program calculates Fibonacci numbers frequently, having the `lookup`

table becomes an advantage. As a matter of fact, every time `get_function_memo(n)`

is called (for a new value of `n`

), the program can refer to the available intermediate results, and thus offer a quicker answer. These remarks are valid beyond the current example with Fibonacci numbers, and apply to memoization as a general coding technique.

In the above code, you may have spotted that, even though `lookup`

is defined outside `get_fibonacci_memo(n)`

, it is accessed or modified inside the function. As a matter of fact, the dictionary `lookup`

is a global variable, which can be seen from any location in the program, including from inside a function, where variables are normally local. It is important to note that the use of global variables exhibits many dangers. For instance, as global variables can be modified from any line of code, unwanted or incorrect changes become more likely and harder to track down. A common and useful piece of advice is, in fact, to stay clear of global variables if possible. These arguments motivate the following variant of the memoized algorithm to calculate Fibonacci numbers. In particular, a class is defined that surrounds `lookup`

, whereby no global variables are used.

```
class Fibonacci(object):
def __init__(self):
self._lookup = {}
def __call__(self, n):
if n <= 2:
return 1
elif n in self._lookup:
return self._lookup[n]
else:
self._lookup[n] = self.__call__(n - 1) + self.__call__(n - 2)
return self._lookup[n]
```

Test the class `Fibonacci()`

. Input:

```
fn = Fibonacci()
[fn(i) for i in range(1, 10)]
```

Output:

```
[1, 1, 2, 3, 5, 8, 13, 21, 34]
```

## 08. Quick sort

```
def partition(array):
n = len(array)
# Sorting
l = 0
r = n - 1
pointer = (1, 0)
while l < r:
if array[l] > array[r]:
array[l], array[r] = array[r], array[l]
pointer = pointer[::-1]
l += pointer[0]
r -= pointer[1]
return (array, l)
def quick_sort(array):
n = len(array)
if n > 1:
(array, p) = partition(array)
array[:p] = quick_sort(array[:p])
array[p+1:] = quick_sort(array[p+1:])
return array
```

### 08.a Checks

Input:

```
quick_sort([]) == []
```

Output:

```
True
```

Input:

```
array = list(np.random.randint(100, size=120))
np.array_equal(quick_sort(array), np.sort(array))
```

Output:

```
True
```

## 09. Quick sort in real life

## 10. Merging

```
def get_combo(left, right):
combo = []
i = 0
j = 0
# Both left[i:] and right[j:] contain elements
while left[i:] and right[j:]:
if left[i] < right[j]:
combo.append(left[i])
i += 1
else:
combo.append(right[j])
j += 1
# Either left[i:] or right[j:] is empty
combo.extend(left[i:])
combo.extend(right[j:])
return combo
```

### 10.a Checks

Input:

```
get_combo([],[]) == []
```

Output:

```
True
```

Input:

```
get_combo([0,1],[]) == [0,1]
```

Output:

```
True
```

Input:

```
get_combo([],[0,1]) == [0,1]
```

Output:

```
True
```

Input:

```
get_combo([1, 2, 3, 4, 5, 6, 7], [1, 2, 4, 5, 6])
```

Output:

```
[1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6, 7]
```

## 11. Merge sort

```
def merge_sort(array):
n = len(array)
if n > 1:
i = int(n/2)
left = merge_sort(array[:i])
right = merge_sort(array[i:])
return get_combo(left, right)
else:
return array
```

### 11.a Checks

Input:

```
merge_sort([]) == []
```

Output:

```
True
```

Input:

```
array = list(np.random.randint(100, size=120))
np.array_equal(merge_sort(array), np.sort(array))
```

Output:

```
True
```

## 12. Wrap-up

## 13. Credits

**Opening image**: J.L. Haven and C. Hettrick, "Bandelore," U.S. Patent 59,745, Nov. 20, 1866.**Video lectures**: 0612, Educational technologist, Singapore.**Folk-dance video**: Algo-rythmics, Sapientia University, Târgu Mureș, Romania.**Coding**: Alberto Favaro with advice from Pascal Bugnion.