Coding stunts, job interviews, and sorting algorithms

I was never able to pull off a wheelie with the bike, or to do a walk-the-dog with the yoyo. As many of us have learned at the playground, not knowing these tricks usually means that you won’t be able to hang around with the bigger kids. Not even if you want it so, so much. Fast-forward a few decades and, as it happens, you are looking for a job in data science. The bigger kids have transformed into companies that do awesome things like convolutional neural networks and reinforcement learning. The playground logic, however, still generally applies. Unless you know the right programming stunts, you are unlikely to ace interviews with those amazing companies. And what’s worse, you still can’t pull off a wheelie.

An essential coding trick to learn in preparation for data-science job interviews is sorting algorithms. If you are not familiar with these, and are looking for a gentle introduction, you have landed in the right blog post. Hopefully the code will not show it, but I am a beginner too. Indeed, this blog post originated as an exercise, where I challenged myself to program every algorithm described in 0612’s YouTube video series Sorting Algorithms Redux. Encouraged by ASI Data Science, I turned that piece of work in a brief tutorial, adding some text and watchfully revising the code. I have learned a lot both from 0612’s videos and from several comments offered by Pascal Bugnion re. how to correctly implement the algorithms. Let me take the opportunity to thank them, and to inform the reader that any mistake is entirely my own. If you are getting ready for coding interviews, this blog post should be useful, particularly if you are aiming high!

Two side-comments are in order. (i) The source (Jupyter) notebook for this blog post can be downloaded here. (ii) As I have avoided using further packages, the code relies purely on in-built functions. The unique exception is NumPy, Python's main package for scientific calculations, which I employed for testing the output of algorithms.

# NumPy will be used only to test the code
import numpy as np  

00. Introduction

01. Time complexity

02. Selection sort

def argmin(array):  
    # Get the index of the smallest
    # element in the array
    arg = 0
    for i in range(1, len(array)):
        if array[i] < array[arg]:
            arg = i
    return arg

def selection_sort(array):  
    n = len(array)
    for i in range(n):
        j = argmin(array[i:n]) + i
        array[i], array[j] = array[j], array[i]
    return array

02.a Checks

Input:

selection_sort([]) == []  

Output:

True  

Input:

array = list(np.random.randint(100, size=120))  
np.array_equal(selection_sort(array), np.sort(array))  

Output:

True  

03. Bubble sort

def bubble_sort(array):  
    n = len(array)
    for i in range(n):
        changed = False
        for j in range(n-i-1):
            if array[j] > array[j+1]:
                array[j], array[j+1] = array[j+1], array[j]
                changed = True
        if not changed:
            break
    return array

03.a Checks

Input:

bubble_sort([]) == []  

Output:

True  

Input:

array = [i for i in range(10)]  
bubble_sort(array) == array  

Output:

True  

Input:

array = list(np.random.randint(100, size=120))  
np.array_equal(bubble_sort(array), np.sort(array))  

Output:

True  

04. Cocktail sort

# Sort while moving forward or backward
def partly_sort(array, i, forward):  
    n = len(array)
    # Range of forward traverse
    if forward == True:
        interval = range(i, n-i-1)
    # Range of backward traverse
    elif forward == False:
        interval = reversed(range(i, n-i-2))
    # Partly sort
    changed = False
    for j in interval:
        if array[j] > array[j+1]:
            array[j], array[j+1] = array[j+1], array[j]
            changed = True
    return (array, changed)

# Cocktail sort
def cocktail_sort(array):  
    n = len(array)
    for i in range(n):
        (array, changed) = partly_sort(array, i, True)
        if not changed:
            break
        (array, changed) = partly_sort(array, i, False)
        if not changed:
            break
    return array

04.a Checks

Input:

cocktail_sort([]) == []  

Output:

True  

Input:

array = [i for i in range(10)]  
cocktail_sort(array) == array  

Output:

True  

Input:

array = list(np.random.randint(100, size=120))  
np.array_equal(cocktail_sort(array), np.sort(array))  

Output:

True  

05. Insertion sort

def insertion_sort(array):  
    n = len(array)
    for i in range(n-1):
        j = i
        while (j+1 > 0) and (array[j] > array[j+1]):
            array[j], array[j+1] = array[j+1], array[j]
            j -= 1
    return array

05.a Checks

Input:

insertion_sort([]) == []  

Output:

True  

Input:

array = list(np.random.randint(100, size=120))  
np.array_equal(insertion_sort(array), np.sort(array))  

Output:

True  

06. Bucket sort

def bucket_sort(array, mini, maxi, b, basic_sort = bubble_sort):  
    n = len(array)
    # Safety checks
    if n == 0:
        return []
    elif n < b:
        raise ValueError('Number of buckets exceeds length of array.')
    # Bucketing
    step = float(maxi - mini)/b
    buckets = [[] for i in range(b)]
    result = []
    for value in array:
        # Range for last bucket
        # includes both endpoints
        if value == maxi:
            j = b - 1
        # Ranges for other buckets
        # include left endpoint only
        else: 
            j = int((value - mini)/step)
        buckets[j].append(value)
    # Sorting and recombining
    for bucket in buckets:
        result.extend(basic_sort(bucket))
    return result

06.a Checks

Input:

mini = 0  
maxi = 20  
array = np.random.random_integers(0, 20, size=20)  
np.array_equal(bucket_sort(array, mini, maxi, 10), np.sort(array))  

Output:

True  

Input:

mini = 0.  
maxi = 20.  
array = np.random.uniform(mini, maxi, size=20)  
np.array_equal(bucket_sort(array, mini, maxi, 10), np.sort(array))  

Output:

True  

07. Recursion

# Get the nth Fibonacci number 
def get_fibonacci(n):  
    if n <= 2:
        return 1
    else:
        return get_fibonacci(n-1) + get_fibonacci(n-2)

Test the function get_fibonacci(n). Input:

[get_fibonacci(i) for i in range(1, 10)] 

Output:

[1, 1, 2, 3, 5, 8, 13, 21, 34]

07.a Aside

Even though get_fibonacci(n) illustrates well the mechanics of recursion, it is not an effective way of generating Fibonacci numbers. One can verify this using the diagram below, which lists the function calls f(n) = get_fibonacci(n) required to compute the 6th Fibonacci number. It appears evident that many quantities are calculated repeatedly. For example, the whole subtree leading to f(4) is evaluated twice. These unnecessary operations are wasteful, and make get_fibonacci(n) very inefficient. As a matter of fact, it is easy to deduce from the structure of the diagram below that the function has exponential time complexity.

calls_stack

It may now look like recursion ought to be abandoned as a technique for calculating Fibonacci numbers. This impression, however, turns out to be mistaken. Indeed, fixing the problems that weigh get_fibonacci(n) down results in an efficient algorithm, whose time complexity is O(n). As demonstrated in get_fibonacci_memo(n) below, the key is to store the intermediate function values as they are calculated. To be specific, the dictionary lookup keeps track of all previously encountered Fibonacci numbers, making sure that no time is wasted in computing these again. Notably, the trick of saving partial results is called memoization, and has a central role in dynamic programming.

lookup = {}  
def get_fibonacci_memo(n):  
    if n <= 2:
        return 1
    elif n in lookup:
        return lookup[n]
    else:
        lookup[n] = get_fibonacci_memo(n - 1) + get_fibonacci_memo(n - 2) 
        return lookup[n]

Test the function get_fibonacci_memo(n). Input:

[get_fibonacci_memo(i) for i in range(1,10)]

Output:

[1, 1, 2, 3, 5, 8, 13, 21, 34]

Memoization provides reduced time complexity by increasing the space complexity of the algorithm. In other words, a speedup is obtained at the cost of some memory, that occupied by lookup. This can be regarded as a drawback of memoization, but not in all cases. If the program calculates Fibonacci numbers frequently, having the lookup table becomes an advantage. As a matter of fact, every time get_function_memo(n) is called (for a new value of n), the program can refer to the available intermediate results, and thus offer a quicker answer. These remarks are valid beyond the current example with Fibonacci numbers, and apply to memoization as a general coding technique.

In the above code, you may have spotted that, even though lookup is defined outside get_fibonacci_memo(n), it is accessed or modified inside the function. As a matter of fact, the dictionary lookup is a global variable, which can be seen from any location in the program, including from inside a function, where variables are normally local. It is important to note that the use of global variables exhibits many dangers. For instance, as global variables can be modified from any line of code, unwanted or incorrect changes become more likely and harder to track down. A common and useful piece of advice is, in fact, to stay clear of global variables if possible. These arguments motivate the following variant of the memoized algorithm to calculate Fibonacci numbers. In particular, a class is defined that surrounds lookup, whereby no global variables are used.

class Fibonacci(object):

    def __init__(self):
        self._lookup = {}

    def __call__(self, n):
        if n <= 2:
            return 1
        elif n in self._lookup:
            return self._lookup[n]
        else:
            self._lookup[n] = self.__call__(n - 1) + self.__call__(n - 2) 
            return self._lookup[n]        

Test the class Fibonacci(). Input:

fn = Fibonacci()  
[fn(i) for i in range(1, 10)]

Output:

[1, 1, 2, 3, 5, 8, 13, 21, 34]

08. Quick sort

def partition(array):  
    n = len(array)
    # Sorting
    l = 0
    r = n - 1
    pointer = (1, 0)
    while l < r:
        if array[l] > array[r]:
            array[l], array[r] = array[r], array[l] 
            pointer = pointer[::-1]
        l += pointer[0]
        r -= pointer[1]
    return (array, l)

def quick_sort(array):  
    n = len(array)
    if n > 1:
        (array, p) = partition(array)
        array[:p] = quick_sort(array[:p])
        array[p+1:] = quick_sort(array[p+1:])
    return array

08.a Checks

Input:

quick_sort([]) == []  

Output:

True  

Input:

array = list(np.random.randint(100, size=120))  
np.array_equal(quick_sort(array), np.sort(array))  

Output:

True  

09. Quick sort in real life



10. Merging

def get_combo(left, right):  
    combo = []
    i = 0
    j = 0
    # Both left[i:] and right[j:] contain elements
    while left[i:] and right[j:]: 
        if left[i] < right[j]:
            combo.append(left[i])
            i += 1
        else:
            combo.append(right[j])
            j += 1
    # Either left[i:] or right[j:] is empty
    combo.extend(left[i:])
    combo.extend(right[j:])
    return combo

10.a Checks

Input:

get_combo([],[]) == []  

Output:

True  

Input:

get_combo([0,1],[]) == [0,1]  

Output:

True  

Input:

get_combo([],[0,1]) == [0,1]  

Output:

True  

Input:

get_combo([1, 2, 3, 4, 5, 6, 7], [1, 2, 4, 5, 6])  

Output:

[1, 1, 2, 2, 3, 4, 4, 5, 5, 6, 6, 7]

11. Merge sort

def merge_sort(array):  
    n = len(array)
    if n > 1:
        i = int(n/2)
        left = merge_sort(array[:i])
        right = merge_sort(array[i:])
        return get_combo(left, right)
    else: 
        return array

11.a Checks

Input:

merge_sort([]) == []  

Output:

True  

Input:

array = list(np.random.randint(100, size=120))  
np.array_equal(merge_sort(array), np.sort(array))  

Output:

True  

12. Wrap-up

13. Credits

  • Opening image: J.L. Haven and C. Hettrick, "Bandelore," U.S. Patent 59,745, Nov. 20, 1866.
  • Video lectures: 0612, Educational technologist, Singapore.
  • Folk-dance video: Algo-rythmics, Sapientia University, Târgu Mureș, Romania.
  • Coding: Alberto Favaro with advice from Pascal Bugnion.
Share on: Twitter, Facebook, LinkedIn or Google+