In a recent job interview I was given the following challenge:

Assume a character encoding a = 1, b = 2, c = 3, ... , z = 26. Given a string of numbers (e.g. '123456'), figure out how many different ways that string could be decoded.

I didn't arrive at a solution in the few minutes I had in the interview. The interviewer briefly described a potential solution doing a heirarchical partitioning of the string. I kept thinking about it afterwards and I couldn't shake the feeling that there was a tricky math problem in here, not just a programming problem.

The difficult part of this challenge is figuring out how many decodings there are when there are overlapping possible groupings of numbers. For example, the string '111' can be decoded as ['1', '1', '1'], ['11', '1'], or ['1', '11']. I never figured out a formula for calculating the number of decodings but by manually figuring out the number of decodings for the strings '1', '11', '111', '1111', '11111', '111111', etc. I noticed that the number of decodings follow the Fibonacci sequence starting with 1, 2, 3, 5, 8...! Unfortunately I don't have a good explanation for why that is the case. (Though it makes gut level sense in the way the Fibonacci sequence is a sum of everything that has come before.)

So, my strategy for solving this problem follows these basic steps:

  1. Break the given string down into substrings such that each substring is one digit, '10' or '20', or cannot be broken down further because it has an unknown number of possible decodings. These substrings can be considered independently for the purposes of this problem.
    • For example, '956' becomes ['9', '5', '6'] and '121956' becomes ['1219', '5', '6'].
  2. For each substring figure out the number of decodings. For single digits, '10', and '20' this is just 1. For other substrings this is the $N$th value in the Fibonacci series 1, 2, 3, 5, 8... where $N$ is the length of the substring.
  3. Multiply together all of the number of decodings for the individual substrings to get the number of of decodings for the original string of numbers.

You can see my code solving this problem below. The functions roughly corresponding to the above steps are:

  1. chunk_numbers
  2. calc_substring_combos + series
  3. calc_combos
In [1]:
from operator import mul
In [2]:
def calc_combos(nums):
    """
    Calculate the number of possible ways a string of numbers can be decoded
    using the scheme 1 -> a, 2 -> b, ... , 26 -> z.

    Parameters
    ----------
    num : str
        String of numeric characters. It is assumed that the string
        has at least one valid decoding.

    Returns
    -------
    combos : int
        The number of ways `num` can be decoded.

    """
    return reduce(mul, (calc_substring_combos(chunk) for chunk in chunk_numbers(nums)))
In [3]:
def chunk_numbers(nums):
    """
    Separate a string of numbers into a list of substrings.
    Returned substrings are as small as possible (e.g. one or two characters)
    except where numbers are grouped such that they have multiple
    decodings.

    Parameters
    ----------
    num : str
        String of numeric characters. It is assumed that the string
        has at least one valid decoding.

    Returns
    -------
    subnums : list of str
        List of substrings broken down as much as possible.

    """
    subnums = ['']
    
    for i, n in enumerate(nums[:-1]):
        subnums[-1] += n
        if n == '1':
            continue
        elif n == '2' and nums[i + 1] in '0123456':
            continue
        else:
            subnums.append('')
            
    subnums[-1] += nums[-1]

    return subnums
In [4]:
def series(n):
    """
    Return the n-th member of the series 1, 2, 3, 5, 8, 13...
    (Which is actually the Fibonacci series minus the first 1.)

    """
    r0 = 1
    r1 = 1
    
    for _ in xrange(n):
        r0, r1 = r1, r0 + r1
    
    return r0
In [5]:
def calc_substring_combos(subnums):
    """
    For a given string of numbers, return the number
    of possible cominations for decoding that string.
    
    Parameters
    ----------
    subnums : str
        String of digits. Should be irreducable if passed through
        check_numbers. Assumed to have a valid decoding.

    Returns
    -------
    combos : int
        The number of possible decodings for this substring.

    """
    if len(subnums) == 1 or (len(subnums) == 2 and subnums[1] == '0'):
        return 1
    else:
        return series(len(subnums))
In [6]:
# tests for calc_combos
assert calc_combos('9') == 1, calc_combos('9')
assert calc_combos('999') == 1, calc_combos('999')
assert calc_combos('1111') == series(4), calc_combos('1111')
assert calc_combos('11911') == 6, calc_combos('11911')
assert calc_combos('119119') == 9, calc_combos('119119')
assert calc_combos('119119119') == 27, calc_combos('119119119')
assert calc_combos('12911') == 4, calc_combos('12911')
assert calc_combos('1211') == series(4), calc_combos('1211')
assert calc_combos('101010') == 1, calc_combos('101010')
assert calc_combos('202020') == 1, calc_combos('202020')
assert calc_combos('212121') == series(6), calc_combos('212121')
In [7]:
# tests for calc_substring_combos
assert calc_substring_combos('1') == series(1), calc_substring_combos('1')
assert calc_substring_combos('11') == series(2), calc_substring_combos('11')
assert calc_substring_combos('212121') == series(6), calc_substring_combos('212121')
In [8]:
# tests for series
assert series(1) == 1, series(1)
assert series(2) == 2, series(2)
assert series(3) == 3, series(3)
assert series(4) == 5, series(4)
assert series(5) == 8, series(5)
assert series(6) == 13, series(6)
assert series(7) == 21, series(7)
In [9]:
# tests for chunk_numbers
assert chunk_numbers('1') == ['1'], chunk_numbers('1')
assert chunk_numbers('10') == ['10'], chunk_numbers('10')
assert chunk_numbers('567') == ['5', '6', '7'], chunk_numbers('567')
assert chunk_numbers('1111') == ['1111'], chunk_numbers('1111')
assert chunk_numbers('11911') == ['119', '11'], chunk_numbers('11911')
assert chunk_numbers('12911') == ['12', '9', '11'], chunk_numbers('12911')
assert chunk_numbers('123') == ['123'], chunk_numbers('123')