In a recent job interview I was given the following challenge:
Assume a character encoding a = 1, b = 2, c = 3, ... , z = 26. Given a string of numbers (e.g.
'123456'
), figure out how many different ways that string could be decoded.
I didn't arrive at a solution in the few minutes I had in the interview. The interviewer briefly described a potential solution doing a heirarchical partitioning of the string. I kept thinking about it afterwards and I couldn't shake the feeling that there was a tricky math problem in here, not just a programming problem.
The difficult part of this challenge is figuring out how many decodings there are when there are overlapping possible groupings of numbers. For example, the string '111'
can be decoded as ['1', '1', '1']
, ['11', '1']
, or ['1', '11']
. I never figured out a formula for calculating the number of decodings but by manually figuring out the number of decodings for the strings '1'
, '11'
, '111'
, '1111'
, '11111'
, '111111'
, etc. I noticed that the number of decodings follow the Fibonacci sequence starting with 1, 2, 3, 5, 8...! Unfortunately I don't have a good explanation for why that is the case. (Though it makes gut level sense in the way the Fibonacci sequence is a sum of everything that has come before.)
So, my strategy for solving this problem follows these basic steps:
'10'
or '20'
,
or cannot be broken down further because it has an unknown number of possible decodings.
These substrings can be considered independently for the purposes of this problem.'956'
becomes ['9', '5', '6']
and '121956'
becomes ['1219', '5', '6']
.'10'
, and '20'
this is just 1.
For other substrings this is the $N$th value in the Fibonacci series 1, 2, 3, 5, 8... where
$N$ is the length of the substring.You can see my code solving this problem below. The functions roughly corresponding to the above steps are:
chunk_numbers
calc_substring_combos
+ series
calc_combos
from operator import mul
def calc_combos(nums):
"""
Calculate the number of possible ways a string of numbers can be decoded
using the scheme 1 -> a, 2 -> b, ... , 26 -> z.
Parameters
----------
num : str
String of numeric characters. It is assumed that the string
has at least one valid decoding.
Returns
-------
combos : int
The number of ways `num` can be decoded.
"""
return reduce(mul, (calc_substring_combos(chunk) for chunk in chunk_numbers(nums)))
def chunk_numbers(nums):
"""
Separate a string of numbers into a list of substrings.
Returned substrings are as small as possible (e.g. one or two characters)
except where numbers are grouped such that they have multiple
decodings.
Parameters
----------
num : str
String of numeric characters. It is assumed that the string
has at least one valid decoding.
Returns
-------
subnums : list of str
List of substrings broken down as much as possible.
"""
subnums = ['']
for i, n in enumerate(nums[:-1]):
subnums[-1] += n
if n == '1':
continue
elif n == '2' and nums[i + 1] in '0123456':
continue
else:
subnums.append('')
subnums[-1] += nums[-1]
return subnums
def series(n):
"""
Return the n-th member of the series 1, 2, 3, 5, 8, 13...
(Which is actually the Fibonacci series minus the first 1.)
"""
r0 = 1
r1 = 1
for _ in xrange(n):
r0, r1 = r1, r0 + r1
return r0
def calc_substring_combos(subnums):
"""
For a given string of numbers, return the number
of possible cominations for decoding that string.
Parameters
----------
subnums : str
String of digits. Should be irreducable if passed through
check_numbers. Assumed to have a valid decoding.
Returns
-------
combos : int
The number of possible decodings for this substring.
"""
if len(subnums) == 1 or (len(subnums) == 2 and subnums[1] == '0'):
return 1
else:
return series(len(subnums))
# tests for calc_combos
assert calc_combos('9') == 1, calc_combos('9')
assert calc_combos('999') == 1, calc_combos('999')
assert calc_combos('1111') == series(4), calc_combos('1111')
assert calc_combos('11911') == 6, calc_combos('11911')
assert calc_combos('119119') == 9, calc_combos('119119')
assert calc_combos('119119119') == 27, calc_combos('119119119')
assert calc_combos('12911') == 4, calc_combos('12911')
assert calc_combos('1211') == series(4), calc_combos('1211')
assert calc_combos('101010') == 1, calc_combos('101010')
assert calc_combos('202020') == 1, calc_combos('202020')
assert calc_combos('212121') == series(6), calc_combos('212121')
# tests for calc_substring_combos
assert calc_substring_combos('1') == series(1), calc_substring_combos('1')
assert calc_substring_combos('11') == series(2), calc_substring_combos('11')
assert calc_substring_combos('212121') == series(6), calc_substring_combos('212121')
# tests for series
assert series(1) == 1, series(1)
assert series(2) == 2, series(2)
assert series(3) == 3, series(3)
assert series(4) == 5, series(4)
assert series(5) == 8, series(5)
assert series(6) == 13, series(6)
assert series(7) == 21, series(7)
# tests for chunk_numbers
assert chunk_numbers('1') == ['1'], chunk_numbers('1')
assert chunk_numbers('10') == ['10'], chunk_numbers('10')
assert chunk_numbers('567') == ['5', '6', '7'], chunk_numbers('567')
assert chunk_numbers('1111') == ['1111'], chunk_numbers('1111')
assert chunk_numbers('11911') == ['119', '11'], chunk_numbers('11911')
assert chunk_numbers('12911') == ['12', '9', '11'], chunk_numbers('12911')
assert chunk_numbers('123') == ['123'], chunk_numbers('123')