Browsing Wikipedia one day I came across an interesting line from the Factorial page under computation
The asymptotically best efficiency is obtained by computing n! from its prime factorization. As documented by Peter Borwein, prime factorization allows n! to be computed in time O(n(lognloglogn)2), provided that a fast multiplication algorithm is used (for example, the Schönhage–Strassen algorithm). Peter Luschny presents source code and benchmarks for several efficient factorial algorithms, with or without the use of a prime sieve.
In this post I describe two algorithms for computing n!, code them up in Python, plot and discuss their runtimes, share links to detailed analysis and try parse the important differences between the two algorithms.
What makes such an algorithm have better efficiency is that we can compute the unique prime factorisation of a factorial first and then use it to compute the factorial. I'll go through the steps to do this now.
Let's consider 9! as an example
The naive way to compute it is by the definition, 9!=9⋅8⋅7⋅6⋅5⋅4⋅3⋅2=9∏k=2k
What we did was count how many times each prime and it's powers divide into 9⋅8⋅7⋅6⋅5⋅4⋅3⋅2, i.e how many multiples of each prime and it's powers are there between 1 and 9.
This is easily computed with division, revisiting 2, 922=2.25, we can ignore the .25 since we are looking for the number of times it divides evenly. The mathematical way of writing this is using the floor function ⌊x⌋, which rounds down to the nearest, so we want ⌊922⌋=2. So to get the same from before, we sum ⌊92⌋+⌊922⌋+⌊923⌋=4+2+1=7
So for a general n, to get the prime factorisation we do this for each prime p and each of it's powers. The power of p dividing n!, νp(n!)=⌊logpn⌋∑k=1⌊npk⌋
%matplotlib inline
import math
import numpy as np
import timeit
import matplotlib.pyplot as plt
We will use a sieve of Eratosthenes to get the set of all the primes below n. The code I found on Rosetta Code, was a concise and nice way of using Python generators and sets.
When we iterate through eratosthenes(n)
it checks to see if the current number, i, is not part of the composite
set, if so then return it as a prime and add it's multiples to the set, starting from it's square. We update the set starting from p2 since multiples beneath will have already been added, Eg. 2p was added for multiples of 2.
def eratosthenes(n):
composite = set()
for i in range(2, n+1):
if i not in composite:
yield i
composite.update(range(i**2, n+1, i))
factorFactorial(n)
computes n! using the code and methods dicussed above.
def factorFactorial(n):
factorial = 1
for p in eratosthenes(n):
v_p = 0
for k in range(1,math.ceil(math.log(n,p))):
v_p += n // p**k
factorial = factorial*(p**v_p)
return factorial
iterFactorial(n)
computes n! using the definition.
def iterFactorial(n):
fact = 1
for i in range(1,n+1):
fact = fact * i
return fact
Let's do a quick test computing the 103!, which has over 2500 digits so it's fairly large already.
%timeit factorFactorial(10**3)
%timeit iterFactorial(10**3)
1000 loops, best of 3: 1.14 ms per loop 1000 loops, best of 3: 418 µs per loop
The iterFactorial
function is alot faster, but there is alot of overhead in the factorisation, let's try a larger example 106!
%timeit -n 1 factorFactorial(10**5)
%timeit -n 1 iterFactorial(10**5)
1 loops, best of 3: 2.55 s per loop 1 loops, best of 3: 4.76 s per loop
Ok phew, so it was a bit a bit faster to go through the factoring process, it's only worth it for sufficiently large n though. Let's plot out how long it takes for the two algorithms, with some averaging to remove random fluctiations.
def loop(factorial,myRange,averageNum):
timesAvg = []
for i in myRange:
times = []
for j in range(averageNum):
start_time = timeit.default_timer()
factorial(i**3)
elapsed_time = timeit.default_timer() - start_time
times.append(elapsed_time)
timesAvg.append(np.mean(times))
return timesAvg
from mpl_toolkits.axes_grid1.inset_locator import zoomed_inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset
fig, ax = plt.subplots(figsize=[10,10])
loops = []
myRange = range(1,40)
ax2 = plt.axes([.2, .4, .3, .3], axisbg='w')
for i,f in enumerate([factorFactorial,iterFactorial,math.factorial]):
loops.append(loop(f,myRange,10))
ax.plot(np.power(myRange,3),loops[i],label = f.__name__)
ax.legend(loc = 'upper left')
ax.set_xlabel(r"$n!$",fontsize = 20)
ax.set_ylabel("time",fontsize = 20)
ax2.plot(np.power(myRange,3)[:20],loops[i][:20])
#ax2.set_yticklabels([])
mark_inset(ax, ax2, loc1=2, loc2=4, fc="none", ec="0.5")
ax.set_title("Comparision of different factorial algorithms")
plt.show()
plt.close()
The factorFactorial(n)
function seems to gain the upper hand around n=5000
and the difference gets much larger from then on out. As a matter of interest I plotted python's built in math.factorial(n)
, it rely's on a more complicated algorithm and is written in c. It vastly outpreforms factorFactorial(n)
although it's not as fast as other prime factoring algorithms written in C.
In the paper On the complexity of calculating factorials, Peter B. Borwein algorithms very similar to the ones above are compared thouroughly, the results being:
Naive multiplication, iterFactorial(n)
is O(nM(nlogn))
Algorithm similar to factorFactorial(n)
is O(loglognM(nlogn)), where M(n) is the complexity of multiplying two n digit numbers.
I think alot of what the paper describes is slightly beyond the scope of this post so I'll try give a brief explanation why factorFactorial(n)
is faster for n≫1. We can notice the difference in the above complexities is logn being replaced by the slower growing loglogn, the orgin of these terms might shed some light on the differences.
n in iterFactorial(n)
comes from doing the n multiplications.
loglogn in factorFactorial(n)
comes from where we have to sum the complexity of the steps for each prime. At each prime,pi we cross off n/pi multiples, so summing these we get O(∑ni=11pi)=O(loglogn). (Sum of reciprocals of primes)
There are lot's of possible optimisations, I kept the code as simple to highlight the algorithm. Finding further optimisations could be an interesting excersise, eg slightly modifying the algorithm, using another method to find prime numbers or rewriting the algorithm with Numpy.
Here are a couple of resources I used making this post: Prime numbers and Numpy, Rebrained blog, Python factorial source code, On the complexity of calculating factorials, Peter B. Borwein, A great webpage detailing and comparing lot's of Factorial Algorithms