An example of RSA

How does RSA encrypt and decrypt information? I'll outline the process below. You can follow along to produce your own secret messages using this free SAGE notebook server. (when using the SAGE server, the sg prefix used in my code is superfluous.)

The mathematics used in this example is very simple number theory, but it is necessary that you understand the basics of modular arithmetic. If you've never seen congruence arithmetic before, you may want to take a moment to read this article.

The line of code below is necessary for those who might want to follow along using the iPython notebook rather than the online SAGE notebook. Using SAGE from within iPython unfortunately does take some work -- these instructions are a good starting place.

In [7]:
#sage users do not need this line
import sage.all as sg

RSA is an example of public key cryptography. The basic idea is that for every communicator (you!) there exists a private key and a public key, and together these keys allow for the encryption and decryption of messages to you. Anyone in the world can encrypt a message (using your public key) and send it to you in such a way that only you can decrypt the result (using your private key). Notice that in this scenario, "you" are the only agent receiving messages. Also note that the messages are encoded especially for you, using your public key. To produce messages and send them secretly to other agents, you must use their keys, in particular their public key. They then decode messages encrypted for them via their private key.

At the most abstract level, you can think of a key as just being information. Of course disembodied information is somewhat mysterious and impractical. Keys are represented as integers, i.e. whole numbers. In RSA the keys are actually pairs of integers.

Now we'll see how the public and private keys are made. The process requires several steps. The outline of the key generation process for RSA given on Wikipedia is considerably more concise than what I will do -- if you are able to follow the mathematics, you may want to skim the outline given there.

The first step is to select two large (mine aren't so large) prime numbers, which we'll call $p$ and $q$. Python together with the SAGE libraries handily does this for us in the next cell.

In [39]:
#sage users do not need the sg prefix
p = sg.random_prime(100,lbound=15)
q = sg.random_prime(100,lbound=15)
[p,q]       !
[37, 73]

The above python code gave us two random prime numbers $p$ and $q$, selected at random in the range 15 to 100. It turns out that $p = 37$ and $q = 73$. Think of these numbers as ingredients to the keys we are making. These ingredients are secret -- we don't want anyone copying our recipe exactly. These numbers are also the whole secret in the sense that the values of $p$ and $q$ are the only secret parts of the encryption/decryption process. They are not the key(s) exactly, but we will see that knowing $p$ and $q$ along with the public key is enough to deduce the private key (i.e. to break RSA).

The next step is to combine these ingredients to make a public value called the modulus.

In [40]:
n = p*q               #making the modulus n
n                     #public -- anyone can know this!

Though $p$ and $q$ are secret and special to each communicator, the modulus $n=pq$ is public knowledge. It may seem strange that a number would be public knowledge, but its factors secret. Couldn't someone just factor $n$ and recover $p$ and $q$? The answer is yes in principle -- a person who was able to do this could read our secret messages. However, at the time of this writing, there is no known method to factor $n$ in a reasonable amount of time when $p$ and $q$ are sufficiently large. Obviously, the $p$ and $q$ in this example are not large enough to offer real security!

The next stage in cooking up the public and private keys is to apply Euler's totient function $\phi(\cdot)$ to the modulus $n$. The reason for doing this is somewhat technical--it has to do with a mathematical proposition based on the Chinese Remainder Theorem which ensures that messages are encrypted and decrypted correctly. Mathematically, this is the "brilliant" part of RSA. But it's easy to understand what $\phi(n)$ does to $n$ -- it just returns the number of integers in the set {$1,2,\ldots,n-1$} which are relatively prime to $n$.

In [41]:
phi_n = sg.euler_phi(n)
phi_n                 #secret!

Note that this means that in the set {1,2,...,2700}, there are 2592 elements $a$ such that $gcd(a,2701) = 1$.

The value $\verb=phi_n=$, or $\phi(n)$, is a secret ingredient in the key generation process. Knowing $\phi(n)$ is actually "equivalent" to knowing the secret values $p$ and $q$ because of the relationship

$$\phi(pq) = (p-1)(q-1)$$

which holds whenever $p$ and $q$ are prime. To flesh things out a little bit more, we are saying that if we know

$$2592 = (p-1)(q-1)$$

and also that

$$2701 = pq$$

then these two equations in two unknowns can easily be solved to find $p$ and $q$.

In particular, we do $q = \frac{2701}{p}$ and then make this substitution in the first equation to yield

$$2592 = (p-1)(\frac{2701}{p}-1)$$

This equation, in turn, can be solved using the quadratic formula to give $p = 37$, from which point the value of $q$ is immediate via $q = \frac{2701}{p}$ .

Conversely, if we know $p$ and $q$ we can easily find $\phi(n)$. This is the sense in which knowing $\phi(n)$ is "equivalent" to knowing $p$ and $q$.


We have now arrived at a stage at which we can say what keys in RSA "really are". The public key in RSA is a pair of integers $(n,e)$ where $n$ is the modulus and $e$ is a value called the public exponent. There is some free choice in selecting $e$. For the math to work, we need only that $1 < e < \phi(n)$ and that $e$ and $\phi(n)$ are relatively prime.

For those who know a little group theory, the point of this condition on $e$ is to ensure that $e$ is a member of the multiplicative group of integers mod $\phi(n)$, and in particular that $e^{-1}$ exists.

In plain language, the condition guarantees that there is another number $d$ with $1 < d < \phi(n)$ such that $e\cdot d \equiv 1$ mod $\phi(n)$. In fact, the resulting $d$ not only exists, but is unique.

The number pair $(n,d)$ constitutes the private key in RSA. Note that the roles played by $e$ and $d$ are completely symmetric. The public key could be swapped for the private key, and all the constraints would still hold.

The following bit of code selects a public exponent $e$ through a random process. In a real implementation, $e$ would be selected slightly more carefully.

In [49]:
my_public_exponent = sg.randint(10,phi_n) 
while sg.gcd(my_public_exponent,phi_n) != 1:
    my_public_exponent = sg.randint(10,phi_n-1)
my_public_exponent         #public (duh)

Now that we have $e=913$, we can quickly compute $d$. The code masks the underlying process of computing $d$, which is quite efficient -- it amounts to using the extended Euclidean algorithm on $e$ and $\phi(n)$ to find the Bezout coefficient of $e$.

In [50]:
my_private_exponent = sg.inverse_mod(my_public_exponent,phi_n)
my_private_exponent        #secret!

Thus we have $d = 2257$. If you like you can check that $913\cdot2257 \equiv 1$ mod $2592$.

The key generation phase of RSA is now (finally!) over. We have arrived at a public key of $(2701, 913)$ and a private key of $(2701,2257)$. Now all we need is a message to encrypt.

RSA only works on numerical messages. Any message someone wishes to send you must first be coded as an integer between 0 and $n-1$, in some publicly known way. If the message is too big to allow for that then it needs to be broken down into suitably small chunks. Below we select a message at random.

In [51]:
message = sg.randint(0,n-1)  #the message can be anything
message     #secret


Now that we have the message $M=1965$, we want to encrypt it. The formula for computing the ciphertext $C$ from the message $M$ is

$C = M^e$ mod $n$.

In [52]:
ciphertext = sg.power_mod(message,my_public_exponent,n)    
ciphertext            #public

Thus the message $M = 1965$ is encrypted as $C = 1965^{913}$ mod $2701 = 1631$.

Of course, you want to use a special procedure to compute the modular exponent -- it is infeasible to expand $1965^{913}$ precisely. This is why we use the SAGE $\verb=power_mod=$ function rather than direct exponentiation.


Now suppose we want to get from $C$ back to $M$. The formula for this is

$M = C^d$ mod $n$

In [53]:
decoded_ciphertext = sg.power_mod(ciphertext,my_private_exponent,n)

Success! Note that the original message has popped out. All that's left is to figure out why it worked. The reason is as follows.

Notice that when we computed

$C^d$ mod $n$

we equivalently did

${M^e}^d$ mod $n$

by definition of $C$.

We now have to explain why

$M \equiv {M^e}^d$ mod $n$

Unfortunately a full account requires a discussion of the Chinese Remainder Theorem, which is beyond the scope of a short example. For those wishing to go further, here is a place to start. The road to complete enlightenment is not long -- you should be able to prove the correctness of RSA with a couple of hours of study.