As a side project I have been doing some self-study on encryption to better understand it. It is how we protect our data as it travels across the internet or when at rest, we use concepts from it to verify that we sent messages, and whole currency schemes are built around the idea. Encryption is an incredibly dense topic and it is easy to mess up. As such, all of the code I have written should not be used for any real encryption. The goal here is to make some of the concepts easier to understand. For those interested in learning more about the history, I quite enjoyed The Code Book. As for a more in-depth understanding of cryptography, take a look at Cryptography Engineering. Then you and I can struggle through it together!
Diffie-Hellman Key Exchange
The first topic for encryption we will go over is the Diffie-Hellman Key Exchange (DHKE). I think there is no better way to explain the basics than how Wikipedia does, specifically in the image below. Alice and Bob want to share a key to encrypt their communications. Using paint (instead of math) we think of it like so:
- Alice and Bob decide on a common paint color
- Alice and Bob pick their own secret paint color
- Alice and Bob combine their secret color with the common color
- Alice and Bob trade this mixed paint
- Alice mixes Bob’s paint with her secret color
- Bob mixes Alice’s paint with his secret color
- Now Alice and Bob have a shared secret color
(Alice and Bob will be recurring characters so you will have to get used to them!)
This section is more of a big picture section, so if you already have a basic understanding of encryption feel free to move on (or not, maybe you enjoy reading what I write). A key, in cryptography, is used to lock and unlock a message. But imagine trying to share a key and send a message to someone via a courier. If the message is clear text, the courier can read it. So we encrypt it. How do we then share a key with that person to decrypt it? If we just send the key, then the courier has that also and can decrypt it. DHKE aims to solve that problem. Let’s take a look at a working example.
A Working Example
I wrote some code to handle this which you can find here. Below is the output you can see from running it (with some flags to make it more verbose).
A note on my iconography:
- x -> y : means x sends a message to y
- Internal means that it is occurring locally for that person
First Alice tells Bob to set his prime base to 5 (and Eve, who is attempting to eavesdrop, sees this). Then Alice tells Bob to set his prime mod to 23 (which Eve also sees). Then both Alice and Bob create a random secret and perform a calculation using that secret and the prime base and prime mod. Then Alice tells Bob the result of her calculation and vice versa (and Eve sees both of these). Finally they perform one more set of calculations and arrive at a shared secret.
DHKE In Math
So how does this actually work? Modular arithmetic to the rescue! The equation that we are looking at is g^x mod p. g and p combine (sort of) to make our common paint. There are some restrictions around what numbers we are allowed to choose for g and x, one of which I discuss later, but we do know that p is prime. x is where this gets interesting. Alice and Bob will each select their own secret number for this (their secret paint as it were), a and b respectively. This can be any number. Then each calculates g^x mod p, and shares it (their mixed paint). Now Alice has g^b mod p (B from now on) from Bob and Bob has g^a mod p (A from now on) from Alice (Eve would have these as well). Then each takes their secret key and combines it mod p with the number they received. So Alice has B^a mod p and Bob has A^b mod p, which is their secret key. Wait, B^a mod p and A^b mod p are the secret keys? That must mean they are the same value! How cool is that? Let’s go back through the steps to understand it better.
- Alice and Bob decide on two numbers g and p.
- Alice and Bob select a secret number, a and b respectively
- Alice and Bob calculate g ^ x mod p where x is their number
- Alice and Bob share these numbers (A and B)
- Alice and Bob combine the shared number with their secret to generate the shared secret
So let’s look at what we have at step 5:
Alice has: B^a mod p = (g^b mod p) ^ a mod p = g^ab mod p
Bob has: A^b mod p = (g^a mod p) ^ b mod p = g^ab mod p
You might be saying, I still don’t get why that works. This is where a deeper understanding of modular arithmetic comes in handy. I will walk you through some of the basics (or not, if you have already had enough math, but I would say soldier on, you only need elementary math knowledge for the most part).
When we calculate something like y mod z what we are really calculating is the remainder of y / z (or y % z in a lot of programming languages). When y < z then y % z is y. When it is larger it works like a clock. As y grows y % z will go through all values 0 to z – 1 until it is a multiple of z and then it will go back to 0 and then it will proceed through all of those values again. It turns out that the exponentiation operation in modular arithmetic is transitive. So (a ^ b) ^ c mod d = (a ^ c) ^ b mod d = a^ (bc) mod d. So Alice is calculating (g^b mod p) ^ a mod p which is (g^b)^a mod p. From that, both sides end up having g^ab mod p.
This is where one of the restrictions on g and p comes in. For a ^ b mod c, the possible results can be limited depending on the numbers selected. Let’s look at an example: a ^ b mod 7.
2 ^ 1 mod 7 = 2 mod 7 = 2
2 ^ 2 mod 7 = 4 mod 7 = 4
2 ^ 3 mod 7 = 8 mod 7 = 1
2 ^ 4 mod 7 = 16 mod 7 = 2
2 ^ 5 mod 7 = 32 mod 7 = 4
2^ 6 mod 7 = 64 mod 7 = 1
See where that is an issue? It only results in 3 numbers which means that the possible secret keys are only half of all numbers less than 7 can be a key (and limiting the key space is a bad thing!).
So instead we pick a number like 3:
3 ^ 1 mod 7 = 3 mod 7 = 3
3 ^ 2 mod 7 = 9 mod 7 = 2
3 ^ 3 mod 7 = 27 mod 7 = 6
3 ^ 4 mod 7 = 81 mod 7 = 4
3 ^ 5 mod 7 = 243 mod 7 = 5
3^ 6 mod 7 = 729 mod 7 = 1
Here we can see that 3^x mod 7 can be any number less than 7, so our key space is larger. If you want to learn more about that it’s called a primitive root modulo n.
Alice and Bob have managed to generate the same number, but how does that solve this problem. Let’s look at this from Eve’s perspective.
Without any of the flags, the output shows what Eve can see only. In this example, Eve sees g and p as well as A and B. To determine what the secret key is, Eve needs to determine what a (or b) is, which means solving this equation for a: A = g^a mod p. As it turns out, this is a very tough equation to solve. These tough problems are often called trap door functions, functions that are easy to calculate in one direction, but not in the other (some doors you can’t come back from). This problem is specifically called the discrete logarithm problem, if you’re interested in looking further.
That means that Alice and Bob have shared a number, over an insecure medium, without revealing what that number is. Using that number, they can now send messages that are encrypted that they both can decrypt. Yay! Now you know a little bit more about DHKE and how we protect our secrets. To promote further security, we actually will regenerate a key for each session. So if one of your keys traded in this manner is ever actually calculated, it only affects that session. This is what we call forward secrecy!
I have two more concepts on deck to talk about soon (you may have noticed the git repository), the RSA algorithm for asymmetric encryption and HMACs. Once I have them completed the last sentence will contain links! Go forth and hack encrypt!