We have covered a method for key exchange, and we have covered a way to implement public key encryption and message signing. Our topic today is hash-based message authentication codes or HMAC (a subset of message authentication codes). An HMAC provides us with most of the features of message signing, but it is quicker. There are times when you will use one over the other, and that is a subject for a later time. Right now, let’s get a better understanding of what they are and what they do. Oh also, security disclaimer: don’t use any code for real security purposes…It is proof-of-concept only. You get the picture.
HMAC
The ultimate goal of message authentication codes (and, specifically, HMACs) is to provide integrity and authentication of a message. Encrypting a message doesn’t ensure that the message hasn’t changed and also does not provide a guarantee of its origin. To ensure the message hasn’t changed, we can use a one-way function (a hash) that would be hard to brute force, so only the message will translate to the correct hashed value. Combine that with a secret that only the two people talking know, and this will allow us to prove (within reason) that a participant in the conversation sent it, as well as ensuring that this message was the one originally sent. Now let’s go into how and how not to calculate it.
How Not to Calculate an HMAC
Hopefully you have given some thought into how you would create an HMAC. One idea to jump to would be HMAC(M) = H(secret || M) where H is a hashing algorithm, secret is something known only to the entities communicating, M is the message, and || is the concatenation. This would at first seem valid, but this can actually be exploited, depending on your choice of hashing algorithm. For several common hashing algorithms, the message will be padded to fit into their specific sizes. For example, with MD5, the message will be padded until it is divisible by 512. So any message, M, will now be M||pad where len(M||pad) == 512*n (where n is an integer). Another thing that affects the message authentication is how MD5 (and other algorithms) actually hash their input. Imagine a message M where M = m1||m2, with m1 and m2 each being 512 bits in length. It turns out that MD5(M) == MD5( MD5( m1 ) || m2 ).
See where that could be an issue? Pretend I am sending a message, M, to someone with the HMAC defined above. The full text would be M||HMAC(M||pad). As an attacker, I would know M and HMAC(M), and I can calculate the pad. I could then append something to M like so M’ = M|| pad || evil. We want HMAC(secret || M’) which is MD5(secret || M || pad || evil). Based on how MD5 works, we can now do MD5(HMAC(M) || evil) which is MD5( MD5(secret || M || pad ) || evil) which in turn is MD5(secret || M || pad || evil). So if we were intercepting the their traffic we could alter it. Here is a good write up about this particular exposure, including a sample C program to simulate it.
Methods for Calculating
So how do we actually calculate the thing? A better construction would be something like this: HMAC(M) = MD5( key xor outer_key_pad || md5( key xor inner_key_pad || message) ). This is described in this RFC. This calculation helps us avoid our previous issue. Why is that?
Let A = message || md5( key xor outer_key_pad || md5( key xor inner_key_pad || message) ) that Eve has intercepted. With our broken example, all Eve needed to do was to append evil to the end of the HMAC that she had received and continue hashing. Now it is not so simple. If she were to take the HMAC out of A (call it A2), and append an evil message to the end it would look like so:
A’ = A2 || evil, or MD5(key xor outer_key_pad || md5(key xor inner_key_pad || message)) || evil
which she would then want to run through MD5. Using that would now be:
MD5(MD5(key xor outer_key_pad || md5(key xor inner_key_pad || message)) || evil) which is:
MD5(key xor outer_key_pad || md5(key xor inner_key_pad || message) || evil ) but what eve wanted was
MD5(key xor outer_key_pad || md5(key xor inner_key_pad || message || evil ) ), A subtle but important difference. Given this construction, I know of no attack that we could perform. Now for some sample code!
A Code Example
Below is some code I wrote to calculate an HMAC, and here is a sample output. When run with the “-t” option, it displays the intermediate values for everything and uses a default key of “Jefe” with the message “what do ya want for nothing?”. This was a test case in the RFC for this HMAC algorithm, so we know the expected result.
This example shows the o and i key_pads after they have been xored with the keys. So there we have it! Obviously, this was just written for learning purposes, there are better ways to actually generate an HMAC for security purposes. Here’s one!
Hopefully you learned something new and valuable through this. Encryption is a tough topic and implementing it poorly has caused many issues. Remember to keep encrypting and keep hacking! Happy holidays!
References
https://en.wikipedia.org/wiki/Message_authentication_code
https://en.wikipedia.org/wiki/Hash-based_message_authentication_code
http://www.freeformatter.com/hmac-generator.html
https://gist.github.com/gthomas/3452673
https://docs.python.org/2/library/hmac.html
https://github.com/iagox86/hash_extender
https://tools.ietf.org/html/rfc2202