That Crypto Code Sample Is Probably Wrong

So, you want to know how to encrypt your data?   Simple enough, as with virtually any programming task these days you can simply type a few key words into Google, (including your programming language of preference, of course) hit enter, and browse through the myriad search results to find an example that looks appealing and seems to do what you need.   From there, you pop it into your code, change a few variable names, and Bob’s your uncle, job done.

There’s only one problem with that.  The sample code that you found?  It’s probably wrong.  Not only is is probably wrong, it’s very possibly dangerously wrong.  Implementing encryption correctly is hard, and it requires a substantial amount of expertise to avoid some of the common pitfalls, and unfortunately, the folks who write crypto code samples often don’t seem to have that expertise.  It certainly is one of the more specialized problems in application development, and probably requires more time and effort to come up to a minimum level of competence than just about any other area, so perhaps this is not so surprising, but unfortunate in any case.

So, what are some of the issues we might see?   Well, let’s take a look at a C# example that I found on the Internet, living in the wild on a tutorial website.  This is the exact sort of code sample I’m warning about.  It’s an example that demonstrates the use of symmetric cryptography (where the same key is used both to encrypt and decrypt the message) using the built in .NET framework crypto support which can be found in the System.Security.Cryptography namespace.

how-not-to-encrypt

I’ve numbered each of the issues I’m going to cover here.  As a rule, all of the mistakes in the Encrypt() method have been duplicated in the Decrypt() method, but I trust that this is easy enough to see, so I haven’t numbered them twice.

Issues

  1. Using a weak key.   One of the critical foundations to the security of symmetric encryption is that an attacker cannot guess the key.  The way we ensure that the attacker cannot guess the key, is by using a random key.  This means a key that is generated using either a true source of randomness, like fair dice, or more typically in the application development space, a cryptographically-secure pseudo-random number generator.   A password, no matter how you obfuscate and embellish it, will never be cryptographically random.  Not even if you just pound on the keyboard for a bit.  Now that doesn’t mean that you can’t use a password at all…But if you do, there are some necessary intermediate steps, which revolve around the application of a key derivation function (KDF) to turn that password into a strong key.  But, that’s computationally expensive and potentially less secure, so generally speaking when a randomly generated key can be used, it should be.   So, where do you get a strong random key?  Well, it’s actually even simpler to create a good key than it is to think up a bad one.  All of the symmetric algorithms in .NET have a GenerateKey() method that will create a strong key of the correct size for you.  Then you just need to save it somewhere safe to use, which brings us to issue number two.
  2. Compiling the key into the application source code.  The number one source of all the cryptographic problems in all of application development is poor key management.  You shouldn’t have to deploy new binaries to change a key.  You shouldn’t have keys saved in source control along with non-secret data, I.E., the rest of your code.  You also shouldn’t have a situation where everyone who has a copy of the binary also has your secret key, should they care to look for it.  The ideal place to store secret keys is in a hardware security module, where they’re locked away safe from prying eyes, but if you have one of those available you’re probably not learning anything from this article.  Realistically, for most Windows-based web applications your best options for secret key storage are either DPAPI, which is the Windows Data Protection API, a mechanism for the storage of sensitive data, or  encrypted configuration sections.
  3. Using a static IV.  One of the security goals of cryptography is that the encrypted ciphertext should not leak information about the plaintext.  (It’s a bit more nuanced that this, for instance, only applying to messages of the same length, but consider this to be a general rule.) This means, for instance, that if you encrypt the same plaintext message twice, the resulting ciphertexts should be completely different, such that I as an attacker can’t discern between two encryptions of the same message, and the encryptions of two entirely different messages.  One of the ways we can achieve this when we’re encrypting multiple messages with the same key, is to use a non-secret random initialization vector, or IV.  It’s the combination of the key and the IV that determines the contents of the ciphertext for a given plaintext message.   If you encrypt a single message twice, changing either the key or the IV will cause the ciphertexts to be unrelated.  Using the same key and IV both times will lead to identical ciphertexts.   More commonly, while the messages you’re encrypting may not be identical, it will show an attacker which messages begin with the same content, which can be extremely valuable information to them.  So, instead of using a static IV for every message, we should use a new, random IV each time.  To a degree this is easier than doing it wrong;  again the .NET framework handles this seamlessly and will generate a new random IV every time you call CreateEncryptor().  You simply don’t set the IV when encrypting, only when decrypting.  The extra work in doing it properly is that now with each encryption you have two outputs, the encrypted ciphertext and the IV which you also need to save and send along with the ciphertext for decryption.  Again, it doesn’t need to be secret, but it does needs to be different for each encryption operation, and in some cases unpredictable as well, so letting .NET generate it for us on each encryption ensures that we have the necessary properties. 
    IMPORTANT NOTE: One thing that we haven’t discussed to this point are the types of symmetric algorithms we might use.  There are stream ciphers, block ciphers, and AEAD (authenticated encryption with additional authenticated data) ciphers.  The type that we’re primarily dealing with in this article are block ciphers, since these are common and generally the default in crypto libraries.  This example, for instance is DES in CBC mode, which is a block cipher, and AES in CBC mode is another common example.  In these cases, reusing IVs leads to the information disclosure vulnerability I mentioned above, which is unfortunate.  When using a stream cipher like RC4 however, or a block cipher in a streaming mode like AES-CTR, reusing IVs is DEADLY and will completely break your encryption scheme, even when the message are entirely different.
    IMPORTANT NOTE NUMBER TWO:  There is a way to use a block cipher without an IV at all, and it’s  called ECB mode.  (ECB stands for Electronic Code Book.)  This is the simplest of all encryption modes because each block of plaintext is directly encrypted with the key, and so there’s no need for an IV.  Because of this, you’ll see it frequently in code samples.  In .NET, for instance, you would see the algorithm instance’s Mode property being set to CipherMode.ECB.  DO NOT EVER DO THIS AND RUN FROM ANY CODE SAMPLE THAT DOES.  Again, remember when we talked about how using a static IV leaks information about the plaintext?  Well, not using one at all in ECB generally leaks information like a sieve, and not only when you have two messages that begin with the same data, but even throughout a single message when it has two blocks with the same data.  This makes figuring out the contents of many messages trivial, even when you only have access to the ciphertext and no knowledge of the key.
  4. Do you know what’s worse that a weak key?  Subtly making the key even weaker.  Now, we could argue about how subtle  Substring(0, 8) actually is or isn’t, but the fact remains that due to the key length constraints of the chosen algorithm (DES, which will talk about in item 6) there was never any intent to use the full key: “2013;[pnuLIT)WebCodeExpert”.  Because DES uses a 56-bit key, it will only ever accept 8 characters, so using this key selection method, the key was never going to be anything but: “2013;[pn“.  All the rest of the password is merely window dressing, added to make it look stronger than it really is, if only at first glance.
  5. Keys are made of bits, not strings of characters.   This is more of a programming error than a crypto error, but it’s a common one to see in crypto code.   The type for a key in any of the algorithms in .NET is a byte array of a length (or lengths) specified by the algorithm.  When we generate a key, we generate a byte array, not a string of characters.  Characters, and especially Unicode characters do not necessarily map 1-to-1 to bytes.  So, if in the code sample above, if any of the first eight characters of the “key” were multi-byte characters, this would cause an unhandled exception because the 8 character substring would contain more than 8 bytes, and 8 bytes is the only legal key size for DES in .NET.  In a happy coincidence, this particular error is being mitigated to some extent by another error we’ve already discussed, the fact that the key is hard-coded into the source.  It would quickly become very apparent if the key chosen were of an illegal size, and should be fixed rapidly.  If key management were done properly however, and the key were a parameter passed in from external secure storage, it becomes much more important to ensure that it is handled properly and checked for validity.  If you do need to store the key as a string (in an configuration file, for instance) it should be in a format specifically designed to represent binary data as a string:  either Base64 or hex encoded.
  6. Choosing the DESCryptoServiceProvider for the encryption algorithm. Two words:  Just Don’t.  There are a number of symmetric encryption algorithms offered by the .NET framework.  None of them are great options, for reasons we’ll cover in item 8, but DES is quite nearly the worst, and should NEVER be used in new systems, for any reason.  Early I mentioned that DES uses 56-bit long keys.  This means that no matter how well you generate your key, and how cryptographically random it is, it’s still weak, because it simply isn’t long enough.  DES keys, even complete random ones, can simply be brute-forced with modern computing capabilities.   This means that an attacker can simply try every possible key until they find the correct one.  As of this writing, with specialized hardware, this can be done in about a day and as time goes on, this will only become faster and cheaper.  DES is not safe.  Do not use it.  Currently the best of the option in the core .NET framework is the AESManaged algorithm which is an implementation of the Rijndael algorithm (a block cipher) as configured in the NIST AES standard, and which we generally call AES.  There are still caveats to its use however, which again we’ll cover in item 8.
  7. Not disposing instances of types that implement IDisposable.  This is another programming, rather than crypto error, but there are lots of types used in crypto code that should be properly disposed of when we’re done with them, so this is also something you’ll commonly see missed in crypto code samples.  In this case, the DESCryptoServiceProvider, the ICryptoTransform returned by des.CreateEncryptor(), the MemoryStream, and the CryptoStream all implement IDisposable and should be wrapping in using() statements to ensure they’re properly cleaned up.
  8. No Authentication.  This is not a problem with the code that is here, but a critical component that’s simply missing all together.  When we think of encryption, the first thing that generally comes to mind is confidentiality, or keeping data secret.  However, there’s a second property that’s equally important that is integrity, or ensuring that the data we decrypt is the same as the data we encrypted.  The way that we do this is through the use of authentication, in the form of a message authentication code (MAC.)  There are a number of MAC constructions, and one of the most common you’ll see is HMAC.  In days past we would take an unauthenticated encryption algorithm, such as the ones offered by the .NET framework, compute the MAC of the data we want to ensure isn’t tampered with (generally the ciphertext and the IV) using a second secret key (the authentication key) and send that along with the IV and ciphertext.   Then, on the decryption side, we again compute the MAC of the ciphertext and IV using the same secret key, verify that our computed MAC matches the MAC that was sent with the ciphertext, and only then can we be assured that integrity has been maintained and that it is safe to attempt decryption.  Getting this right is hard all by itself, and if you do it incorrectly, you can leave your implementation exposed to whole classes of attacks, such as timing attacks on the MAC verification process, or failing to fix IV malleability or padding oracles.  Thankfully, we now have a better way, and this is a new class of ciphers I mentioned before, called AEAD algorithms.  AEAD modes of operation incorporate integrity checking into the basic encryption and decryption operations, deriving an authentication key from the encryption key so you only need to keep track of a single secret key, and ensuring that the MAC is computed and verified correctly, so that there is no chance for you to accidently introduce vulnerabilities into the system.  Unfortunately, there are no implementations of AEAD algorithms in the core .NET framework library, but Microsoft has released an open-source .NET library that exposes the AEAD implementations available in Windows’ Cryptography: Next Generation API.  It’s called the CLR Security library and is available from: https://clrsecurity.codeplex.com/.   This is what you should use.  Once you’ve created a reference to this library in your project, it’s as easy to use as any of the other symmetric algorithms in the core framework.  You create an instance of AuthenticateAesCng as your symmetric algorithm, use the default ChainingMode of CngChaningMode.Gcm, and it will expose one other property called Tag which is the MAC.   So, when using this algorithm, after encrypting, you’ll have an IV, a tag, and the ciphertext, all of which will be transmitted to the decrypting system.  The decryptor will use the same key, (of course) set the IV and the tag based on the values that have been sent, and then attempt to authenticate and decrypt.  If either the IV or the ciphertext have been tampered with, or the tag is set incorrectly, the authentication will fail, the algorithm will be able to tell that something is wrong, and decryption will be aborted.
  9. Scary string replacements.  The last issue is another programming issue rather than a crypto error.  Ciphertexts, like keys, are binary.  They aren’t strings of characters, they’re arrays of bytes.  Now, they almost always need to be encoded as strings in some fashion, again using either Base64 or hex encoding, but neither of those should introduce spaces into the ciphertext.   Now, depending on the use case you may need to do an additional level of encoding, percent-encoding for instance to turn a Base64-encoded ciphertext into something URL-safe, but this should be done clearly and thoughtfully so a maintainer understands what you were doing and why, and so that the program will operate predictably.  Randomly throwing in a string replace to convert spaces that shouldn’t be there in the first place is alarming and indicates that we might not have adequately considered how to encode and transmit our encrypted data safely.

So to sum up, we have one crypto code sample, a couple of dozens of lines of code, and 9 significant issues, several of them critical.   Unfortunately many of the well-intentioned tutorials on writing cryptographic code are just as bad, and simply should not be used.   Again the safest option is to use an AEAD cipher like the AES-GCM implementation in the CLR Security library.  If this still looks daunting, I’ve been working on a cryptographic library that wraps this implementation and makes it even easier to use.  It’s not quite to v1.0 yet, but the SimpleEncryption API in my open source CryptoCore library is as its name suggests, the simplest way available to safely encrypt data with .NET. 

Hopefully the points I’ve made will help you to identify flaws in the crypto code samples you come across, and ensure that those flaws don’t make it into your own projects, and cripple your own security.

One thought on “That Crypto Code Sample Is Probably Wrong

  1. DES as used in this example is even weaker than 56 bits. It expect an 8 byte key, but that’s 64 bits, not 56. So how does it get to 56? By ignoring the least significant bit of each key byte. Combining that with ASCII always leaving the most significant bit of each byte as zero, we’re down to 48 bits. It gets even worse if you limit it to printable characters (44.7 bits) or alphanumeric characters (39.6 bits).

    Like

Leave a comment