Introduction to Cryptography#

One of primary tools in computer security is cryptography. In this chapter, we are going to provide a short survey of basic cryptographic primitives and their most common (mis)uses. For more details, we invite you to the Introduction to Cryptography companion course.

Cryptographic primitives#

Most cryptographic protocols are composed of the following building blocks called the cryptographic primitives. We treat the most of them as black boxes with well-defined properties. Well-designed protocols depend only on these properties, but it turns out that implementation details sometimes matter.

Symmetric ciphers#

A symmetric cipher allows two parties (let’s call them Alice and Bob) to communicate while keeping Mallory (a third party with malicious intent) from knowing the contents of the messages.

The cipher consists of encryption and decryption functions, both parametrized by a common key. Encryption transforms a plaintext message to a ciphertext message. Decryption does the reverse, provided that it is given the same key.

The ciphertext should reveal no information about the plaintext except for its length (most ciphers are length-preserving). Mathematically speaking, for a randomly chosen key, the encryption function should be an effectively random bijection between all possible plaintexts and all possible ciphertexts of the given length. (Effectively random means that there is no known algorithm that can distinguish it from a truly random bijection in reasonable time.)

We usually assume that the encryption and decryption functions are public — after all, there are just a few unbroken ciphers out there. The secret lies in the key. (This is called the Kerckhoffs’s principle.)

The are two kinds of symmetric ciphers.

Stream ciphers#

A one-time pad is a cipher that XORs the plaintext with an equally long key, which is assumed to be fully random. Decryption is achieved by XORing the ciphertext with the same key. This works because $[(x⊕k)⊕k = x⊕(k⊕k) = x⊕0 = x]$, where $[⊕]$ denotes the XOR operation. Interestingly, the ciphertext is also a sequence of fully random bits, thus revealing no information on the plaintext except for its length.

Provable security is nice, but otherwise the one-time pad is very impractical. The key has to be as long as the message and it must be never reused. If we find two ciphertexts $[a⊕k]$ and $[b⊕k]$ encrypted with the same key $[k]$, we can compute $[(a⊕k)⊕(b⊕k) = a⊕b⊕k⊕k = a⊕b]$. This reveals the difference of $[a]$ and $[b]$, which should be impossible to obtain without the key.

Stream ciphers are more useful. They replace ideal randomness by a pseudo-random generator (PRNG) parametrized by the key and a per-message initial value (IV). The sequence of bits produced by the PRNG is called the keystream and XORed with the plaintext to get the ciphertext. Similarly to the one-time pad, we must never re-use the keystream, so we must never repeat the IVs.

A typical example of a stream cipher is ChaCha20, which has 128 or 256-bit keys. It is optimized for software implementation on modern CPUs.

All stream ciphers react to bit-flips in the ciphertext in an interesting way. Whenever you flip the $[b]$-th bit of the ciphertext, the decrypted plaintext has the $[b]$-th bit flipped, too, and all other bits are intact.

Block ciphers#

A block cipher encrypts blocks of a fixed size. For example, AES has 128-bit blocks and key length between 128 and 256 bits.

To encrypt a message of a general size, we must pad it to a multiple of the block size first. The padding should be reversible — we should be able to recover the original message without knowing its length. Adding zero bytes does not work, since the original message could end with a zero byte. A typical solution is to create a padding of size $[P]$ by adding $[P]$ bytes of value $[P]$. Please note that the padding cannot be empty: if the length already was a multiple of the block size, a full block of padding must be added.

Now, we need to apply the block cipher on a sequence of full plaintext blocks $[X_1, \ldots,X_n]$ to obtain a sequence of ciphertext blocks $[Y_1,\ldots,Y_n]$. This can be done in multiple ways, which are called the block cipher modes.

ECB (Electronic Code Book)#

The ECB mode is the simplest. We just encrypt and decrypt each block separately: $[Y_i = E(K,X_i)]$ and $[X_i = D(K,Y_i)]$, where $[E]$ is the encryption function, $[D]$ is the decryption function, and $[K]$ is the key.

This is simple, but horribly insecure. For example, identical plaintexts yield identical ciphertexts. More generally, you can tell if $[X_i = X_j]$ by comparing $[Y_i]$ with $[Y_j]$. This is very far from the idea of the random bijection. Do not use ECB ever.

A common way to illustrate why ECB should never be used is the ECB Penguin:

An image of the Linux mascot 'Tux' encrypted with ECB. The shape of the image is clearly visible.

(Image courtesy of Fillippo Valsorda, licensed under CC-BY-SA-3.0)

Even though each individual block of the image data is encrypted, you can still clearly make out the original picture.

Let us see what happens when we flip a bit in a ciphertext block $[Y_i]$. The corresponding plaintext block $[X_i]$ is destroyed (after all, $[E]$ and $[D]$ are random-ish functions), other plaintext blocks are intact.

CBC (Cipher Block Chaining)#

A more sophisticated mode is CBC. Each ciphertext block is obtained by encrypting a XOR of the plaintext block with the previous ciphertext block. The first block is XORed with a block-sized IV (unique, preferably random initial value). More formally, $[Y_i = E(K, X_i ⊕ Y_{i-1})]$ where $[Y_{-1} = \mathrm{IV}]$.

Thanks to the IV, identical plaintexts are encrypted differently. The chaining of blocks prevents recognizing block equality.

Decryption is easy: each ciphertext block is decrypted and then XORed with the previous ciphertext block: $[X_i = D(K, Y_i) ⊕ Y_{i-1}]$.

Let us see what happens when we flip the $[b]$-th bit in a ciphertext block $[Y_i]$. The corresponding plaintext block $[X_i]$ is destroyed and the next plaintext block $[X_{i+1}]$ has the $[b]$-th bit flipped. All other plaintext blocks stay intact.

CTR (Counter)#

Another interesting mode is CTR. It is essentially a stream cipher with a PRNG obtained from the block cipher: we encrypt blocks that contain an IV and a counter. That is, $[Y_i = X_i ⊕ E(K, \mathrm{IV} \mathbin{\Vert} i)]$, where $[\Vert]$ is concatenation.

As with all stream ciphers, the IV must not be repeated. Flipping ciphertext blocks translates to flipping the corresponding plaintext blocks.

Unlike the other modes, CTR does not require padding, but it is sometimes used with one.

Asymmetric ciphers#

Symmetric ciphers are useful if Alice and Bob can meet to agree on the key before they need to communicate. What if it is their first encounter? There is a nice cryptographic solution.

An asymmetric cipher has two keys: an encryption key $[K_E]$ and a decryption key $[K_D]$. What is encrypted using $[K_E]$ can be decrypted only using $[K_D]$. Furthermore, it is unfeasible to compute $[K_E]$ from $[K_D]$ or vice versa — the pair $[(K_E, K_D)]$ can be generated only as a whole.

One of these keys is usually public, while the other is kept private. If we make the encryption key public and the decryption key private, then everybody can encrypt a message to Bob which only he can decrypt.

The other combination is also useful: If the encryption key is private, while the decryption key is public, only Alice can encrypt a message, but everybody can read it. Voilà, we just constructed a cryptographic signature that guarantees that Alice’s message is authentic.

The RSA cryptosystem#

An archetypal example of an asymmetric cipher is RSA, based on number theory. Let us sketch how it works. We set up:

two distinct big prime numbers $[p]$ and $[q]$, chosen randomly
the modulus $[n = pq]$
the encryption exponent $[e]$ and the decryption exponent $[d]$ such that $[x^{ed} \bmod n = x]$ for all $[x \in \lbrace 0,\ldots,n-1 \rbrace]$ (It is not trivial that such exponents exist, but we omit the proof.)
the encryption key $[(e,n)]$
the decryption key $[(d,n)]$
we throw away $[p]$ and $[q]$

Plaintexts and ciphertexts are integers between $[0]$ and $[n-1]$. Encryption is computed as $[E(x) = x^e \bmod n]$, decryption as $[D(y) = y^d \bmod n]$.

Security of RSA relies on hardness of factoring of integers to products of primes. To generate the exponents $[e]$ and $[d]$ (or to compute one from the other), we need to know the primes $[p]$ and $[q]$. But the keys contain only their product $[n]$. (So hardness of factoring is required for security of RSA. On the other hand, it is not known if hardness of factoring is sufficient.)

While nobody knows a polynomial-time factoring algorithm (or they have not told us), there is a steady stream of increasingly better sub-exponential-time algorithms. Also, computing power available to potential attackers grows steeply. Because of this, the recommended key size for RSA grew from 512 bits in the 1990’s over 1024 bits in 2002 to at least 2048 bits in 2015. The largest key publicly known to be cracked has 829 bits.

Quantum computers are expected to cause a revolution in RSA factoring, utilizing the Shor’s algorithm to factor integers in polynomial time. Even though the biggest currently quantum-factored integers have length of couple of tens of bits (22 bits as of 2025), so nowhere near the 2048 bits used currently in RSA, in the future this might be a massive problem. To solve it the field of post-quantum cryptography was born.

Hash functions#

The third kind of a cryptographic primitive is a cryptographic hash function also known as a message digest or fingerprint. It maps arbitrary long messages to bit strings of a fixed size (usually hundreds of bits). Again, the function behaves random-ishly. This is hard to define formally, so we usually require the following three properties to hold:

non-invertibility: for a given result $[y]$, we cannot effectively find $[x]$ such that $[h(x) = y]$, where $[h]$ is the hash function.
second preimage resistance: for a given $[x]$, we cannot effectively find $[x^\prime]$ such that $[h(x) = h(x^\prime)]$.
collision resistance: it is impossible to effectively find $[x]$ and $[x^\prime]$ such that $[h(x) = h(x^\prime)]$.

(If you met hashing in the context of data structures, please observe that our requirements are much stronger.)

Typical strong hash functions are SHA2-256 and SHA3-256 with 256-bit output. Earlier functions MD5 and SHA1 are considered broken except for some limited uses.

Random generators#

Many cryptographic protocols need a rich supply of random bits to generate keys and nonces (numbers used once, i.e., very likely unique as we need for IVs).

Unlike a regular random generator that is only supposed to be statistically uniform, cryptographic random generators must be unpredictable. That is, even an observer who knows the whole history of generated random bits cannot effectively predict the next bit. (Note that this also implies statistical uniformity.)

We already know how to obtain a pseudo-random generator (PRNG) from a block cipher: it is the key stream from the CTR mode. As long as the IV is unpredictable and the cipher is strong, the keystream is unpredictable, too.

We might want to harvest physical randomness (assuming there is such a thing) using a hardware random generator. However, hardware generators are notoriously hard to get right: side channels can leak information, attackers close to the hardware can influence how it works (a flask of liquid nitrogen can be handy).

The gold standard is a hybrid RNG. It is based on a PRNG with harvested physical randomness mixed into the internal state of the PRNG (e.g., using a hash function). If the hardware RNG part is reliable, the result is truly random. If it is not, then the hybrid RNG is still at least as secure as the PRNG. Linux provides a hybrid RNG in form of the /dev/random device, which produces an infinite stream of random bytes.

Combining primitives#

Let us see several common combinations of primitives.

Hybrid ciphers#

While symmetric ciphers run in linear time, no known asymmetric cipher does — for example, a straightforward implementation of RSA runs in cubic time. However, we can combine symmetric and asymmetric ciphers to get a hybrid cipher that is fast and still has asymmetric keys:

Setup:

Symmetric cipher with functions $[E_S]$ and $[D_S]$.
Asymmetric cipher with functions $[E_A]$ and $[D_A]$.
Keys $[K_E]$ and $[K_D]$ for the asymmetric cipher.

Encryption of a plain text $[x]$:

Generate a random key $[K_R]$ for the symmetric cipher.
Encrypt the plain text using the symmetric cipher: $[y \leftarrow E_S(K_R, x)]$.
Encrypt the random key using the asymmetric cipher: $[k \leftarrow E_A(K_E, K_R)]$.
Return ciphertext $[(k, y)]$.

Decryption of a ciphertext $[(k, y)]$:

Decrypt the random key using the asymmetric cipher: $[K_R \leftarrow D_A(K_D, k)]$.
Decrypt the message using the symmetric cipher: $[x \leftarrow D_S(K_R, y)]$.

Signing a hash#

A similar problem with slow asymmetric ciphers arises in signatures. This time, the right combination is to hash the message first and then sign the hash. Not only this is fast, but it also allows multiple signatures on a single message that can be verified independently of each other.

This is where the collision-freeness of hash functions is important. If you could produce two messages $[x]$ and $[x^\prime]$ with the same hash, you could have $[x]$ signed and then transplant the signature to $[x^\prime]$.

Message authentication codes#

Encryption makes messages confidential (i.e., they cannot be read by a third party), but it does not protect message integrity (i.e., that the message was not tampered with).

Integrity protection is achieved by using a cryptographic signature (like those we created from asymmetric ciphers) or their symmetric cousin called a message authentication code (MAC). A MAC uses the same key for both signing and verifying the signature. It is typically used when Alice and Bob already have a shared secret, or if Alice and Bob are the same entity. Think of a web application that stores user IDs in cookies and wants to make sure that the user has not tampered with the ID.

A traditional way of constructing MACs is making keyed hash functions from ordinary hash functions. A keyed hash is additionally parametrized by a key. For each choice of the key, you get a different random-ish function. The most common construction of a keyed hash from a normal hash is HMAC.

There are also authenticated encryption algorithms that combine both encryption and a MAC in a single primitive. One such example is ChaCha20-Poly1305, another is the Galois/Counter mode (GCM) of block ciphers.

In many cases, the MAC is computed not only from the message itself, but also from its context, so a genuine message replayed (re-sent) in a different context is caught.

Challenge-response authentication#

Suppose that Alice the user wants to convince Bob the server that she knows her password. However, she does not want to send it directly — who knows, perhaps somebody is just impersonating Bob.

A simple solution would be to send a hash of the password. But if Mallory catches the hash, they can replay it and impersonate Alice.

This is easily fixed by Bob sending a nonce first and Alice hashing both the password and the nonce (or even better using a MAC with the password as a key and the nonce as a message).

Attacks#

Let us see a couple of attacks on cryptography. Some will aim at the algorithms, other at their real-world implementation.

Side channels#

Consider the following function for checking passwords:

bool check_password(char *given, char *correct)
{
    // In plain C, the strings are zero-terminated sequences of characters.
    int n = strlen(correct) + 1;
    for (int i=0; i < n; i++)
        if (given[i] != correct[i])
            return false;
    return true;
}

It looks innocent, but there is a fatal flaw: it stops at the first mismatch. Therefore the time spent in the function reveals the length of the maximum prefix of given that is correct. An attacker with a precise clock can therefore try all the possible characters at the first position, then all possible at the second position, and so on.

This is an example of a timing attack, which belongs to the wider class of side-channel attacks. Side channels are not a part of the attacked protocol itself, but they leak additional data related to the real-world implementation. There are many other side channels, for example effects on CPU caches, temperature of the machine, energy consumption etc.

Note that many side channel attacks involve repeating the operation many times. Obviously, measuring the time it takes to compare a few characters of a password is very challenging and it contains a lot of noise. When we try it a thousand times over and average the timing, patterns start to emerge. The same holds for the other side channels.

Repeated nonces#

As we already know, repeated nonces can be disastrous. Here is a real-world example. Full-disk encryption protects data on a disk in case the disk is stolen. We will see that it does not protect against attackers who can access the machine multiple times.

Abstractly speaking, a disk is an array of blocks. We need random access to blocks, so each block must be encrypted independently. An obvious choice is a stream cipher with a nonce unique to every block. However, we cannot store the nonce in the block itself (file systems do not enjoy block sizes that are not powers of two) and storing it elsewhere would be slow. So we derive the nonce from the block number itself.

The attack goes like this:

We make a copy of the encrypted disk.
We fill the whole file system with zeroes (this is easy if we have a user account on the machine, but perhaps we also can trick the victim into downloading a huge file).
Now, most previously free blocks have all-zero plaintext.
So their ciphertext is equal to the keystream.
By comparing the disk with our copy, we can recognize these blocks and save the keystream.
We remove the all-zero file.
Later, the victim creates a new file with secret data. It will be likely stored in some of the previously free blocks.
The blocks are still encrypted using the same keystream, so we can use the stored keystream to decrypt the secret data.

(As an exercise, you can show that our file need not contain all zeroes. Any known contents would work, too. This is called a known-plaintext attack, while the original one is a chosen-plaintext attack.)

Using a single key for multiple purposes#

Suppose that Alice and Bob use a single key to encrypt both Alice’s messages to Bob and Bob’s messages to Alice. The attacker can then capture a message and replay it in the other direction where it could mean something completely different.

Similarly, somebody could discover that RSA encryption and decryption are the same operation, only using a different exponent. If you swap the exponents, RSA still works. So perhaps we could make the following optimization: we create a single pair of a public and private key and use it for both encryption and signing. This enables a simple attack:

Bob sends a message to Alice, encrypted by Alice’s public key.
Mallory captures the ciphertext and tricks Alice into signing it.
Alice encrypts the ciphertext with her private key and sends the result to Mallory. But her private key is the other key of the pair, so she actually decrypted Bob’s message.

A standard fix for symmetric keys is to establish a key schedule. It is a mechanism that takes a single master secret and then derives all keys needed by the protocol from it. This is usually done using keyed hash functions (parametrized by the purpose of the generated key).

Padding oracle attacks#

We already observed that block cipher modes react to bit flips in the ciphertext in interesting ways. Combined with a cleverly chosen side channel, it can lead to complete decryption without the key.

CTR mode#

Suppose that Alice first signs her message using a MAC. Then she pads the message to a multiple of AES block size by adding $[P]$ bytes with value $[P]$. Finally, she encrypts the message using AES in CTR mode and sends it to Bob.

Bob decrypts the message, removes the padding (which can fail if the padding is improperly formed), checks the signature, and processes the contents of the message.

Now, assume that Mallory has access to padding oracle — for a given ciphertext, the oracle says if the corresponding plaintext is properly padded. In many cases, Bob himself can be used as such an oracle: perhaps he sends different error codes for bad padding and bad signature, or there is a timing difference. We show how to use the padding oracle for decryption.

Let us start with decoding the size of the padding $[P]$. Assume that $[P>1]$. So the last two bytes of the final plaintext block are $[P]$ $[P]$. Now, we try all 255 combinations of bit flips in the last ciphertext byte, which decrypt to all 255 possible values of the last plaintext byte (except $[P]$ itself). For each of the combinations, we ask the oracle. The only case in which the oracle answers “well padded” is when we changed the last byte to 1. So we know that $[((P \oplus k) \oplus f) \oplus k = 1]$ where $[P]$ is first encrypted with a keystream byte $[k]$, then bit-flipped and finally decrypted to 1. Therefore $[P]$ can be computed as $[1 \oplus f]$.

This fails for $[P=1]$, but this case can be easily detected by bit-flipping the second-to-last byte of the last block. If $[P=1]$, all combinations lead to valid padding, otherwise no combination does.

So we know $[P]$. Now we XOR the last $[P]$ ciphertext bytes with $[P\oplus (P+1)]$, which changes the last $[P]$ plaintext bytes to $[P+1]$. We try all combinations of bit flips in the byte just before the padding. Only one of them leads to valid padding, which is the one that was decrypted to $[P+1]$. So we recover the last byte of the message.

Then we bit flip to change the last $[P+2]$ plaintext bytes to $[P+2]$, which allows us to recover the previous byte of the message. We continue in the same way until we recover the whole last block. Then we truncate the ciphertext by removing the last block and iterate until we decrypt the whole message. (The situation is slightly different as the truncated message doesn’t end with a proper padding. Try to figure out the details.)

CBC mode#

A slightly more complex variant of this attack can be performed on CBC. We flip bits in the second-to-last ciphertext block, which leads to bit flips in the last plaintext block. Again, this allows us to decrypt the whole last block.

We can continue by truncating the message. The only problem happens when the message was reduced to a single block: in that case, we flip bits in the IV. (This requires the IV to be transmitted along the message, which is what most protocols do. If the IV were generated by hashing a per-connection nonce with the sequence number of the message, the attack would not decrypt the first block, but still all the other blocks.)

Length extension attack#

If you saw the construction of HMAC, you might wonder why it is so complex. What if we just defined $[\mathrm{MAC}(x,K) = h(K \mathbin{\Vert} x)]$? Here $[x]$ is the message, $[K]$ is the key, $[\Vert]$ is concatenation and $[h]$ is an arbitrary hash function.

For a sufficiently random-ish hash function, this construction is considered secure. In fact, with SHA3 it’s standardized under the name KMAC. But if we use it with any hash function from the SHA2 family, things go interestingly wrong. Let us see what happens with the 256-bit variant.

We need to peek inside the SHA2-256. It is based on a compression function $[f(a,b)]$ that takes two 256-bit blocks and compresses them to a single 256-bit blocks in a random-ish way.

To hash a message $[m]$, we split it to blocks $[B_0]$, …, $[B_{n-1}]$, with the last block padded. Then we compress a fixed IV with $[B_0]$, compress the output with $[B_1]$, then with $[B_2]$, all the way to $[B_{n-1}]$. The output of the last compression is returned as the hash value $[h(m)]$.

Now, suppose that we have a known message $[m]$ and its signature $[h(K \mathbin{\Vert} m)]$. Since the signature is just the output of the last compression, we can append new blocks to the message and continue compressing, thus producing a valid signature $[h(K \mathbin{\Vert} m^\prime)]$ for some message $[m^\prime = m \mathop{\Vert} s]$ with $[s]$ of our choice. Well, almost: we forgot the padding. So our $[m^\prime]$ will be actually $[m \mathop{\Vert} p \mathop{\Vert} s]$ where $[p]$ is the original padding which we know, but do not control.

This is a serious vulnerability: it allows us to forge a signature on extensions of the original message, where most new bytes are under our control.

Even with this attack, SHA2-256 is considered a strong hash function — it has all the standard properties (collision-freeness etc.). But our MAC construction required even stronger properties.

Signing structured data#

When signing structured data (e.g., dictionaries of key-value pairs), we have to serialize them to a sequence of bytes first. Consider the following implementation in Python:

def sign_dictionary(d: dict[str, str], key: str) -> bytes
    parts = []
    for k, v in sorted(d.items):
        parts.append(k)
        parts.append('=')
        parts.append(v)
    return sign_string("".join(parts), key)

The author remembered to sort the keys, but they forgot to delimit keys and values properly. For example, both {'a': 'bcd', 'e': 'fgh'} and {'a': 'b', 'cde': 'fgh'} serialize to the same string a=bcde=fgh.

Furthermore, {'username': 'Aliceadmin=True'} is equivalent to {'username': 'Alice', 'admin': 'True'}. So if somebody uses this to sign cookies in their web app and you register an account with name Aliceadmin=True, the signature on your login cookie can be transplanted on a cookie giving admin rights.

The remedy is simple: always make sure that serialization of structures is invertible. No two different inputs may lead to the same output.

Bad random numbers#

Interestingly many real-world vulnerabilities are caused by weak random number generators. For example, a survey of 4.7 million RSA keys on internet-connected devices contained 12,720 keys that shared one of the primes. This was likely caused by insufficient randomness available just after the device was powered on.

In May 2008, it was discovered that the Debian maintainer of the OpenSSL cryptographic library applied a poorly thought out patch that weakened initialization of the library’s PRNG. It skipped mixing randomness from /dev/random to the initial state, so the only non-constant part of the state was the process ID (a 15-bit number at that time).

Because of this, affected systems could generate only $[2^{15}]$ different SSH key pairs. Even worse, SSH typically used the DSA signature algorithm which leaks information on the private key if used with weak randomness. So all private keys generated or just used on the vulnerable systems had to be changed.

Whose key is it?#

Our example application of asymmetric ciphers involved Alice sending a message to Bob, encrypted by Bob’s public key. But how can she be sure that the key belongs to the real Bob?

If she just asked Bob for his public key, Mallory could intercept the communication and send their public key instead. Then they would intercept every message sent by Alice, decrypt it with their private key, encrypt it with Bob’s public key and relay it to Bob. Voilà, Mallory is reading all the plaintexts and neither Alice nor Bob notice. This is a typical example of a man-in-the-middle (Mallory-in-the-middle?) attack.

Directories and certificates#

Here is an attempt at a solution: a trusted directory service. Assume that there is somebody (let’s call them Trent) who can be trusted by everybody. Trent can keep a directory of public keys together with some identification of their owners. Everybody has a copy of Trent’s public key. When Bob creates his public key, he proves his identity to Trent, who then records the pair (Bob, key) in his database. When Alice wants Bob’s key, she asks Trent, who sends a reply signed by his private key.

The downside of directory services is that they must be always online and handle high load. A better solution is making Trent a certification authority. When Bob registers his key with Trent, Bob gets a certificate — a message with Bob’s identity, Bob’s public key, and usually an expiration time, all signed with Trent’s private key. Bob remembers his certificate. Every time Alice wants to talk to Bob, he sends his certificate and Alice can verify using Trent’s public key that the certificate is genuine.

We daily encounter certificates in HTTPS, which is HTTP on the top of the TLS cryptographic protocol. TLS verifies that the remote server has a certificate for the domain name requested by the user, issued by one of the (many) certification authorities trusted by the OS or the browser.

Trust on first use#

Certification authorities are not perfect. First, it is impossible to find anybody trusted by the whole world. Second, identity generally does not imply trust: consider “ACME Holding, incorporated in the Bahamas”, “gmail.cz” or “Martin Mareš, MFF UK, CZ” (there are two).

An alternative to certificates is to trust the first connection (either blindly or after verifying it by other means), record the public key and verify that it matches in subsequent connections. This is called trust on first use, commonly abbreviated as TOFU. It will not guarantee that the other party is not malicious, but it guarantees that the other party does not change.

TOFU is often encountered with SSH: when you connect to a new server, SSH displays the fingerprint (hash) of the server’s public key and asks you to confirm it. Then it records this fingerprint. On subsequent connections, it checks the key against the recorded fingerprint. If it matches, the server is trusted. If it doesn’t, SSH warns about a potential man-in-the-middle attack and refuses to proceed.

Unfortunately, browsers make it hard to use TOFU on the web. They cry wolf when encountering a certificate not signed by one of their trusted authorities. The user has to jump through hoops to accept it. The whole situation is presented as much more dangerous than a wholly unencrypted HTTP connection, even though unauthenticated encryption still provides protection against all passive attackers (those who only listen).

TLDR#

Cryptographic protocols are built of primitives which have well-defined interface, including security properties.

Cryptography looks deceivingly easy, but it is not.

If you design our own protocol, it is likely to be insecure. Avoiding all pitfalls (in particular side channels) is very hard even for experienced cryptographers.

Use well-known libraries that are developed in public and scrutinized by many experts in the field. Examples include Tink and NaCl.

Unless you have specific needs, use TLS, which is available in common libraries of many programming languages. Don’t forget to verify certificates.