Fundamentals#

Information & Computer Security#

Information security, in its broadest sense, is the practice of keeping information private, safe and secure, so, protecting it from unauthorized access, use, disclosure, alteration, or destruction. This protection applies to information in all its forms, whether it is digital, physical, or even intellectual.

Computer security, or cybersecurity, is a subset of information security that focuses on protecting digital data, networks, and computer systems. Because data is just information, the terms are often used interchangeably. However information security has a wider scope that includes securing physical documents and verbal communication, whereas cybersecurity is specifically concerned with threats in the digital realm.

Assets#

Asset is anything that has any value, anything that it is necessary to protect. In the context of computer security this is usually some data, think database of employees or even software, or some physical but computer controlled thing, like hardware or a cyberphysical system, think a power plant.

In the more general information security, it can be applied to other things then computers too, think a secret family recipe for tea stored in a bookshelf. Or company’s reputation and brand image, which can be damaged by a security breach.

CIA#

As mentioned in the previous chapter, “security” and “privacy” of an asset is divided into three core pillars of security:

Confidentiality: Ensuring that information is not disclosed to unauthorized individuals, entities, or processes. It is about preventing the unauthorized disclosure of data. “Bad guys can’t read this.”
Integrity: Maintaining the consistency, accuracy, and trustworthiness of data over its entire lifecycle. Data must not be changed in an unauthorized or undetected manner. “Bad guys can’t modify this.”
Availability: Ensuring that information is accessible when needed by authorized users. This involves protecting against hardware failures, network outages, and denial-of-service attacks. “Bad guys can’t stop this.”

Threat#

Threat is a potential for violation of security, which exists when there is a circumstance, capability, action, or event that could breach security - affect the CIA - of an asset. This is the attacker who wants to steal data from our app. The attacker themself is a threat actor, while the act of them wanting to steal our data is a threat. Threats come in different intensities and forms. From innocent grandmas clicking a button twice triggering some weird edge case, curious teenagers, criminals, to Advanced Persistent Threats (APTs) - usually state-sponsored groups, like the Cozy Bear of the Russian Federation, the Equation Group of the United States, the Lazarus Group of North Korea, and so on — that are used for espionage, sabotage or whatever is beneficial for their sponsor.

As defenders, in threat modeling we consider what threats are most relevant to us, what assets we want to protect, which assets would threats most probably target, how would they approach the attack, what would stop them, what methods would they use. And based on that we set our security barriers, tweak our focus, our preventive measures, our monitoring etc.

As an attacker, we try to find holes in the threat model of our targets, trying to find a case where the reality is in conflict with their threat model, like a bug in code, or where the threat model itself is broken, like a badly placed security barrier.

Vulnerability#

Vulnerability is a weakness in a system that can be exploited by a potential attacker. In computer security, most often it means a bug somewhere or misconfiguration in code, in hardware, or in a process, that allows an attacker to violate CIA of an asset. To give a typical example, consider this piece of Python code in the web framework Flask:

@app.route("/ping")
def ping():
    host = request.args.get('host', '127.0.0.1')
    proc = os.popen(f"ping -c 5 {host}")
    output = proc.read()
    proc.close()
    return output

Let say this is a part of diagnostics in a router admin interface and is run without a sandbox. The content of the host variable is taken from the host HTTP GET request argument, so it is user-controlled. If we make a request to /ping?host=1.2.3.4, the service will return the output of the ping command.

PING 1.2.3.4 (1.2.3.4) 56(84) bytes of data.
64 bytes from 1.2.3.4: icmp_seq=1 ttl=55 time=6.88 ms
64 bytes from 1.2.3.4: icmp_seq=2 ttl=55 time=5.78 ms
64 bytes from 1.2.3.4: icmp_seq=3 ttl=55 time=6.20 ms
64 bytes from 1.2.3.4: icmp_seq=4 ttl=55 time=6.22 ms
64 bytes from 1.2.3.4: icmp_seq=5 ttl=55 time=6.08 ms

--- 1.2.3.4 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 5.776/6.232/6.881/0.361 ms

Now, if we set the host to 1; echo "pwned", the command that will popen execute is ping -c 5 1; echo "pwned", so after the ping finishes, our injected command will execute. So the output will look like this:

PING 1 (0.0.0.1) 56(84) bytes of data.

--- 1 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4082ms

pwned

This is called Command Injection and allows us, as the attacker, to achieve Remote Code Execution, one of the most severe vulnerabilities of them all. In this scenario, we would have had full control over the router, so we could eavesdrop on the traffic, redirect it, plant a backdoor in the router for later access etc.

In the information security sense, vulnerabilities can also be non-technical. Like if a company pays all legit-looking invoices sent to invoices@company.com without proper verification — that is a vulnerability in a poorly-made company process.

There can be quite a lot of discussion about what is and what is not considered a vulnerability, we discuss this in a later section.

Risk#

Risk is the potential for loss or damage when a threat exploits a vulnerability to harm an asset. It is the intersection of these three concepts; a risk only exists when a threat, a vulnerability, and an asset overlap. A locked-down server with a critical vulnerability poses no risk if it is turned off, unplugged, and has no data on it (no asset to harm). Similarly, a highly valuable database (an asset) on a hardened system poses only little risk.

More formally, risk is often calculated or assessed by considering two main factors:

Likelihood: The probability that a threat will materialize and exploit a specific vulnerability. Is the considered attacker a curious teenager or a state-sponsored group? Is the vulnerability easy or difficult to find and exploit?
Impact: The magnitude of harm or damage that would be caused to the asset if the threat is successful. What is the financial loss, reputational damage, or operational disruption that would occur?

The goal of security is not to eliminate all risk – which is often impossible or prohibitively expensive – but to reduce and manage it to an acceptable level - remember the house securing from the first chapter? This practice is known as risk management, which involves identifying, assessing, and prioritizing risks, and then applying resources to minimize their likelihood and impact.

Let’s revisit our command injection vulnerability in the router’s admin interface.

Asset: The router itself and all the network traffic passing through it.
Vulnerability: The command injection flaw in the ping function.
Threat: An attacker on the local network (or the internet, if the interface is exposed) looking for vulnerable devices.
Risk: The combination of these factors creates a high risk. The likelihood is moderate to high, as scanning for such vulnerabilities is common. The impact is critical, as a successful exploit gives the attacker complete control over the network, violating the Confidentiality, Integrity, and Availability of all data passing through it.

Digging deeper into vulnerabilities#

Definition#

As we stated already, what is and what is not a vulnerability is not clear and it varies across people.

Take the command injection vulnerability above, would you still consider it a vulnerability if the commands would be run in an unprivileged sandbox? Or if it required admin authentication to use the endpoint?

To classify and track vulnerabilities in products, the Common Vulnerabilities and Exposures (CVE) database was created in 1999. It works by assigning reported vulnerabilities an identifier in the form of CVE-YEAR-NUMBER, like CVE-2019-0708 – a vulnerability in Microsoft Windows allowing unauthenticated remote attackers to take over machines via RDP. CVE’s purpose was to make it clear which vulnerability is everyone talking about. It also comes with a CVSS scoring system to classify the severity of a vulnerability.

It might be tempting to say that this is the ultimate source of truth about vulnerabilities then. Well, it’s not that easy. There are many cases where CVEs were assigned even though they obviously did not pose any security risk. Like when, for some time, people were convinced that a serious vulnerability exists in a popular tool curl, before it became apparent that it is just bogus. The CVE was published without knowledge of the curl maintainers. So in some way the CVE system is very broken. And same goes for the CVSS system.

So, it is very important to use common sense when looking at vulnerabilities. It is wise to look for impact, if there is no reasonable impact, it is very well possible that what you are dealing with is not a security vulnerability.

However, there can also be issues that, while not being vulnerabilities with direct impact, can pose security risk and are worth fixing. Like having no minimal strength of passwords users can set – if all users set strong passwords anyway, it is not exploitable, but once a user who sets a single character password comes, the impact will show. Another example could be missing defense in depth measures. Having an unauthenticated administration of an app accessible in the company internal network might not always be a direct vulnerability, but is for sure very risky.

Exploit#

An exploit is a program or set of steps to take advantage of an vulnerability and demonstrate impact. It should be somewhat reproducible. This is the proof that a vulnerability exists.

For the router command injection vulnerability this could be:

import requests

# command to run
command = 'echo "pwned"'

r = requests.get("http://routerip/ping", params={'host':f"1;{command}"});

print(r.text)

Responsible disclosure#

DISCLAIMER: This is not legal advice.

If you, as an ethically thinking individual, happen to find a vulnerability somewhere, what should you do?

It is customary to report it to the vendor of the vulnerable product through a private, secure channel. It is important to state that you come in peace and in good faith, so that it cannot be confused with for example a ransom actor threatening them. A lot of vendors publish contacts, like in the form of security.txt, where they want vulnerabilities to be reported. And quite a lot of vendors also offer bug bounty programs where you can even get paid for your findings.

The vendor often asks for the vulnerability to be kept secret until a bug fix is developed, tested, and distributed. Agreeing to such an embargo is often a condition of payment of the bug bounty.

Responsible vendors fix bugs quickly, or at least explain why they need more time to fix the bug. In reality, you find vendors that do not respond at all or who ask for a unreasonably long embargo period. Security researchers often counter this behaviour by setting a timeout after which they publish the vulnerability no matter what the vendor does. For example, Google’s Project Zero has a 90-day deadline. The reasoning behind this stems from the assumption, that if you could find the vulnerability, somebody else probably did as well and they often aren’t “the good guys”. By setting a timeout on publicly disclosing the vulnerability, you force the vendor to fix the vulnerability as soon as possible, before the “bad guys” can exploit it.

Some general principles of vulnerability#

Misconfigurations#

Even though the program or hardware itself works as intended, a lot of bad things can still happen if it is badly configured.

Unchanged secret#

Consider an app where you use JSON Web Tokens (JWTs) for authentication, but you forgot to change the secret used to cryptographically sign the token.

SECRET="changeme" # guessable secret

...

@app.route("/admin")
def admin():
    username = get_jwt_identity()
    if username != "admin":
        return "Denied"

    # Admin area here
    ...

    return output

An attacker can easily bruteforce the secret using a wordlist attack. They take list of most common secrets and a JWT that was generated by the app for their unprivileged user. Now they try to verify this JWT using each of the secrets, if they succeed, they know this is the secret that is used by the app (all of this can be done locally, so it is very fast). So they can generate their own JWTs and spoof any user of the application.

Exposed sensitive endpoint#

Spring Framework is a web framework for Java. It optionally exposes several endpoints for easy developer debugging. Some of them are /heapdump or /env, the first one lets you download the full memory (contains of the heap specifically, but since everything is on the heap in Java, everything is there), the second one returns the env variables.

In versions before 1.5 (released in 2017), both of those (and many others) were publicly accessible without authentication by default. In 2024, 2.3% of public instances have the heapdump endpoint still enabled. This was the vulnerability that was used to hack TeleMessage, a Signal messenger extension app, that was used by some of the top USA government officials.

Logic bugs#

If you can introduce a bug that will make your app not work, you can also introduce a bug that creates a vulnerability.

Every single letter matters#

Can you spot the bug in this C++ code?

if (req.accesscode || req.accesscode == expectedaccesscode) {
    // return the data
} else {
    // return unauthorized
}

Yes, instead of || (or) there should be && (and). Hard to spot on the first sight, might be missed during testing, because everything works. But introduces a critical vulnerability.

Insecure direct object reference (IDOR)#

Take a website that has an endpoint /user.php?id=156 to query sensitive personal details. What happens when you, while logged in as user with ID 156, try to query /user.php?id=157? If the website returns the user details of user 157, it is vulnerable to Insecure direct object reference (IDOR).

It is especially bad, if the website is using sequential IDs, like in this example, but it can also often surface with UUIDs, which are more common nowadays. If the user UUID is leaked anywhere in the app and authorization is not enforced on the sensitive endpoint, you still got IDOR.

Injection#

Take the following piece of Python app code:

@app.route("/search")
def search():
    searchterm = request.args.get('term')
    if searchterm:
        results = db.query(
            f"SELECT * FROM products WHERE name ILIKE '%{searchterm}%' AND PUBLIC=1"
        )
        return jsonify(results)
    return "Provide search term"

What happens when you set term to ';-- a? If you don’t know, -- is a comment in SQL, so the resulting query, that will run will be "SELECT * FROM products WHERE name ILIKE '%', which will match all products, including the ones that are not public (have PUBLIC=0).

This is called SQL Injection, and it is principally very similar to the command injection vulnerability mentioned above.

The core issue is the mixing of user-supplied data with meta data (whether it’s code directly, or really just data). This pattern is not unique to SQL or the command line, the same principles are shared across Cross-site Scripting, noSQL Injection, or LDAP Injection, and so on. The key takeaway is to never trust user input.

Any data that originates from outside your application’s control is a potential vector for attack. This data comes from various sources (e.g., URL parameters, HTTP headers, POST bodies, uploaded files) and is often passed to sensitive functions called sinks (e.g., database queries, shell executors, HTML renderers). The vulnerability occurs when data from a source reaches a sink without being properly handled.

There are two primary ways to prevent injection attacks:

Sanitization/Escaping: This involves cleaning the user’s input by removing or encoding dangerous characters. While sometimes necessary, it is brittle and prone to error, as attackers are constantly finding new ways to bypass blocklists.
Separation of Data and Code: This is the most robust solution. Instead of building queries with string formatting, use parametrized queries (also known as prepared statements). The database driver is then able to distinguish between the command (the SQL statement) and the data (the search term), preventing any part of the data from being executed as code.

The safe version of the code above would look like this:

# Using a library that supports parametrized queries
results = db.query(
    "SELECT * FROM products WHERE name ILIKE %s AND PUBLIC=1",
    (f'%{searchterm}%',)
)

A particularly dangerous variant of this vulnerability class is Insecure Deserialization. This happens when an application deserializes user-controlled data (like a JSON string or a Java/Python object stream) which can lead to the instantiation of malicious objects and, in the worst case, remote code execution.

Parser Differentials#

A parser differential vulnerability occurs when two different systems, or two different layers within the same system, parse the same piece of data in different ways. This discrepancy can be abused to bypass security controls. A common scenario involves a frontend security component (like a reverse proxy or a Web Application Firewall) and a backend application server.

Imagine a request is made that includes a malicious URL. The frontend proxy inspects the URL, decides it is safe, and forwards it to the backend. However, the backend application’s URL parser interprets the same URL differently, causing it to perform an action the proxy would have blocked.

A classic example involves exploiting inconsistencies in URL parsing. A standard URL has a structure like scheme://user:password@host:port/path. Let’s say the frontend proxy is configured to block any requests to an internal host, internal.service.

An attacker might send a request with the URL: https://google.com\@internal.service/.

The frontend proxy parser is strict. It sees the \ as an invalid character within the user-info part or hostname, and might interpret the host as google.com. Believing the request is destined for a safe, external domain, it lets the request pass.
The backend application uses a more lenient parsing library. This library might automatically “correct” the \ to a / before parsing, or have a different precedence for delimiters. It could interpret google.com\ as the username and internal.service as the host, thus forwarding the request to the sensitive internal service.

Thus bypassing the security measures of the frontend.

Time of check, time of use (TOCTOU)#

UNIX systems have a system-wide temporary directory at /tmp, where all users can create their files, but only the owner can delete them. Consider the following Python function that creates a new temporary file, carefully selecting a yet unused name:

def open_temp_file():
    for i in range(1000):
        name = '/tmp/temp-' + str(i)
        if not os.path.exists(name):
            return open(name, 'w')
    raise RuntimeError('Too many temporary files')

What could go wrong? Well, many things. First, somebody else could occupy all 1000 possible names by their files, making your program fail. Or they could occupy 990 of them, making your program very slow.

But there is a much worse problem: Checking that the name is unused and using this fact to create your own file do not happen simultaneously. There is a short time window between these two events. Long enough for somebody else to step in and create their own file with the same name.

So you can be tricked into overwriting somebody else’s file. That seems harmless: the owner of the file must have given you access rights to it. But instead of a file, the attacker could create a symbolic link to one of your files, making you overwrite your own file of their choice. This is bad.

To avoid problems like this, the check and the action depending on its result must happen as a single atomic operation. Low-level filesystem APIs often support atomic actions (open(2) has the O_EXCL flag, mkdir(2) gives up when the name is already occupied etc.). Python’s open does not have an option for it, but there is the tempfile module in the standard library with nicely wrapped functions for secure creation of temporary files.

Domain-specific vulnerabilities#

Those were just a few examples vulnerability principles that can be widely found across different IT domains. Many vulnerabilities are however specific to a domain – like the web, hardware, or binary. We will cover some of those specifically in the following lectures, after we go over cryptography.

TLDR#

Information Security protects Assets (data, systems, reputation) by maintaining the CIA Triad: Confidentiality, Integrity, and Availability. Cybersecurity is the digital subset of this practice.

Risk is the core concern, existing only where a Threat (actor) can exploit a Vulnerability (weakness) to harm an Asset. Security management aims to reduce the Likelihood and Impact of these risks.

A Vulnerability is any flaw—technical (code bugs, hardware issues) or non-technical (poor processes, misconfiguration)—that can be exploited. They are tracked (e.g., via CVEs), but their severity is determined by real-world Impact.

Common vulnerability classes include Injection (mixing user data with code), Misconfigurations (e.g., using default secrets), Logic Bugs, and TOCTOU (non-atomic operations).

The fundamental rule of defense is to never trust user input and prevent injection flaws by enforcing the separation of data and code (e.g., using prepared statements).