Skip to content
Natalia Kazankova

Understanding, detecting, and fixing buffer overflows: a critical software security threat

Buffer overflows are one of the oldest and most dangerous vulnerabilities in software security. A heap buffer overflow was the second most exploited vulnerability in 2023. Over the years, it has enabled countless attacks, often with severe consequences, such as Cloudbleed in 2017. Despite advances in security practices, buffer overflows continue to pose significant risks, especially in software written in low-level languages like C and C++. Using tools like static analysis and fuzz testing during development can detect buffer overflow vulnerabilities early while adopting best practices for secure coding can prevent them from occurring in the first place.

Content

Buffer overflows


What is buffer overflow?

A buffer overflow happens when a program attempts to store data beyond the bounds of a fixed-size buffer. Buffers are temporary storage areas used to hold data while it’s being transferred or processed. In languages like C and C++, memory management is handled manually, which means there is no automatic protection against writing more data than a buffer can accommodate. A buffer overflow can result in undesired behaviour, crashes, or even remote code executions.

When a buffer overflow occurs, the extra data can overwrite adjacent memory locations, potentially altering the execution of the program. In some cases, attackers can manipulate this overflow to overwrite key variables or control structures, such as return addresses, to execute malicious code. 

For example, in C, buffer overflows typically occur with functions like strcpy, which do not check the size of the input. If an input string exceeds the buffer’s capacity, it can overwrite adjacent memory, leading to unpredictable behavior.

In addition to buffer overwrites, a buffer overread can also occur, where the program reads beyond the intended buffer’s boundary. While overwrites involve writing excess data, overreads allow attackers to access sensitive information from adjacent memory that was not meant to be exposed. Heartbleed, which affected millions of servers, is a well-known example of a buffer overread.

Simple buffer overflow example

To illustrate how buffer overflow exploit works, consider a simple C program that accepts user input to fill a character array:

```c
void vulnerable_function(char *user_input) {
  char buffer[10];
  strcpy(buffer, user_input);
}
```

If an attacker inputs more than 10 characters, the `strcpy` function will overflow the buffer allocated on the stack and overwrite adjacent stack memory. By crafting a specific input, they could overwrite the return address and make the program jump to malicious code.

Types of buffer overflow

Buffer overflows can be categorized into three main types: global, stack-based and heap-based. All can have serious security implications but differ in how they are exploited.

1. Global buffer overflow

This occurs when a buffer overflow happens in global or static variables, which are stored in the data segment. Overflowing these buffers can corrupt other global variables or cause unexpected behavior.

2. Stack-based buffer overflow

A stack-based buffer overflow is the easiest to exploit. It occurs in the call stack, which stores local variables and function return addresses. When a function is called, the local variables are stored in the stack, and if a buffer within the stack overflows, it can overwrite the return address. Attackers exploit this by overwriting the return address with their own malicious code, redirecting the program’s execution flow.

One of the common exploitation techniques used is Return-Oriented Programming (ROP). In ROP attacks, the attacker manipulates the program’s return addresses to chain together small instructions (called “gadgets”) that already exist in the program’s memory, eventually executing arbitrary code.

3. Heap buffer overflow

Heap-based buffer overflows are the second most notorious and exploited vulnerability accrording to 2023 CWE Top 10 Known Exploited Vulnerabilities (KEV). They occur in the heap, a memory region used for dynamic memory allocation. The heap is more flexible than the stack, but it lacks the strict structure of the stack. When a heap buffer overflow occurs, it can corrupt the metadata used to manage heap memory, leading to unpredictable behavior such as arbitrary code execution or a system crash.

KEV weaknesses

2023 CWE Top 10 KEV Weaknesses. Heap-based buffer overflow is ranked as #2.

While more difficult to exploit than stack-based overflows, heap overflows can still be dangerous. Attackers often exploit heap overflows to manipulate memory structures that track dynamic memory allocation, potentially leading to the compromise of the entire system.

Buffer overflow example: Cloudbleed

Blog visual - Cloudbleed-1

Image source: Forbes 

One notable real-world case of buffer overflow is the Cloudbleed incident in 2017, where a vulnerability in Cloudflare’s code led to sensitive user data being leaked. The problem arose from a buffer overflow in the code that handled parsing web pages. The overflow caused random chunks of memory to be leaked, which sometimes contained personal information such as passwords and API keys. Although Cloudbleed didn’t involve malicious exploitation, it highlighted the potential dangers of buffer overflow vulnerabilities in cloud services.

Detecting buffer overflows

Detecting buffer overflows during development and testing is critical for preventing security breaches. Some common techniques and tools include:

Static analysis
They scan the source code for common vulnerabilities, including buffer overflows, without running the program. While they are widely adopted and work directly from the IDE, static analysis isn’t enough to ensure high-level security. Common challenges include a high number of false positives and false negatives. Read more on that here. 

Dynamic Analysis/DAST
Such tools monitor the program during execution to detect memory corruption issues, including buffer overflows. Typically, they work in a black-box mode, meaning they don’t analyze the source code. Common challenges are that it’s difficult to identify where exactly the bug is hidden and its root cause and, therefore, fix it. DAST is usually used by security teams late in the development process. 

Fuzz testing
Fuzzers like AFL (American Fuzzy Lop) and CI Fuzz input unexpected or random data into a program to trigger vulnerabilities in a run-time, helping developers identify buffer overflow issues. Currently, it’s the fastest way to identify memory corruption bugs and their root cause. Common challenges include the need to write test cases to fuzz a codebase. Such a challenge is being addressed with the help of AI. For example, CI Fuzz automatically analyzes the code base, prioritizes what functions need to be fuzzed primarily and generates test cases for them.

To use the simulator, software engineers need to write glue code that redirects calls from the original MCal functionality to the simulator's implementation using its API. We refer to this as adapters. The result is a fully runnable AUTOSAR application on Linux x86.

How AI-augmented fuzz testing works 

Watch this 10-minute video to learn how fuzz testing can automatically analyze your code base and generate relevant fuzz tests for you, diminishing the manual work from a few days to a few minutes. Speaker - Khaled Yakdan, Chief Product Officer and a fuzz testing guru.

Preventing and fixing buffer overflows

Preventing buffer overflows starts with writing secure code, particularly when working in languages like C and C++. Some best practices include:

  • Bounds Checking: Always check the size of inputs before writing to buffers. Use safe functions like strncpy instead of strcpy to limit the number of characters written to a buffer.
  • Memory Protection Mechanisms: Modern systems employ techniques like Address Space Layout Randomization (ASLR) and Data Execution Prevention (DEP) to make buffer overflow exploits harder.
  • Code Reviews: Regular code reviews can help identify insecure coding patterns that could lead to buffer overflows.
  • Static Analysis and Fuzz Testing: Incorporating tools like AddressSanitizer and CI Fuzz during development ensures that buffer overflow issues are detected early. With fuzzing, you can also identify the root cause and fix a bug quickly. 

Global Buffer overflow
Example of a Global Buffer overflow uncovered by CI Fuzz. 

With fuzz testing, all uncovered issues are pinpointed to the exact line of code in the repository and accompanied by inputs that triggered an issue and clear actions to remediate those. Simply load the crashing input into the debugger and see what is causing it.

Once a buffer overflow vulnerability is detected, fixing it involves proper bounds checking, replacing vulnerable code, and conducting thorough testing to ensure the issue is fully resolved.

Buffer overflows in embedded systems

Buffer overflows are a particular concern in embedded systems, where resources are often limited and security features like ASLR or DEP may not always be available. Embedded systems are found in industries such as automotive, medtech, and manufacturing, where the consequences of a buffer overflow can be catastrophic.

For example, in the automotive industry, a buffer overflow in the software controlling a car’s systems could lead to unintended acceleration or brake failure, putting lives at risk. In medical devices, a buffer overflow could disrupt the operation of life-saving equipment, leading to serious harm.

To prevent buffer overflows in embedded systems, developers must follow strict coding standards like MISRA C and adopt secure coding practices. Regular testing with fuzzing and static analysis tools is also crucial, as embedded systems often have limited security monitoring once deployed.

Uncover buffer overflows in minutes with fuzzing

If you develop critical products using C/C++, ensure that you uncover all memory corruption issues early in the testing process. Read more about fuzz testing here or book a call with Code Intelligence’s experts to see AI-powered fuzzing in action. 

Industry leaders like Google, Continental, CARIAD, and Bosch have been mitigating the risks of delayed releases, costly repairs, critical system malfunctions, and cyber attacks with Code Intelligence.