I wrote this post a long time ago but I never had the chance to publish it. Someone pinged me and I decided to finally make this publicly available. This is a bit different to my other blogposts: it has a more academic tone and has a focus on CHERI. The analysis of temporal safety bugs is repeated in my other blogpost, the difference being that the focus of this blogpost is CHERI (a hardware memory safety mitigation, see more details at https://cheri-alliance.org/).
(This paragraph is only introducing AIxCC) The DARPA Artificial Intelligence Cyber Challenge (AIxCC) is a competition where competitors have to create AI-driven automated systems to find, trigger and patch vulnerabilities. The competition includes many Challenge Projects, which are real-world software modified to contain injected vulnerabilities. The injected vulnerabilities, Challenge Project Vulnerabilities (CPVs), are mostly based on past disclosed and fixed vulnerabilities and common developer mistakes such as fixed-size buffers, missing length checks, flawed linked-list traversal, etc.
The goal of the AIxCC program as I understand it is to “supercharge” fuzzing with AI, and beyond fuzzing, also identify the source of the problem and automatically patch software code. What has CHERI got to do with fuzzing, anyway? At most, CHERI could increase the sanitization capability of AddressSanitizer (ASan) and maybe also the performance of sanitized binaries (given a fast enough CHERI-enabled microarchitecture). Theoretically, the determinism and strictness of spatial safety provided by CHERI could allow a CHERI+ASan sanitizer to detect more kinds of buffer overflows, but the degree to which this is the case depends on whether such code idioms exist in real-world software.
To clarify, this post is not about fuzzing (I’ll elaborate on fuzzing in other posts/work, perhaps) but a security analysis of these specific bugs considering CHERI. The question is: to what extent will CHERI directly mitigate these “typical” vulnerabilities? Specifically, will triggering these vulnerabilities immediately lead to deterministic crashes, coercing potential arbitrary code execution into “fail stop” (an immediate crash), clearly preventing further compromise?
One of the challenge projects is Nginx. As there is already a port of Nginx to CHERI purecap for execution in CheriBSD’s CheriABI process environment [5], with only minor source modifications, it required relatively low effort to port the AIxCC version of Nginx, with all the injected vulnerabilities in place, in the context of CHERI.
The challenge is designed and tested to be run in a Linux host but we will run Nginx in CheriBSD instead, as the CHERI Linux port is still under development and doesn’t yet support features such as userspace heap temporal safety (I’m not sure what’s the state of the project right now). Nginx is well supported and widely used on FreeBSD, but the implementation of some features are platform-specific. There is some platform-dependent code in Nginx that affects the reproducibility of CPVs, so I have made some best-effort tweaks to support CheriBSD/FreeBSD except as specifically stated. The hardware I used is an Arm Morello SoC [6] – which incorporates a prototype CHERI-enhanced version of the Neoverse N1 core.
TL;DR
- CHERI mitigates all of the spatial safety CPVs (and theoretically CPV12, which is not reachable in FreeBSD).
- CPV15 is an intra-object overflow that can be caught by CHERI sub-object bounds, while ASan can only catch such bugs indirectly. (Admittedly, intra-object overflow bugs seem rare as also pointed out by Apple Security Research).
- The heap UAFs do not trigger a fault with CHERI, but if the attacker relies on the reallocation of the UAF object to carry on the attack, then caprevoke would make sure to invalidate the dangling pointers and dereferences of the dangling pointer would fault, thwarting the attack. However, there could be ways to exploit UAFs without reallocating the underlying UAF object, but that depends on the bug. Therefore, mitigation of these bugs in CHERI requires a case-by-case analysis. My other blog post analyses the exploitability of these CPVs and presents an RCE exploit. The exploit would not be feasible with the use-after-reallocation protection of CHERI in place.
- Double free is detected and mitigated by mrs, which is a shim to the system allocator that handles quarantining and interfacing with the kernel for capability revocation. As result of this work, we have now configured the mrs allocator to abort rather than mask double free (from CheriBSD 25.03 onwards) for debuggability.
- Then, while running Nginx with CHERI before I start my fuzzing effort, the cherified Nginx crashed on two buffer overflow bugs introduced in the AIxCC Nginx CPV yet not listed in the official AIxCC Nginx challenge repository; it is not yet clear to me the degree to which these would be exploitable to achieve arbitrary control flow (update: at least one is, which I can explain in another blogpost just as a fun exercise).
The following table summarises our findings (including the two extra bugs, bonus CPVs, or BCPVs):
| \ | Sanitizer crash | CHERI traps | CHERI mitigates | Notes | |
|---|---|---|---|---|---|
| CPV1 | heap-buffer-overflow | yes | yes | ||
| CPV2 | heap-buffer-overflow | yes | yes | ||
| CPV3 | heap-buffer-overflow | yes | yes | ||
| CPV4 | heap-buffer-overflow | yes | yes | ||
| CPV5 | SEGV | yes | yes | The impact is a NULL pointer dereference, generally unlikely to be exploitable. | |
| CPV8 | heap-buffer-overflow | yes | yes | ||
| CPV9 | heap-use-after-free | no | yes | The impact is at most NULL pointer dereference if we don’t alias this object with another. | |
| CPV10 | double-free | no/yes | yes | Double free can be configured to trap. | |
| CPV11 | heap-use-after-free | no | yes | Privileged information can be leaked due to UAF before revocation. But mrs prevents a heap pointer value leak through object aliasing. | |
| CPV12 | heap-buffer-overflow | N/A | N/A | Theoretically mitigated, but feature not ported to CheriBSD. | |
| CPV13 | SEGV | yes | yes | The impact is a NULL pointer dereference, generally unlikely to be exploitable. | |
| CPV14 | global-buffer-overflow | yes | yes | ||
| CPV15 | SEGV | yes | yes | Intra-object overflow. Object data corruption can also be caught by sub-object bounds. | |
| CPV17 | heap-use-after-free | no/yes | yes | The UAF can lead to double-free of memory pool blocks, which is caught by mrs. | |
| BCPV1 | heap-buffer-overflow | yes | yes | Linear buffer overflow. Occurs after a double free in CPV 10 in which ASAN had stopped execution, but also reachable via other inputs. | |
| BCPV2 | heap-buffer-overflow | yes | yes | Null byte poisoning. |
*For bugs where CHERI does not trap, we say that CHERI mitigates its exploitation if the impact of the bug for information disclosure or remote code execution is eliminated/reduced without CHERI.
Setting up Nginx
The first step is to cherify the AIxCC version of Nginx, and by cherification I mean ported C/C++ code to CHERI. There is an existing port of Nginx in CHERI purecap, as I said, and I wish I could simply cherry-pick all the relevant commits to AIxCC Nginx. However, the challenge project commit history has been scrubbed, and its commit messages are not helpful at all (on purpose). Fortunately, the cherification changes aren’t substantial, so I decided to manually cherify the challenge project, and it didn’t take me a long time. You can find my repository here: https://github.com/RoundofThree/challenge-004-nginx-source. You may want to skip this, but in case you are curious, these are the main cherification modifications:
- Nginx configuration of pointer size (capabilities in Arm Morello are 128-bit)
- Replace some uses of
uintptr_t(pointer type) andptraddr_t(integer type that can hold a pointer address) - Fix some instances of pointer arithmetic, like aligning up a pointer, which must have a correct source of provenance so the compiler knows how to correctly derive a capability
- Tighten bounds of memory allocations in Nginx’s custom allocator, otherwise spatial safety is to the granularity of memory mappings rather than individual allocations
- Expanding a full
ngx_array_twithngx_array_pushalways needs to allocate a new underlying container in purecap to avoid a bounds violation (because the bounds are strictly set and capabilities follow the principle of monotonicity) - Strengthen allocation alignment for the
ngx_auth_log_tpool to the pointer size, because capabilities must be stored 16-byte aligned.
Don’t forget that we are running on CheriBSD, a CHERI-extended version of the FreeBSD system, which implies that some features with vulnerable code added as part of the competition must also be compatible with BSD. These are:
- At least not make the host_specs feature code crash in FreeBSD. In AIxCC Nginx, this feature implementation reads from
/procwhich is not mounted by default in FreeBSD. Then,fclose(NULL)causes SEGFAULT in FreeBSD while it does not in recent Linux. I’m not going to fix the functionality of the host_specs feature because that’s irrelevant to analysis of the vulnerability. - The connection history feature is implemented by recording the connections in
ngx_epoll_process_events, but FreeBSD usesngx_kqueue_moduleinstead ofngx_epoll_module. I had to port the feature and hope I didn’t add new vulnerabilities. - The feature of sending a range of data from a resource with the option to reverse the data stream is implemented in
ngx_linux_sendfile_chain.c, which is specific to Linux as the name suggests… and no, FreeBSD does not share the same code. I haven’t ported this feature to FreeBSD, therefore CPV12 is not reproducible in FreeBSD. I risk introducing new bugs if I naively add the reverse sendfile feature to FreeBSD. At least, processing the HTTP request that is supposed to trigger CPV12 won’t lead to a crash.
As I said, in this work, our intention is not to fuzz Nginx. The AIxCC repository released the CPVs with trigger requests and patches. Some CPVs require certain settings in Nginx configuration, like exporting host_specs or last_ip. We can either use a single file to reproduce all the CPVs (reusing the file hardcoded in the AIxCC harness), or we can write a new configuration file for each CPV.
I wrote a simple and naive Python wrapper to start Nginx with a chosen configuration file, send the trigger request, and stop the Nginx service. For each CPV, we need a configuration file and a trigger request, and I grouped them into different folders to be input to the wrapper. We specify the coredump location to be /tmp/cores/. A program crash induced by memory corruption, be it CHERI SIGPROT, or ASan aborts, will terminate a worker process. At the end, we gracefully stop the Nginx server by terminating master and worker processes.
For example, to feed Nginx the trigger request of CPV3:
$ python3 test.py cp3 request.txt
NGINX started with config file: /usr/home/zyj20/challenge-004-nginx-source/cp3/test.conf
Response from server:
NGINX stopped
The configuration file cp3/test.conf is:
working_directory /tmp/cores/;
worker_rlimit_core 500M;
trace on;
events {
}
http {
server {
listen 127.0.0.1:8080;
server_name localhost;
location / {
return 200;
}
}
}
The error log shows that Nginx worker process exited due to CHERI SIGPROT (signal 34):
$ tail /usr/local/nginx/logs/error.log
2025/01/30 23:38:53 [alert] 70688#0: worker process 70689 exited on signal 34 (core dumped)
2025/01/30 23:38:53 [notice] 70690#0: signal process started
And a quick peek at the coredump, located at /tmp/cores shows that it hit a CHERI bounds fault due to the heap buffer overflow exercised in CPV3 (so a heap buffer OOB write is turned into a CHERI SIGPROT crash).
$ gdb objs/nginx /tmp/cores/nginx.70689.core
GNU gdb (GDB) 14.1 [GDB v14.1.d20240612 for FreeBSD]
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-unknown-freebsd15.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
GEF for freebsd ready, type `gef' to start, `gef config' to configure
89 commands loaded and 5 functions added for GDB 14.1 [GDB v14.1.d20240612 for FreeBSD] in 0.00ms using Python engine 3.9
[+] 15 extra commands added in 0.05 seconds
Reading symbols from objs/nginx...
[New LWP 100214]
Core was generated by `nginx: worker process'.
Program terminated with signal SIGPROT, CHERI protection violation.
Capability bounds fault.
And by the way, I am using a port of GEF to Morello I wrote, which makes debugging output more visual. I wrote a short overview of the features of the tool in another blog post.
Analysis of CPVs in CheriABI
CPVs 6, 7 and 16 were not released at the time of writing this, so we’ll ignore them. For reference, this table shows the released 14 CPVs and the bug kind (actually the ASan crash type):
| \ | Sanitizer crash |
|---|---|
| CPV1 | heap-buffer-overflow |
| CPV2 | heap-buffer-overflow |
| CPV3 | heap-buffer-overflow |
| CPV4 | heap-buffer-overflow |
| CPV5 | SEGV |
| CPV8 | heap-buffer-overflow |
| CPV9 | heap-use-after-free |
| CPV10 | double-free |
| CPV11 | heap-use-after-free |
| CPV12 | heap-buffer-overflow |
| CPV13 | SEGV |
| CPV14 | global-buffer-overflow |
| CPV15 | SEGV |
| CPV17 | heap-use-after-free |
Generally (but not always), triggering spatial safety bugs leads to CHERI crashes, but triggering temporal safety bugs may not lead to an immediate crash.
In the context of memory safety, bugs that don’t SIGPROT immediately may in fact be non-exploitable. The crash may be deferred and manifest later in the exploit flow (e.g., for UAF bugs and with CheriBSD’s use-after-reallocation protection using a quarantine, the crash happens when the attacker tries to use the old dangling pointer after the underlying memory is reallocated, which is a very common technique in exploitation). CHERI is intended to mitigate vulnerabilities rather than being a bug-finding tool, where the goal is an immediate synchronous trap.
Most of the CPVs are out-of-bounds accesses, that is, spatial safety violations, so we expect them to be mitigated by CHERI, generally causing CHERI crashes when triggered (except large buffers which may require imprecise bounds due to bounds compression). Theoretically, temporal safety violations like UAFs don’t cause an immediate CHERI SIGPROT crash if the freed pointer is used before the memory is reallocated, but whether their exploitation is mitigated requires further case-by-case analysis.
Notes on CHERI heap temporal memory safety
On this note, I should clarify that I do have caprevoke turned on in my testing CheriBSD kernel. Caprevoke (capability revocation) is a feature in CheriBSD that implements non-probabilistic C/C++ heap temporal safety in userspace processes by means of capability revocation. For performance reasons, CheriBSD quarantines freed memory until it is efficient to revoke any outstanding pointers, which must take place before memory can be released from quarantine for reallocation. You can read more about the research here in the initial Cornucopia paper [3] and the recent Cornucopia Reloaded [4] that follows it. For this blogpost, just know that CheriBSD default allocator jemalloc is wrapped with the mrs shim, which implements quarantining and interfaces with the kernel to revoke capabilities via cheri_revoke. As the manpage says,
The kernel exposes a CHERI capability revocation service, a mechanism to
revoke capabilities to regions of the address space. Requests for
revocation are made by setting bits in the shadow bitmap and invoking
the cheri_revoke() system call.
Allocators such as jemalloc, snmalloc, and so on, are modified to use mrs, which then quarantines allocations for later batch revocation – after which heap memory can be reused with the knowledge that there are no outstanding dereferenceable pointers from a prior allocation of that memory.
So, the question is, with caprevoke on (and assuming jemalloc+mrs), do these heap UAFs in CPV9, CPV11 and CPV17 cause a deterministic SIGPROT crash? A second question is, are these heap UAFs exploitable even if they don’t immediately cause a SIGPROT crash?
Lastly, we have bugs that cause segmentation fault and one double free bug.
In the version of CheriBSD used in this work (24.05), the impact of double free bugs, unlike UAFs, depends on the allocator internals. Snmalloc [1], for example, has strong protections against double free bugs, so double free bugs will lazily result in crashes because of the allocator freelist design. For our case of jemalloc+mrs, double free bugs are mitigated at the mrs layer because the mrs quarantine is gated using a bitmap in validate_freed_pointer. As a result of this work, future versions of CheriBSD’s mrs will abort immediately on double free, to ease use of CHERI for software debugging.
static inline void *
validate_freed_pointer(void *ptr)
{
/*
* Untagged check before malloc_underlying_allocation()
* catches NULL and other invalid caps that may cause a rude
* implementation of malloc_underlying_allocation() to crash.
*/
if (!cheri_gettag(ptr)) {
mrs_debug_printf("validate_freed_pointer: untagged capability addr %p\n",
ptr);
return (NULL);
}
void *underlying_allocation = REAL(malloc_underlying_allocation)(ptr);
if (underlying_allocation == NULL) {
mrs_debug_printf("validate_freed_pointer: not allocated by underlying allocator\n");
return (NULL);
}
/*mrs_debug_printf("freed underlying allocation %#p\n", underlying_allocation);*/
/*
* Here we use the bitmap to synchronize and make sure that
* our guarantee is upheld in multithreaded environments. We
* paint the bitmap to signal to the kernel what needs to be
* revoked, but we also gate the operation of bitmap painting,
* so that we can only successfully paint the bitmap for some
* freed allocation (and let that allocation pass onto the
* quarantine list) if it is legitimately allocated on the
* heap, not revoked, and not previously queued for
* revocation, at the time of painting.
*
* Essentially at this point we don't want something to end up
* on the quarantine list twice. If that were to happen, we
* wouldn't be upholding the principle that prevents heap
* aliasing.
// [...]
In the version of jemalloc+mrs that we initially validated with, double frees did not corrupt heap-allocator state – but they also did not immediately trap. As part of this research, we have modified mrs to instead generate a program abort to make double-free issues easier to identify.
Spatial safety bugs
We reproduced all the CPVs with their provided triggers to verify that they are mitigated by CHERI.
Heap buffer overflow detection and mitigation in CHERI
By feeding the blob in .internal_only/cpv1/blobs/, which contains a From field with an email that starts with two dots .. in a row, it causes a buffer underrun, and we can observe it from the SIGPROT bounds violation in the coredump.
Program terminated with signal SIGPROT, CHERI protection violation.
Capability bounds fault.
#0 0x00000000001b3cb0 in ?? ()
gef> disas /m 0xb3cb0
Dump of assembler code for function ngx_http_process_from:
// [...]
4093 if (*u == '.') { // CHERI crash CPV1 (bounds fault)
0x00000000000b3cb0 <+808>: ldrb w12, [c3, #-1]!
0x00000000000b3cb4 <+812>: mov w13, #0x2 // #2
0x00000000000b3cb8 <+816>: cmp w12, #0x2e
0x00000000000b3cbc <+820>: b.ne 0xb3cb0 <ngx_http_process_from+808> // b.any
0x00000000000b3cc0 <+824>: b 0xb3d4c <ngx_http_process_from+964>
0x00000000000b3cc4 <+828>: mov w13, wzr
0x00000000000b3cc8 <+832>: mov x0, xzr
0x00000000000b3ccc <+836>: sub c0, c0, #0x5
0x00000000000b3cd0 <+840>: cmp x0, #0x0
0x00000000000b3cd4 <+844>: cset w14, eq // eq = none
We see that at ldrb w12, [c3, #-1]!, the c3 capability register is offset by -1 and is then being dereferenced, but 0x41493000 - 1 is not in bounds of [0x41493000-0x41495000], which explains the SIGPROT bounds violation.
gef> p $c3
$1 = () 0x41493000 [rwRW,0x41493000-0x41495000]
I won’t write the details of debugging the other buffer overflow CPVs. For the sake of rigor, I tested and checked the coredumps for all the spatial safety CPVs:
| CPVs related to spatial safety | CHERI crash triggered? | Where?* |
|---|---|---|
| CPV1 | yes | src/http/ngx_http_request.c:4093 |
| CPV2 | yes | src/http/ngx_http_core_module.c:1994 |
| CPV3 | yes | src/http/ngx_http_request.c:4217 |
| CPV4 | yes | src/http/ngx_http_core_module.c:5295 |
| CPV8 | yes | src/mail/ngx_mail_pop3_handler.c:337 |
| CPV12 | no | N/A |
| CPV14 | yes | src/http/modules/ngx_http_rewrite_module.c:178 |
*Line numbers in my cherified AIxCC Nginx source repository.
Feeding the CPV12 vulnerable blob does not trigger a CHERI crash because the intended bug is in the ngx_sendfile_r function in src/os/unix/ngx_linux_sendfile_chain.c, and Nginx in CheriBSD does not reach that Linux-specific code path. I can probably add that vulnerability to src/os/unix/ngx_freebsd_sendfile_chain.c too… but the CPV12 is about hardcoding a buffer size and I would say, we can safely assume that the process would crash due to CHERI if we could trigger the vulnerability.
Apart from CPV12, all other spatial safety bugs are mitigated by CHERI.
CPV1: Extra bug?
While playing around with AIxCC, Nginx crashed due to a CHERI bounds fault. This was the seemingly inoffensive blob:
GET / HTTP/1.1
Host: localhost
Connection: close
From: test@test.com
It turns out, at src/http/ngx_http_request.c:4139, this input triggers a NULL byte overflow.
4138 if (state == sw_tld) {
0x00000000000b3d6c <+996>: cmp w13, #0x4
0x00000000000b3d70 <+1000>: b.ne 0xb3d80 <ngx_http_process_from+1016> // b.any
0x00000000000b3d74 <+1004>: mov x0, xzr
4139 *u = '\0'; // CHERI crash unintended CPV1 (bounds fault)
0x00000000000b3d78 <+1008>: strb wzr, [c3]
0x00000000000b3d7c <+1012>: b 0xb3da0 <ngx_http_process_from+1048>
u is allocated with size from->len. If the execution doesn’t reach the line return NGX_DECLINED and so the for-loop executes from->len times, then u will be incremented from->len times.
static ngx_int_t
ngx_http_validate_from(ngx_str_t *from, ngx_pool_t *pool, ngx_uint_t alloc)
{
// [...]
if (alloc) {
u = ngx_palloc(pool, from->len); // allocated here
if (u == NULL) {
return NGX_ERROR;
}
} else {
u = from->data;
}
for (i = 0; i < from->len; i++) {
ch = f[i];
switch (state) {
case sw_begin:
if (isalnum(ch) || ch == '-' || ch == '_') {
state = sw_username;
} else if (ch == '.') {
state = sw_username_dot;
} else {
return NGX_DECLINED;
}
*u++ = ch;
break;
// [...]
Then, u will point to base(u) + from->len, which is one byte past the allocated buffer, so the following code causes a NULL byte poisoning when trying to terminate the string with a NULL byte.
if (state == sw_tld) {
*u = '\0'; // CHERI crash unintended CPV1 (bounds fault)
if (alloc) {
from->data = u;
}
return NGX_OK;
} else {
return NGX_DECLINED;
}
It may not seem like a big deal, but history teaches us that NULL byte poisoning might be turned into powerful exploits [2].
CPV15: Sub-object bounds as intra-object overflow sanitization
I didn’t add CPV15 to the table of spatial safety bugs because the official AIxCC Nginx challenge release marked its sanitizer output as SEGV. However, I observed that CPV15 is actually an intra-object overflow, that is, reads/writes past an object field that overflows to other fields in the same object.
By feeding the trigger blob in CPV15 to AIxCC Nginx in Ubuntu, the Nginx worker process crashes due to segmentation fault. However, feeding the same blob to AIxCC Nginx (CheriABI, purecap) in CheriBSD does not trigger a crash. This led me to think that this may be related to the difference in structure sizes and alignment due to a larger pointer size on CHERI. So I increased the length of the uid input to trigger a crash successfully.
# cp15/request.txt
GET / HTTP/1.1
Host: localhost
Cookie: uid=YWFhYWFhYWFhYWFhYWFhYWJiYmJiYmJiYmJiYmJiYmJjY2NjY2NjY2NjY2NjY2Nj;;
Note that base64decode(YWFhYWFhYWFhYWFhYWFhYWJiYmJiYmJiYmJiYmJiYmJjY2NjY2NjY2NjY2NjY2Nj) is aaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbcccccccccccccccc. So I added more letters and base64-encoded it.
# cp15/request1.txt
GET / HTTP/1.1
Host: localhost
Cookie: uid=YWFhYWFhYWFhYWFhYWFhYWJiYmJiYmJiYmJiYmJiYmJjY2NjY2NjY2NjY2NjY2NjZGRkZGRkZGRkZGRkZGRkZGVlZWVlZWVlZWVlZWVlZWVmZmZmZmZmZmZmZmZmZmZm;;
From the coredump in Ubuntu, I could see that the segmentation fault was due to trying to dereference a corrupted pointer. From the coredump in CheriBSD using the long version of the trigger blob, the crash was due to SIGPROT bounds violation inside the ngx_decode_base64_internal function. A closer look at it revealed that we are overflowing the uid_got field of ngx_http_userid_ctx_t. The shorter version of the blob can only overflow until ngx_http_userid_ctx_t->cookie.len in a purecap binary because capabilities are stored 16-byte aligned, so there is a padding between ngx_http_userid_ctx_t->cookie.len and ngx_http_userid_ctx_t->cookie.data.
gef> p $c4
$1 = () 0x415bcd6e [rwRW,0x415bcd20-0x415bcd70]
gef> p *(ngx_http_userid_ctx_t *)0x415bcd20
$2 = {
uid_got = {0x61616161, 0x61616161, 0x61616161, 0x61616161},
uid_set = {0x62626262, 0x62626262, 0x62626262, 0x62626262},
cookie = {
len = 0x6363636363636363,
data = 0x6464646464646464 [wx,0x64646464646464-0x64646464646464] (invalid,sealed) <error: Cannot access memory at address 0x6464646464646464>
},
reset = {tag = 0, address = 0x6565656565656565, permissions = {[ Global User0 User2 CompartmentID BranchUnseal Unseal StoreLocalCap Execute Store ] otype = 0x4aca, range = [0x65656565656565 - 0x65656565656565)}}
}
*Overflow from uid_got to other object fields in a ngx_http_userid_ctx_t object.
At this point I realised I forgot to enable CHERI sub-object bounds, which would catch this intra-object overflow without relying on the fact that a pointer was corrupted. Oops. So I compiled again with -Xclang -cheri-bounds=subobject-safe and tested the short version of the trigger blob.
[New LWP 100215]
Core was generated by `nginx: worker process'.
Program terminated with signal SIGPROT, CHERI protection violation.
Capability bounds fault.
1329 *d++ = (u_char) (basis[s[1]] << 4 | basis[s[2]] >> 2);
0x0000000000072008 <+144>: ldrb w9, [c1, #2]
0x000000000007200c <+148>: ldrb w10, [c1, #1]
0x0000000000072010 <+152>: ldrb w9, [c2, x9]
0x0000000000072014 <+156>: ldrb w10, [c2, x10]
0x0000000000072018 <+160>: lsr w9, w9, #2
0x000000000007201c <+164>: orr w9, w9, w10, lsl #4
0x0000000000072020 <+168>: strb w9, [c4, #1] <==
gef> p $c4
$1 = () 0x415bcd2f [rwRW,0x415bcd20-0x415bcd30]
gef> p *(ngx_http_userid_ctx_t *)0x415bcd20
$2 = {
uid_got = {0x61616161, 0x61616161, 0x61616161, 0x61616161},
uid_set = {0x0, 0x0, 0x0, 0x0},
cookie = {
len = 0x40,
data = 0x4156e82b [rwRW,0x4156e800-0x4156ec00] "YWFhYWFhYWFhYWFhYWFhYWJiYmJiYmJiYmJiYmJiYmJjY2NjY2NjY2NjY2NjY2Nj;;"
},
reset = 0x0
}
*The victim ngx_http_userid_ctx_t object at the time of crash.
We can observe that with sub-object bounds, the process crashes right when the object field overflow happens.
By default, ASan does not catch intra-object overflows because inserting a redzone padding between object fields can be problematic (and that’s why the intra-object-overflow feature is still experimental?). Plus, the current experimental feature of intra-object-overflow only supports C++ applications (and Nginx is written in C). Therefore, not every fuzzing input that triggers this bug would trigger a sanitizer fault, and only an overflow that is long enough to corrupt critical data structures (in this case a pointer) will trigger a crash. The advantage of CHERI sub-object bounds is that this kind of overflow can be detected regardless of the ability of the bug to corrupt the program internal state.
In terms of exploitation, even if we don’t have sub-object bounds compiled in the binary, an attack that relies on corrupting the ngx_http_userid_ctx_t->cookie.data pointer would be infeasible due to the pointer integrity protection of CHERI capabilities. In informal words, overwriting a capability with data will clear the tag of the capability, making it invalid.
Temporal safety bugs
CPV9: NULL pointer dereference or more?
The heap UAF in CPV9, as triggered by the officially released vulnerable blob, results in a crash but not because of CHERI SIGPROT. CPV9 crashes due to a NULL pointer dereference as a result of UAF, either in CheriBSD or Ubuntu. Is it exploitable? Can we craft HTTP requests that don’t result in a NULL dereference?
The bug lies in the deletion of blacklist entries in ngx_black_list_remove. A blacklist object consists of a IP pointer to a string object ngx_str_t, and pointers to prev and next blacklist entries, linked in a doubly-linked list.
typedef struct ngx_black_list_s {
ngx_str_t *IP;
ngx_black_list_t *next;
ngx_black_list_t *prev;
}ngx_black_list_t;
*The ngx_black_list_s structure.
In ngx_black_list_remove, the linked list is traversed until an entry that has a matching IP is found. Now, consider the scenario where the list is empty so reader is NULL: a NULL dereference would happen in the for-loop due to accessing the next field in reader = reader->next.
This is not all. Now, consider the scenario where remove_ip matches the head of the linked list. The condition of the first if-statement will be satisfied, and the entry will be cleaned up and freed in ngx_destroy_black_list_link. However, the next and prev fields of the deleted node are not cleared, and the head of the linked list is not updated. Therefore, a subsequent use of the blacklist will always start traversing from the head, which is a dangling pointer to a node whose IP pointer is NULL. The dangling pointer can be used at subsequent ngx_black_list_insert, ngx_black_list_remove and ngx_is_ip_banned calls, and in all these cases, the worker process will crash due to NULL pointer dereference if IP is NULL. I couldn’t find ways to exercise a write using this dangling pointer.
If we trigger a reallocation of the memory pointed by this dangling pointer, we could be able to write a pointer to IP and avoid crashing the process. Note that the blacklist nodes are allocated via ngx_alloc instead of ngx_palloc, which means that they are allocated through the system allocator. This attack strategy would be a no-go for CHERI because the system allocator jemalloc+mrs offers use-after-reallocation protection. In simple terms, the dangling pointer is invalidated before the underlying allocation is allocated to another object, defeating this object aliasing strategy. I want to try to write a PoC exploit for this (in non-CHERI) but this will be for a future blog post (no promises).
ngx_int_t
ngx_black_list_remove(ngx_black_list_t **black_list, u_char remove_ip[])
{
ngx_black_list_t *reader;
reader = *black_list;
if (reader && !ngx_strcmp(remove_ip, reader->IP->data)) { // CHERI crash CPV9 (due to NULL pointer dereference, a product of UAF)
ngx_destroy_black_list_link(reader);
return NGX_OK;
}
for (reader = reader->next; reader && reader->next; reader = reader->next) {
if (!ngx_strcmp(remove_ip, reader->IP->data)) {
ngx_double_link_remove(reader);
ngx_destroy_black_list_link(reader);
return NGX_OK;
}
}
return NGX_ERROR;
}
*The ngx_black_list_remove function.
#define ngx_destroy_black_list_link(x) \
ngx_memzero((x)->IP->data, (x)->IP->len); \
ngx_free((x)->IP->data); \
(x)->IP->data = NULL; \
ngx_memzero((x)->IP, sizeof(ngx_str_t)); \
ngx_free((x)->IP); \
(x)->IP = NULL; \
ngx_memzero((x), sizeof(ngx_black_list_t)); \
ngx_free((x)); \
(x) = NULL;
*The ngx_destroy_black_list_link macro that cleans up the node.
The takeaway of this bug is that further uses of the object pointed by the dangling pointer lead to NULL dereference. If we overlap another object with the memory pointed by the dangling pointer (object aliasing), this bug may be more useful in the absence of CHERI (turns out nope, see my other blog post for a detailed analysis). But that seems not feasible with mrs.
CPV11: Host specifications disclosure due to heap UAF read
CPV11 does not crash the Nginx process and it prints the host specifications even without remote admin privileges because the UAF buffer contains the host specifications. The object cycle->host_specs is allocated in ngx_init_cycle, and its fields host_cpu, host_mem and host_os are initialised immediately after:
// [...]
cycle->host_specs->host_cpu = ngx_alloc(sizeof(ngx_str_t), log);
if (cycle->host_specs->host_cpu == NULL) {
ngx_destroy_pool(pool);
return NULL;
}
cycle->host_specs->host_cpu->data = (u_char*)"Unknown CPU\n";
ngx_memzero(line, NGX_MAX_HOST_SPECS_LINE);
fp = fopen("/proc/cpuinfo", "r");
if (fp != NULL) {
temp_char = NULL;
while (fgets(line, sizeof(line), fp) != NULL) {
if (ngx_strncmp(line, "model name", 10) == 0) {
temp_char = strchr(line, ':');
if (temp_char != NULL) {
temp_char += 2;
cycle->host_specs->host_cpu->data = ngx_alloc(sizeof(line), log);
if (cycle->host_specs->host_cpu->data == NULL) {
break;
}
ngx_memzero(cycle->host_specs->host_cpu->data, sizeof(line));
cycle->host_specs->host_cpu->len = \
ngx_sprintf(cycle->host_specs->host_cpu->data, "%s", temp_char) - \
cycle->host_specs->host_cpu->data;
break;
}
}
}
fclose(fp);
}
// [...]
The issue is, immediately after, the code checks that we configured remote_admin and if not, it frees cycle->host_specs. What about all the memory allocations pointed by this object?
ccf = (ngx_core_conf_t *) ngx_get_conf(cycle->conf_ctx, ngx_core_module);
if (!ccf->remote_admin) {
ngx_free(cycle->host_specs);
}
A quick grep concludes that the object cycle->host_specs is used in ngx_http_get_host_specs (with the other use in ngx_master_process_exit, which tears down cycle->host_specs). The dangling pointer can be dereferenced in ngx_http_get_host_specs to print host specifications even if remote_admin is not enabled. This is possible even with mrs revocation because this dangling pointer is not revoked yet when it’s in the quarantine (which is most likely the case).
static ngx_int_t ngx_http_get_host_specs(ngx_http_request_t *r,
ngx_http_variable_value_t *v, uintptr_t data)
{
u_char *temp;
v->data = ngx_pnalloc(r->pool, NGX_MAX_HOST_SPECS_LINE * 3);
if (v->data == NULL) {
return NGX_HTTP_INTERNAL_SERVER_ERROR;
}
ngx_memzero(v->data, NGX_MAX_HOST_SPECS_LINE * 3);
temp = v->data;
v->data = ngx_sprintf(v->data, "%s", r->cycle->host_specs->host_cpu->data); // NO CHERI crash CPV11 (UAF)
v->data = ngx_sprintf(v->data, "%s", r->cycle->host_specs->host_mem->data);
v->data = ngx_sprintf(v->data, "%s", r->cycle->host_specs->host_os->data);
v->len = v->data - temp;
v->data = temp;
return NGX_OK;
}
If we disable mrs, maybe we can use this as a read primitive. Not sure how useful this is though, as, among things, leaked pointer values cannot be reinjected on CHERI due to tagging.
Another subtle issue that doesn’t concern Linux distros in general but affects FreeBSD/CheriBSD is setting the data field of a ngx_str to a literal string, in: cycle->host_specs->host_cpu->data = (u_char*)"Unknown CPU\n". Later in ngx_master_process_exit, the string literal is freed. This is generally not an issue in Linux distros because the data field is dynamically allocated with ngx_alloc if /proc is mounted.
Anyway, this seems useless for the attacker because ngx_master_process_exit calls exit(0) in the end.
static void
ngx_master_process_exit(ngx_cycle_t *cycle)
{
ngx_uint_t i;
if (cycle->host_specs) {
if (cycle->host_specs->host_cpu) {
ngx_free(cycle->host_specs->host_cpu->data);
cycle->host_specs->host_cpu->data = NULL;
ngx_free(cycle->host_specs->host_cpu);
cycle->host_specs->host_cpu = NULL;
}
if (cycle->host_specs->host_mem) {
ngx_free(cycle->host_specs->host_mem->data);
cycle->host_specs->host_mem->data = NULL;
ngx_free(cycle->host_specs->host_mem);
cycle->host_specs->host_mem = NULL;
}
if (cycle->host_specs->host_os) {
ngx_free(cycle->host_specs->host_os->data);
cycle->host_specs->host_os->data = NULL;
ngx_free(cycle->host_specs->host_os);
cycle->host_specs->host_os = NULL;
}
ngx_free(cycle->host_specs);
cycle->host_specs = NULL;
}
// [...]
The takeaway for this bug is: the dangling pointer can be dereferenced in ngx_http_get_host_specs to print host specifications even if remote_admin is not enabled. I couldn’t find any other uses of this dangling pointer before it gets discarded in ngx_master_process_exit, terminating the process. In practice, this is also feasible with CHERI and mrs revocation.
CPV17: UAF that leads to a double free?
Triggering the heap UAF in CPV17 logs an application error because the UAF object, s->connection, has its write event object passed to ngx_mail_send in ngx_mail_session_internal_server_error, and the fd corresponding to s->connection->write is closed at the first free (ngx_mail_close_connection), therefore causing a send() failed (9: Bad file descriptor) error. Triggering the bug does not crash the worker process in the version of CheriBSD I used in this work (24.05).
2025/02/04 00:06:22 [alert] 21598#0: *2 send() failed (9: Bad file descriptor) while in auth state, client: 127.0.0.1, server: 0.0.0.0:8080
2025/02/04 00:06:22 [alert] 21598#0: *2 connection already closed while in auth state, client: 127.0.0.1, server: 0.0.0.0:8080
*Nginx log snippet after triggering CPV17 using the officially released trigger blob.
According to the official CPV information,
This function attempts to access the freed connection structure, which leads to a crash via a UAF.
However, it doesn’t trigger a crash in CheriBSD. Hmmm.
What is the implication of this UAF, then? ngx_mail_send will call ngx_mail_close_connection because the fd is cleared.
void
ngx_mail_send(ngx_event_t *wev)
{
ngx_int_t n;
ngx_connection_t *c;
ngx_mail_session_t *s;
ngx_mail_core_srv_conf_t *cscf;
c = wev->data;
s = c->data;
if (wev->timedout) {
ngx_log_error(NGX_LOG_INFO, c->log, NGX_ETIMEDOUT, "client timed out");
c->timedout = 1;
ngx_mail_close_connection(c);
return;
}
if (s->out.len == 0) {
if (ngx_handle_write_event(c->write, 0) != NGX_OK) {
ngx_mail_close_connection(c);
}
return;
}
n = c->send(c, s->out.data, s->out.len);
// [...]
if (n == NGX_ERROR) {
ngx_mail_close_connection(c);
return;
}
// [...]
Calling twice ngx_mail_close_connection on the same connection object means calling ngx_close_connection and ngx_destroy_pool twice. Calling ngx_close_connection is not useful because it checks that fd is not -1. Calling ngx_destroy_pool twice on the same pool object can potentially corrupt the internal state of the memory allocator? In ngx_destroy_pool, the registered cleanup handler functions are called, large allocations associated with the pool are freed and the pool blocks are freed using ngx_free. With mrs, freeing them a second time is a no-op. So, we are left with the registered cleanup handlers which will be called twice… but they seem useless from a quick glance.
void
ngx_mail_close_connection(ngx_connection_t *c)
{
ngx_pool_t *pool;
ngx_log_debug1(NGX_LOG_DEBUG_MAIL, c->log, 0,
"close mail connection: %d", c->fd);
#if (NGX_MAIL_SSL)
if (c->ssl) {
if (ngx_ssl_shutdown(c) == NGX_AGAIN) {
c->ssl->handler = ngx_mail_close_connection;
return;
}
}
#endif
#if (NGX_STAT_STUB)
(void) ngx_atomic_fetch_add(ngx_stat_active, -1);
#endif
c->destroyed = 1;
pool = c->pool;
ngx_close_connection(c);
ngx_destroy_pool(pool);
}
The takeaway of this bug is that calling ngx_destroy_pool twice can lead to a double free of pool blocks and large allocations. Whether that can be used to build an exploit is not completely clear, but the double free is converted to a no-op by mrs – or, in the 25.03 release of CheriBSD, and abort().
A little conclusion
With caprevoke on, heap allocations are quarantined by mrs before (eventually) performing sweeping revocation of the capabilities. Dangling pointers to quarantined and freed allocations are still valid capabilities, therefore dereferencing a capability to a quarantined allocation is allowed in CHERI with caprevoke. Only after the memory is reallocated, dangling pointers are rendered invalid by means of sweeping revocation.
Basically, we can infer that CHERI did not deterministically trigger faults for heap UAFs. The guarantee provided by CHERI heap temporal safety is that use-after-reallocation is prevented – i.e., attackers cannot take advantage of memory aliasing to create, for example, type confusion – but any UAF triggered before its reallocation would not fault. For example, in these specific CPV triggers, the dangling pointer is used before the memory is reallocated, therefore such inputs won’t trigger faults. That means that CHERI alone is unreliable as a sanitizer for temporal safety bugs.
But thinking of CHERI as a mitigation rather than a sanitizer, can we construct an exploit that doesn’t use the dangling pointer to overlap with another object? This depends on the codebase I think. My claim is that, given the specific UAFs we have, we can’t achieve anything but crashes if we are not able to use the dangling pointer after the underlying memory is reallocated. (FYI: I explored how we can exploit these three CPVs in another blog post and the same approach would not be possible for CHERI for the reason stated above) (Please correct me if I am missing a point in my code analysis).
CPV10: Extra bug?
I mentioned before that the double free in CPV10 is mitigated by jemalloc+mrs in CheriBSD because mrs implements pointer validation. However, in its existing CheriBSD 24.05 configuration (in which double free is effectively disregarded, rather than aborting the process), unlike with ASAN, it is possible to continue execution past the second free(), allowing later code to run with memory-safety enforcement as well.
static ngx_int_t
ngx_http_process_prefer(ngx_http_request_t *r, ngx_table_elt_t *h,
ngx_uint_t offset)
{
ngx_table_elt_t *p;
if (r->headers_in.prefer) {
ngx_log_error(NGX_LOG_INFO, r->connection->log, 0,
"client sent duplicate host header: \"%V: %V\", "
"previous value: \"%V: %V\"",
&h->key, &h->value, &r->headers_in.prefer->key,
&r->headers_in.prefer->value);
ngx_free(r->headers_in.prefer); // NO CHERI crash in CPV10 (double free)
return NGX_OK;
}
// [...]
As a result, I found another bug that is likely introduced when developing the Prefer feature in CPV10, which causes a bounds violation fault in CHERI.
[New LWP 100214]
Core was generated by `nginx: worker process'.
Program terminated with signal SIGPROT, CHERI protection violation.
Capability bounds fault.
625 /* the end of HTTP header */
626 *b->last++ = CR; *b->last++ = LF; // or here, CHERI crash unintended CPV10 (bounds fault)
0x00000000000cff38 <+2724>: ldr c0, [c2, #16]
0x00000000000cff3c <+2728>: add c1, c0, #0x1
0x00000000000cff40 <+2732>: str c1, [c2, #16]
0x00000000000cff44 <+2736>: strb w8, [c0]
0x00000000000cff48 <+2740>: ldr c0, [c2, #16]
0x00000000000cff4c <+2744>: mov w8, #0xa // #10
0x00000000000cff50 <+2748>: add c1, c0, #0x1
0x00000000000cff54 <+2752>: str c1, [c2, #16]
0x00000000000cff58 <+2756>: strb w8, [c0] <==
gef> p $c0
$1 = () 0x415b8f9a [rwRW,0x415b8ea0-0x415b8f9a]
In ngx_http_header_filter, the developers of AIxCC added a new header.
if (r->headers_in.prefer) { // XXXR3: unintended bug that can be triggered by CPV10 testcase
b->last = ngx_cpymem(b->last, "Prefer: ",
sizeof("Prefer: ") - 1);
b->last = ngx_cpymem(b->last, r->headers_in.prefer->value.data,
r->headers_in.prefer->value.len);
*b->last++ = CR; *b->last++ = LF;
}
This buffer b was allocated in b = ngx_create_temp_buf(r->pool, len); with len being the computed size of the headers. We can see that, for example, the number of bytes consumed by Last-Modified header was added to len. However, the Prefer header size was not accounted for when computing len.
// [...]
if (r->headers_out.last_modified == NULL
&& r->headers_out.last_modified_time != -1)
{
len += sizeof("Last-Modified: Mon, 28 Sep 1970 06:00:00 GMT" CRLF) - 1;
}
// [...]
if (r->headers_out.last_modified == NULL
&& r->headers_out.last_modified_time != -1)
{
b->last = ngx_cpymem(b->last, "Last-Modified: ",
sizeof("Last-Modified: ") - 1);
b->last = ngx_http_time(b->last, r->headers_out.last_modified_time);
*b->last++ = CR; *b->last++ = LF;
}
// [...]
I observed that the following request does not trigger an overflow:
GET / HTTP/1.1
Host: localhost
Connection: close
Prefer: 1234567
But if I add one more byte, it crashes due to bounds violation:
GET / HTTP/1.1
Host: localhost
Connection: close
Prefer: 12345678
Regardless, in my opinion, this heap buffer overflow due to wrong buffer size is likely due to a mistake when developing the Prefer feature.
Extra notes
You may have noticed that the CPV descriptions comment on the impact of the bug, and many CPV descriptions write something like the following:
This vulnerability causes NGINX to crash thus denying service to its clients. The intentional crash of a service is called “Denial of Service” or DoS.
Indeed, DoS is a security issue. And well, CHERI would not mitigate against attacks with the aim to cause DoS because of its “fail-stop” behavior.
Acknowledgements
I would like to thank Professor Robert Watson for reviewing this write-up and providing many insightful suggestions.
References
[1] Paul Liétar, Theodore Butler, Sylvan Clebsch, Sophia Drossopoulou, Juliana Franco, Matthew J. Parkinson, Alex Shamis, Christoph M. Wintersteiger, and David Chisnall. 2019. Snmalloc: a message passing allocator. In Proceedings of the 2019 ACM SIGPLAN International Symposium on Memory Management (ISMM 2019). Association for Computing Machinery, New York, NY, USA, 122–135. https://doi.org/10.1145/3315573.3329980
[2] Project Zero. 2014. The poisoned NUL byte, 2014 edition. Retrieved from https://googleprojectzero.blogspot.com/2014/05/the-poisoned-nul-byte-2014-edition.html
[3] N. Wesley Filardo et al., “Cornucopia: Temporal Safety for CHERI Heaps,” 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 2020, pp. 608-625, doi: 10.1109/SP40000.2020.00098
[4] Nathaniel Wesley Filardo, Brett F. Gutstein, Jonathan Woodruff, Jessica Clarke, Peter Rugg, Brooks Davis, Mark Johnston, Robert Norton, David Chisnall, Simon W. Moore, Peter G. Neumann, and Robert N. M. Watson. 2024. Cornucopia Reloaded: Load Barriers for CHERI Heap Temporal Safety. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS ‘24), Vol. 2. Association for Computing Machinery, New York, NY, USA, 251–268. https://doi.org/10.1145/3620665.3640416
[5] Brooks Davis, Robert N. M. Watson, Alexander Richardson, Peter G. Neumann, Simon W. Moore, John Baldwin, David Chisnall, Jessica Clarke, Nathaniel Wesley Filardo, Khilan Gudka, Alexandre Joannou, Ben Laurie, A. Theodore Markettos, J. Edward Maste, Alfredo Mazzinghi, Edward Tomasz Napierala, Robert M. Norton, Michael Roe, Peter Sewell, Stacey Son, and Jonathan Woodruff. 2019. CheriABI: Enforcing Valid Pointer Provenance and Minimizing Pointer Privilege in the POSIX C Run-time Environment. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ‘19). Association for Computing Machinery, New York, NY, USA, 379–393. https://doi.org/10.1145/3297858.3304042
[6] Richard Grisenthwaite, Graeme Barnes, Robert N. M. Watson, Simon W. Moore, Peter Sewell, and Jonathan Woodruff. 2023. The Arm Morello Evaluation Platform—Validating CHERI-Based Security in a High-Performance System. IEEE Micro 43, 3 (May-June 2023), 50–57. https://doi.org/10.1109/MM.2023.3264676