In lock-free atomic C/C++ programming, can memory reordering affect the intended result?
10:48 30 May 2026

This title is a bit obscure, but please forgive me for not being able to think of a better way to describe it at the moment.

To more clearly articulate my question, I may need to elaborate extensively. If you'd like to save time, you can directly refer to Case 3 and 4. However, I would be extremely grateful if you were willing to carefully read through my question!

Background

I'm not a complete novice in lock-free programming; on the contrary, I'm quite familiar with concepts like memory order, memory barriers, and global visibility. Because of this, I'd like to use a lighter memory order in lock-free programming to improve performance, such as replacing `memory_order_seq_cst` with `acquire/release`, or even `relaxed`. However, sometimes I'm unsure if this will lead to unexpected results.

I will use several abstracted specific cases to illustrate my issue. In the following cases, there is no dynamic memory free, so the UAF problem does not need to be considered.

Case1

Imagine an initialized singly linked list with 100 nodes. Multiple threads concurrently attempt to traverse the list, while another thread attempts to corrupt node 49.

struct node {
    struct node *next;
} nodes[100];

_Atomic(struct node *) head = &nodes[0];

void __attribute__((constructor)) init(void)
{
    for (size_t i = 0; i < 99; ++i)
        nodes[i].next = &nodes[i + 1];
}

void *reader(void *arg)
{
    struct node *cur = atomic_load_explicit(&head, memory_order_relaxed);
    while (cur) {
        struct node *next = cur->next;

        if (atomic_compare_exchange_weak_explicit(&head, &cur, next, memory_order_release, memory_order_relaxed))
            cur = next;
    }
    return NULL;
}

void *writer(void *arg)
{
    while (atomic_load_explicit(&head, memory_order_acquire) != &nodes[50])
        ;
    nodes[49].next = (void *)-1; // invalid pointer but not null
    return NULL;
}

int main(void)
{
    pthread_t th;

    pthread_create(&th, NULL, writer, NULL);
    for (size_t i = 0; i < xxx; ++i) // xxx > 1
        pthread_create(&th, NULL, reader, NULL);

    while (1)
        pause();
    return 0;
}

Can the linked list be successfully traversed (that is, every node will be traversed, and no thread will access illegal memory)? I believe the answer is yes: although there may be many reader threads that read `node[49]`, only one thread will successfully change `head` from `&node[49]` to `&node[50]`. This modified's release order corresponds to the writer thread's acquire barrier, ensuring that when the writer thread writes to `node[49]`, the aforementioned reader threads will no longer read `node[49]`.

Now what if we change all the memory order in the case to be relaxed? Now the writer's reads and the reader's reads can indeed occur concurrently, but I think the linked list can still be successfully traversed. The writer will modify `node[49]` only if it sees `head == &node[50]`, which means that if `node[49]` is modified, then the fact that `head == &node[50]` has already occurred, even if the modification of `node[49]` may be global visible earlier than `head = &node[50]`.

Case2

This use case is very similar to use case 1, but each thread modifies the nodes it has traversed:

struct node {
    struct node *next;
} nodes[100];

_Atomic(struct node *) head = &nodes[0];

void __attribute__((constructor)) init(void)
{
    for (size_t i = 0; i < 99; ++i)
        nodes[i].next = &nodes[i + 1];
}

void *thread(void *arg)
{
    struct node *cur = atomic_load_explicit(&head, memory_order_relaxed);
    while (cur) {
        struct node *next = cur->next;

        if (atomic_compare_exchange_weak_explicit(&head, &cur, next, memory_order_relaxed, memory_order_relaxed)) {
            cur->next = (void *)-1; // invalid pointer but not null
            cur = next;
        }
    }
    return NULL;
}

int main(void)
{
    pthread_t th;

    for (size_t i = 0; i < xxx; ++i) // xxx > 1
        pthread_create(&th, NULL, thread, NULL);

    while (1)
        pause();
    return 0;
}

Again: Can the linked list be successfully traversed? I believe the answer is yes, even though all memory order is relaxed. This is actually the same as case1, and because the thread who modify a node and a thread who successfully read this node and modify the head to `node->next` are the same thread, and the same thread is not affected by memory reordering, there is no concurrency.

Case3

If you agree with the conclusion I gave in Case 1/2 above, then we can discuss Case 3! It's very simple, code:

int x = 1;
atomic_int y = 0;

void thread1(void)
{
    int tmp = 0, tmp_x = x;
    bool success = atomic_compare_exchange_strong_explicit(&y, &tmp, tmp_x, memory_order_relaxed, memory_order_relaxed);

    assert(!success || tmp_x == 1);
}

void thread2(void)
{
    int tmp = 0;

    atomic_compare_exchange_strong_explicit(&y, &tmp, 1, memory_order_relaxed, memory_order_relaxed);
    x = 2;
}

All memory orders are relaxed, so will assert() fail? Of course! The compiler will even reorder `x = 2` before cmpxchg, and this may happen:

1. thread2 writes `x = 2`

2. thread1 reads `tmp_x = 2`

3. thread 1successfully executes cmpxchg, changing y to 2.

4. assert failed

However, if we make a small modification:

Case4

int x = 1;
atomic_int y = 0;

void thread1(void)
{
    int tmp = 0, tmp_x = x;
    bool success = atomic_compare_exchange_strong_explicit(&y, &tmp, tmp_x, memory_order_relaxed, memory_order_relaxed);

    assert(!success || tmp_x == 1);
}

void thread2(void)
{
    int tmp = 0;

    if (atomic_compare_exchange_strong_explicit(&y, &tmp, 1, memory_order_relaxed, memory_order_relaxed) || tmp == 1)
        x = 2;
    else
        abort();
}

Will `abort()` or `assert()` be triggered?

Let's first analyze whether `abort()` will be triggered: If it is, it means that `cmpxchg` in thread2 failed to modify `y`, indicating that `y` was definitely modified by thread1, and the modified value was 2. However, since the `else` branch is entered, `x = 2` will not occur, so thread1 cannot see `x == 2`, and therefore `y` cannot be changed to 2, which is a contradiction.

Next, let's analyze whether `assert()` will be triggered: If `assert()` is triggered, it means that thread1 successfully changed `y` to a value other than 1. If this is the case, then the `if` condition in thread2 will fail, the modification `x = 2` will not occur, and thread1 cannot read a value other than 1 for `x`, which is also a contradiction.

However, test cases 3 and 4 are logically equivalent. If the if statement in thread 2 is always true, then we can remove the if statement.

c++ c atomic arm64 hardware-acceleration