What's the reason why the waken signal cannot be lost by using an RMW operation compared to a pure load?
23:10 17 May 2026

Consider this example:

#include 
#include 
#include 
extern void block_wait();
extern void wake();

int main(){
  std::atomic counter = 0;
  std::jthread t1([&](){
    if(counter.fetch_sub(1,std::memory_order::relaxed) == 0){  // #0
       block_wait(); // #1
    }
  });
  std::jthread t2([&](){
    if(counter.fetch_add(1,std::memory_order::relaxed) == -1){ // #2
       wake(); // #3
    }
  });
}

In this example, block_wait and wake don't introduce data-race, and their functions are implied by their names. block_wait blocks the thread and waits for a wake signal to unblock the thread. wake wakes the thread that is blocked.

If #0 reads 0, #2 will be guaranteed to read -1 and execute wake() to wake #1. However, if we change #2 to a pure load as follows:

#include 
#include 
#include 
extern void block_wait();
extern void wake();

int main(){
  std::atomic counter = 0;
  std::jthread t1([&](){
    if(counter.fetch_sub(1,std::memory_order::relaxed) == 0){  // #0
       block_wait(); // #1
    }
  });
  std::jthread t2([&](){
    if(counter.load(std::memory_order::relaxed) == -1){ // #2
       wake(); // #3
    }
  });
}

Under the same condition: #0 reads 0 and executes #1 to block the thread, in this situation, #2 can also read 0 and doesn't execute #3.

What's the correct reason why #2 as an RMW operation doesn't miss to execute wake() compared to a pure load from the perspective of the C++ standard/abstract machine sense? I try to give three explanations; if the explanation is not right, please point out why.

  1. The RMW operation is less prone to read the stale value than a pure load.

For this explanation, as pointed out in other questions, people think that the concept of stale value is not useful. Anyway, if this concept is not useful, please provide a reasonable explanation of why this argument is incorrect.

  1. The RMW operation is more prone to read the later modification in the modification order than a pure load

Similarly, if this argument is incorrect, please point out why

  1. The C++ standard imposes a stricter restriction on what the value can read by an RMW operation than a pure load

I suppose this could be an acceptable argument, besides the coherence rule defined in [intro.races] p11-p14, there is an extra rule defined in [atomics.order] p10 to impose on what the value of an RMW can read; instead, a pure load doesn't have this restriction.

c++ language-lawyer atomic stdatomic