Difficulty reading volatile memory
19:24 01 Oct 2025

I have code that has worked well when compiled with gcc but fails to work correctly when compiled with clang. I tracked down the issue to a read of volatile memory (on a microcontroller). I found by stepping in assembly that the memory associated with pma_address[i] is read into a register but the values set in the register are never used. So the handle_setup_packet is never called presumably because the optimizer determined that buffer will never change. By making buffer_out into a volatile uint8_t* the code does function correctly. But I don't see why it would need to be volatile in this case. I would think one volatile would be enough to convince the compiler that buffer is worth looking at. I'm wondering what my logic error is in the following code:

void read_pma(uint8_t byte_count, volatile uint16_t * pma_address, uint8_t *buffer_out) {
    int count_received_16 = (byte_count + 1) >> 1;
    for(int i=0; ibuffer[0].EP_RX, buffer);
    handle_setup_packet(reinterpret_cast(buffer));
}

Updates

I'm adding additional information to address the comments. From the suggestions in the comments I have learned more about strict aliasing and have found what I think is the main problem. The compiler is clang 20.1.0 provided by ARM: https://github.com/arm/arm-toolchain with the options -target=armv7m-none-eabi -O3 -flto. The assembly generated is quite large due to the loop unrolling it is doing, but generally there a number of sections like below, where it is loading multiple times out the pma_address memory location and writing repeatedly to a scratch register.

;         ((uint16_t *) buffer_out)[i] = pma_address[i];
10000776: f835 3c1e     ldrh    r3, [r5, #-30]
1000077a: f835 3c1c     ldrh    r3, [r5, #-28]
1000077e: f835 3c1a     ldrh    r3, [r5, #-26]
10000782: f835 3c18     ldrh    r3, [r5, #-24]
10000786: f835 3c16     ldrh    r3, [r5, #-22]
1000078a: f835 3c14     ldrh    r3, [r5, #-20]

I found several several ways that I can get it to generate the intended assembly below:

;         ((uint16_t *) buffer_out)[i] = pma_address[i];
 8004370: 885f          ldrh    r7, [r3, #0x2]
 8004372: eb00 0144     add.w   r1, r0, r4, lsl #1
 8004376: f8a1 704d     strh.w  r7, [r1, #0x4d]
 800437a: eb08 0744     add.w   r7, r8, r4, lsl #1
 800437e: f8b7 20a2     ldrh.w  r2, [r7, #0xa2]
;     for(int i=0; i

Where the assembly is both loading and storing the information obtained from pma_address. The solutions I found (without adding volatile) that result in a correctly operating program are 1) compile without -flto, 2) compile with -fno-strict-aliasing, and 3) change the code to address the aliasing violations.

void read_pma(uint8_t byte_count, volatile uint16_t * pma_address, uint8_t *buffer_out) {
    int count_received_16 = (byte_count + 1) >> 1;
    for(int i=0; i> 8) & 0xFF;
    }
}

fun() {
    read_pma(byte_count, USBPMA->buffer[0].EP_RX, buffer);
    usb_control_request ctrl_req {};
    std::memcpy(&ctrl_req, buffer, sizeof(usb_control_request));
    handle_setup_packet(&ctrl_req);
}

I tried but could not create a version of this code in Godbolt Compiler Explorer and could only obtain the unintended behavior in the context of the full program. The main thing that surprised me in the debugging is that with LTO on the compiler can fully ignore the handle_setup_packet function due to the aliasing violation. I tested this by placing invalid code in that function asm(" invalid_op;"); in that code which generated no errors when compiled in LTO mode.

c++ volatile strict-aliasing