C++ Lock-Free Queue - Part II

C++ Lock-Free Queue - Part II

ยท

16 min read

In Part I of this blog series, we discussed how to write a lock-free queue for the case of a single producer thread and a single consumer thread (SPSC). This is handy when sending messages between dedicated threads, but we often want to have multiple threads producing and consuming objects.

For example, we may want to have multiple threads pushing messages to a single event-processing thread (MPSC), a single streaming thread pushing deserialization work to multiple task threads (SPMC), or an async task pool where any thread can push work for multiple task threads to execute (MPMC).

These cases will all use the general definition of the Queue class declared in Part I of this blog series (as opposed to the specialized one for SPSC queues).

Queue Class Members and Indices

For the SPSC queue, all we needed were single push and pop indices to keep track of where the threads were operating in the queue. With many threads pushing and popping this is much more difficult to do: each (e.g.) popping thread would need to know where every emplacing thread was currently working to make sure it is safe to pop at a given index. This is possible, but it is cumbersome to implement, and it results in many atomic reads (and cache misses!) of the status of the other threads every time we try to emplace or pop entries from the queue.

Instead, we'll use an atomic bit array, mSlotFullFlags, to keep track of whether each of the slots in mStorage is empty or full. The full implementation of this bit array is beyond the scope of this post, but it is basically just an array of std::atomic<unsigned long long>, where reading, setting, or clearing a bit in the array requires doing an atomic load(), fetch_or(), or fetch_and() of an entire 64-bit block of bits at a time.

The members of the Queue class (for non-SPSC queues) are shown below. Note that since the bit array already knows the queue capacity, we don't need to have a separate mCapacity variable like the SPSC queue had.

template <typename DataType, ThreadsPolicy Threading, WaitPolicy Waiting>
class Queue
{
//...
private:
    //STATIC MEMBERS
    static constexpr bool sOnePusher = (Threading == ThreadsPolicy::SPMC);
    static constexpr bool sOnePopper = (Threading == ThreadsPolicy::MPSC);

    //OVER-ALIGNED MEMBERS
    alignas(sAlignment) std::atomic<unsigned long long> mPushPopIndices = 0;
    alignas(sAlignment) std::atomic<int> mSize = 0;
    alignas(sAlignment) std::byte* mStorage = nullptr;

    //DEFAULT-ALIGNED MEMBERS
    //Not over-aligned as neither these nor the mStorage pointer change
    using UInt64 = unsigned long long; //ASSUMES 64 bits!
    AtomicBitArray<UInt64> mSlotFullFlags;
    int mIndexEnd = 0; //at this, we need to wrap indices around to zero
};

Also unlike the SPSC queue, we want to atomically load and compare-and-swap (CAS) both the push and the pop indices at the same time. This is needed to ensure that we are seeing a consistent state of the queue when reserving the next slot for pushing or popping. Otherwise, we might think that the queue is full or empty when it isn't, or we might even misinterpret where the threads are and end up with a data race! To synchronize both the push and pop indices at the same time we could use std::memory_order::seq_cst, but it is rather slow and there's a faster way to do it.

To atomically operate on both of the 32-bit push and pop indices at the same time, we'll combine them into a single 64-bit integer, mPushPopIndices. mPushPopIndices is used to store where pushing and popping threads should push and pop next, respectively. To make it easier to use, we'll define the QueueIndices struct below, which we can std::bit_cast to and from an unsigned long long for storage in mPushPopIndices:

//Alignas so can be used naturally in std::atomic
struct alignas(2*alignof(int)) QueueIndices
{
public:

    //Unwrapped values: Modulo capacity is (wrapped) array index. 
    int mUnwrappedPushIndex = 0;
    int mUnwrappedPopIndex = 0;
};

Due to the similarities between this and the SPSC specialization, the only significant difference between their Allocate() and Free() methods is the memory handling for the bit array. As such their content is not repeated here.

Emplacing One Object

The first thing we need to do is reserve a spot in mStorage to emplace at. The results of our reservation attempt are stored in the ReserveSlotResult struct:

struct ReserveSlotResult
{
    int mNumReserved = 0; //Defaults to failed
    int mWrappedBeginIndex = 0; //mUnwrappedBeginIndex % capacity
    int mUnwrappedBeginIndex = 0;
};

Reserving a slot to emplace when there is only one producing thread (SPMC) is relatively straightforward, and is shown below. All we need to do is (relaxed) load the current push index, and (acquire) check the bit flag at that index to see if the slot is empty. If the bit is still set, then that slot (and thus the queue itself) is full:

//Class template parameters removed for brevity
ReserveSlotResult Queue::Reserve_EmplaceSlot() requires (sOnePusher) //SPMC
{
    //Get the current indices
    //Index load relaxed: Only use the push index, and we're the only pusher
    auto cStoredIndices = mPushPopIndices.load(std::memory_order::relaxed);
    auto cIndices = std::bit_cast<QueueIndices>(cStoredIndices);
    int cPushIndex = cIndices.mUnwrappedPushIndex % capacity();

    //Make sure the slot is empty
    //Bit load acquire: Object creation cannot be reordered above this
    if(mSlotFullFlags[cPushIndex].Get_Bit(std::memory_order::acquire))
        return ReserveSlotResult{}; //The container is full

    //Don't advance the push index in mPushPopIndices until we finish pushing
    //That way we don't need to also check the bit flags when popping
    return {1, cPushIndex, cIndices.mUnwrappedPushIndex};
}

Note that if we only have a single producer (SPMC), no other threads are competing with us for an emplace slot. So we can wait to update mPushPopIndices until we've finished emplacing the object and have updated the bit flag. That way popping threads know that, for an SPMC queue, it is safe for them to pop up to the push index, because all pushes before then have been completed.

If there are multiple producers, we reserve a slot to emplace when we successfully increase the push index in mPushPopIndices. That way other threads will know they need to try to reserve the following slot instead. However, we can't just fetch_add() the index as we first need to make sure that there isn't a popping thread working at that location. Thus we need to use a CAS loop on the update of mPushPopIndices to retry until we succeed:

//Class template parameters removed for brevity
ReserveSlotResult Queue::Reserve_EmplaceSlot() requires (!sOnePusher) //!SPMC
{
    int cPushIndex;
    QueueIndices cIndices;
    UInt64 cNewStoredIndices;

    //Index load: MPSC relaxed (sync nothing), MPMC acquire (sync bit flags)
    static constexpr auto sIndexLoadOrder = sOnePopper ? 
        std::memory_order::relaxed : std::memory_order::acquire;

    //CAS success: MPSC acquire: prevent object creation reordering above
        //MPMC acq_rel: Also don't want bit flag check to go below CAS.
    static constexpr auto sCASOrder = sOnePopper ? 
        std::memory_order::acquire : std::memory_order::acq_rel;

    //Reserve a slot to push to
    auto cStoredIndices = mPushPopIndices.load(sIndexLoadOrder);
    do
    {
        //Load current indices
        cIndices = std::bit_cast<QueueIndices>(cStoredIndices);
        cPushIndex = cIndices.mUnwrappedPushIndex % capacity();

        //Guard against catching up to the pop index (also handles MPSC)
        auto cDiff = cIndices.mUnwrappedPushIndex - 
            cIndices.mUnwrappedPopIndex;
        if((cDiff == capacity()) || (cDiff == (capacity() - mIndexEnd)))
            return ReserveSlotResult{}; //Full. 2nd check handled wrap-around

        //MPMC only: Guard against popper at this index
        if constexpr (!sOnePopper)
        {
            if(mSlotFullFlags[cPushIndex].Get_Bit(std::memory_order::acquire))
            {
                //Slot is full. Why? Load indices again
                cNewStoredIndices = mPushPopIndices.load(sIndexLoadOrder);
                auto cNew = std::bit_cast<QueueIndices>(cNewStoredIndices);
                if(cNew.mUnwrappedPushIndex == cIndices.mUnwrappedPushIndex)
                    return ReserveSlotResult{}; //Push index unchanged: Full

                //Push index changed, try again
                cStoredIndices = cNewStoredIndices;
                continue;
            }
        }

        //++ push index
        QueueIndices cNewIndices{Bump_Index(cIndices.mUnwrappedPushIndex), 
            cIndices.mUnwrappedPopIndex};
        cNewStoredIndices = std::bit_cast<UInt64>(cNewIndices);
    }
    while(!mPushPopIndices.compare_exchange_weak(cStoredIndices, 
        cNewStoredIndices, sCASOrder, sIndexLoadOrder));

    return {1, cPushIndex, cIndices.mUnwrappedPushIndex};
}

For an MPSC queue, we don't need to check any bit flags because we can just check the pop index. We also know it has finished popping at all of the indices before it, because we don't update the pop index until it's finished (like for emplacing). Also note that none of the other pushing threads can be operating at a location before the pop index, unless they succeed at the CAS before we do.

For MPMC queues, we also (acquire) check the bit flag stored in mSlotFullFlags. If it's clear then that means the slot is empty and we can try to reserve this push index. Otherwise the slot is full, and we reload the indices to determine why: either someone else beat us to this slot and already pushed here (the indices changed, so we just retry), or the container itself is full. That's why the bit flag load uses std::memory_order::acquire: to synchronize the mPushPopIndices associated with its change in order to make this determination.

Finally, note that Bump_Index() is identical to that for the SPSC specialization and is thus not shown.

Now that we have reserved a slot to emplace at, the rest is again straightforward: we just emplace an object in that slot, and set the bit flag at that index. For SPMC queues we haven't bumped the push index yet, so we then do that here in Advance_StoredIndex(). Finally we increase mSize, as was covered in Part I of this blog post:

//Class template parameters removed for brevity
template <typename... ArgumentTypes>
bool Queue::Emplace(ArgumentTypes&&... aArguments)
{
    //Reserve push slot
    auto [cNumReserved, cPushIndex, cUnwrappedIdx] = Reserve_EmplaceSlot();
    if(cNumReserved == 0)
        return false; //Failed, queue full

    //It's empty. Emplace data
    auto cAddress = mStorage + cPushIndex * sizeof(DataType);
    new (cAddress) DataType(std::forward<ArgumentTypes>(aArguments)...);

    //Set bit to notify popppers this slot is ready
    //Bit set: SPMC relaxed (Advance_StoredIndex() syncs), else release: 
        //Syncs object creation, prevents them reordering below this
    static constexpr auto sOrder = sOnePusher ? std::memory_order::relaxed 
        : std::memory_order::release;
    mSlotFullFlags[cPushIndex].Set_Bit(sOrder);

    //SPMC: ++ push index: we're done, poppers don't have to check bit
    if constexpr (sOnePusher)
        Advance_StoredIndex(cUnwrappedIdx, 1);

    //Update the size
    Increase_Size(1);
    return true;
}

The definition of Advance_StoredIndex() is shown below. It's only called for SPMC queues to advance the push index, and for MPSC queues to advance the pop index. Since there is only one thread modifying the index we are advancing, we can modify mPushPopIndices with a single fetch_add() operation:

//Class template parameters removed for brevity
void Queue::Advance_StoredIndex(int aUnwrappedIndex, int aNumToAdvance)
{
    auto cNewUnwrappedIndex = aUnwrappedIndex + aNumToAdvance;
    cNewUnwrappedIndex -= (cNewUnwrappedIndex >= mIndexEnd) ? mIndexEnd : 0;

    //Must convert to stored type before subtracting
    UInt64 cOriginal, cNew;
    if constexpr (sOnePusher) //SPMC: Advance push index
    {
        //Zero: Pop index irrelevant, we're subtracting them
        cOriginal = std::bit_cast<UInt64>(QueueIndices{aUnwrappedIndex, 0});
        cNew = std::bit_cast<UInt64>(QueueIndices{cNewUnwrappedIndex, 0});
    }
    else //MPSC: Advance pop index
    {
        //Zero: Push index irrelevant, we're subtracting them
        cOriginal = std::bit_cast<UInt64>(QueueIndices{0, aUnwrappedIndex});
        cNew = std::bit_cast<UInt64>(QueueIndices{0, cNewUnwrappedIndex});
    }

    //Advance the stored index
    auto cToAdd = cNew - cOriginal; //This may wrap-around, is intentional
    mPushPopIndices.fetch_add(cToAdd, std::memory_order::release);
}

Most of the code here is for dealing with the fact that we combined the push and pop indices into a single variable. The fetch_add() has release semantics to make sure that the bit flag changes we made earlier are synchronized to threads that acquire this new mPushPopIndices value.

Emplacing Multiple Objects

The logic for emplacing multiple objects is just an extension to that of emplacing a single object, similar to how it was for the SPSC queue. For SPMC queues this is again relatively straightforward and is shown below. The only real difference compared to emplacing a single object is that here we potentially check many bit flags (Find_AvailableSlots()) to see if we can find enough available slots to push everything:

//Class template parameters removed for brevity
ReserveSlotResult Queue::Reserve_MultipleEmplaceSlots(int aNumToReserve)
    requires (sOnePusher) //SPMC
{
    //Get the current indices
    //Index load relaxed: Nothing can reorder before this, and 
    //acquire wouldn't sync bit flags! (cleared after pop index reserved)
    auto cStoredIndices = mPushPopIndices.load(std::memory_order::relaxed);
    auto cIndices = std::bit_cast<QueueIndices>(cStoredIndices);
    int cPushIndex = cIndices.mUnwrappedPushIndex % capacity();

    //Don't search beyond the pop index
    auto cMaxSlotsAvailable = cIndices.mUnwrappedPopIndex + capacity() - 
        cIndices.mUnwrappedPushIndex;
    cMaxSlotsAvailable -= (cMaxSlotsAvailable >= mIndexEnd) ? mIndexEnd : 0;
    auto cNumToSearch = std::min(aNumToReserve, cMaxSlotsAvailable);

    //Check bit flags to see how many we can push
    auto cNumToReserve = Find_AvailableSlots<false>(cPushIndex, 
        cNumToSearch);

    //Don't advance the push index in mPushPopIndices until finish pushing!
    //That way we don't need to also check the bit flags when popping
    return {cNumToReserve, cPushIndex, cIndices.mUnwrappedPushIndex};
}

The boolean template parameter to Find_AvailableSlots() indicates that we should search for cleared bits (false) instead of set bits (true). The implementation of this function is left as an exercise to the reader ๐Ÿ™‚, but is basically just a series of std::memory_order::acquire atomic loads until we either find enough slots to reserve, or reach a bit that has been set. For non-MPMC queues this load is an acquire so that object creation (or destruction during pop) is not reordered before this check. For MPMC queues we need to use acquire semantics to synchronize the corresponding mPushPopIndices.

For multiple producers, the function for reserving multiple emplace slots is also an extension of single-object reservation: using a CAS-loop to check the pop index and bit flags before trying to make the reservation:

//Class template parameters removed for brevity
ReserveSlotResult Queue::Reserve_MultipleEmplaceSlots(int aNumToReserve)
    requires (!sOnePusher) //!SPMC
{
    int cNumToReserve = 0;
    int cNewIndex = 0;
    QueueIndices cIndices;

    //Index load: MPSC relaxed (sync nothing), MPMC acquire (sync bit flags)
    static constexpr auto sIndexLoadOrder = sOnePopper ? 
        std::memory_order::relaxed : std::memory_order::acquire;

    //CAS success: MPSC acquire: prevent object creation reordering above
        //MPMC acq_rel: Also don't want bit flag check to go below CAS.
    static constexpr auto sCASOrder = sOnePopper ? 
        std::memory_order::acquire : std::memory_order::acq_rel;

    //Reserve slots to push to
    auto cStoredIndices = mPushPopIndices.load(sIndexLoadOrder);
    do
    {
        //Load current indices
        cIndices = std::bit_cast<QueueIndices>(cStoredIndices);

        //Find max # slots could possibly be available: Up to the pop index
        auto cMaxAvailable = cIndices.mUnwrappedPopIndex + capacity() - 
            cIndices.mUnwrappedPushIndex;
        cMaxAvailable -= (cMaxAvailable >= mIndexEnd) ? mIndexEnd : 0;
        auto cMaxNumCouldReserve = std::min(aNumToReserve, cMaxAvailable);

        //Guard against the container being full
        if(cMaxNumCouldReserve == 0)
            return ReserveSlotResult(); //Nothing available

        //Find how many slots are available to push to
        if constexpr (sOnePopper) //MPSC: Up to the pop index is available
            cNumToReserve = cMaxNumCouldReserve;
        else //MPMC only: Can push up to the slowest popper
        {
            /* CHECK BIT FLAGS HERE, SEE CUTOUT BELOW */
        }

        //Advance push index
        cNewIndex = Increase_Index(cIndices.mUnwrappedPushIndex, 
            cNumToReserve);
        QueueIndices cNewIndices{cNewIndex, cIndices.mUnwrappedPopIndex};
        auto cNewStoredIndices = std::bit_cast<UInt64>(cNewIndices);
    }
    while(!mPushPopIndices.compare_exchange_weak(cStoredIndices, 
        cNewStoredIndices, sCASOrder, sIndexLoadOrder));

    auto cPushIndex = cIndices.mUnwrappedPushIndex % capacity();
    return {cNumToReserve, cPushIndex, cIndices.mUnwrappedPushIndex};
}

However the code for checking the bit flags is a bit complicated, so it's been cut out of the above and is found in the code snippet below. The reason it's complicated is that, if we're not careful, a thread trying to emplace (e.g.) several hundred entries at once may need to (atomically) read so many bit flags that it's too slow to compete with other pushers that are only trying to reserve one slot at a time.

Suppose one thread is trying to reserve 200 slots at once to emplace at. It may find 200 free slots, but then another thread might come in and reserve a single slot first. Once the first thread fails it's CAS exchange and loads the new value from mPushPopIndices, it knows that 199 of the slots it found are still free.

So instead of starting over, on following iterations through the loop we'll use our previous knowledge of the bit flags to significantly reduce the amount we need to do. This allows us to compete with threads doing less work so that we're much more likely to succeed with our reservation. This code is listed below:

//CUTOUT: Check bit flags, find how many empty slots we can push to

//Find how many slots are still available from last attempt: 
int cNumStillAvailable = 0;
bool cFirstAttempt = (cNumToReserve == 0);
if(!cFirstAttempt)
{
    //Empty slots beyond new push index are still available
    cNumStillAvailable = cNewIndex - cIndices.mUnwrappedPushIndex; 
    if(cNumStillAvailable < 0)
        cNumStillAvailable += mIndexEnd; //Handle index wrap-around

    //Handle # reserved last iteration > # slots we tried to reserve
    if((cNumStillAvailable > cMaxNumCouldReserve)
        cNumStillAvailable = 0; //Start over
}

//Search for (more) clear slots
auto cSearchFromIndex = cIndices.mUnwrappedPushIndex + cNumStillAvailable;
cSearchFromIndex %= capacity();
auto cNumToSearch = cMaxNumCouldReserve - cNumStillAvailable;
auto cNumToReserve_NewlyFound = Find_AvailableSlots<false>(cSearchFromIndex, 
    cNumToSearch);

//Guard against the container being full
if(cNumToReserve_NewlyFound == 0)
{
    //Read indices to check if another thread has already pushed here. 
    auto cNewStoredIndices = mPushPopIndices.load(sIndexLoadOrder);
    auto cNewIndices = std::bit_cast<QueueIndices>(cNewStoredIndices);
    if(cNewIndices.mUnwrappedPushIndex != cIndices.mUnwrappedPushIndex)
    {
        cStoredIndices = cNewStoredIndices;
        continue; //Push index changed, try again
    }

    //Push index unchanged: Full: a thread still needs to pop here.
    if(cNumStillAvailable == 0)
        return ReserveSlotResult(); //Nothing available

    //Nearly full, but can still push to the slots we found last attempt
}

//Not full (yet), update # to reserve
cNumToReserve = cNumStillAvailable + cNumToReserve_NewlyFound;

Now that we've reserved as many slots as we can, we can just emplace the objects, set the bit flags, update the push index (for SPMC only), and then update the size. This is very similar to Emplace(), except since we're emplacing many entries we have to handle wrapping back around to the front of the queue:

//Class template parameters removed for brevity
template <typename InputType>
Span<InputType> Queue::Emplace_Multiple(const Span<InputType>& aSpan)
{
    //Reserve push slots
    const auto cCapacity = capacity();
    auto [cNumReserved, cReserveBeginIndex, cUnwrappedIndex] = 
        Reserve_MultipleEmplaceSlots(aSpan.size());

    if(cNumReserved == 0)
        return aSpan; //Queue is full

    //Helper function
    auto cPushAndFlag = [&](InputType* aInputData, DataType* aPushTo, 
        int aNumToReserve, const auto& aFlagsBegin)
    {
        std::uninitialized_move_n(aInputData, aNumToReserve, aPushTo);

        //Bit set: SPMC relaxed (Advance_StoredIndex() syncs), else release: 
        //Syncs object creation, prevents them reordering below this
        static constexpr auto sOrder = sOnePusher ? 
            std::memory_order::relaxed : std::memory_order::release;

        //Set bits to notify popppers these slots are ready
        auto cFlagsEnd = aFlagsBegin + aNumToReserve;
        Set_BitRange(aFlagsBegin, cFlagsEnd, sOrder);
    };

    //Setup push/flag
    auto cAddress = mStorage + (cReserveBeginIndex * sizeof(DataType));
    auto cPushToData = std::launder(reinterpret_cast<DataType*>(cAddress));
    auto cDistanceBeyond = (cReserveBeginIndex + cNumReserved) - cCapacity;
    auto cFlagsBegin = std::begin(mSlotFullFlags);
    auto cModifyFlagsBegin = cFlagsBegin + cReserveBeginIndex;
    const auto cSpanData = aSpan.data();

    //Push and set flags
    if(cDistanceBeyond <= 0)
        cPushAndFlag(cSpanData, cPushToData, cNumReserved, cModifyFlagsBegin);
    else //Push wraps around to beginning of queue, do in 2 parts
    {
        auto cInitialLength = cNumReserved - cDistanceBeyond;
        cPushAndFlag(cSpanData, cPushToData, cInitialLength, 
            cModifyFlagsBegin);

        cPushToData = std::launder(reinterpret_cast<DataType*>(mStorage));
        auto cToPush = cSpanData + cInitialLength;
        cPushAndFlag(cToPush, cPushToData, cDistanceBeyond, cFlagsBegin);
    }

    //SPMC: ++ push index: we're done, poppers don't have to check bit
    if constexpr (sOnePusher)
        Advance_StoredIndex(cUnwrappedIndex, cNumReserved);

    //Update the size
    Increase_Size(cNumReserved);

    //Return unfinished entries
    auto cRemainingBegin = cSpanData + cNumReserved;
    return Span<InputType>(cRemainingBegin, aSpan.size() - cNumReserved);
}

Popping Objects

That's it for emplacing objects, but what about for popping? Fortunately, the methods for popping objects are essentially mirror images of those for emplacing objects. All we have to do is change from pushing to popping, from setting bits to clearing bits, from checking for queue full to queue empty, etc.

That is a lot of work to do, but since there isn't anything really new to be discussed I'll just omit it instead of doubling the length of this blog post.

Awaiting

Fortunately, the code needed for waiting to emplace or pop entries (as well as that for handling mSize) is nearly identical to that of the SPSC queue discussed in Part I of this blog series, so it is not repeated here.

However, there is one very critical difference. When notifying popping threads to wake up, we need to wake them not only if mSize was zero, but also if it used to be negative! Similarly, when notifying pushing threads, we also need to wake them if mSize was greater than the capacity!

How is it possible to have these kinds of values for mSize? Suppose mSize is zero, and a pusher emplaces an object. Before it has a chance to increase mSize, a popper could come in and remove the object from the queue and decrease mSize to -1.

OK, well once the pusher finishes mSize should go back to zero, and we wouldn't want to wake any threads if the queue is empty anyway, right? Well, what if instead of pushing one object, this thread pushed 100 objects? Then instead of the queue being empty and mSize being zero, there are 99 objects in the queue and our other popping threads are all still sleeping! In fact, these threads may never get a chance to wake again, unless we guard against these out-of-bounds values while notifying:

//In End_PopWaiting() and Increase_Size():
if(cPriorSize <= 0)
    mSize.notify_all(); //Wake popping threads

//In Decrease_Size():
if (cActualSize >= capacity())
    mSize.notify_all(); //Wake emplacing threads

Conclusion

Queues are extremely useful in multithreaded programs, and are commonly used for passing messages or work between threads. Using a lock-free queue is paramount for having good performance, especially as the number of threads increases.

By using a fixed-size array, we can save a lot of time by not dynamically allocating every node in our queue. Also, we're able to push and pop many entries at once significantly faster with an array than we could by pointer-hopping through a linked-list.

I should note that while I have written this code for an abstract machine, I have only tested this code on x86. I have not been able to test this yet on an ARM processor, which has a more relaxed memory model. Thus there may be subtle problems with the chosen atomic memory orders that I haven't found yet. If you're targeting a platform using ARM (e.g. Android, iOS), review the memory ordering carefully and please let me know if you encounter any problems!

License & Disclaimers

Support This Blog!

Did you find this article valuable?

Support C++ Professional Game Engine Programming by becoming a sponsor. Any amount is appreciated!

ย