This article is WIP and still contains incorrect stuff.

I bought an uxl3s with the goal of playing around with spinalhdl. Since i read about cache coherency protocols recently, i tought it would be nice to figure out how the cache coherency mechanism works on the SMP-version of VexRiscv.

At first, i was completely overwhelmed by the complexity of VexRiscv. Luckily, i got some help from its creator on the gitter channel. VexRiscv cores have a small write-trough cache (wikipedia explains it nicely), which is generally simpler than a writeback cache.

Keeping write-trough caches coherent means invalidating cache lines on other cores when one core writes data to these cache lines.

The banana memory bus

The BMB bus is a memory bus. it connects two types of participants together:

  • bus masters are entities that operate on the memory space (they write / read certain addresses), for instance CPU’s
  • bus slaves act upon these reads and write requests, for instance memories or GPIOs.

When there are multiple masters on the bus, an arbiter is needed to regulate which master has access to the bus at what time. This arbiter passes the read/write requests from masters to the slaves and passes the responses back. When the masters have a cache, extra communication signals are created between the masters and the arbiter for cache coherency purposes.

A short introduction about the banana memory bus can be read here. On the bottom of that readme, the signals that travel through the BMB are enumerated, in the documentation of the cmd and the rsp stream.

Cache invalidation streams in the banana memory bus (BMB)

When the optional cache invalidation feature is enabled (see here), 3 more streams are added to the BMB bus:

  • inv for cache invalidation requests (sent from the arbiter to all the masters)
  • ack for acknowledgement of cache invalidation (sent from the bus master to the arbiter)
  • sync to communicate to a bus master that the caches are in sync (or invalidated) after that master did a write request.

The ack stream has no signals apart from the valid and the ready signal of the Stream primitive.

The inv stream carries the following signals

Name Bitcount Description
all 1 The bus arbiters sets this to 1 when this inv request was caused by a write by this source
address addressWidth The starting address of the memory block to be invalidated
length invalidateLen The length of the memory block to be invalidated
source sourceWidth Transaction source ID, allow out of order completion between different sources, similar to AXI ID

The sync stream carries the following signals

Name Bitcount Description
source sourceWidth Transaction source ID, allow out of order completion between different sources, similar to AXI ID

The cache coherency protocol

Since we are dealing with a write through cache, there is only one way a cache line can get outdated or invalid: by the write to the cached memory location by another core. The cache coherency protocol hence has to ensure that when one core writes to the memory, the right cache lines are invalidated.

TODO add part about fence

This will happen as follows:

  • a bus master (eg a core) writes to a memory location
  • A BmbInvalidateMonitor sits on the bus, both as master and as slave, listening on the rsp stream for answers of write requests, and generates inv transactions from them.
  • the arbiter sends a inv request to all masters on the bus (with the information about the cache lines to be invalidated).
  • the bus masters are then supposed to invalidate their cache lines and respond with an ack signal. Note that the ack signal has no source transaction ID, as acks for multiple inv requests should never be received out of order.
  • when the BMB bus arbiter got acks from all masters, it will send an ack to the slaves. The BmbInvalidateMonitor listens as slave to the bus and receives the aggregated ack.
  • when the ack signal has been received from all masters, a sync transaction will be issued to the bus master that wrote to the memory location. A queue of outstanding sync requests is kept by the BmbInvalidateMonitor to handle multiple in-flight inv requests.

The connection from the cores to the data bus

In SaxonSoc the cores are connected to the BMB bus here. The relevant piece of code is:

    cores.produce(for(cpu <- cores.cpu) {
      interconnect.addConnection(
        cpu.iBus -> List(iBus.bmb),
        cpu.dBus -> List(dBusCoherent.bmb)
      )
    })

Here, all cores are connected to the same data bus, dBusCoherent. Note that the produce syntax is used here which is part of the generator framework. Basically, the variable cores here is not a list of cores but a list of core handles. Handles are basically area’s that will become available at some point. The scala code block inside the produce call will then only be called when the cores structure has been generated.