..

Detecting a PS2 Emulator: GS Backpressure

This is the third entry in my series of detecting PlayStation 2 emulators.

  1. The VU0 Pipeline
  2. When 1 * X does not equal X

Changing it up from our past methods, we will be using the GS (Graphics Synthesizer) to detect if we are running under an emulator.

It’s really difficult to call the GS a GPU, it’s a lot more simple than what most would consider a GPU. The GS is just a rasteriser, the heavy lifting is written by the developer on VU1. If you’d like more information I recommend checking out this writeup of the PS2 hardware to see everything the GS can do.

Because the GS is simple it’s also quite fast. If you’re interested, I have a previous post benchmarking different methods of clearing a framebuffer on the GS linked here. Despite being fast, the GS isn’t instant. In this context, concept of the GS being busy drawing is called “backpressure”. Sony implemented some ways to properly synchronize the GS with the EE (main CPU). One of these methods is how we will calculate the GS processing time.

GS Signals

There are three types of GS Signals. You trigger these by writing to their respective register (showcased in the implementation).

  • SIGNAL
  • LABEL
  • FINISH

SIGNAL sets a bit in the GS status register and optionally generates an interrupt without waiting for the GS to finish the current drawing process. If SIGNAL is written to while a previous SIGNAL interrupt state hasn’t been cleared, the GS will halt until this interrupt status has been cleared.

LABEL writes to a special field in the SIGLBLID register. There is no interrupt for this signal.

FINISH is much like SIGNAL, where it sets a status bit and optionally an interrupt, but successive writes to FINISH do not halt the GS. Adding a write to the FINISH register at the end of your drawing packet allows the EE to know when the GS has finished your drawing process. This is going to be the signal we use to time the GS.

Here is an example of how using FINISH can syncronise the GS and the EE An animation of arrows going from the EE to the DMAC to the GS with a code snippit showing a while loop spinning on GS CSR. The last arrow is the 'finish' arrow. As soon as it enters the GS the code snippit exits the while loop.

Note that this animation is assuming that there is no GS backpressure, the transferred data is immediately consumed by the GS. Another note, the time it takes the data (arrows) to go from EE RAM to the GS is the DMA transfer time. DMA transfer time is required for emulating the majority of games, but is not important for us now.

The Implementation

The PS2 has a few ways to time things, the easiest is using COP0 register called Count. Count is a 32 bit counter that increments every clock cycle. If we get the count before we start our EE->GS DMA transfer, then the count after the GS has finished, we can see how many cycles the drawing process took.

Ideally we would be drawing large sprites with texture mapping, alpha blending, fogging, etc to really make the slowest possible drawing packet. To do this would require a bunch of boilerplate drawing environment setup. So for this example, we will avoid that and write to NOP the maximum amount of NLOOP times. The exact details are irrelevant, we are writing to NOP a lot.

Instead of breaking everything down, I opted to instead show you a heavily commented example of the implementation.

int isGSBackpressurePresent()
{
    // Allocate some memory on the heap for our GS packet
    qword_t* gs_packet = aligned_alloc(16, sizeof(qword_t) * (0x7FFF + 1));
    qword_t* q = gs_packet;

    // Set our TAG (header), inform the GS that we will send 0x7FFF qwords of
    // register writes
    PACK_GIFTAG(q, GIF_SET_TAG(0x7FFF, 1, 0, 0, 0, 1), GIF_REG_AD);
    q++;
    // Write 0x7FFE NOPS
    for(int i = 0; i < 0x7FFE; i++)
    {
        PACK_GIFTAG(q, 0, GS_REG_NOP);
        q++;
    }
    // Finish off the packet with a write to FINISH
    // Once the GS is done processing the above NOPs, it will process this
    // and set the FINISH status bit
    PACK_GIFTAG(q, 1, GS_REG_FINISH);
    q++;

    FlushCache(0);

    // Enable FINISH event / clear any past events
    *R_EE_GS_CSR = 0x2;

    // Send the packet via the DMAC
    *R_EE_D2_MADR = (u32)gs_packet;
    *R_EE_D2_QWC = (q - gs_packet);

    // Get our cycle count before we send the packet
    u32 start_count = GetCop0(COP0_COUNT);

    // Start the DMAC transfer
    *R_EE_D2_CHCR = 0x101;

    // PCSX2 emulates DMAC transfer time (unless instant DMA is enabled).
    // We don't care about that so we don't wait for the transfer to finish
    // while(!(*R_EE_D2_CHCR & 0x100));

    // Wait for the GS to process the final FINISH write by spinning
    // on the FINISH status bit
    while(!(*R_EE_GS_CSR & 0x2));

    u32 end_count = GetCop0(COP0_COUNT);

    free(gs_packet);

    // Takes around 400_000 to 427_000 cycles on a real PS2
    // Assume that any less than 300_000 cycles is due to lack of backpressure
    // emulation
    return (end_count - start_count) > 300000;
}

To extend our previous animation, this is what the above implementation does. An animation with arrows labeled with NOP going from the EE to the DMAC to the GS with a code snippit showing a while loop spinning on GS CSR. The last arrow is the 'finish' arrow. As soon as it enters the GS the code snippit exits the while loop.

Currently no emulators (PCSX2, Play!, DobieStation, hps2x64*) emulate this behaviour1.

I rate this one a 2.5/5 in difficulty. You have to build a giant packet, clog up the GS for a little bit, and spin the EE for a bunch of cycles. It’s not necessarily slow or very complex, but there are surely better alternatives.

Why Is This Not Emulated?

Emulating this perfectly would be quite laborious. As I said previously, an ideal test would push the GS to its limits by using all of the GS features. Fully emulating backpressure would require emulating DRAM latency and the delay imposed by page breaks, texture cache misses and penalties from using alpha blending, fogging, gouraud shading, and more. The effort to properly hardware test all of this and implement it would be effectively pointless. Very few games suffer from missing backpressure2. Not only that, the performance gained by not emulating backpressure is definitely worth it being missing.


  1. I can’t actually test hps2x64, but by my analysis of the code, I do not immediate see any GS cycle counting system. 

  2. I was informed of an experimental PCSX2 that estimates backpressure based on the size of a DMA packet. However, I’m not entirely sure if that fixes the games affected by the lack of backpressure emulation, nor have I had the chance to play around with it as of now.