..

Palette shifting with the GS

An exploration into palette shifting on the PS2 using indexed textures, CLUT, and DMA chains.

The final source is located on my GitHub here.

I found a post long ago about an effect 8-bit game developers would use for animations.

Randomly remembering it, I decided I will try to do the same using the indexed texture and CLUT feature of the PlayStation 2’s GS.
* Note that CLUT is an acronym for Colour Look Up Table and is used interchangeably with ‘palette’

First off, I needed something to animate. Inspired by a piece of homebrew (created 1 month after me o_o) I went with a flame.

Fully aware that I wouldn’t be able to recreate something that cool, I went to GIMP and ended up with this.

a pixelated flame

Pretty snazzy, right?

Now, if you’re unsure what a colour palette is, I’ll do my best to explain. If you do know, feel free to skip this next segment.

Palette Basics

Let’s assume that we are using texture format PSMCT32. 8 bits for A, B, G and R each. That’s 32 bits, or 4 bytes per texel. A diagram of the PSMCT32 format. (8 bits for each of ABGR)

If we do the math, a 512x512 texel texture will take ~1 MB of memory. Now for the PS2, that’s a quarter of the VRAM the GS has, ouch.

Thankfully though we can change how our texels are interpreted. Instead of each texel being a packed ARGB value, we can instead treat each texel value as an index into a table. That table is otherwise known as a palette.

Here is some spreadsheet artwork to aid my explaination A diagram of the PSMCT32 format. (8 bits for each of ABGR)

On the left hand side is our palette. It currently only has indexes 0->4 loaded with a colour. On the right hand side is our texture. Each texel (or cell in this case) references a colour in our palette.

Now, the PS2 has two different types of indexed textures, 4 bit and 8 bit. What one should be used? It depends on how many colours you need. An indexed texture can only reference 16 (4 bit) or 256 (8 bit) colours depending on the mode you use.

If we utilise a 4bit indexed format, our 512x512 texture mentioned above is now 131KB instead (with another 256 bytes for the palette)*.

Hopefully you have a little bit more of an understanding of how a palette and indexed texture works.

* This section completely ignores the existence of CT24 and CT16 colour formats for simplicity.


The Implementation

To create the indexed texture, I simply used GIMPs indexed texture mode. Because there is no native support in GIMP for a 4 bit mode, I had to write a python script to convert the 8 bit format into a 4 bit one.

With some manual labour out of the way, I had two arrays of data.

My palette.

u16 flame_clut[16] __attribute__ ((aligned (16))) =
{
	0x9CE7,
	0x811F,
	0x81DF,
	0x829F,
	0xA69F,
	0xA69F,
	0xB2DF,
	0xFFFF,
	0x0000,
	0x0000,
	0x0000,
	0x0000,
	0x0000,
	0x0000,
	0x0000,
	0x0000
};

My texture.

static u8 flame_itex[] __attribute__ ((aligned (16))) =
{
	0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 
	0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 
	0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 
	0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 
	0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 
	0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 
	0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 
	0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 
	0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 
	0xBB, 0x11, 0x11, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
	...

Note: There are alignment constraints when using the PS2s DMAC. Make sure your data is at least QWORD (128 bit) aligned!

After some display initialization and vram allocations, I was ready to upload my CLUT, texture, and my draw.

a pixelated flame

Well, it’s a start.

Now it’s time to rotate the clut. All this does is shift every colour backwards every odd frame.

void rotate_palette()
{
	u16 *colour_palette = (u16 *)UNCACHED_SEG(flame_clut);
	u16 first_index = colour_palette[0];
	for (s32 i = 1; i < 0xF; i++)
		colour_palette[i] = colour_palette[i + 1];
	colour_palette[0xF] = first_index;
}

That’s cool, but not what I wanted. Instead of rotating the entire palette, let’s only rotate the colours used inside of the flame.

// Rotates the palette from indices 1 to 6
void rotate_palette()
{
	u16 *colour_palette = (u16 *)UNCACHED_SEG(flame_clut);
	u16 temp = colour_palette[0];
	for (s32 i = 0; i < 0xF; i++)
		colour_palette[i] = colour_palette[i + 1];
	colour_palette[0xF] = temp;
}

That’s good enough for me. But flames usually move right? Using GIMP’s smudge tool, I generated 3 different textures (all using the same palette).

a pixelated flame leaning left a pixelated flame, alike the original one a pixelated flame leaning right

After some testing, swapping between these textures every 5 frames provided some pretty good results.

The Technical Part

To keep this post somewhat friendly, I opted to leave the heavy PS2 specific details out of the main portion. If you’re not interested in how I used the VIF, DMAC GIF and GS, or if you don’t know what any of those mean, this part might get boring.

There are 3 data paths to the GS.

  • PATH3 is directly from the EE to the GS
  • PATH2 is from the EE to the VIF1 and to the GS
  • PATH1 is from VU1 memory to the GS

PATH3 is the easiest. Build a GIF packet and send it to the GIF via DMA channel 2.
* Or directly write to the FIFO, which I’m not going to bother entertaining in this post.

At the time of writing this code, I did not have much experience using PATH2. I decided that I’m going to finally use PATH2 for something.
An overview of the 3 data paths to the PS2 GS
Credit: Guilherme Lampert

My design ended up becoming this

Build sprite packet
Is odd frame?
Rotate palette
Is 5th frame?
Rotate texture
Kick sprite packet via PATH3
No
No
Has the palette or texture been rotated?
Upload palette and or texture via VIF1 PATH2
Wait for VSYNC
Yes
No
Yes
Yes

The sprite packet is actually two sprites.

// (psuedo code)
GIF_SET_TAG(9, 1, GIF_PRE_ENABLE, GS_PRIM_SPRITE, GIF_FLG_PACKED, 1)
GS_SET_RGBAQ(0x00, 0x00, 0x00, 0x7f, 0x00) 
GS_SET_XYZ(0 << 4, 0 << 4, 0)
// Sprite kicked, clears the framebuffer with black
GS_SET_XYZ(640 << 4, 448 << 4, 0)

// Set our primitive to a sprite, textured, using UVs, Alpha blending enabled
GS_SET_PRIM(GS_PRIM_SPRITE, 0, 1, 0, 1, 0, 1, 0, 0) 

// Point to our texture and clut buffer
GS_SET_TEX0(g_texptr / 64, 1, GS_PSM_4, 5, 5, 1, 1, g_clutptr / 64, GS_PSM_16, 1, 0, 1)
GS_SET_UV(0, 0)
GS_SET_XYZ(0, 0, 0)
GS_SET_UV(32 << 4, 32 << 4)
// Sprite kicked, UV 0,0 to 32,32 will be mapped to 0,0 -> 640,448
// The GS takes care of mapping the texture to the clut buffer
GS_SET_XYZ(640 << 4, 448 << 4, 0)

Kicking this is very simple.

  • Flush the cache!
  • Set DMAC channel 2 MADR to the start of the packet
  • Set DMAC channel 2 QWC to the length of the packet in qwords
  • Set DMAC channel 2 CHCR to 0x100 to start the transfer

Or just use the dma lib and call

dma_channel_send_normal(DMA_CHANNEL_GIF, draw_packet, q - draw_packet, 0, 0);

I’ll skip explaining the rotating parts in depth. You’ve already read above what rotating the palette involves. Rotating the texture is simply changing a global pointer between the 3 texture arrays. Unlike the palette, the textures are not modified during runtime.

Now, uploading the palette and texture. I utilised something called DMA chains. Instead of transferring one big block of memory, starting from MADR and ending at MADR + QWC, our DMAC can be smart and do some heavy lifting for us.

I’ve simplified this code as best as I could. If it looks intimidating, you’d be right, it is.

// Tell the DMAC to send the next 7 QW in this packet to the VIF
DMATAG_CNT(q, 7, 0, 0, 0); 
	// Tell the VIF to send the next 8 QW to the GIF
	PACK_VIFTAG(q, VIF_CMD_NOP, VIF_CMD_NOP, VIF_CMD_NOP, (VIF_CMD_DIRECT << 24) | 8); 

		// Tell the GIF that we will be 'blit'ing a texture
		GIF_SET_TAG(4, 0, 0, 0, GIF_FLG_PACKED, 1)
		GS_SET_BITBLTBUF(0, 0, 0, g_clutptr >> 6, 1, GS_PSM_16)
		GS_SET_TRXPOS(0, 0, 0, 0, 0)
		GS_SET_TRXREG(16, 1)
		// As soon as we set TRXDIR, the GIF treats the incoming data as a texture
		GS_SET_TRXDIR(0)

		// Tell the GIF that the next 2 QW will be in IMAGE format
		GIF_SET_TAG(2, 0, 0, 0, GIF_FLG_IMAGE, 0)

// This is where things get interesting
// Instead of copying the data at flame_clut into this packet (which can be expensive in memory and execution time the larger your data is)
// The DMAC engine will start copying data (2 qw) from starting from flame_clut
// It's REFerencing to flame_clut
DMATAG_REF(q, 2, (uiptr)flame_clut, 0, 0, 0); 

// I've excluded uploading the texture in this sample. It is essentially the same exact thing.

To rephrase; if your texture is a huge amount of QW, you can simply reference it instead of copying it all into your dma packet.

In another effort in explaining the DMAC chain, here is a diagram of what is hapening above.

Numbered is the data that gets sent to the GIF. An example of how the DMAC can point to data in the middle of a packet

Kicking this is even easier.

  • Flush the cache!
  • Set DMAC channel 2 TADR to the start of the packet
  • Set DMAC channel 2 CHCR to 0x104 to start the tag trasnfer

Or just use the dma lib and call

dma_channel_send_chain(DMA_CHANNEL_GIF, draw_packet, q - draw_packet, 0, 0);

Final Thoughts

If you’ve made it this far, thank you! I’m still trying to figure out how I want these posts to be written. I don’t want to write things nobody understands but I also want to keep it interesting without culling too many details.

If you’d like to play around with the code, it’s located on my GitHub here.

I’m happy to say the original iteration of the “VIF Flame” inspired Daniel Santos to implement it into their GTAV PS2 recreation. You can check out the demo on their youtube channel.

GTA V Legacy Alpha Animated Water Effect
Link here if the above doesn’t work https://www.youtube.com/watch?v=9TLH5kXNGZw