Palette shifting with the GS
An exploration into palette shifting on the PS2 using indexed textures, CLUT, and DMA chains.
The final source is located on my GitHub here.
I found a post long ago about an effect 8-bit game developers would use for animations.
Randomly remembering it, I decided I will try to do the same using the indexed texture and CLUT feature of the PlayStation 2’s GS.
* Note that CLUT is an acronym for Colour Look Up Table and is used interchangeably with ‘palette’
First off, I needed something to animate. Inspired by a piece of homebrew (created 1 month after me o_o) I went with a flame.
Fully aware that I wouldn’t be able to recreate something that cool, I went to GIMP and ended up with this.
Pretty snazzy, right?
Now, if you’re unsure what a colour palette is, I’ll do my best to explain. If you do know, feel free to skip this next segment.
Palette Basics
Let’s assume that we are using texture format PSMCT32. 8 bits for A, B, G and R each. That’s 32 bits, or 4 bytes per texel.
If we do the math, a 512x512 texel texture will take ~1 MB of memory. Now for the PS2, that’s a quarter of the VRAM the GS has, ouch.
Thankfully though we can change how our texels are interpreted. Instead of each texel being a packed ARGB value, we can instead treat each texel value as an index into a table. That table is otherwise known as a palette.
Here is some spreadsheet artwork to aid my explaination
On the left hand side is our palette. It currently only has indexes 0->4 loaded with a colour. On the right hand side is our texture. Each texel (or cell in this case) references a colour in our palette.
Now, the PS2 has two different types of indexed textures, 4 bit and 8 bit. What one should be used? It depends on how many colours you need. An indexed texture can only reference 16 (4 bit) or 256 (8 bit) colours depending on the mode you use.
If we utilise a 4bit indexed format, our 512x512 texture mentioned above is now 131KB instead (with another 256 bytes for the palette)*.
Hopefully you have a little bit more of an understanding of how a palette and indexed texture works.
* This section completely ignores the existence of CT24 and CT16 colour formats for simplicity.
The Implementation
To create the indexed texture, I simply used GIMPs indexed texture mode. Because there is no native support in GIMP for a 4 bit mode, I had to write a python script to convert the 8 bit format into a 4 bit one.
With some manual labour out of the way, I had two arrays of data.
My palette.
u16 flame_clut[16] __attribute__ ((aligned (16))) =
{
0x9CE7,
0x811F,
0x81DF,
0x829F,
0xA69F,
0xA69F,
0xB2DF,
0xFFFF,
0x0000,
0x0000,
0x0000,
0x0000,
0x0000,
0x0000,
0x0000,
0x0000
};
My texture.
static u8 flame_itex[] __attribute__ ((aligned (16))) =
{
0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
0xBB, 0x11, 0x11, 0xBB, 0xBB, 0xBB, 0xBB, 0xBB,
...
Note: There are alignment constraints when using the PS2s DMAC. Make sure your data is at least QWORD (128 bit) aligned!
After some display initialization and vram allocations, I was ready to upload my CLUT, texture, and my draw.
Well, it’s a start.
Now it’s time to rotate the clut. All this does is shift every colour backwards every odd frame.
void rotate_palette()
{
u16 *colour_palette = (u16 *)UNCACHED_SEG(flame_clut);
u16 first_index = colour_palette[0];
for (s32 i = 1; i < 0xF; i++)
colour_palette[i] = colour_palette[i + 1];
colour_palette[0xF] = first_index;
}
That’s cool, but not what I wanted. Instead of rotating the entire palette, let’s only rotate the colours used inside of the flame.
// Rotates the palette from indices 1 to 6
void rotate_palette()
{
u16 *colour_palette = (u16 *)UNCACHED_SEG(flame_clut);
u16 temp = colour_palette[0];
for (s32 i = 0; i < 0xF; i++)
colour_palette[i] = colour_palette[i + 1];
colour_palette[0xF] = temp;
}
That’s good enough for me. But flames usually move right? Using GIMP’s smudge tool, I generated 3 different textures (all using the same palette).
After some testing, swapping between these textures every 5 frames provided some pretty good results.
The Technical Part
To keep this post somewhat friendly, I opted to leave the heavy PS2 specific details out of the main portion. If you’re not interested in how I used the VIF, DMAC GIF and GS, or if you don’t know what any of those mean, this part might get boring.
There are 3 data paths to the GS.
- PATH3 is directly from the EE to the GS
- PATH2 is from the EE to the VIF1 and to the GS
- PATH1 is from VU1 memory to the GS
PATH3 is the easiest. Build a GIF packet and send it to the GIF via DMA channel 2.
* Or directly write to the FIFO, which I’m not going to bother entertaining in this post.
At the time of writing this code, I did not have much experience using PATH2. I decided that I’m going to finally use PATH2 for something.
Credit: Guilherme Lampert
My design ended up becoming this
The sprite packet is actually two sprites.
// (psuedo code)
GIF_SET_TAG(9, 1, GIF_PRE_ENABLE, GS_PRIM_SPRITE, GIF_FLG_PACKED, 1)
GS_SET_RGBAQ(0x00, 0x00, 0x00, 0x7f, 0x00)
GS_SET_XYZ(0 << 4, 0 << 4, 0)
// Sprite kicked, clears the framebuffer with black
GS_SET_XYZ(640 << 4, 448 << 4, 0)
// Set our primitive to a sprite, textured, using UVs, Alpha blending enabled
GS_SET_PRIM(GS_PRIM_SPRITE, 0, 1, 0, 1, 0, 1, 0, 0)
// Point to our texture and clut buffer
GS_SET_TEX0(g_texptr / 64, 1, GS_PSM_4, 5, 5, 1, 1, g_clutptr / 64, GS_PSM_16, 1, 0, 1)
GS_SET_UV(0, 0)
GS_SET_XYZ(0, 0, 0)
GS_SET_UV(32 << 4, 32 << 4)
// Sprite kicked, UV 0,0 to 32,32 will be mapped to 0,0 -> 640,448
// The GS takes care of mapping the texture to the clut buffer
GS_SET_XYZ(640 << 4, 448 << 4, 0)
Kicking this is very simple.
- Flush the cache!
- Set DMAC channel 2 MADR to the start of the packet
- Set DMAC channel 2 QWC to the length of the packet in qwords
- Set DMAC channel 2 CHCR to 0x100 to start the transfer
Or just use the dma lib and call
dma_channel_send_normal(DMA_CHANNEL_GIF, draw_packet, q - draw_packet, 0, 0);
I’ll skip explaining the rotating parts in depth. You’ve already read above what rotating the palette involves. Rotating the texture is simply changing a global pointer between the 3 texture arrays. Unlike the palette, the textures are not modified during runtime.
Now, uploading the palette and texture. I utilised something called DMA chains. Instead of transferring one big block of memory, starting from MADR and ending at MADR + QWC, our DMAC can be smart and do some heavy lifting for us.
I’ve simplified this code as best as I could. If it looks intimidating, you’d be right, it is.
// Tell the DMAC to send the next 7 QW in this packet to the VIF
DMATAG_CNT(q, 7, 0, 0, 0);
// Tell the VIF to send the next 8 QW to the GIF
PACK_VIFTAG(q, VIF_CMD_NOP, VIF_CMD_NOP, VIF_CMD_NOP, (VIF_CMD_DIRECT << 24) | 8);
// Tell the GIF that we will be 'blit'ing a texture
GIF_SET_TAG(4, 0, 0, 0, GIF_FLG_PACKED, 1)
GS_SET_BITBLTBUF(0, 0, 0, g_clutptr >> 6, 1, GS_PSM_16)
GS_SET_TRXPOS(0, 0, 0, 0, 0)
GS_SET_TRXREG(16, 1)
// As soon as we set TRXDIR, the GIF treats the incoming data as a texture
GS_SET_TRXDIR(0)
// Tell the GIF that the next 2 QW will be in IMAGE format
GIF_SET_TAG(2, 0, 0, 0, GIF_FLG_IMAGE, 0)
// This is where things get interesting
// Instead of copying the data at flame_clut into this packet (which can be expensive in memory and execution time the larger your data is)
// The DMAC engine will start copying data (2 qw) from starting from flame_clut
// It's REFerencing to flame_clut
DMATAG_REF(q, 2, (uiptr)flame_clut, 0, 0, 0);
// I've excluded uploading the texture in this sample. It is essentially the same exact thing.
To rephrase; if your texture is a huge amount of QW, you can simply reference it instead of copying it all into your dma packet.
In another effort in explaining the DMAC chain, here is a diagram of what is hapening above.
Numbered is the data that gets sent to the GIF.
Kicking this is even easier.
- Flush the cache!
- Set DMAC channel 2 TADR to the start of the packet
- Set DMAC channel 2 CHCR to 0x104 to start the tag trasnfer
Or just use the dma lib and call
dma_channel_send_chain(DMA_CHANNEL_GIF, draw_packet, q - draw_packet, 0, 0);
Final Thoughts
If you’ve made it this far, thank you! I’m still trying to figure out how I want these posts to be written. I don’t want to write things nobody understands but I also want to keep it interesting without culling too many details.
If you’d like to play around with the code, it’s located on my GitHub here.
I’m happy to say the original iteration of the “VIF Flame” inspired Daniel Santos to implement it into their GTAV PS2 recreation. You can check out the demo on their youtube channel.
Link here if the above doesn’t work https://www.youtube.com/watch?v=9TLH5kXNGZw