..

Inline Assembly Dangers

Prepare, this is a long one. If you just want to see the fix, skip to the bottom.

Problem 1

I’ve been having an issue with the PS2 SDK, using my discord messages for reference, I’ve first run into this since the middle of 2023.

It’s pretty simple, load any PS2SDK ELF, soft reset with ps2link, and load the elf again. This will result in a TLB miss (crash). Investigating, I found that the miss originates in libcglue timezone startup code. Because this is weakly linked, providing an empty stub ‘fixes’ the problem.

void _libcglue_timezone_update(){}

int main()
{
    // do your fun PS2 stuff here, just don't use any timezone stuff
}

I didn’t investigate further. I chalked it up maybe something with reentry within newlib, or maybe the function is just broken. I don’t know much about reentry or timezones so I told anyone who ran into this issue to add that stub and to not use timezones.

Problem 2

A few months later, someone tells me that after a ps2link reset, their graphics are broken. To be more specific, there are 32 pixel height alternating strips of their clear colour and black. My thought was that the zbuffer is somehow pointing to the framebuffer and that what we are seeing is the zbuffer clearing each bottom half of the page. If you’ve taken a look at my fast clearing post, this will make some sense ;)

An image of a CRT TV with blue and black horizontal stripes.

After cloning their code and adding some logging, sure enough after a soft reset the framebuffer and zbuffer were pointing to the same page address. 0xFFFFFFFF.

That’s definitely not right.

If you’re curious about how this allocation happens, it’s pretty straightforward, no reason for this to go wrong.

u32 fbp = 0;
u32 zbp = 0;
void alloc_vram()
{
    fbp = graph_vram_allocate(width,height,format,ALIGN_PAGE);
    zbp = graph_vram_allocate(width,height,zformat,ALIGN_PAGE);
}

libgraphs vram allocator is dumb, it’s just linear which is good enough if you’re not tight on vram. Because it’s so simple, I cursed at ps2link for not resetting the vram pointer inside of graph, and recommended calling graph_vram_clear() before starting your first allocation.

At this point, I swore to never use ps2link reset. It just was too unreliable.

Problem Solving

Just a couple days ago I witnessed someone get caught in the timezone issue. I was in a good mood and decided I wanted to figure out what is happening so this footgun doesn’t blow another persons foot off. If I traced the instruction causing the TLB miss I’d end up somewhere in ‘memmove’, with not a good stack frame.

I decided to start from the top instead, starting with _libcglue_timezone_update(). I further narrowed it down to setenv().

setenv() is part of newlib. Digging even deeper into newlib I ended up here.

int
_setenv_r (struct _reent *reent_ptr,
   const char *name,
   const char *value,
   int rewrite)
{
  static int alloced;        /* if allocated space before */
  register char *C;
  int l_value, offset;
  ...

The main thing that caught my eye was that static local, alloced.

Static locals are essentially globals with local scope. If you initialize a static local like so ‘static int alloced = 1’, that local will be initialized only once, even if that line is ‘executed’ multiple times. I don’t like static locals, if you’re not careful you can easily interpret them as being initialized every time a function is called. But that’s not so important here.

I immediately thought back to that vram allocation issue.

Adding logging to newlib was going to be a hassle. Instead, I wrote a test to see if I can reproduce it.

static int global_variable;

int main(void)
{
    printf("!!global_variable = %d\n", global_variable);
    global_variable = 1;

    static int local_variable;

    printf("!!local_variable = %d\n", local_variable);

    local_variable = 1;
    SleepThread();
}

I loaded this up and these are the results:

> ps2client execee host:playground.elf

global_variable = 0
local_variable = 0

> ps2client reset
> ps2client execee host:playground.elf

global_variable = 1
local_variable = 1

That, is, not, good, at, all.


What if it’s just being placed in the wrong section?

> mips64r5900el-ps2-elf-objdump -x playground.elf | grep variable
    0011ff14 l     O .bss 00000004 global_variable
    0011ff10 l     O .bss 00000004 local_variable.0

Nope, the .bss is zero initialized (just like all uninitialized globals). These being here is correct.

(This was wasted time as the SCE toolchain compiled homebrew worked fine.)

> mips64r5900el-ps2-elf-objdump -d playground.elf | grep start
    00100b00 <__start>:

> ps2client execee host:playground.elf
Loaded, host:playground.elf
start address 0x100b00

Nope, the start address matches __start.

What if the crt code is wrong?

I decided to try and debug the __start routine that is supposed to clear the .bss section. I wasn’t too convinced that this was the problem. This all made sense when I read it many, many times.

The code in question looked like this.

/*
 * First function to be called by the loader
 * This function sets up the stack and heap.
 * DO NOT USE THE STACK IN THIS FUNCTION!
 */
void __start(struct sargs_start *pargs)
{
    asm volatile(
        "# Clear bss area"
        "la   $2, _fbss"
        "la   $3, _end"
        "1:"
        "sltu   $1, $2, $3"
        "beq   $1, $0, 2f"
        "nop"
        "sq   $0, ($2)"
        "addiu   $2, $2, 16"
        "j   1b"
        "nop"
        "2:"
        "                       \n"
        "# Save first argument  \n"
        "la     $2, %0 \n"
        "sw     $4, ($2)        \n"
        "                       \n"
        "# SetupThread          \n"
        "la     $4, _gp         \n"
        "la     $5, _stack      \n"
        "la     $6, _stack_size \n"
        "la     $7, %1	        \n"
        "la     $8, ExitThread  \n"
        "move   $gp, $4         \n"
        "addiu  $3, $0, 60      \n"
        "syscall                \n"
        "move   $sp, $2         \n"
        "                       \n"
        "# Jump to _main      	\n"
        "j      %2           \n"
        : /* No outputs. */
        : "R"(args_start), "R"(args), "Csy"(_main));
}

I opened it in PCSX2s debugger, but noticed something strange. It was wrong.

> mips64r5900el-ps2-elf-objdump -d playground.elf --disassemble=__start

001017b0 <__start>:
  1017b0:  3c020016  lui    v0,0x16
  1017b4:  3c030016  lui    v1,0x16
  1017b8:  24427acc  addiu  v0,v0,31436
  1017bc:  ac440000  sw     a0,0(v0)
  1017c0:  3c040017  lui    a0,0x17
  1017c4:  2484bf70  addiu  a0,a0,-16528
  1017c8:  3c050000  lui    a1,0x0
  1017cc:  24a5ffff  addiu  a1,a1,-1
  1017d0:  3c060002  lui    a2,0x2
  1017d4:  24c60000  addiu  a2,a2,0
  1017d8:  24677988  addiu  a3,v1,31112
  1017dc:  3c080011  lui    a4,0x11
  1017e0:  25088fe0  addiu  a4,a4,-28704
  1017e4:  0080e025  move   gp,a0
  1017e8:  2403003c  li     v1,60
  1017ec:  0000000c  syscall
  1017f0:  08040018  j      100060 <_main>
  1017f4:  0040e825  move   sp,v0
  1017f8:  03e00008  jr     ra
  1017fc:  00000000  nop

Where is our bss zeroing loop!?

I tried adding nops, I tried changing the inputs and outputs, messing with volatile qualifiers, disabling optimizations, nothing worked.

Until I realised.
That entire section of assembly is missing newlines.
The first line is a comment.
The entire portion of assembly that handles zeroing our bss section, is a comment.

You see, with inline assembly you need to use newlines (\n) to denote the end of a line. After GCC gave our assembly to the assembler, it looked like this.


# Clear bss areala   $2, _fbssla   $3, _end1:sltu   $1, $2, $3beq   $1, $0, 2fnopsq   $0, ($2)addiu   $2, $2, 16 <the rest of the bss loop>
# Save First argument
la     $2, %0
sw     $4, ($2)
<the rest of the assembly>

Sure enough, adding the newlines (and fixing the formatting) fixed the global variable sample above and the timezone issues. Soft resetting the PS2 is now reliable!

The related pull request can be viewed here.

I think uyjulian summed it up best, “Now that’s a fun issue.”