Inline Assembly Dangers
Prepare, this is a long one. If you just want to see the fix, skip to the bottom.
Problem 1
I’ve been having an issue with the PS2 SDK, using my discord messages for reference, I’ve first run into this since the middle of 2023.
It’s pretty simple, load any PS2SDK ELF, soft reset with ps2link, and load the elf again. This will result in a TLB miss (crash). Investigating, I found that the miss originates in libcglue timezone startup code. Because this is weakly linked, providing an empty stub ‘fixes’ the problem.
void _libcglue_timezone_update(){}
int main()
{
// do your fun PS2 stuff here, just don't use any timezone stuff
}
I didn’t investigate further. I chalked it up maybe something with reentry within newlib, or maybe the function is just broken. I don’t know much about reentry or timezones so I told anyone who ran into this issue to add that stub and to not use timezones.
Problem 2
A few months later, someone tells me that after a ps2link reset, their graphics are broken. To be more specific, there are 32 pixel height alternating strips of their clear colour and black. My thought was that the zbuffer is somehow pointing to the framebuffer and that what we are seeing is the zbuffer clearing each bottom half of the page. If you’ve taken a look at my fast clearing post, this will make some sense ;)
After cloning their code and adding some logging, sure enough after a soft reset the framebuffer and zbuffer were pointing to the same page address. 0xFFFFFFFF.
That’s definitely not right.
If you’re curious about how this allocation happens, it’s pretty straightforward, no reason for this to go wrong.
u32 fbp = 0;
u32 zbp = 0;
void alloc_vram()
{
fbp = graph_vram_allocate(width,height,format,ALIGN_PAGE);
zbp = graph_vram_allocate(width,height,zformat,ALIGN_PAGE);
}
libgraphs vram allocator is dumb, it’s just linear which is good enough if you’re not tight on vram. Because it’s so simple, I cursed at ps2link for not resetting the vram pointer inside of graph, and recommended calling graph_vram_clear() before starting your first allocation.
At this point, I swore to never use ps2link reset. It just was too unreliable.
Problem Solving
Just a couple days ago I witnessed someone get caught in the timezone issue. I was in a good mood and decided I wanted to figure out what is happening so this footgun doesn’t blow another persons foot off. If I traced the instruction causing the TLB miss I’d end up somewhere in ‘memmove’, with not a good stack frame.
I decided to start from the top instead, starting with _libcglue_timezone_update()
. I further narrowed it down to setenv()
.
setenv()
is part of newlib. Digging even deeper into newlib I ended up here.
int
_setenv_r (struct _reent *reent_ptr,
const char *name,
const char *value,
int rewrite)
{
static int alloced; /* if allocated space before */
register char *C;
int l_value, offset;
...
The main thing that caught my eye was that static local, alloced
.
Static locals are essentially globals with local scope. If you initialize a static local like so ‘static int alloced = 1’, that local will be initialized only once, even if that line is ‘executed’ multiple times. I don’t like static locals, if you’re not careful you can easily interpret them as being initialized every time a function is called. But that’s not so important here.
I immediately thought back to that vram allocation issue.
Adding logging to newlib was going to be a hassle. Instead, I wrote a test to see if I can reproduce it.
static int global_variable;
int main(void)
{
printf("!!global_variable = %d\n", global_variable);
global_variable = 1;
static int local_variable;
printf("!!local_variable = %d\n", local_variable);
local_variable = 1;
SleepThread();
}
I loaded this up and these are the results:
> ps2client execee host:playground.elf
global_variable = 0
local_variable = 0
> ps2client reset
> ps2client execee host:playground.elf
global_variable = 1
local_variable = 1
That, is, not, good, at, all.
What if it’s just being placed in the wrong section?
> mips64r5900el-ps2-elf-objdump -x playground.elf | grep variable
0011ff14 l O .bss 00000004 global_variable
0011ff10 l O .bss 00000004 local_variable.0
Nope, the .bss is zero initialized (just like all uninitialized globals). These being here is correct.
What if ps2link is for some reason skipping the __start section?
(This was wasted time as the SCE toolchain compiled homebrew worked fine.)
> mips64r5900el-ps2-elf-objdump -d playground.elf | grep start
00100b00 <__start>:
> ps2client execee host:playground.elf
Loaded, host:playground.elf
start address 0x100b00
Nope, the start address matches __start.
What if the crt code is wrong?
I decided to try and debug the __start routine that is supposed to clear the .bss section. I wasn’t too convinced that this was the problem. This all made sense when I read it many, many times.
The code in question looked like this.
/*
* First function to be called by the loader
* This function sets up the stack and heap.
* DO NOT USE THE STACK IN THIS FUNCTION!
*/
void __start(struct sargs_start *pargs)
{
asm volatile(
"# Clear bss area"
"la $2, _fbss"
"la $3, _end"
"1:"
"sltu $1, $2, $3"
"beq $1, $0, 2f"
"nop"
"sq $0, ($2)"
"addiu $2, $2, 16"
"j 1b"
"nop"
"2:"
" \n"
"# Save first argument \n"
"la $2, %0 \n"
"sw $4, ($2) \n"
" \n"
"# SetupThread \n"
"la $4, _gp \n"
"la $5, _stack \n"
"la $6, _stack_size \n"
"la $7, %1 \n"
"la $8, ExitThread \n"
"move $gp, $4 \n"
"addiu $3, $0, 60 \n"
"syscall \n"
"move $sp, $2 \n"
" \n"
"# Jump to _main \n"
"j %2 \n"
: /* No outputs. */
: "R"(args_start), "R"(args), "Csy"(_main));
}
I opened it in PCSX2s debugger, but noticed something strange. It was wrong.
> mips64r5900el-ps2-elf-objdump -d playground.elf --disassemble=__start
001017b0 <__start>:
1017b0: 3c020016 lui v0,0x16
1017b4: 3c030016 lui v1,0x16
1017b8: 24427acc addiu v0,v0,31436
1017bc: ac440000 sw a0,0(v0)
1017c0: 3c040017 lui a0,0x17
1017c4: 2484bf70 addiu a0,a0,-16528
1017c8: 3c050000 lui a1,0x0
1017cc: 24a5ffff addiu a1,a1,-1
1017d0: 3c060002 lui a2,0x2
1017d4: 24c60000 addiu a2,a2,0
1017d8: 24677988 addiu a3,v1,31112
1017dc: 3c080011 lui a4,0x11
1017e0: 25088fe0 addiu a4,a4,-28704
1017e4: 0080e025 move gp,a0
1017e8: 2403003c li v1,60
1017ec: 0000000c syscall
1017f0: 08040018 j 100060 <_main>
1017f4: 0040e825 move sp,v0
1017f8: 03e00008 jr ra
1017fc: 00000000 nop
Where is our bss zeroing loop!?
I tried adding nops
, I tried changing the inputs and outputs,
messing with volatile
qualifiers, disabling optimizations, nothing worked.
Until I realised.
That entire section of assembly is missing newlines.
The first line is a comment.
The entire portion of assembly that handles zeroing our bss section, is a comment.
You see, with inline assembly you need to use newlines (\n) to denote the end of a line. After GCC gave our assembly to the assembler, it looked like this.
# Clear bss areala $2, _fbssla $3, _end1:sltu $1, $2, $3beq $1, $0, 2fnopsq $0, ($2)addiu $2, $2, 16 <the rest of the bss loop>
# Save First argument
la $2, %0
sw $4, ($2)
<the rest of the assembly>
Sure enough, adding the newlines (and fixing the formatting) fixed the global variable sample above and the timezone issues. Soft resetting the PS2 is now reliable!
The related pull request can be viewed here.
I think uyjulian summed it up best, “Now that’s a fun issue.”