How to Exploit Dlmalloc Unlink(): Protostar Level Heap3
|
By David Xia
While stuck inside during social distancing, I’ve been making my way through LiveOverflow’s awesome
Youtube playlist “Binary Exploitation / Memory Corruption.” His videos are structured
around a well known series of exploit exercises here called “Protostar.” I took the
time to truly understand each one before moving onto the next as the exercises build on each
other. For the past several days I’ve been trying to understand the “Heap3”
level, a relatively complex level that requires manipulating the heap to redirect code
execution to an arbitrary function. After rewatching the video many times and reading numerous other
online explanations, I finally understand! That moment of understanding feels so gratifying.
Many other resources already explain the exploit well, but I’m writing my own explanation to
reinforce my understanding and to celebrate.
Exploit Exercise Protostar Heap3
1234567891011121314151617181920212223242526272829
#include <stdlib.h>#include <unistd.h>#include <string.h>#include <sys/types.h>#include <stdio.h>voidwinner(){printf("that wasn't too bad now, was it? @ %d\n",time(NULL));}intmain(intargc,char**argv){char*a,*b,*c;a=malloc(32);b=malloc(32);c=malloc(32);strcpy(a,argv[1]);strcpy(b,argv[2]);strcpy(c,argv[3]);free(c);free(b);free(a);printf("dynamite failed?\n");}
The source code is pretty straightforward. There’s the main() and winner() functions. There’s
three character pointers, three malloc()’s, three strcpy()’s, three free()’s, and finally a
printf(). Our goal is to redirect code execution from main() to winner().
The description at the top of the level is
This level introduces the Doug Lea Malloc (dlmalloc) and how heap meta data can be modified to
change program execution.
All these exercises are on 32-bit x86 architecture.
Background on dlmalloc
The vulnerable malloc is usually referred to as dlmalloc (named after one of its authors Doug Lea)
and must be an old version like this one from 1996. The Phrack article “Once
Upon a free()…” provides useful background.
Most malloc implementations share the behaviour of storing their own management information, such
as lists of used or free blocks, sizes of memory blocks and other useful data within the heap
space itself.
The central attack of exploiting malloc allocated buffer overflows is to modify this management
information in a way that will allow arbitrary memory overwrites afterwards.
For our purposes, skip to the “GNU C Library implementation” section. It says that memory slices or
“chunks” created by malloc are organized like so. On 32-bit systems, prev_size and size are
4 bytes each. data is the user data section. malloc() returns a pointer to the address where
data starts.
The other important things to know about the vulnerable version(s) of dlmalloc are:
The lowest bit of size called PREV_INUSE indicates whether the previous chunk is used or not
Once we free() the chunk using free(mem), the memory is released, and if
its neighboring chunks aren’t free, dlmalloc will clear the next chunk’s PREV_INUSE and add the
chunk to a doubly-linked list of other free chunks. It does this by adding a forward and backward
pointer at mem.
1234567891011121314
. +----------------------------------+
chunk -> | prev_size |
+----------------------------------+
| size |
+----------------------------------+
mem -> | fd |
+----------------------------------+
| bk |
+----------------------------------+
| (old memory, can be zero bytes) |
: :
nextchunk -> | prev_size ... |
: :
If neighboring chunks are free, dlmalloc will merge the just freed chunk with its neighboring
chunks. The two neighboring free chunks are in a doubly-linked list. dlmalloc first removes the
neighboring chunk at the lower memory address from the list, merges it with the recently freed
chunk, and repeats this for the neighboring chunk at the higher memory address. The unlinking is
done with a macro called unlink() which removes an entry from a doubly-linked list and ties the
loose ends of the list back together.
Since we can overwrite the bytes of P, we can overwrite 4-bytes of memory at two arbitrary places.
To trigger this code path, chunks being consolidated must be bigger than 80 bytes. dlmalloc
classifies these chunks as “fastbins.”
An array of lists holding recently freed small chunks. Fastbins are not doubly linked.
What the heap looks like in heap3.c
Run gdb on heap3.c. My personal preference is to set the disassembly-flavor to intel and turn off
pagination.
The printf has become a puts(). plt stands for procedure linkage table, one of the structures
which makes dynamic loading and linking easier to use. @plt means we are calling puts at PLT
entry at address 0x8048790. If we disassemble that address we see
It calls another function at address 0x804b128. This address is part of the Global Offset Table
(GOT) which points to the dynamically linked library containing the actual puts() function.
We want to replace the call to puts() with a call to winner(). So we want to overwrite the
contents of 0x804b128 in the GOT, currently 0x08048796, with the address to winner().
To get a visual sense of what the heap looks like, set breakpoints at every library function
call, i.e. break at the address of malloc(), strcpy(), free(), and puts().
The heap starts at 0x804c000, ends at 0x804d000, and has size 0x1000 or 4096 bytes. We can
define hooks in gdb. We define one to examine the first 56 words of the heap in hexadecimal every
time execution stops.
The second word of the chunk up to the last three bits indicates the chunk size in bytes. 0x29 is
0b101001. Without the last three bits it’s 0b101000 which is 40. We can see the chunk starts at
0x804c000 and ends at 0x804c028 which is the start of the next chunk. This range encompasses
10 words. Each word is 4 bytes which makes 10 * 4 = 40 bytes. The last bit of the size word
indicates that the previous chunk is in use. By convention the first chunk has this bit turned on
because there’s no previous chunk that’s free.
The second chunk resulting from the second malloc() starts at 0x804c028 and ends at 0x804c050.
It’s identical to the first chunk. Continue past the third malloc().
We see a third chunk is created. The number at the end (right now 0x00000f89) indicates the
remaining size of the heap. It has been decreasing. Continue past the first strcpy().
0x804c030 now has 0x0804c050 which is a pointer to the start of the third chunk. This shows the
second and third chunk are now tied together in a singly-linked list since they are small enough to
be considered fastbins. Continue.
Now the first chunk has been freed and address 0x804c008 has a pointer 0x0804c028 to the second
chunk. If we continue, the program runs the printf("dynamite failed?\n"); line.
Let’s work backwards. We can use unlink() to write the four byte address of a call to winner() to the GOT
entry for puts(). Use objdump to find the address of winner().
We can’t just put 0x08048864 in the GOT entry at 0x804b128 (why?).
In order to call winner(), we’ll need to craft a payload that does so. Such a
payload is often called “shellcode.” The following assembly code will do.
12
moveax,0x8048864calleax
Using an online x86 assembler, the above in raw assembly is
\xB8\x64\x88\x04\x08\xFF\xD0. We can store this in the heap’s first chunk whose data area starts
at 0x804c008. Now we want to write 0x804c008 into the GOT entry for puts() at 0x804b128.
Let’s go back to the unlink statements.
1234
BK=*(P+12);FD=*(P+8);*(FD+12)=BK;*(BK+8)=FD;
BK is the address of \xB8\x64\x88\x04\x08\xFF\xD0. Where should we store that? Let’s put it in
the first chunk at 0x804c014. The first chunk’s data starts at 0x804c008, but we’ve seen the
first byte is changed by dlmalloc when it’s freed. We don’t want our shellcode to be changed so we
put it at a safe distance in the data at a +12-byte offset. 12 A’s can pad the shellcode enough to
push it 12-bytes into the heap. We have enough info to construct the first command line argument.
We’ll store FD and BK in the third chunk. We can use the second command line argument to
overwrite the size of the third chunk to be greater than 80 to trigger the unlink() macro when the
third chunk is free()’d. The second argument needs to have enough characters to overflow its
chunk. The chunk’s data starts at 0x804c030 and ends 32 bytes later at 0x804c050. The third
chunk’s size is four bytes later at 0x804c054. So we can use 32 + 4 = 36 characters as padding.
Let’s pick 100 as the size of the third chunk. 100 = 0x64. We also have to set the last bit to 1 to
indicate the second or previous chunk is in use. So the third chunk’s size should be 0x65. So our
second argument can have 36 B’s as padding followed by \x65.
Now we craft the third and final argument. The structure for it will be some padding + some four
bytes to be determined + some size + FD + BK.
The third chunk starts at 0x804c050. It used to end 40 bytes later at 0x804c078, but we
overwrote its size to 0x65 or 100. So now it ends 100 bytes later at 0x804c0b4. We want to
trigger unlink() on the third chunk when we free() it. We’ve already ensured it’s not a fastbin
by setting its size to be greater than 80 bytes. The next condition is to make dlmalloc consolidate
this chunk with either the chunk before or after. Since we’re using the previous chunk, let’s fool
dlmalloc into thinking the next chunk is free.
I know what you’re thinking. There’s no fourth chunk. That’s right, but we’ll make dlmalloc think
there is. In order to check a chunk is free, dlmalloc looks at the PREV_INUSE bit of the next
chunk. To find the next chunk, dlmalloc adds the size of the current chunk to the current chunk’s
address. You can see this at line 3259.
So let’s write a small size at 0x804c0b8 to make dlmalloc think the fifth chunk is close by and so
we don’t have to add too much padding to our third argument. A size like 0x20. We’ll have to write
it as \x00\x00\x00\x20. But we have a problem here. C treats \x00 as the end of a string, and
thus strcpy() will stop copying any string up to and including that NUL. We won’t be able to add
any more bytes after that. This means we cannot insert \x00 in the middle of any of our inputs.
But all is not lost. We want a small number for the fourth chunk’s size. What’s another way of
summing to a small number, at least in the way computers represent integers? In non-modular
arithmetic, the only way two integers can produce a small sum is if they themselves are smaller. In
modular arithmetic, a small integer can be the sum of large numbers that are greater than the
modulus.
Take a closer look at how chunk_at_offset() is defined. It sums two numbers with no sanity checks.
So we can write a really big number with no null bytes that strcpy() won’t stop on and will make
dlmalloc think the next fifth chunk is close by. Even better, we can use the first byte of the
fourth chunk as the fifth chunk’s size. How can we make dlmalloc think the fifth chunk is four bytes
ahead of the fourth chunk? We do this with 0xfffffffc which is -2 in two’s complement for signed
integers. So 0xfffffffc at 0x804c0b8 will point to a fifth chunk’s size four bytes earlier at
0x804c0b4. This word’s last bit must be set to 0 to indicate the fourth chunk is free. We can
simply use 0xfffffffc again here.
We want (FD + 12) to equal 0x804b128. So FD should be 0x804b128 - 12 = 0x804b11c. In the above
we decided to make BK0x0804c014. We have