Dangerous Functions

"The 'gets' function is dangerous and should not be used." -- Old Compiler Proverb

In class, I mentioned that the gets() function, in its traditional form, cannot be used safely. In this bonus assignment, we will demonstrate that fact by bypassing the password protection on a login system that uses gets().

The gets() function in xv6 is defined as:

char * gets(char *buf, int max);

In traditional UNIX (and in the version of xv6 you will be working on for the bonus assignment, the function instead looks like:

char * gets(char *buf);

See the difference? In the latter case, how do we know how much data we should read from the user? The answer is we have no idea! So the gets() function just keeps on copying and just hopes no one is antisocial enough to try and enter too much data.

Getting the Code

As usual, we'll be working off of a slightly modified version of xv6. This version is modified so that its gets() function does not include a length; it will instead keep copying characters until it sees a newline.

It also implements a super-secure password login system in init.c. Open up init.c and examine the checkpass() function. As you can see, unless someone knows the extremely complex password "hunter2", they will be unable to get to the xv6 shell.

If you still have your xv6 directory from last time, remove or rename it first. Then get the base xv6 code for this assignment:

$ git clone https://github.com/moyix/xv6-public.git
Cloning into 'xv6-public'...
remote: Counting objects: 4576, done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 4576 (delta 5), reused 0 (delta 0), pack-reused 4562
Receiving objects: 100% (4576/4576), 11.69 MiB | 3.31 MiB/s, done.
Resolving deltas: 100% (1861/1861), done.
Checking connectivity... done.
$ cd xv6-public/
$ git checkout bonus_sec
Branch bonus_sec set up to track remote branch bonus_sec from origin.
Switched to a new branch 'bonus_sec'

Breaking In

Examine the checkpass() function. Your mission is to find a password that is not "hunter2" that will nevertheless allow you to log in. Note that it's okay if the "Incorrect, try again" message is printed, as long as you end up with a shell. So a successful breakin might look like:

xv6...
cpu1: starting
cpu0: starting
sb: size 1000 nblocks 941 ninodes 200 nlog 30 logstart 2 inodestart 32 bmap start 58
Password: [nefarious input goes here]
Incorrect, try again.
init: starting sh
$ ls
.              1 1 512
..             1 1 512
README         2 2 1972
cat            2 3 13260
echo           2 4 12365
forktest       2 5 8097
grep           2 6 14936
[...]

Your job, of course, is to figure out what to put in for [nefarious password goes here]. Submit the password you devise on NYU Classes, in a text file. Include a brief explanation of why the password you submitted allows you to log in. You get half credit for a working password, the other half for a correct explanation.

Hints:

It may help to look at the assembly code for checkpass. You can do that by looking at the file init.asm after running make.
It may also help to step through checkpass in a debugger. You can do that by running make qemu-gdb, then in another window, cd to the xv6-public directory and run i386-jos-elf-gdb (or just gdb in Linux). Then enter
```
symbol-file _init
```

to tell gdb how to interpret memory addresses as function names. Now you can do things like:

Set a breakpoint on checkpass with break checkpass
Print the address of variables in a function like p/x &is_valid
Stop just before checkpass returns to see what values variables have. For example, by looking at init.asm, you can see that the ret instruction is at address 0x7c. So you can do:
```
    break *0x7c
```
to stop just before the function returns.

Both pass and is_valid are local variables on the stack. Where are they in memory, in relation to one another? (You may want to use the debugger for this)
What happens if you enter a password that's longer than the length of the pass array? Play around with passwords of different lengths to see how the program responds. Try to explain and understand the different behaviors in terms of the layout of the stack and the local variables of the function.