CS444 Lab 1

The Shell

Congratulations: you are now the proud new owner of an operating system called ! Its previous owner appears to have taken superb care of it, and filled it with all the features fitting of a UNIX clone. It seems like it should be functional, but the previous author forgot the critical task of making userspace...usable. Sitting in front of you, within reach, is a system you can call your own with all its potential, laid bare and ready to put your hardware to use and do your bidding—but first, it looks like you'll need to weed through some documentation and write a shell for it...

A shell is a program which is explicitly written to carry out the commands of a user. Many other programs also "do useful work" at the behest of a user, such as compilers, text editors, web browsers, image editors, music players, and so forth, but what makes a shell special is that it is the "primordial" program of a users session, and its major purpose is the starting and orchestration of these other useful programs.

Many of us are now familiar with graphical logins and fancy menus; in a sense, for example, the Windows desktop and the Apple dock are casually "shells" by definition. However, when dealing with limitations, such as slow networks, slow processors, little memory, and poor support, few programs are more fastidious and spartan than sh, the Unix shell, and its derivatives, all of which run in the (admittedly dated, but easily emulated) environment of a character-cell terminal.

Shells also sit in a unique position with respect to other programs in an operating system; whereas although most of us wouldn't admit it, we could get along fine without, say, a web browser, a shell is absolutely essential to being able to use a system—without it, the user cannot initiate any work whatsoever. As a result, they tend to be intrinsically tied to the system on which they were designed.

About 30 years ago, when your selection of operating systems was about as diverse as, say, today's smartphone market (but, of course, far more expensive), shells were vastly different and often non-portable between systems, leading to all sorts of "features" (some would call them bugs) and interesting inconsistencies in the various shells of DOS (COMMAND), IBM JCL, the nameless CLI of OS/2, etc.. They did all have one feature in common, though: they ran commands and started programs when asked to.

Nowadays, we have the good fortune to have what has consistently been the best so far: various standards, including the living specification, dictate all sorts of the design of modern Unix-like operating systems from the system call ABI to the shell language, making it easy to write portable programs once and expect them to work everywhere. Even Windows, the only major OS out there today with no direct ancestry in the Unix kernel, has conceded the ability to run bash through its Linux emulation system.

This lab's assignment will be taking advantage of that portability; available to you through Odin and the ITL are a bunch of Linux machines with conforming specifications (and working compilers) that you can use to prototype the design of a shell. In theory, if you can get it working there, it should work anywhere else, including, hopefully, your own instance of XV6—but more on that later. (Before you start, however, you might want to experiment with bash in the terminal to familiarize yourself with how a text-based shell works.)

Front Matter

Please ensure you can use Odin; many of you have new accounts there with a default password, and will be prompted to set it. If you need help with using ssh, please refer to the resources page, and feel free to ask any of us.

When you succeed at logging into Odin, you should find a link in your home directory called cs444-sp17. This directory contains various assignment directories which will be referred to by assignments for submission. Instructions in each assignment should specify which directory is to be used, and these paths will typically be written relative to your home directory. While removing the link is possible (the real directory will persist) it is highly inadvisable.

We will be assuming some knowledge of Linux systems at this level; please be sure you are familiar in the terminal environment, as it is still the premiere environment for operating on remote and emulated systems. If you need any help, again, please feel free to ask.

Material

Unix-descended (and POSIX-compliant) operating systems use the fork-exec model of starting a new process (usually as a result of servicing a user program). The high-level method is this:

The parent process (in our case, our shell) has some additional capability when dealing with its children (the programs it starts); for example:

Putting it all together, the typical idiom for launching a program resembles the following C code, which should work without modification on x64 Linux and a decent compiler:

 1 #include <unistd.h>
 2 #include <sys/wait.h>
 3 #include <stdio.h>
 4 
 5 int main()
 6 {
 7     /* Program and arguments we want to execute; note that the program usually
 8      * receives its own path as the first "argument" */
 9     char *argv[] = {"ls", "-la", "/"};
10 
11     int childpid, status;
12 
13     printf("Parent process: %d\n", getpid());
14 
15     /* Fork off a child */
16     if(childpid = fork()) {
17         /* We get here if fork() returns nonzero--this runs in the parent.
18          * Simply wait until our child is done executing. */
19 
20         printf("Parent post-fork, child process ID: %d\n", childpid);
21 
22         /* Due to the "zombie" process mechanic, this won't race. */
23         while(wait(&status) != childpid)
24             ;  /* [Thumb twiddling intensifies] */
25 
26         printf("Parent post-child death, exit status: %d\n", status);
27     } else {
28         /* Fork returned 0--this runs in the child. If we had any environment
29          * to set up, we'd do so here, but for lack of that, let's just run the
30          * program... */
31 
32         printf("Child process ID: %d\n", getpid());
33 
34         /* This call never returns; the new program is loaded over this one.
35          * See man 2 execvp.*/
36         printf("Child preparing to exec, goodbye!\n");
37         if(execvp(argv[0], argv) < 0) {
38             /* Uh oh, exec error. Complain and exit. */
39             perror("execvp");
40             _exit(111);  /* This code is placed in &status by wait(), above. */
41         }
42     }
43 
44     printf("Parent terminating, goodbye!\n");
45 
46     return 0;
47 }

On my terminal, I get:

Parent process: 5069
Parent post-fork, child process ID: 5071
Child process ID: 5071
Child preparing to exec, goodbye!
total 97
drwxr-xr-x  20 root root  4096 Jan 18 16:02 .
drwxr-xr-x  20 root root  4096 Jan 18 16:02 ..
-rw-------   1 root root   445 Jan 18 16:02 .bash_history
lrwxrwxrwx   1 root root    12 Nov  3 14:07 bin -> usr/host/bin
drwxr-xr-x   4 root root  1024 Jan 17 00:30 boot
drwx------   3 root root  4096 Jan 13 13:17 .config
drwxr-xr-x  15 root root  3460 Jan 19 18:14 dev
drwxr-xr-x  53 root root  4096 Jan 20 06:01 etc
drwxr-xr-x   3 root root  4096 Jan 12 22:38 home
-rw-------   1 root root    58 Jan 12 23:54 .lesshst
lrwxrwxrwx   1 root root    12 Nov  3 14:07 lib -> usr/host/lib
lrwxrwxrwx   1 root root    27 Nov  3 14:07 lib64 -> usr/x86_64-pc-linux-gnu/lib
drwx------   2 root root 16384 Jan 12 16:50 lost+found
drwxr-xr-x   2 root root  4096 Nov  3 14:07 media
drwxr-xr-x   2 root root  4096 Nov  3 14:07 mnt
drwxr-xr-x   2 root root  4096 Nov  3 14:07 opt
dr-xr-xr-x 247 root root     0 Jan 18 15:53 proc
drwx------   6 root root  4096 Jan 18 18:24 root
drwxrwxrwt   9 root root     0 Jan 18 15:53 run
lrwxrwxrwx   1 root root    13 Nov  3 14:07 sbin -> usr/host/sbin
drwxr-xr-x   2 root root  4096 Nov  3 14:07 srv
dr-xr-xr-x  12 root root     0 Jan 18 15:53 sys
drwxrwxrwt  14 root root     0 Jan 20 06:51 tmp
drwxr-xr-x   7 root root  4096 Nov  3 20:03 usr
drwxr-xr-x  14 root root  4096 Jan 16 16:07 var
drwxr-xr-x   2 root root  4096 Jan 13 23:56 .vim
-rw-------   1 root root 18306 Jan 18 16:02 .viminfo
Parent post-child death, exit status: 0
Parent terminating, goodbye!

The "hardest" part of the shell is usually just parsing the input, which is given to you in the skeleton code. The other "hard parts" are mostly in the environment setup in the comment, and are based on how fully-functioned you want your shell to be; for example, if you want to support pipelines like echo "stuff" | grep "ff", you'll need to make use of pipe and dup; if you want to do redirections like ls > file or cat < /tmp/a >> /tmp/b, you'll want to use open, and so forth. Use care when adjusting file descriptors; typically, you'll want to close the appropriate descriptor before calling open or dup, which always allocates the lowest one available. "Backgrounding" a process is usually as simple as not calling wait in the parent (though most real shells also keep track of these "background jobs")

When redirecting or piping, it is important to note that most Unix processes start life with three open file descriptors: 0, the lowest legal descriptor, refers to standard input, which should be open read only, and is the usual place for programs that wish to read from the terminal. Similarly, FD 1 is standard output, open for writing, and is the usual place that messages with, e.g., printf are sent. Finally, FD 2 is standard error, open for writing, which is used for "out of stream" diagnostic messages that aren't usually considered part of a program's "output"—in particular, even when 1 is redirected, 2 usually remains attached to a controlling terminal, making it a useful place to write diagnostic messages to the user. You can, for all intents and purposes, assume that your shell program starts with these descriptors open as such.

Assignment

Due by 23:55 on Thursday, February 2, 2016

Write a Unix shell that supports pipes, redirection, backgrounding, and some necessary builtins (exit, cd, and pwd). The skeleton code given as a resource satisfies most of the difficult parts of parsing commands and an example involving exec, and provides a good framework to work on all others, but using it is not mandatory. You should write this shell in ANSI C—even if other languages expose Linux syscalls—because it will make it easy to later port to . In this vein, your life will be easiest if you use only the syscalls documented in the Syscalls section of the Information page, as that is the exhaustive list of syscalls supported by . For working on Linux, however, you should use the prototypes, and afford yourself the section 2 man pages for help and examples.

Here are some example command sequences that should work in your shell.

Submit your work to your GitLab repository to hand it in. You may work on the code itself anywhere—in the ITL, on Odin in your home directory, etc.—but only submissions to GitLab will be graded. A copy of this repository is already set up for your use in your ~/cs444-sp17/lab1/ directory.