CS445/CS545 Compiler Construction
SPRING 2008
COURSE INFORMATION
Instructor: Christino Tamon
Lectures: TR 9:30-10:45am, Science Center 342
Office hours: SC373, TR 8:30-9:30, 10:45-11:15, 3:00-4:00
Pre-requisites: CS344 Data Structures and Algorithms, CS345 Automata Theory and Formal Languages
SYLLABUS:
A study of compiler design. Overview of the compilation process.
Formal definition of syntax, lexical scanning, parsing including LL and LR grammars,
run-time structures, intermediate code generation, and storage allocation.
Students are expected to develop a compiler for a substantial subset of a
high-level language using compiler tools such as lex and yacc.
GRADING Project and Assignments 60%; Exams 40%
TEXTS
- Required:
Alfred V. Aho, Monica Lam, Ravi Sethi and Jeffrey D. Ullman. "Compilers: Principles, Techniques, and Tools," 2nd edition,
Addison-Wesley (2006).
- Brian W. Kernighan and Dennis M. Ritchie, "The C Programming Language," 2nd edition,
Prentice-Hall (1988) (on reserve)
- John Levine, Tony Mason, and Jason Brown, "lex & yacc," 2nd edition, O'Reilly (1992) (on reserve)
OUTLINE
- Language translation. Compilation. Overview.
- Lexical Analysis. Regular expressions.
- Syntax Analysis. Theory of Parsing: top-down LL(1); bottom-up SLR(1), LR(1) (and LALR(1)).
- Semantic Analysis. Parse trees. Syntax-directed translation. Type checking.
- Code generation. Runtime environments.
OBJECTIVES/OUTCOMES
The objective of this course is to learn the fundamentals of compiler theory and its application
in constructing a real working compiler. The specific outcomes are:
- C Proficiency.
- Knowledge of standard UNIX tools for software design and compiler construction.
- Knowledge of assembly language and its run-time environment relevant for compilers.
- Ability to build a small and functional compiler in a UNIX programming environment.
REQUIREMENTS/POLICIES
Although attendance is not mandatory, students are responsible for all course materials
covered in lectures and any exams given during class periods. Students that need to make
up missing course work must follow the official Clarkson policies regarding these matters.
All students must submit their own work; the exchange of ideas are encouraged but
ultimately the submitted work must be the student's own. Please refer to the Clarkson
University Regulations for more guidelines on academic integrity and related matters.
SCHEDULE
- Introduction: read Chapter 1 [Dragon]
Phases: lexical analysis, syntax analysis, semantic analysis,
intermediate code generation, code optimization, code generation.
- Course project: Appendix A [Dragon, 1st ed]
Source language: Pascal; target language: Intel assembly.
Implementation language + tools: C + lex & yacc.
- Syntax-directed translation: read Chapter 2 [Dragon]
Recursive-descent parsing (top down). LL(1) grammars.
- Parsing techniques: read Chapter 4 [Dragon]
Top-down (stack-based) LL(1); bottom-up SLR(1); LR(1) and LALR(1) (yacc)
- Symbol Table: hashing (see Chapter 7 [Dragon])
- Semantic analysis: read Chapter 5 and 6 [Dragon]
building syntax trees; type systems.
- Code generation: Chapter 8 [Dragon].
Run-time environments Chapter 7 [Dragon]; [Bartlett]
ASSIGNMENTS
- Homework 1 (due: 02/01/08).
Improve the expression calculator from our labs by adding some features of your own.
Submit two versions of your calculator: top-down (recursive-descent) and bottom-up
(using lex & yacc). Send by email a compressed tar file containing all your source files
along with their makefiles. Include a documentation for each program.
- Project 1 (due: 02/14/08).
Submit your yacc parser (+ lex scanner) for the Pascal language (from handout).
Adjust the grammar so that nested subprograms and if-then statements are allowed,
and array expressions can appear as R-values. Also, adjust the last rule for sign and
let the scanner handle two types of comment styles (with (* .. *) and with { .. }).
Use the test files from /afs/cu/class/cs445/public/test-files.
- Homework 2 (due: 03/25/08).
Implement a different scheme for the symbol table. Either modify the hashing scheme
or use a different data structure altogether. Examples of alternatives include:
- Perfect hashing scheme (see "Introduction to Algorithms", 2ed, Cormen et al.).
- Different bucket data structure (such as binary search trees).
- Self-balancing data structures (such as AVL trees or red-black trees).
Compare the performance of our basic hashing scheme with your alternative scheme on
a large corpus of words. Bonus: Explore the use of a different programming language
(say, C++) for this assignment and investigate the potential of making it compatible
with lex & yacc.
- Project 2 (due: 03/14/08).
Submit your compiler with the symbol table and syntax trees implemented. Your compiler
should enforce basic scoping rules on variables. Your compiler should also output the
syntax tree for (most of) the souce program.
- Project 3 (due: 04/11/08).
Submit your compiler with a complete semantic analysis implemented. Your compiler
should enforce basic semantic and typing rules. For this, you need to implement a
type module and also (optionally) an error handler module.
Semantic check-list
- Project 4 (due: Finals week).
Submit your compiler with a basic code generator for Intel assembly implemented.
Your compiler should handle input/output (read/write), basic expressions and statements
involving integers, and procedure/function calls. It should also properly handle access
to non-local names.
DRAGON PROJECT
- Arrange an online demo of your compiler (during Finals week; see schedule outside my office).
- Send a compressed tar source of your compiler by email.
- Submit a hardcopy of your compiler source files and documentation.
The documentation should include a design document, a user manual, a testing report,
and a status report (limitations, caveats, or bugs).