ITEC 460 - Chapter 1 Notes
Why Study Compilers
- Additional reason 1: fun
- Additional reason 2: distinguish from other grads
Our Project
- JavaCC - Generate Lexer + LL Parser + ...
- Implementation Language: Java
- Target Language: Java Subset
- Types: int, bool, arrays, classes
- Assignment: Variables and expressions
- Classes - Data structures only - no methods
- Methods (outside classes)
- If, While, Do, For
- I/O: int = read(); print(int i); println();
- Recursion
- Extensions:
- Memory Management: Heaps and Garbage Collection
- Methods in classes
- Inheritance
- Access Control
- Overloading
- Recursive Classes
Compiler Structure
- vs Sebesta
- IC = Asm Tree = Abstract Asm Language
- IC Generator + Semantic Analyzer = Type Checker + Asm Tree Generator
- Optimization? Part of Asm Tree Generator and Code Generator
- Symbol Table? Implied
- Type checking and Linking: separate and independent
compilation
- Modularized:
- Divided into Front End and Back End
- Easy to add new language or new target architecture
Lexer
- Input: stream of characters
- Output: stream of tokens
- Each call to get returns a new token
- Ignores white space and comments
- Large proportion of compiler's time
- Regular expressions → JavaCC → Lexer
Parser
- Input: stream of tokens
- Output: abstract parse tree
- Checks for basic syntax errors
- Determines program structure
- Context Free Grammar (annotated) → JavaCC → Lexer
Semantic Analyzer
- Checks additional errors
- eg type checking, define before use, valid access to structured types
- Traverses abstract parse tree, using visitor pattern
- Heavily uses the symbol table
Intermediate Code Generation - Assembly Tree Generator
- Assembly language for idealized machine
- Represented as a tree: tree to tree transformation
- AST → Abstract Asm → Asm is easier than AST → Asm
- Less distance between levels
- Optimization easier on Abstract ASM
- Also sets up call frames
Assembly Code Generationn
- Output is code for target architecture
- Also does target specific optimization
- Uses tiling:
- cover tree with tiles (ie groups of nodes)
- output code sequence for each tile
Machine Code Generation and Linking
- Unix assembler - as
- Use gcc to link to C libraries