|
home—lectures—recipe—exams—hws—D2L—breeze (snow day; distance)
Recall: We write “→” for entire grammar rules, and “⇒” for individual derivations.
Examples:
Some things that arose:
It is helpful to actually divide reading-a-program's-input into three phases:
tokenizing (or "lexing"): a quick pass to chunk/group characters into higher leve. tokens, like identifiers, int-literals, string-literals, etc.
While it is conceivable to omit this step,
having grammar non-terminals of (say) “Digit”
instead of “IntLiteral”,
it would mean that the resulting parse tree
would have
three entire nodes for the literal “
In Programming Languages and Interpretation §1.4, Shriram Krishnamurthi argues that what I here call “parsing” can benefit from being decomposed into two smaller phases: reading the tokenized input into a simple hierarchy (perhaps just based on matched parentheses), and then full parsing/validating of the grammar; this second part becomes easier since the input is already in structure-rich form that is easy for the parser/validator to work with. Indeed, using LISP/scheme/racket's read to create a basic tree (S-Expression) is both highly-reusable in many contexts, and significantly helpful in those contexts — the mark of a good decomposition of tasks.
Note that the grammar for Java doesn't quite match "does the program compile" — the CFG doesn't catch using-an-undefined-variable, for example. (CFG grammars cannot capture this!) This fact is usually ignored, and most everybody will say "yes there is a CFG for Java," even though the set of legal (compilable) programs is determined
New terms:
S → aS S → b
A Language: a set of strings. (e.g. the set of all valid Java programs — an infinite set). What language does the above grammar define?
The language defined by a grammar: the set of all strings derivable. To show that a string is in the grammar, show me a derivation (possibly a left-most derivation); equivalently show me a parse-tree.
To specify a grammar, you officially need four things:
Most commonly seen:
context-free grammars, “CFG”:
every rule's left-hand-side is a single non-terminal.
BNF: same as CFG, but use "::=" instead of →, and "|".
regular grammar: each right-hand-side has at most one non-terminal. (equivalent to regular expresssions, in expressibility: A langauge is generated by some regular grammar if, and only if, there is a regular expression that matches exactly the strings in a language. We'll say "the language is regular", w/o committing ourselves to any particular grammar or reg.exp. (Note that these are regular expressions without back-references!)
A grammar for
<integer> → <digit><integer> | <digit> <digit> → 0 | 1 | 2 | 3 ... | 9 <ident> → <letter>|<letter><letterOrDigitSeq> <letterOrDigitSeq> → <letterOrDigit>|<letterOrDigit><letterOrDigitSeq> <letterOrDigit> → <letter> | <digit> <letter> → a | b | c ...For comparison: java syntax: numbers
EBNF: Allow shorthands in BNF: For a non-terminal A,
Example: Write a simple grammar that matches arithmetic-expressions:
3 + 7 143 3 + 7 + 99 29 + 8 * 3 |
We say that a grammar is ambiguous iff one string has two different (legal) parse trees. We don't like ambiguity. There are several solutions:
Book example: add some additional nonterminals to the grammar so that we can only generate one of those two parse trees. That is, make the grammar more complicated so that the “correct precedence” always auto-happens.
This extra nonterminal is usually called a “Term” and/or a “Factor” in this particular example, and it corresponds to the product of 1-or-more expressions. The grammar is carefully constructed so that + only occurs between entire Terms, so Terms are already parse-trees before + comes along.
See: www.csse.monash.edu.au/~lloyd/tildeProgLang/Grammar/Arith-Exp/
Task: Can you make a grammar Ada's int-literals, which allow underscores (similar to Java 8), except that: You can't start or end with an underscore, and can't have two in a row.
Can you make a grammar for XML?
home—lectures—recipe—exams—hws—D2L—breeze (snow day; distance)
©2016, Ian Barland, Radford University Last modified 2016.Dec.05 (Mon) |
Please mail any suggestions (incl. typos, broken links) to ibarlandradford.edu |