RU beehive logo ITEC dept promo banner
ITEC 380
2016fall
ibarland

homelecturesrecipeexamshwsD2Lbreeze (snow day; distance)

grammar-cont
grammars, cont.

Review Grammars

Note that for one given tree, there might be many derivations (just having with productions in different orders). The derivations are easier to type than a tree (auto-grading!); we'll compromise by always asking for the “leftmost derivations”: always expand the leftmost nonterminal

Recall: We write “→” for entire grammar rules, and “⇒” for individual derivations.

Examples:

Some things that arose:

It is helpful to actually divide reading-a-program's-input into three phases:

Discuss the above w.r.t. XML.

In Programming Languages and Interpretation §1.4, Shriram Krishnamurthi argues that what I here call “parsing” can benefit from being decomposed into two smaller phases: reading the tokenized input into a simple hierarchy (perhaps just based on matched parentheses), and then full parsing/validating of the grammar; this second part becomes easier since the input is already in structure-rich form that is easy for the parser/validator to work with. Indeed, using LISP/scheme/racket's read to create a basic tree (S-Expression) is both highly-reusable in many contexts, and significantly helpful in those contexts — the mark of a good decomposition of tasks.

Note that the grammar for Java doesn't quite match "does the program compile" — the CFG doesn't catch using-an-undefined-variable, for example. (CFG grammars cannot capture this!) This fact is usually ignored, and most everybody will say "yes there is a CFG for Java," even though the set of legal (compilable) programs is determined

New terms:

  S → aS
  S → b

A Language: a set of strings. (e.g. the set of all valid Java programs — an infinite set). What language does the above grammar define?

The language defined by a grammar: the set of all strings derivable. To show that a string is in the grammar, show me a derivation (possibly a left-most derivation); equivalently show me a parse-tree.

To specify a grammar, you officially need four things:

Most commonly seen: context-free grammars, “CFG”: every rule's left-hand-side is a single non-terminal.
BNF: same as CFG, but use "::=" instead of →, and "|".

regular grammar: each right-hand-side has at most one non-terminal. (equivalent to regular expresssions, in expressibility: A langauge is generated by some regular grammar if, and only if, there is a regular expression that matches exactly the strings in a language. We'll say "the language is regular", w/o committing ourselves to any particular grammar or reg.exp. (Note that these are regular expressions without back-references!)

A grammar for

 
<integer> →  <digit><integer> | <digit>
<digit> →  0 | 1 | 2 | 3 ... | 9

<ident> →  <letter>|<letter><letterOrDigitSeq>

<letterOrDigitSeq> →  <letterOrDigit>|<letterOrDigit><letterOrDigitSeq>

<letterOrDigit> →  <letter> | <digit>

<letter> →  a | b | c ...
For comparison: java syntax: numbers

EBNF: Allow shorthands in BNF: For a non-terminal A,

Note that while parentheses are used for grouping in regular expressions, but usally not used in EBNF. (How do we get around that?)

Example: Write a simple grammar that matches arithmetic-expressions:

3 + 7
143
3 + 7 + 99
29 + 8 * 3
Imagine that you already have a nonterminal “Num” that matches numbers. Now, give me a parse tree of 29 + 8 * 3, using your grammar. … did anybody come up with a second, different tree? If so, which one is right?

We say that a grammar is ambiguous iff one string has two different (legal) parse trees. We don't like ambiguity. There are several solutions:

Task: Can you make a grammar Ada's int-literals, which allow underscores (similar to Java 8), except that: You can't start or end with an underscore, and can't have two in a row.

Can you make a grammar for XML?


1This spec seems not to mention signs. That's because Java considers + and - to be unary operators, and not part of the int-literal itself.      
2 For a truly grand view, you can think about an “überparse” function that accepts a string and a grammar. Indeed, there are (curried versions of) this: parser-generators which take in a grammar and return a parse function, e.g. Bison. See also: compiler-compilers like Yacc.      

homelecturesrecipeexamshwsD2Lbreeze (snow day; distance)


©2016, Ian Barland, Radford University
Last modified 2016.Dec.05 (Mon)
Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu
Rendered by Racket.