RU beehive logo ITEC dept promo banner
ITEC 380
2018fall
ibarland

grammars, cont.

Review Grammars

Note that for one given tree, there might be many derivations (just having with productions in different orders). The derivations are easier to type than a tree (auto-grading!); we'll compromise by always asking for the “leftmost derivations”: always expand the leftmost nonterminal

Recall: We write “→” for entire grammar rules, and “⇒” for individual derivations.

Examples:

New terms:

Some things we may have noticed, in the reading:

It is helpful to actually divide reading-a-program's-input into three phases:

Discuss the above w.r.t. XML.

In Programming Languages and Interpretation §1.4, Shriram Krishnamurthi argues that what I here call “parsing” can benefit from being decomposed into two smaller phases: reading the tokenized input into a simple hierarchy (perhaps just based on matched parentheses), and then full parsing/validating of the grammar; this second part becomes easier since the input is already in structure-rich form that is easy for the parser/validator to work with. Indeed, using LISP/scheme/racket's read to create a basic tree (S-Expression) is both highly-reusable in many contexts, and significantly helpful in those contexts — the mark of a good decomposition of tasks.

Note that the grammar for Java doesn't quite match "does the program compile" — the CFG doesn't catch using-an-undefined-variable, for example. (CFG grammars cannot capture this!) This fact is usually ignored, and most everybody will say "yes there is a CFG for Java," even though the set of legal (compilable) programs is determined

  S → aS
  S → b

A Language: a set of strings. (e.g. the set of all valid Java programs — an infinite set). What language does the above grammar define?

The language defined by a grammar: the set of all strings derivable. To show that a string is in the grammar, show me a derivation (possibly a left-most derivation); equivalently show me a parse-tree.

To specify a grammar, you officially need four things:

Most commonly seen: context-free grammars, “CFG”: every rule's left-hand-side is a single non-terminal.
BNF: same as CFG, but use "::=" instead of →, and "|".

regular grammar: each right-hand-side has at most one non-terminal. (equivalent to regular expresssions, in expressibility: A langauge is generated by some regular grammar if, and only if, there is a regular expression that matches exactly the strings in a language. We'll say "the language is regular", w/o committing ourselves to any particular grammar or reg.exp. (Note that these are regular expressions without back-references!)

A grammar for

 
<integer> →  <digit><integer> | <digit>
<digit> →  0 | 1 | 2 | 3 ... | 9

<ident> →  <letter>|<letter><letterOrDigitSeq>

<letterOrDigitSeq> →  <letterOrDigit>|<letterOrDigit><letterOrDigitSeq>

<letterOrDigit> →  <letter> | <digit>

<letter> →  a | b | c ...
For comparison: java syntax: numbers

EBNF: Allow shorthands in BNF: For a non-terminal A,

Note that while parentheses are used for grouping in regular expressions, but usally not used in EBNF. (How do we get around that?)

Example: Write a simple grammar that matches arithmetic-expressions:

3 + 7
143
3 + 7 + 99
29 + 8 * 3
Imagine that you already have a nonterminal “Num” that matches numbers. Now, give me a parse tree of 29 + 8 * 3, using your grammar. … did anybody come up with a second, different tree? If so, which one is right?

We say that a grammar is ambiguous iff one string has two different (legal) parse trees. We don't like ambiguity. There are several solutions:

Task: Can you make a grammar Ada's int-literals, which allow underscores (similar to Java 8), except that: You can't start or end with an underscore, and can't have two in a row.

Can you make a grammar for XML?


1 This spec seems not to mention signs. That's because Java considers + and - to be unary operators, and not part of the int-literal itself.      
2 I combined those words at random just now. If they mean something, let me know!      
3 For a truly grand view, you can think about an “überparse” function that accepts a string and a grammar. Indeed, there are (curried versions of) this: parser-generators which take in a grammar and return a parse function, e.g. Bison. See also: compiler-compilers like Yacc.      

logo for creative commons by-attribution license
This page licensed CC-BY 4.0 Ian Barland
Page last generated
Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu
Rendered by Racket.