RU beehive logo ITEC dept promo banner
ITEC 380
2017fall
ibarland

homelecturesrecipeexamshwsD2Lbreeze (snow day; distance)

grammars-intro
intro grammars/BNF

video: overview-before-grammars (2m22s)

A quick road-map and some explanation:

Reading: 
         Scott §§2.1.  (Read, then re-read closely.)
         
         
Let's play the gramma  game! 
Can you turn the string “S” into the string “hotdog hit android .”?
Rules: you can take as many steps as you like, where a step is one of: 1a. “S” can be replaced with “N V .” 1b. “S” can be replaced with “N V N .” 2a-h. “N” can be replaced with “cat” or “ufo” or “apple” or “fox” or “dog” or “hotdog” or “android” or “itec” 3a-g. “V” can be replaced with with “bite” or “run” or “jump” or “hit” or “love” or “spin” or “hear” Write your answer as each string, with a “⇒” between them. (You can also include the exact rule# you used too, if you like). video: grammars-intro (26m41s)

That's a pretty crude sentence we generated. Can we tweak (improve) these rules so that: (i) a noun can be optionally preceded by “a”, “an”, or “the”, and (ii) a verb can optionally have an “ed” added to the end? (We won't enforce the right articles; “a android” will be allowed.) Please write down your revised rules. (It's tedious to keep writing “can be replaced with”; you can just write “→” instead, as an abbreviation. …In fact, I'll require you to use the abbreviation, since it's tedious to keep reading that phrase. Conciseness can be clarity!) Likewise, let's abbreviate “or” with “|”. (Btw, can we write all our rules w/o needing to say “or”? It's already a shortcut — a bit of “syntactic sugar” to make it easier for us to write our exact rules!) Specifying Languages: syntax vs semantics Syntax: a grammar G1 (mongo the caveman's grammar): S → N V . rule 1 S → N V N . rule 2 N → cat | ufo | apple | fox | dog | hotdog | android | itec rules 3a-h V → bite | run | jump | hit | love | spin | hear rules 4a-g What is a sentence we can derive from S? What's one that we can't? G1 is a bit mongo-ish; how to add ... articles, plurals, and maybe some verb conjugation? Here's G2: S → NP CV NP . rule 1 S → NP CV . rule 2 NP → ART N rule 3 ART → a | an | the rules 4a-c CV → V s | V ed rules 5a-b N → cat | ufo | apple | fox | dog | hotdog | android | mongo | itec rules 6a-i V → bite | run | jump | hit | love | spin | hear rules 7a-g Example: S ⇒ NP CV NP . by rule 1 ⇒ ART N CV NP . by rule 3 ⇒ the N CV NP . by rule 4c ⇒ the N V ed NP . by rule 5b ⇒ the N hit ed NP . by rule 7d ⇒ the hotdog hited NP . by rule 6f ⇒ the hotdog hited ART N . by rule 3 ⇒ the hotdog hited a N . by rule 4a ⇒ the hotdog hited a android. by rule 6g
Derivations also correspond to syntax-trees: video: syntax-tree-hotdog-hit-android (6m25s)


Group exercise: video: fieldDecl-example (15m21s)

What is a grammar for generating… a Java field-declaration? (Let's make a few test-cases first!)
How about an entire Java class-declaration? [not in video -- in lecture, instead]

Recursive Rules

video: grammar-recur-ADJ-phrase (7m16s)

G3, a grammar that can generate infinite # of strings: S → NP CV NP . rule 1 S → NP CV . rule 2 NP → ART AP rule 3 ART → a | an | the rules 4a-c AP → N rule 5 AP → ADJ AP <---- note the circular def'n! rule 6 CV → V s | V ed rules 7a-b N → cat | ufo | apple | fox | dog | hotdog | android | mongo | itec rules 8a-i V → bite | run | jump | hit | love | spin | hear rules 9a-g ADJ → happy | pretty | nice | small | smelly rules 10a-e S ⇒ NP CV NP . ⇒* the cat bites NP . took multiple steps here — hence the “⇒*”) ⇒ the cat bites ART AP . ⇒ the cat bites the AP . ⇒ the cat bites the ADJ AP . by rule 6 ⇒ the cat bites the nice AP . ⇒ the cat bites the nice ADJ AP . by rule 6 ⇒ the cat bites the nice small AP . ⇒ the cat bites the nice small ADJ AP . by rule 6 ⇒ the cat bites the nice small smelly AP . ⇒ the cat bites the nice small smelly N . ⇒ the cat bites the nice small smelly android . [think "Sentence", "Verb Phrase", "Conjugated Verb", etc.] [Technically, I'm leaving out spaces from the grammar. How to fix?] Is this string in the language?: a cat bites a apple. YES (we can give a derivation for it) but note that a cat bites a apples. NO (...proof?...take ITEC420!)

making a grammar for Java identifiers

Here're some examples of what we want to be able to generate, using this grammar:
  n
  numStudents
  xyz94
  a9bC_today
  _yeah
  _
video: recursive-rule-java-id (16m15s)

And the rules we come up with:
Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9            (rule 1)
JavaLetter → _ | A | B | ... | Z | a | b | ... | z       (rule 2)
DigitOrJavaLetter → Digit | JavaLetter                   (rules 3a,3b)

JavaId → JavaLetter DigitsOrJavaLetters            (rule 4)

DigitsOrJavaLetters → ε                            (rule 5a)
DigitsOrJavaLetters → DigitOrJavaLetter  DigitsOrJavaLetters   (rule 5b)
Finally, we test our answer by deriving some of our examples from earlier:
JavaId ⇒ JavaLetter DigitsOrJavaLetters     (by rule 4)
       ⇒ n DigitsOrJavaLetters              (by rule 1n)
       ⇒ n ε                                (rule 5a)
       = n


JavaId ⇒ JavaLetter DigitsOrJavaLetters                (by rule 4)
       ⇒ n DigitsOrJavaLetters                         (by rule 1n)
       ⇒ n DigitOrJavaLetter DigitsOrJavaLetters       (by rule 5b)
       ⇒ n Digit DigitsOrJavaLetters                   (by rule 3a)
       ⇒ n 5 DigitsOrJavaLetters                       (by rule 2e)
       ⇒ n 5 DigitOrJavaLetter DigitsOrJavaLetters     (by rule 5b)
       ⇒ n 5 JavaLetter DigitsOrJavaLetters            (by rule 3b)
       ⇒ n 5 _ DigitsOrJavaLetters                     (by rule 1_)
       ⇒ n 5 _ DigitOrJavaLetter DigitsOrJavaLetters   (by rule 5b)
       ⇒ n 5 _ JavaLetter DigitsOrJavaLetters          (by rule 3b)
       ⇒ n 5 _ m DigitsOrJavaLetters                   (by rule 1m)
       ⇒ n 5 _ m ε                                     (by rule 5a)

Reflecting on grammars

Again I'll empahsize: grammars are all about syntax, and explicitly avoid addressing any semantics. Linguist Naom Chomsky uses the example “Colorless green ideas sleep furiously.” as an example of a sentence which is syntactically perfect English, but semantically nonsensical.

Some vocabulary

Some vocabulary terms1:

nonterminal
A “replaceable” symbol in the grammar; in context-free grammars this means a symbol on the left-hand-side of “→”.
terminal
A “non-replaceable” symbol in the grammar; in context-free grammars this means a symbol that is not on the left-hand-side of any “→”.
denotes a grammar rule: in the string you're working with, you're allowed to replace what's on the the left-hand-side of a rule with the characters on the right-hand-side (splicing it into your string).
|
Or. It is just a short-hand: “Z → α | β” means there are actually two separate rules, “Z → α” and “Z → β”.
“derives in one step” — if γ and δ are strings, then “γδ” means that you can take γ, replace one of its non-terminals with the right-hand-side of one of the rules, and the result is the string δ.
⇒*
“derives in many steps” — if γ1 and γn are strings, then “γ1 ⇒* γn” means that there is a sequence2 of derivations γ1γ2γ3 ⇒ … ⇒ γn.
derivation
An entire sequence of “⇒” steps, showing that one string derives another.
leftmost derivation
A derivation where each step happens to replace the leftmost nonterminal.
parsing
reading in a stream of characters/tokens, and building the corresponding parse tree (or, determine that the input string is not derivable from the grammar).

Later we'll talk about code to do parsing (at least, recursive descent parsing); for the moment we'll just do it instinctively.

Our textbook also uses the terms “token” (a nonterminal), and “lexeme” (a terminal string). Example: “DoubleLiteral” and “BinaryOp” are a tokens, while “+173.4” and “&&” are lexemes. (I've always heard 'token' used for what the book calls lexemes, myself.)

We'll not worry about the terms “concrete syntax tree” (showing every single step of the grammar rules) vs. “abstract syntax tree” (eliding internal branches that have only one child, like skipping the nodes NP ⇒ N ⇒ cat and just writing NP ⇒ cat). Later I'll subtly start using the latter, since those extra steps don't add anything to the data-representation of a syntax-tree.

As mentioned already, derivations and syntax trees are both essentially-equivalent ways of showing how the syntax of a sentence/program. Derivations have the snag that multiple derivations can all correspond to the same syntax-tree. For example, in the “the hotdog hit an android .” example, we had “NP V NP .” as our first step, at which point we have three different ways we can proceed that still all yield the same result. However, leftmost derivations do correspond uniquely to syntax trees (in CFGs), so that's what I'll ask for in homeworks.

What if we add further rules?
  G4 will be G3, plus the rules:
Exercise: add adverbs and adjectives.
  ADJ → quick | lazy | brown | hot
  ADV → ADJ ly

and changing G3's rules for `S` to be:
  S → NP CV NP . | NP CV . | NP CV NP ADV.

  Self-exercise: 
   - Make a strings that is in G4 (but was not in G3).
   - Make a string that is not in G4.

   
   
Okay, armed with the ability to recur to make "1 or more" or "0 or more", now let's try i
coming up with a grammar for a Java class declaration:

  - example:
  
    class Puppy {
        int numSpots;
        String name = "fido";
        }
  
... To see the tree corresponding to our derivation go to http://mshang.ca/syntree/ and enter:
      [ClassDecl class 
                 [JavaId [Puppy]] 
                 { 
                 [FieldOrMethodDecls [FieldOrMethodDecl [FieldDecl [Type int] [JavaId [numSpots]] ;] ]
                                     [FieldOrMethodDecls [FieldOrMethodDecl [FieldDecl [Type String] [JavaId [name]] [=] "fido" ;]] 
                                     [FieldOrMethodDecls [ε]] ]]
                 }
     ]
What is a grammar for... - declaring a Java identifier (w/o init'ing, but including access modifiers) - a Java identifier? - a Java assignment statement? (we'll leave "expr" not fully defined) See: Java syntax for 120 (which is incomplete and technically wrong in some places), and some of the BNF from the Java Language Definition. Here is another version which includes both grammar and accompanying diagrams. Some limitations of CFG -- see ITEC420 (computability th'y). Also, if we place certain limitations on the grammar productions, (e.g. "Only one non-terminal on the right-hand-side) we end up with "regular grammars", which capture exactly the same thing as regular expressions. (A separate issue, btw: By constraining our grammar we can get nicer guarantees of algorithmic behavior -- parsing etc.)

1Again, I'm only discussing Context-Free Grammars (CFGs); some of these vocabulary terms are more general, and the definition here isn't always the more-general-definition. And these definitions are stressing the intuitive, rather than the formal definition. See your textbook for more formal definitions.      
2 You can also have 0 steps in a sequence-of-derivations, so for any string γ, it's trivially true that γ ⇒* γ.      

homelecturesrecipeexamshwsD2Lbreeze (snow day; distance)


©2017, Ian Barland, Radford University
Last modified 2017.Oct.06 (Fri)
Please mail any suggestions
(incl. typos, broken links)
to ibarlandradford.edu
Rendered by Racket.