|
video: overview-before-grammars (2m22s)
A quick road-map and some explanation:
Reading: Scott §§2.1. (Read, then re-read closely.) Let's play the gramma game!Can you turn the string “S” into the string “hotdog hit android .”?Rules: you can take as many steps as you like, where a step is one of: 1a. “S” can be replaced with “N V .” 1b. “S” can be replaced with “N V N .” 2a-h. “N” can be replaced with “cat” or “ufo” or “apple” or “fox” or “dog” or “hotdog” or “android” or “itec” 3a-g. “V” can be replaced with with “bite” or “run” or “jump” or “hit” or “love” or “spin” or “hear” Write your answer as each string, with a “⇒” between them. (You can also include the exact rule# you used too, if you like). video: grammars-intro (26m41s)
That's a pretty crude sentence we generated. Can we tweak (improve) these rules so that: (i) a noun can be optionally preceded by “a”, “an”, or “the”, and (ii) a verb can optionally have an “ed” added to the end? (We won't enforce the right articles; “a android” will be allowed.) Please write down your revised rules. (It's tedious to keep writing “can be replaced with”; you can just write “→” instead, as an abbreviation. …In fact, I'll require you to use the abbreviation, since it's tedious to keep reading that phrase. Conciseness can be clarity!) Likewise, let's abbreviate “or” with “|”. (Btw, can we write all our rules w/o needing to say “or”? It's already a shortcut — a bit of “syntactic sugar” to make it easier for us to write our exact rules!) Specifying Languages: syntax vs semantics Syntax: a grammar G1 (mongo the caveman's grammar): S → N V . rule 1 S → N V N . rule 2 N → cat | ufo | apple | fox | dog | hotdog | android | itec rules 3a-h V → bite | run | jump | hit | love | spin | hear rules 4a-g What is a sentence we can derive from S? What's one that we can't? G1 is a bit mongo-ish; how to add ... articles, plurals, and maybe some verb conjugation? Here's G2: S → NP CV NP . rule 1 S → NP CV . rule 2 NP → ART N rule 3 ART → a | an | the rules 4a-c CV → V s | V ed rules 5a-b N → cat | ufo | apple | fox | dog | hotdog | android | mongo | itec rules 6a-i V → bite | run | jump | hit | love | spin | hear rules 7a-g Example: S ⇒ NP CV NP . by rule 1 ⇒ ART N CV NP . by rule 3 ⇒ the N CV NP . by rule 4c ⇒ the N V ed NP . by rule 5b ⇒ the N hit ed NP . by rule 7d ⇒ the hotdog hited NP . by rule 6f ⇒ the hotdog hited ART N . by rule 3 ⇒ the hotdog hited a N . by rule 4a ⇒ the hotdog hited a android. by rule 6g
Derivations also correspond to syntax-trees: video: syntax-tree-hotdog-hit-android (6m25s)
Group exercise: video: fieldDecl-example (15m21s)
What is a grammar for generating… a Java field-declaration? (Let's make a few test-cases first!)
Include: re-factoring rules to make them cleaner; ε,and the “blahMaybe ” pattern.How about an entire Java class-declaration? [not in video -- in lecture, instead]
Recursive Rules
video: grammar-recur-ADJ-phrase (7m16s)
G3, a grammar that can generate infinite # of strings: S → NP CV NP . rule 1 S → NP CV . rule 2 NP → ART AP rule 3 ART → a | an | the rules 4a-c AP → N rule 5 AP → ADJ AP <---- note the circular def'n! rule 6 CV → V s | V ed rules 7a-b N → cat | ufo | apple | fox | dog | hotdog | android | mongo | itec rules 8a-i V → bite | run | jump | hit | love | spin | hear rules 9a-g ADJ → happy | pretty | nice | small | smelly rules 10a-e S ⇒ NP CV NP . ⇒* the cat bites NP . took multiple steps here — hence the “⇒*”) ⇒ the cat bites ART AP . ⇒ the cat bites the AP . ⇒ the cat bites the ADJ AP . by rule 6 ⇒ the cat bites the nice AP . ⇒ the cat bites the nice ADJ AP . by rule 6 ⇒ the cat bites the nice small AP . ⇒ the cat bites the nice small ADJ AP . by rule 6 ⇒ the cat bites the nice small smelly AP . ⇒ the cat bites the nice small smelly N . ⇒ the cat bites the nice small smelly android . [think "Sentence", "Verb Phrase", "Conjugated Verb", etc.] [Technically, I'm leaving out spaces from the grammar. How to fix?] Is this string in the language?: a cat bites a apple. YES (we can give a derivation for it) but note that a cat bites a apples. NO (...proof?...take ITEC420!)
n numStudents xyz94 a9bC_today _yeah _video: recursive-rule-java-id (16m15s)
Digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 (rule 1) JavaLetter → _ | A | B | ... | Z | a | b | ... | z (rule 2) DigitOrJavaLetter → Digit | JavaLetter (rules 3a,3b) JavaId → JavaLetter DigitsOrJavaLetters (rule 4) DigitsOrJavaLetters → ε (rule 5a) DigitsOrJavaLetters → DigitOrJavaLetter DigitsOrJavaLetters (rule 5b)Finally, we test our answer by deriving some of our examples from earlier:
JavaId ⇒ JavaLetter DigitsOrJavaLetters (by rule 4) ⇒ n DigitsOrJavaLetters (by rule 1n) ⇒ n ε (rule 5a) = n JavaId ⇒ JavaLetter DigitsOrJavaLetters (by rule 4) ⇒ n DigitsOrJavaLetters (by rule 1n) ⇒ n DigitOrJavaLetter DigitsOrJavaLetters (by rule 5b) ⇒ n Digit DigitsOrJavaLetters (by rule 3a) ⇒ n 5 DigitsOrJavaLetters (by rule 2e) ⇒ n 5 DigitOrJavaLetter DigitsOrJavaLetters (by rule 5b) ⇒ n 5 JavaLetter DigitsOrJavaLetters (by rule 3b) ⇒ n 5 _ DigitsOrJavaLetters (by rule 1_) ⇒ n 5 _ DigitOrJavaLetter DigitsOrJavaLetters (by rule 5b) ⇒ n 5 _ JavaLetter DigitsOrJavaLetters (by rule 3b) ⇒ n 5 _ m DigitsOrJavaLetters (by rule 1m) ⇒ n 5 _ m ε (by rule 5a)
Again I'll empahsize: grammars are all about syntax, and explicitly avoid addressing any semantics. Linguist Naom Chomsky uses the example “Colorless green ideas sleep furiously.” as an example of a sentence which is syntactically perfect English, but semantically nonsensical.
Some vocabulary terms1:
Later we'll talk about code to do parsing (at least, recursive descent parsing); for the moment we'll just do it instinctively.
We'll not worry about the terms “concrete syntax tree” (showing every single step of the grammar rules) vs. “abstract syntax tree” (eliding internal branches that have only one child, like skipping the nodes NP ⇒ N ⇒ cat and just writing NP ⇒ cat). Later I'll subtly start using the latter, since those extra steps don't add anything to the data-representation of a syntax-tree.
As mentioned already, derivations and syntax trees are both essentially-equivalent ways of showing how the syntax of a sentence/program. Derivations have the snag that multiple derivations can all correspond to the same syntax-tree. For example, in the “the hotdog hit an android .” example, we had “NP V NP .” as our first step, at which point we have three different ways we can proceed that still all yield the same result. However, leftmost derivations do correspond uniquely to syntax trees (in CFGs), so that's what I'll ask for in homeworks.
What if we add further rules? G4 will be G3, plus the rules: Exercise: add adverbs and adjectives. ADJ → quick | lazy | brown | hot ADV → ADJ ly and changing G3's rules for `S` to be: S → NP CV NP . | NP CV . | NP CV NP ADV. Self-exercise: - Make a strings that is in G4 (but was not in G3). - Make a string that is not in G4. Okay, armed with the ability to recur to make "1 or more" or "0 or more", now let's try i coming up with a grammar for a Java class declaration: - example:class Puppy { int numSpots; String name = "fido"; }... To see the tree corresponding to our derivation go to http://mshang.ca/syntree/ and enter:[ClassDecl class [JavaId [Puppy]] { [FieldOrMethodDecls [FieldOrMethodDecl [FieldDecl [Type int] [JavaId [numSpots]] ;] ] [FieldOrMethodDecls [FieldOrMethodDecl [FieldDecl [Type String] [JavaId [name]] [=] "fido" ;]] [FieldOrMethodDecls [ε]] ]] } ]What is a grammar for... - declaring a Java identifier (w/o init'ing, but including access modifiers) - a Java identifier? - a Java assignment statement? (we'll leave "expr" not fully defined) See: Java syntax for 120 (which is incomplete and technically wrong in some places), and some of the BNF from the Java Language Definition. Here is another version which includes both grammar and accompanying diagrams. Some limitations of CFG -- see ITEC420 (computability th'y). Also, if we place certain limitations on the grammar productions, (e.g. "Only one non-terminal on the right-hand-side) we end up with "regular grammars", which capture exactly the same thing as regular expressions. (A separate issue, btw: By constraining our grammar we can get nicer guarantees of algorithmic behavior -- parsing etc.)
This page licensed CC-BY 4.0 Ian Barland Page last generated | Please mail any suggestions (incl. typos, broken links) to ibarlandradford.edu |