ITEC 380 - Program 3

Due date: 11:59:59 p.m. Monday 11/01/10

File name: scan

(Notice that there is no extension on the file)

Submit command : submit itec380-01 scan


Your assignment is to write an scanner for the language defined below. Your scanner should output the tokens found in the input, along with any lexemes that are not predefined (ie lexemes are needed for INT_LIT, IDENT ).

Use Python to implement your scanner. When submitted, your scanner should run as a script. Your scanner is to read from the file which specified as the command line argument when your scanner is run, and it should write to standard output. Your scanner must have a main loop that repeatedly calls get and put until end of file. Get will return the next token and associated lexeme and put will output them. As usual, your program should be submitted from rucs2 or one of its clients. Make sure you follow good style. Call your program scan.

Sample Run

If this program were in the file sample1:
i:=23+jjj * kkk;
label loopTop ;

   if i>10 then   
      goto   loopEnd;
then a sample run would look like this:
>scan sample1
IDENT i
ASSIGN
INT_LIT 23
PLUS_OP
IDENT jjj
MUL_OP
IDENT kkk
SEMI
LABEL
IDENT loopTop
SEMI
IF
IDENT i
GT
INT_LIT 10
THEN
GOTO
IDENT loopEnd
SEMI
Note that if the lexeme is to be printed, then there is to be a single space between the token name and the lexeme. The line should have no trailing spaces. Your program is to have no other output and must follow this format exactly.

Tokens

Tokens in the language include the following:
  1. INT_LIT: 1 or more digits. Digits are any of 0 1 2 3 4 5 6 7 8 9.
  2. IDENT: a letter followed by 0 or more letters or digits. Letters are the usual set: a..z and A..Z.
  3. IF: "if"
  4. THEN: "then"
  5. DUMP_TABLE: "dumpTable"
  6. PRINT: "print"
  7. LABEL: "label"
  8. GOTO: "goto"
  9. ASSIGN: ":="
  10. LEFT_PAREN: "("
  11. RIGHT_PAREN: ")"
  12. PLUS_OP: "+"
  13. MINUS_OP: "-"
  14. MUL_OP: "*"
  15. DIV_OP: "/"
  16. LT: "<"
  17. LE: "<="
  18. GT: ">"
  19. GE: ">="
  20. EQ: "="
  21. NE: "/="
  22. SEMI: ";"
Other points:
  1. Note that you are NOT to use Python pattern matching to do this assignment.
  2. Separators: Each pair of adjacent identifiers and/or integer literals must be separated by white space
  3. White space: You can assume that white space includes space and tab characters, and newline (and when using Windows, carriage return characters)
  4. You can assume that your input program is valid.
  5. If the input file does not exist, your program should output an error message and halt.

Last modified on