PERL Regular Expressions



      SKIP THIS: REGULAR EXPRESSIONS: RE
	    =~ for patternmatch (See regular expressions)
     Expression that describes a string or collection of strings
       - Compare w/ arithmetic expr: 5 and 3+2 both describe 5
       - Compare w/ boolean expr: false and 3 < 2 both describe false
       - Both have symbols (eg 5, false) and operators (eg = <)

     For RE, 
        - building blocks are characters describes themselves
	- concatenation of characters describes strings
	- some characters have special meaning, ie basic operators | *  ?
	  RUCS describes RUCS
	  RU CS describes RU CS  ie space is a character
	  | represents a choice  
	     R|U|C|S describes one character, either R or U or C or S
	     rucs|rucs2 describes one string, either rucs or rucs2
	  * represents 0 or more repeittions
	     r* describes empty string or r or rr or rrr or ...
	     ru*cs describes rcs, rucs, ruucs, ruuucs, ...
	  ? represents optional
	     rucs2? describes rucs and rucs2
	     rucs?2 describes rucs2 and ruc2
	       
	  Precedence: HIGH: *?, concatenation, choice low
	     ru|c?s describes rus or rcs or rs
	     
	  () can be used to group
	     ru(cs2)? describes ru and rucs2
	     ((login|logout) (rucs|rucs2|ruacad) now!)* 

     PATTERNS: Put an RE inside slashes to define a pattern: /RE/
        - sentences that contain a string that is described by a RE is said 
	       to MATCH a pattern

        - specify string to match a pattern  using the operator =~

	     if ($name =~ /rucs2?/) # true if $name contains string rucs or rucs2 

             By default, the operator =~ operates on $_ :

	       if (/rucs2?/)  # true if $_ contains string rucs or rucs2 

               while ($line = ){print if $line =~ /hello/}

               while (<>){print if /hello/}  # Same thing

               while (<>){print unless /hello/}

        - /RE/ is a shortcut for m/RE/
        - Other delimiters can be used: m{RE}, m'RE'
        - m'RE' does not string substitution
        - $& remembers what was matched:
                $line = "login to rucs today!"
                $line =~ m/rucs2?.*d/     # . matches any character
                print $&                  # prints rucs tod

	     
     SHORTCUTS: 
        use [rucs] for r|u|c|s
        use [a-z] for a|b|c|d|...
	\d for any digit
	\s for any whitespace character: space, newline, tab, formfeed
	\w for any word character: letter, digit, underline
	\D, \S, \W for not a digit, not whitespace, not a word character

	use x+ for xx* ie one or more x's, what is \w+

	use ^ and $ for beginning and end of line: /^Hello/

	REPETITION: 
	   use x{3} for xxx
	   use x{3,5} for xxx or xxxx or xxxxx
	   use x{3,} for xxx or xxxx or xxxxx or xxxxxx ...
	   what are x{0,} x{1,} x{0,1}

        METACHARACTERS 
	    |, *, [, ], *, +, are metacharacters

            They describe REs, not themselves

	    put \ in front of metacharacter to make it describe itself

	    \ in front of a regular character is ignored
	        \\ means \ - whether or not \ is a metacharacter

	    within [], only ^, -, and ] need \

       OTHER DELIMITERS FOR PATTERNS: 
         Can use m to specify different delimiter than /
	    In  m!www.runet.edu/~nokie! the delimiter is !
	    In m{www.runet.edu/~nokie}  the delimiter is {}
               () and [] can also be used as delimiter

       SUBSTITUTION
             $a = "abcdef"
             $a =~ s/abc/def/ 
             print $a   # defdef
             while (<>) { s/\n//; print}  # prints input without newline