PERL Regular Expressions
SKIP THIS: REGULAR EXPRESSIONS: RE
=~ for patternmatch (See regular expressions)
Expression that describes a string or collection of strings
- Compare w/ arithmetic expr: 5 and 3+2 both describe 5
- Compare w/ boolean expr: false and 3 < 2 both describe false
- Both have symbols (eg 5, false) and operators (eg = <)
For RE,
- building blocks are characters describes themselves
- concatenation of characters describes strings
- some characters have special meaning, ie basic operators | * ?
RUCS describes RUCS
RU CS describes RU CS ie space is a character
| represents a choice
R|U|C|S describes one character, either R or U or C or S
rucs|rucs2 describes one string, either rucs or rucs2
* represents 0 or more repeittions
r* describes empty string or r or rr or rrr or ...
ru*cs describes rcs, rucs, ruucs, ruuucs, ...
? represents optional
rucs2? describes rucs and rucs2
rucs?2 describes rucs2 and ruc2
Precedence: HIGH: *?, concatenation, choice low
ru|c?s describes rus or rcs or rs
() can be used to group
ru(cs2)? describes ru and rucs2
((login|logout) (rucs|rucs2|ruacad) now!)*
PATTERNS: Put an RE inside slashes to define a pattern: /RE/
- sentences that contain a string that is described by a RE is said
to MATCH a pattern
- specify string to match a pattern using the operator =~
if ($name =~ /rucs2?/) # true if $name contains string rucs or rucs2
By default, the operator =~ operates on $_ :
if (/rucs2?/) # true if $_ contains string rucs or rucs2
while ($line = ){print if $line =~ /hello/}
while (<>){print if /hello/} # Same thing
while (<>){print unless /hello/}
- /RE/ is a shortcut for m/RE/
- Other delimiters can be used: m{RE}, m'RE'
- m'RE' does not string substitution
- $& remembers what was matched:
$line = "login to rucs today!"
$line =~ m/rucs2?.*d/ # . matches any character
print $& # prints rucs tod
SHORTCUTS:
use [rucs] for r|u|c|s
use [a-z] for a|b|c|d|...
\d for any digit
\s for any whitespace character: space, newline, tab, formfeed
\w for any word character: letter, digit, underline
\D, \S, \W for not a digit, not whitespace, not a word character
use x+ for xx* ie one or more x's, what is \w+
use ^ and $ for beginning and end of line: /^Hello/
REPETITION:
use x{3} for xxx
use x{3,5} for xxx or xxxx or xxxxx
use x{3,} for xxx or xxxx or xxxxx or xxxxxx ...
what are x{0,} x{1,} x{0,1}
METACHARACTERS
|, *, [, ], *, +, are metacharacters
They describe REs, not themselves
put \ in front of metacharacter to make it describe itself
\ in front of a regular character is ignored
\\ means \ - whether or not \ is a metacharacter
within [], only ^, -, and ] need \
OTHER DELIMITERS FOR PATTERNS:
Can use m to specify different delimiter than /
In m!www.runet.edu/~nokie! the delimiter is !
In m{www.runet.edu/~nokie} the delimiter is {}
() and [] can also be used as delimiter
SUBSTITUTION
$a = "abcdef"
$a =~ s/abc/def/
print $a # defdef
while (<>) { s/\n//; print} # prints input without newline