Perl
History, Domain, Design Principles
- Larry Wall 1986
- Practical Report and Extraction Language
- Scripting language: string processing, pattern match, fast file processing
- Widely used in web and systems administation
- like awk and sed and python and ruby and ...
- Favors writeability over simplicity and readability
- Very flexible:
- Example: dollar variables (eg $_, $\, $/)
- Example: $_ and $ARG
- Example: Dynamic typing
- Multiple ways to do things (deliberately violates simplicity)
What is a Scripting Language
- Features
- Usually interpreted Which simplifies edit, compile, run cycle
- No type declarations and dynamic typing
- Powerful built in types and operations
- Memory management
- Easy access to OS operations
- Usually focus on writability and programmer productivity to minimize programmer cost with minimal
consideration of execution speed
- But execution speed is not totally ignored (perl is compiled to intermediate code for
interpretation)
- Applications
- Typically used to connect and/or to modify the output of existing programs
- Frequently used for systems administration
- Operating systems include a command line shell script
- Modern applications include web processing and GUI design
- Frequently used to avoid repeated execution of a series of command line instructions
- May be tied to particular application/program (eg vim scripting langauge)
- Less frequently used for implementing complex algorithms and data structures
- Early scripting languages included job control languages (eg JCL), batch commands, awk
- Name suggests a movie or play script which contains a list of actions to be taken by players just
as a script contains a list of actions to be taken by the program
Perl References and Downloads
Like C
- Case sensitive
- = for assignment
- == for equality
- = returns a value
- [] for arrays
- Routines are functions which return values
- Function return values can be ignored (ie call a function like a procedure)
- Array/List subscripts start at 0
- No Boolean type
- Single line comments: #
- Optional comma at the end of a list
- Escape sequences for newline (\n), etc
Like Ada
- {} required for compound statements (ie loop and if bodies)
- elsif
- Ranges
- Underscores in numbers
- Default parameters (default params, not just default values)
- No multi line comments
- Value semantics for strings
Different from C, Ada
- Interpreted
- ; required between statements (not at the end of statements)
- Dynamic types
- No declarations
- Prefix dereferencer
- Two kinds of strings
- Lists built into language
- A string is NOT an array (ie list) of characters
- List and scalar contexts
- Comma operator for forming lists
- Some operators have different right and left associativity
- No main routine, function defs executable
- Built in memory management
- Last value calculated is returned if no return statement
- [] to index arrays, {} to index tables
Hello World - as a Script
#!/usr/local/gnu/bin/perl -w
print ("Hello World!\n") ;
Make file executable (only needed once): chmod 700 hello
execute script:
hello
Shell begins execution of file and uses perl to execute it
-w increases type checking: recommended
Hello World - Not as a Script
print ("Hello World!\n") ;
Execute with
perl -w hello.pl
Hello World - Interactive Execution
>perl
print "HI\n";
^D (or ^Z in windows)
HI
>
Hello World - Command Line Execution
>perl -w -e 'print "Hello"; print "world!!\n"'
hello world!!
>
Put the -w before the -e, as in the example above
Interpreted
- command perl has 2 steps
- Checks syntax and converts to intermediate code
- Interprets intermediate code (and checks types)
Other versions of Hello World
print "Hello World!\n" ;
print "Hello"," World\n" ;
print ("Hello"," World\n") ;
print 'Hello',' World' , "\n";
print ('Hello World\n'); # prints Hello World\n
# More ways are possible
Prefix Dereferencers
- A variable's prefix specifies the kind of variable
- $ - scalar (eg
$i
)
- @ - array (ie list) (eg
@names
)
- % - hash tables (eg
%capitals
)
- & - subroutines (eg
&mySubroutine;
)
- * - Typeglob (eg
*foo
is everything named foo
)
Prefix Dereferencer: Scalar
- $ - scalar
- scalars contain a single value
- scalars include numbers, strings, undef, references (to variables and objects)
- Examples:
$i = 3;
$name = "Jack";
Prefix Dereferencer: Array/List
- @ - array (ie list)
- An array is an ordered list of scalars (or variables)
- Array elements are ordered by position
- Indices start with 0
- Examples:
@a = (11, 12, 13);
print @a; # 111213
print $a[1]; # Say: the scalar $a1 is 12
# Can store any scalars
@a = (11, "xyz", 13);
Prefix Dereferencer: Hash
- % - hash table
- Index table by a string rather than by an integer!
- Examples:
%courseTeacher =
( "itec380" , "Okie", "itec371", "Htay");
%courseEnrollment =
(
itec380 => 20,
itec371 => 25,
);
print $courseTeacher{"itec380"}; # Okie
print $courseTeacher{"itec371"}; # Htay
$courseEnrollment{"itec380"} = 23;
print $courseEnrollment{"itec380"}, "\n"; # 23
=>
is a synonym for comma, and it quotes any bare identifiers to its left
Prefix Dereferencer: Same Name, Different Dereferencer
- A given name can refer to different values with different dereferencers
@a = (11, 'x', 13);
$a = $a[2];
print $a; # 13
%a = {11, 'x', 13, 14};
print $a{'11'};
print $a{11};
Prefix Dereferencer: Subroutine
- & - subroutine
- Example:
&print "hi";
- & is usually optional
- One place & is required is if a sub is used above where it is defined
Types
- Single valued (ie scalar):
- Number: integer, float, hex, octal, scientific notation, underlines
- String
- Reference (to variable or object)
- NO Boolean type
- NO Character type (characters are simply strings of length one)
- Multivalued:
Dynamic Typing
- No variable declarations
- Type of a variable depends on value stored there
- Type of a variable can change during execution
$i = 3;
$i = "abcde"
Values have types, not variables
Automatic conversions are frequent
Careful on relational operators (see String Comparison)
Automatic Conversions
- String values automatically converted to integer as needed
- Example:
$i = "5";
$i = $i+1;
print $i;
Question: Does $i hold a string or an integer?
- Hard to tell because of conversions
Question: What about: $i = "a" + 1;
- Non-numeric strings are converted to 0.
- Error reported if command line argument -w is specified.
Question: How to convert to/from ASCII value
print ord "A"; # 65
print ord "2"; # 50 (which is 32 hex)
print ord "ABC"; # 65 (only first character in string)
print chr 65; # A
print chr ord "A"; # A
Warning: Be careful with relational operators!! (See String Comparison)
Two Kinds of Strings
- LITERAL:
- - enclosed by single quotes: eg 'abc'
- - no substitutions (except for \' and \\)
- Example:
$a = 3;
print '$a "abc" \n \' stuff'
# Prints: $a "abc" \n ' stuff
print '$a "abc" \n \\ stuff'
# Prints: $a "abc" \n \ stuff
INTERPOLATED:
- - enclosed by double quotes: "abc"
- - substitutions of variables and escape characters:
- Example
$a = 3;
print "$a 'abc' \n \' stuff"
# Prints: 3 'abc'
' stuff
String Comparison: Dictionary Ordering
- Numeric relops: =, !=, <, <=, >, >=
- String relops: eq, ne, lt, gt, le, ge
- String relops use regular dictionary ordering
- The following are all true:
'car' lt 'dog'
'car' lt 'catty'
'car' lt 'carry'
'car' lt 'd'
- String comparison: cmp returns (-1, 0, 1)
String Comparison for Numeric Strings
- String comparison of numeric strings uses dictionary ordering
'10' lt '20' # true
'10' lt '11' # true
'10' lt '101' # true
Watch out for this one
'10' lt '2' # true
Converting Numbers and Numeric Strings
- If string relops are used, number values are converted strings
'22' lt '3' # true
22 lt '3' # true
22 lt 3 # true
If number relops are used, string values are converted numbers
22 < 3 # false
22 < '3' # false
'22' < '3' # false
2.00 == 2 # true
'2.00' == 2 # true
'2.00' == '2' # true
What happens here
$i = '10';
$i = $i + 1;
if ($i lt '2') {print "less1\n";} else {print "more1\n"};
if ($i < '2') {print "less2\n";}else {print "more2\n"};
Make sure you use the right relops!
Boolean and Truth
- These evaluate to false
- 0
- "0"
- ""
- undef (used for undefined values)
- Everything else evaluates to true:
- Other numbers
- Other other strings
- Any reference
- Boolean operators return "" and 1
- Example:
$i = 3;
print $i == 3;
print $i == 1;
Undef
- function undef undefines a variable
- Example:
undef $/; # undefine the record separator
$infile = <STDIN>; # read entire file as one string
print $infile; # prints entire file, with newlines
Undef is aka a unary operator
can also be used as a value: undef $i same as $i = undef
Other uses of undef
- used to undefine hash entries
- returned by some functions on failure
check if undef with defined: if (defined $i)
Uninitialized Variables
- uninitialized values default to 0 or ""
- They do not default to undef
- perl -w warns when uninitialized variables are used
Control Structures: Print loop
# Print whether some numbers are even or odd or zero
$i = -5;
$limit = 5;
while ($i < $limit)
{
if ($i == 0) {
print "The number is $i, which is even"
}
elsif ($i % 2 == 0) {
print "$i is even"
}
else {
print "$i is odd"
}
$i++;
}
-
() and {}
are required
-
$i
is interpolated!
More on Loops: next and last
- next moves to the next iteration
- last exits the loop immediately
for($i=0; $i < 5; $i++){
if ($i==3)
{next}
print $i; # prints 0, 1, 2, 4
}
for($i=0; $i < 5; $i++){
if ($i==3)
{last}
print $i; # prints 0, 1, 2
}
Unless - the Anti-if
# Equivalent to if !($i == 5)
unless ($i == 5)
{print "Not 5";} # executes when condition is false
else
{print "Is 5";}
Used when something is done in normal circumstances
More on Loops: Named Loops
- lext and last can refer to specific loops
MYNAME:
for($i=0; $i < 10; $i++){
if ($i==3)
{next MYNAME} # next iteration of loop MYNAME
if ($i==6)
{last MYNAME} # exit loop MYNAME immediately
print $i;
}
Modifying Statements: if, unless, until, while
print $i if $i > 3; # only prints if i > 3
print $i unless $i > 3; # only prints if i <= 3
print $i-- while $i > 3; # prints as long as i >= 3
print $i++ until $i > 3; # prints as long as i <= 3
More control: exit and die
- exit - exits program
- die list - prints list and exits
Read/Print Until EOF
- Read all lines from standard input, using an explicit check for end of file
while (!(eof STDIN))
{
$line = <STDIN>; # Read next input line
print $line;
}
<STDIN> returns an empty string on EOF
while ($line = <STDIN>)
{
print $line;
}
Read/Print Until EOF with Defaults: $_ and <>
- default variable for input is $_
while (<STDIN>)
{
print $_;
}
default variable for OUTPUT is $_
while (<STDIN>)
{
print;
}
Use English
- Use "use English" if you don't like symbolic variables
use English; # Use module English
while (<STDIN>)
{print $ARG;} # English name for $_
$ARG
is a synonum for $_
Named Files and Default File Handle
- Opening a file named "someFile.ext":
# This is the file to open
$myFile = "someFile.ext";
# Open it
open AFILE, $myFile or die "unable to open file $myFile ($OS_ERROR)\n";
while (<AFILE>) # Use file handle AFILE
{
print;
}
In a loop context, <> defaults to <STDIN>
while (<>)
{
print;
}
Command Line Arguments
- The array @ARGV contains a list of command line arguments
- That is @ARGV not @ARG
- Example - print command line arguments:
for ($i=0; $i <= $#ARGV; $i++)
{
print $ARGV[$i]
}
$#ARGV
contains the index of the last element of the list
Chop
- Chop removes last character of its string argument
- Chop returns the character removed
- Example:
while ($line = <STDIN>) {
chop $line;
print $line;
}
Chop operates on standard variable
while (<STDIN>) {
chop;
print;
}
What does this do?
while ($line = <STDIN>) {
print chop $line;
}
Chop modifies its argument
$x="abcd";
chop $x;
chop $x;
print $x; # ab
chop will chop a list:
$x="abcd";
$y="efgh";
chop ($x, $y);
print $x; # abc
print $y; # efg
Chomp
- chomp is similar to chop, but it
- removes value of $/ - the record separator
- unix \n, mac \r, windows \r\n
- returns # of characters removed
- it is safer
- Example:
$myFile = "foo";
open AFILE, $myFile or die "unable to open file $myFile ($OS_ERROR)\n";
while (<AFILE>) # Use file handle FILE
{chomp; print;} # Print the contents of the file without newlines
Reading the Entire File
- We can input the entire file like this:
@wholeFile = <STDIN>
print @wholefile; # Print all the lines of the file
chomp @wholefile; # Chomp each line in the list
print @wholefile; # Print entire file, without newlines
The value of @wholeFile
is an array of strings
Lists and Arrays
Hashes
Subroutines
Regular Expressions
Some Rules of Thumb
- Use -w
- Use English
- Define subs before using them. Be careful if not using parens.
- Make ALL subroutine variables my variables
- Unless you NEED global variables for a specific purpose
- Make copies of subroutine parameters
- Don't operate on list @ARG, unless necessary
- Use the correct relational operator
Some Things We Haven't Talked About (much)
- Lists of lists @array = (\@row1, \@row2)
- Packages: package MyPkg;
- Access elements with :: (eg MyPkg::mySub;)
- Arrow Operator
- Use $ref ->[0] instead of ${$ref}[0]
- Use $ref ->{"someKey"} instead of ${$ref}{"someKey"}
- Objects - Three rules
- To make a class, make a package
- Methods are just subroutines in the package
- Use
bless
to specify that a specific object is
a member of some class
- Any object can be in any class
- bless $someScalar, "MyPkg";
- bless @someList, "MyPkg";
- More on Objects - Three simple defs (from Learning Perl)
-
An object is simply a referenced thingy that happens to know what class it belongs to
- A class is simply a package that happens to rpovide methods to deal with objects
- A method is simply a subroutine that expects an object reference (or a package name, for
class methods) as its first arugment.
- Regular expressions:
while (<>){print if /somePattern/;}
Type globs