http://cs.uccs.edu/~cs301/testreg.html
| . | dot | Match any one characters |
| [...] | character class | Match any character listed |
| [^...] | negated character class | Match any character not listed |
| \t | tab | Match HT or TAB character |
| \n | new line | Match LF or NL character |
| \r | return | Match CR character |
| \f | line feed | Match FF (Form Feed) character |
| \a | alarm | Match BELL character |
| \e | escape | Match ESC character |
| \0nnn | Character in octal, e.g. \033 | Match equivalent character |
| \xnn | Character in hexa decimal, e.g. \x1B | Match equivalent character |
| \c[ | Control character, e.g., \c[A? | Match control character? |
| \l | lowercase next character | |
| \u | uppercase next character | |
| \L | lowercase characters till \E | |
| \U | uppercase characters till \E | |
| \E | end case modification | |
| \Q | quote (disable) pattern metacharacters | till \E |
Example 1: character class
if
($string =~ /[01][0-9]/) {
print "$string contains digits 00 to 19\n";
} else {
print "$string contains digits 00 to 19\n";
}
Example 2: negated character
class
if
($string =~ /[^A-z]/) { print "$string contains nonletter characters\n"}
else
{ print "$string does not contains non-letter characters.\n"}
| \w | Match a "word" character (alphanumeric plus "_") |
| \W | Match a non-word character |
| \s | Match a whitespace character |
| \S | Match a non-whitespace character |
| \d | Match a digit character |
| \D | Match a non-digit character |
| * | Match 0 or more times |
| + | Match 1 or more times |
| ? | Match 0 or 1 times |
| {n} | Match exactly n times |
| {n,} | Match at least n times |
| {n, m} | Match at least n but no more than m times |
| ^ | Caret, Match start of the line (can match multiple times when /m (multiline matching) |
| $ | Match end of the line (can match multiple times when /m (multiline matching) |
| \b | Match a word boundary |
| \B | Match a non-(word boundary) |
| \A | Match only at beginning of string |
| \Z | Match only at end of string, or before newline at the end |
| \z | Match only at end of string |
| \G | Match only where previous m//g left off (works only with /g) |
| | | Alternation, Match either expression it separates |
| (...) | Limit scope of alternation, Provide grouping for the quantifiers, Capture matched substrings for backreferences. |
| \1, \2, ... | Backreference, Match text previously matched within first, second, ..., set of parentheses. |
| (?:...) | Grouping only, non-capturing parentheses |
| (?=...) | Positive lookahead, non-capturing parentheses |
| (?!...) | Negative lookahead, non-capturing parentheses |
Example 3: Check if a string contains only digits. http://blanca.uccs.edu/~cs301/cgi-bin/checkno.cgi
if
($string =~ /^\d+$/) {
print "$string contains only digits.<BR>\n";
} else {
print "$string does not contain only digits.<BR>\n";
}
Example 4: Check if a string contains IP address. checkipaddress.cgi
foreach
$string (@testdata) {
if
($string =~ /(\d+)(\.\d+){3}/) {
print "$string", ' matches /(\d+)(\.\d+){3}/', "\n";
} else {
print "$string", ' does not matche /(\d+)(\.\d+){3}/', "\n";
}
# if ($string !~ /([^.]+)\.([^.]+)\.([^.]+)\.([^.]+)/) {
# a.b.c.d will be considered as legal ip address
# without ^ and $ below -123.235.1.248 is a legal ip address
if ($string !~ /^([\d]+)\.([\d]+)\.([\d]+)\.([\d]+)$/) {
print "$string not an IP address\n";
next;
}
$notIP = 0;
foreach $s (($1, $2, $3, $4)) {
print "s=$s;";
if (0 > $s || $s > 255) {
$notIP = 1;
last;
}
}
if ($notIP) { print "\n$string is not an IP address\n"; }
else { print "\n$string is an IP address\n"; }
}
Example 5: Extract URL fields
$url
= param('url');
print
"url=$url<BR>\n";
$url
=~ m|(\w+)://([^/:]+)(:\d+)?/(.*)|; # use m|...| so that we do not
need to use a lot of "\/"
$protocol
= $1;
$domainName
= $2;
$uri
= "/" . $4;
print
"\$3=$3<BR>\n";
if
($3 =~ /:(\d+)/) { $portNo = $1} else { $portNo = 80}
print
"protocol=$protocol<BR>domainName=$domainName<BR>
portNo=$portNo<BR>
uri=$uri<BR>\n";
The above code were used
in checkurl.pl to parse the field in the following url:
Example 7: /re(?:turn-to:
|ply-to: )/ is faster than /(?:return-to|reply-to):
/
/Bill(?=
The Cat| Clinton)/ Matches Bill but only if
followed by ' The Cat' or ' Clinton'.
/OH
\d+(?!\.)/ matches 'OH 44272'
not capturing mean it will not put matching string to $1.
/OH
\d+(?=[^.]) matches 'OH 44272'
not including the last digit 2.
| i | ignore case |
| g | global, in substitute case s/.../.../g, repeat substitution multiple times. |
| m | multiline matching mode |
Example 8: $var =~ s/\bJeff\b/Jeff/igm;
Try remove any (combination)
of the igm modes in the following program and see the effect.
#!/usr/bin/perl
$text = "JeFFerson JEFF jeff\nJeFF\t JefF\nJEff JEFf\n";
print "text=$text\n";
$text =~ s/^\bJeff\b/Jeff/igm;
print "resulting text=$text";
Example 9: Extracting the
urls from the href and src attributes in a htm file.
#!/usr/bin/perl
use
CGI qw(:standard);
print
header();
$file
= param('file');
print
"file=$file<br>\n";
open(IN,
$file);
@lines=<IN>;
$text
= join "\n", @lines;
@srcs=($text
=~ m|src\s*=\s*\"([^\"]+)\"|ig);
@hrefs=($text
=~ m|href\s*=\s*\"([^\"]+)\"|ig);
print
"<P>list of href values<BR>\n";
$count
= 1;
foreach
$href (@hrefs) {
print "$count href=$href<BR>\n";
$count++;
}
print
"<P>list of src values<BR>\n";
foreach
$src (@srcs) {
print "$count src=$src<BR>\n";
$count++;
}
close(IN);
http://cs.uccs.edu/cgi-bin/cs301/listurl.pl?file=CS301F98photo.html
http://cs.uccs.edu/cgi-bin/cs301/listurl.pl?file=test.html
test.html content:
<a
href= "test.html"> <img src ="test.jpg">
<a
href=
"http://cs.uccs.edu/~cs301/perl/re.htm">
<img
src=
"http://cs.uccs.edu/~cs301/images/chow.jpg">