3. Perl兼容正则表达式

发布于 2015年6月13日2017年5月7日 by imliuda

Perl具有比POSIX扩展正则表达式语法更丰富，更可预测的语法。其可预测性的一个例子是，\总是引用一个非字母数字字符。采用perl，我们可以描述是否启用贪婪模式，而posix却不能，例如/a.b/模式，其中的.将尽可能匹配更多，而在/a.?b/模式中，其中的.?将尽可能的匹配更少的字符。所以，对于这个字符串“a bad dab”，采用前一种模式时将匹配整个字符，第二种将匹配“a b”。

处于这些原因，其他的工具和程序都采用类似perl的语法。例如Java 、Ruby、Python、PHP、exim、BBEdit，甚至是微软的.net都采用类似perl的语法。然而，并非所有的perl兼容模式的语法的实现都是一样的，许多实现仅是perl的一个子集。

在实施例中使用的约定：字符’m’不总是需要指定一个perl匹配操作。例如，m/[^ ABC]/也可以呈现为/[^ ABC]/。如果用户希望指定的匹配操作，而无需使用一个正斜杠作为正则表达式定界符的’m’为唯一必要的。有时为了避免“撞分隔符”指定替代正则表达式的分隔符是有用的。见perldoc perlre了解更多详情。

元字符	描述	示例所有的if表达式返回真值
.	匹配除了换行符之外的任意单一字符，如果在方括号中，则为字面意思，匹配“.”自身	if ("Hello World\n" =~ m/...../) { print "Yep"; # Has length >= 5\n"; }
( )	将要匹配的模式放在括号中，在以后我们可以通过使用$1、$2等来访问这些匹配到的元素。	if ("Hello World\n" =~ m/(H..).(o..)/) { print "We matched '$1' and '$2'\n"; } Output: We matched 'Hel' and 'o W';
+	匹配前一个元素一次或多次	if ("Hello World\n" =~ m/l+/) { print "One or more \"l\"'s in the string\n"; }
?	匹配前一个元素0次或1次	if ("Hello World\n" =~ m/H.?e/) { print "There is an 'H' and a 'e' separated by "; print "0-1 characters (Ex: He Hoe)\n"; }
?	使*、+或者{M,N}表达式尽可能少的匹配	if ("Hello World\n" =~ m/(l.+?o)/) { print "Yep"; # The non-greedy match with 'l' followed # by one or more characters is 'llo' rather than 'llo wo'. }
*	匹配前一个匹配模式元素0次或多次	if ("Hello World\n" =~ m/el*o/) { print "There is an 'e' followed by zero to many "; print "'l' followed by 'o' (eo, elo, ello, elllo)\n"; }
{M,N}	匹配的最小次数M和最大次数N	if ("Hello World\n" =~ m/l{1,2}/) { print "There is a substring with at least 1 "; print "and at most 2 l's in the string\n"; }
[…]	匹配一系列可能的字符	if ("Hello World\n" =~ m/[aeiou]+/) { print "Yep"; # Contains one or more vowels }
\|	匹配可选字符，相当于或者	if ("Hello World\n" =~ m/(Hello\|Hi\|Pogo)/) { print "At least one of Hello, Hi, or Pogo is "; print "contained in the string.\n"; }
\b	匹配单词边界	if ("Hello World\n" =~ m/llo\b/) { print "There is a word that ends with 'llo'\n"; }
\w	匹配字母数字字符，包括“_”	if ("Hello World\n" =~ m/\w/) { print "There is at least one alphanumeric "; print "character in the string (A-Z, a-z, 0-9, _)\n"; }
\W	匹配非字母数字字符，不匹配“_”	if ("Hello World\n" =~ m/\W/) { print "The space between Hello and "; print "World is not alphanumeric\n"; }
\s	匹配空白字符 (space, tab, newline, form feed)	if ("Hello World\n" =~ m/\s.*\s/) { print "There are TWO whitespace characters, which may"; print " be separated by other characters, in the string."; }
\S	匹配除了空白字符的任何字符	if ("Hello World\n" =~ m/\S.*\S/) { print "Contains two non-whitespace characters " . "separated by zero or more characters."; }
\d	匹配数字，同[0-9]	if ("99 bottles of beer on the wall." =~ m/(\d+)/) { print "$1 is the first number in the string'\n"; }
\D	匹配非数字	if ("Hello World\n" =~ m/\D/) { print "There is at least one character in the string"; print " that is not a digit.\n"; }
^	在行开始处匹配	if ("Hello World\n" =~ m/^He/) { print "Starts with the characters 'He'\n"; }
$	匹配行结束处	if ("Hello World\n" =~ m/rld$/) { print "Is a line or string "; print "that ends with 'rld'\n"; }
\A	匹配字符串的开始处(but not an internal line).	if ("Hello\nWorld\n" =~ m/\AH/) { print "Yep"; # The string starts with 'H'. }
\Z	在字符串的结束处匹配 (but not an internal line).	if ("Hello\nWorld\n"; =~ m/d\n\Z/) { print "Yep"; # Ends with 'd\\n'\n"; }
[^…]	匹配除了方括号内的任何字符	if ("Hello World\n" =~ m/[^abc]/) { print "Yep"; # Contains a character other than a, b, and c. }

使用的工具

使用perl正则表达式语法的工具包括：

Java
leafnode
Perl
Python
PHP

参考

Perl Programming/Regular Expressions Reference

链接

Perl regular expressions at perl.org
Perl Compatible Regular Expressions library at pcre.org
Perl Regular Expression Syntax at boost.org
W:Regular_expressions#Standard_Perl

发表回复取消回复