Perl具有比POSIX扩展正则表达式语法更丰富,更可预测的语法。其可预测性的一个例子是,\总是引用一个非字母数字字符。采用perl,我们可以描述是否启用贪婪模式,而posix却不能,例如/a.b/模式,其中的.将尽可能匹配更多,而在/a.?b/模式中,其中的.?将尽可能的匹配更少的字符。所以,对于这个字符串“a bad dab”,采用前一种模式时将匹配整个字符,第二种将匹配“a b”。
处于这些原因,其他的工具和程序都采用类似perl的语法。例如Java 、Ruby、Python、PHP、exim、BBEdit,甚至是微软的.net都采用类似perl的语法。然而,并非所有的perl兼容模式的语法的实现都是一样的,许多实现仅是perl的一个子集。
在实施例中使用的约定:字符’m’不总是需要指定一个perl匹配操作。例如,m/[^ ABC]/也可以呈现为/[^ ABC]/。如果用户希望指定的匹配操作,而无需使用一个正斜杠作为正则表达式定界符的’m’为唯一必要的。有时为了避免“撞分隔符”指定替代正则表达式的分隔符是有用的。见perldoc perlre了解更多详情。
元字符 | 描述 | 示例 所有的if表达式返回真值 |
---|---|---|
. | 匹配除了换行符之外的任意单一字符,如果在方括号中,则为字面意思,匹配“.”自身 |
if ("Hello World\n" =~ m/...../) { print "Yep"; # Has length >= 5\n"; } |
( ) | 将要匹配的模式放在括号中,在以后我们可以通过使用$1、$2等来访问这些匹配到的元素。 |
if ("Hello World\n" =~ m/(H..).(o..)/) { print "We matched '$1' and '$2'\n"; } Output: We matched 'Hel' and 'o W'; |
+ | 匹配前一个元素一次或多次 |
if ("Hello World\n" =~ m/l+/) { print "One or more \"l\"'s in the string\n"; } |
? | 匹配前一个元素0次或1次 |
if ("Hello World\n" =~ m/H.?e/) { print "There is an 'H' and a 'e' separated by "; print "0-1 characters (Ex: He Hoe)\n"; } |
? | 使*、+或者{M,N}表达式尽可能少的匹配 |
if ("Hello World\n" =~ m/(l.+?o)/) { print "Yep"; # The non-greedy match with 'l' followed # by one or more characters is 'llo' rather than 'llo wo'. } |
* | 匹配前一个匹配模式元素0次或多次 |
if ("Hello World\n" =~ m/el*o/) { print "There is an 'e' followed by zero to many "; print "'l' followed by 'o' (eo, elo, ello, elllo)\n"; } |
{M,N} | 匹配的最小次数M和最大次数N |
if ("Hello World\n" =~ m/l{1,2}/) { print "There is a substring with at least 1 "; print "and at most 2 l's in the string\n"; } |
[…] | 匹配一系列可能的字符 |
if ("Hello World\n" =~ m/[aeiou]+/) { print "Yep"; # Contains one or more vowels } |
| | 匹配可选字符,相当于或者 |
if ("Hello World\n" =~ m/(Hello|Hi|Pogo)/) { print "At least one of Hello, Hi, or Pogo is "; print "contained in the string.\n"; } |
\b | 匹配单词边界 |
if ("Hello World\n" =~ m/llo\b/) { print "There is a word that ends with 'llo'\n"; } |
\w | 匹配字母数字字符,包括“_” |
if ("Hello World\n" =~ m/\w/) { print "There is at least one alphanumeric "; print "character in the string (A-Z, a-z, 0-9, _)\n"; } |
\W | 匹配非字母数字字符,不匹配“_” |
if ("Hello World\n" =~ m/\W/) { print "The space between Hello and "; print "World is not alphanumeric\n"; } |
\s | 匹配空白字符 (space, tab, newline, form feed) |
if ("Hello World\n" =~ m/\s.*\s/) { print "There are TWO whitespace characters, which may"; print " be separated by other characters, in the string."; } |
\S | 匹配除了空白字符的任何字符 |
if ("Hello World\n" =~ m/\S.*\S/) { print "Contains two non-whitespace characters " . "separated by zero or more characters."; } |
\d | 匹配数字,同[0-9] |
if ("99 bottles of beer on the wall." =~ m/(\d+)/) { print "$1 is the first number in the string'\n"; } |
\D | 匹配非数字 |
if ("Hello World\n" =~ m/\D/) { print "There is at least one character in the string"; print " that is not a digit.\n"; } |
^ | 在行开始处匹配 |
if ("Hello World\n" =~ m/^He/) { print "Starts with the characters 'He'\n"; } |
$ | 匹配行结束处 |
if ("Hello World\n" =~ m/rld$/) { print "Is a line or string "; print "that ends with 'rld'\n"; } |
\A | 匹配字符串的开始处(but not an internal line). |
if ("Hello\nWorld\n" =~ m/\AH/) { print "Yep"; # The string starts with 'H'. } |
\Z | 在字符串的结束处匹配 (but not an internal line). |
if ("Hello\nWorld\n"; =~ m/d\n\Z/) { print "Yep"; # Ends with 'd\\n'\n"; } |
[^…] | 匹配除了方括号内的任何字符 |
if ("Hello World\n" =~ m/[^abc]/) { print "Yep"; # Contains a character other than a, b, and c. } |
使用的工具
使用perl正则表达式语法的工具包括:
- Java
- leafnode
- Perl
- Python
- PHP
参考
链接
- Perl regular expressions at perl.org
- Perl Compatible Regular Expressions library at pcre.org
- Perl Regular Expression Syntax at boost.org
- W:Regular_expressions#Standard_Perl