3. Perl兼容正则表达式

Perl具有比POSIX扩展正则表达式语法更丰富,更可预测的语法。其可预测性的一个例子是,\总是引用一个非字母数字字符。采用perl,我们可以描述是否启用贪婪模式,而posix却不能,例如/a.b/模式,其中的.将尽可能匹配更多,而在/a.?b/模式中,其中的.?将尽可能的匹配更少的字符。所以,对于这个字符串“a bad dab”,采用前一种模式时将匹配整个字符,第二种将匹配“a b”。

处于这些原因,其他的工具和程序都采用类似perl的语法。例如Java 、Ruby、Python、PHP、exim、BBEdit,甚至是微软的.net都采用类似perl的语法。然而,并非所有的perl兼容模式的语法的实现都是一样的,许多实现仅是perl的一个子集。

在实施例中使用的约定:字符’m’不总是需要指定一个perl匹配操作。例如,m/[^ ABC]/也可以呈现为/[^ ABC]/。如果用户希望指定的匹配操作,而无需使用一个正斜杠作为正则表达式定界符的’m’为唯一必要的。有时为了避免“撞分隔符”指定替代正则表达式的分隔符是有用的。见perldoc perlre了解更多详情。

元字符 描述 示例
所有的if表达式返回真值
. 匹配除了换行符之外的任意单一字符,如果在方括号中,则为字面意思,匹配“.”自身
if ("Hello World\n" =~ m/...../) {
  print "Yep"; # Has length >= 5\n";
}
( ) 将要匹配的模式放在括号中,在以后我们可以通过使用$1、$2等来访问这些匹配到的元素。
if ("Hello World\n" =~ m/(H..).(o..)/) {
  print "We matched '$1' and '$2'\n";
}

Output:

We matched 'Hel' and 'o W';

+ 匹配前一个元素一次或多次
if ("Hello World\n" =~ m/l+/) {
  print "One or more \"l\"'s in the string\n";
}
? 匹配前一个元素0次或1次
if ("Hello World\n" =~ m/H.?e/) {
  print "There is an 'H' and a 'e' separated by ";
  print "0-1 characters (Ex: He Hoe)\n";
}
? 使*、+或者{M,N}表达式尽可能少的匹配
if ("Hello World\n" =~ m/(l.+?o)/) {
  print "Yep"; # The non-greedy match with 'l' followed
  # by one or more characters is 'llo' rather than 'llo wo'.
}
* 匹配前一个匹配模式元素0次或多次
if ("Hello World\n" =~ m/el*o/) {
  print "There is an 'e' followed by zero to many ";
  print "'l' followed by 'o' (eo, elo, ello, elllo)\n";
}
{M,N} 匹配的最小次数M和最大次数N
if ("Hello World\n" =~ m/l{1,2}/) {
 print "There is a substring with at least 1 ";
 print "and at most 2 l's in the string\n";
}
[…] 匹配一系列可能的字符
if ("Hello World\n" =~ m/[aeiou]+/) {
  print "Yep"; # Contains one or more vowels
}
| 匹配可选字符,相当于或者
if ("Hello World\n" =~ m/(Hello|Hi|Pogo)/) {
  print "At least one of Hello, Hi, or Pogo is ";
  print "contained in the string.\n";
}
\b 匹配单词边界
if ("Hello World\n" =~ m/llo\b/) {
  print "There is a word that ends with 'llo'\n";
}
\w 匹配字母数字字符,包括“_”
if ("Hello World\n" =~ m/\w/) {
  print "There is at least one alphanumeric ";
  print "character in the string (A-Z, a-z, 0-9, _)\n";
}
\W 匹配非字母数字字符,不匹配“_”
if ("Hello World\n" =~ m/\W/) {
  print "The space between Hello and ";
  print "World is not alphanumeric\n";
}
\s 匹配空白字符 (space, tab, newline, form feed)
if ("Hello World\n" =~ m/\s.*\s/) {
  print "There are TWO whitespace characters, which may";
  print " be separated by other characters, in the string.";
}
\S 匹配除了空白字符的任何字符
if ("Hello World\n" =~ m/\S.*\S/) {
  print "Contains two non-whitespace characters " .
        "separated by zero or more characters.";
}
\d 匹配数字,同[0-9]
if ("99 bottles of beer on the wall." =~ m/(\d+)/) {
  print "$1 is the first number in the string'\n";
}
\D 匹配非数字
if ("Hello World\n" =~ m/\D/) {
  print "There is at least one character in the string";
  print " that is not a digit.\n";
}
^ 在行开始处匹配
if ("Hello World\n" =~ m/^He/) {
  print "Starts with the characters 'He'\n";
}
$ 匹配行结束处
if ("Hello World\n" =~ m/rld$/) {
  print "Is a line or string ";
  print "that ends with 'rld'\n";
}
\A 匹配字符串的开始处(but not an internal line).
if ("Hello\nWorld\n" =~ m/\AH/) {
  print "Yep"; # The string starts with 'H'.
}
\Z 在字符串的结束处匹配 (but not an internal line).
if ("Hello\nWorld\n"; =~ m/d\n\Z/) {
  print "Yep"; # Ends with 'd\\n'\n";
}
[^…] 匹配除了方括号内的任何字符
if ("Hello World\n" =~ m/[^abc]/) {
  print "Yep"; # Contains a character other than a, b, and c.
}

使用的工具

使用perl正则表达式语法的工具包括:

  • Java
  • leafnode
  • Perl
  • Python
  • PHP

参考

链接

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注