Perl初级教程 - 第三天 | |
|
|
第二页:字符串匹配 Perl的最有用的特征之一是 它的强大的字符串处理能力。其中的核心是被很多其它UNIX工具使用的规则表达式(regular expression - RE)。 规则表达式 规则表达式包含在斜线内,匹配通过=~操作符进行。如果字符串the出现在变量$sentence中,则下面的表达式为真: $sentence =~ /the/ RE是大小写敏感的,所以如果 $sentence = "The quick brown fox"; 那么上面的匹配结果为false。操作符!~用在“非匹配”时,在上面的例子中 $sentence !~ /the/ 是真,因为字符串the没有出现在$sentence中。 特殊变量$_ 在条件语句 if ($sentence =~ /under/) { print "We're talking about rugby\n"; } 中,如果我们有下面两个表达式中的一个: $sentence = "Up and under"; $sentence = "Best winkles in Sunderland"; 将打印出一条信息。 但是如果我们把这个句子赋值给特殊变量$_,用起来会更容易些。如果这样,我们可以避免使用匹配和非匹配操作符,上面的例子可以写成: if (/under/) { print "We're talking about rugby\n"; } $_变量是很多Perl操作的缺省变量,经常被使用。 其它的RE 在RE中有大量的特殊字符,既使它们功能强大,又使它们看起来很复杂。最好在用RE时慢慢来,对它们的使用是一种艺术。 下面是一些特殊的RE字符和它们的意义: . # Any single character except a newline ^ # The beginning of the line or string $ # The end of the line or string * # Zero or more of the last character + # One or more of the last character ? # Zero or one of the last character 下面是一些匹配的例子,在使用时应加上/.../: t.e # t followed by anthing followed by e # This will match the # tre # tle # but not te # tale ^f # f at the beginning of a line ^ftp # ftp at the beginning of a line e$ # e at the end of a line tle$ # tle at the end of a line und* # un followed by zero or more d characters # This will match un # und # undd # unddd (etc) .* # Any string without a newline. This is because # the . matches anything except a newline and # the * means zero or more of these. ^$ # A line with nothing in it. 还有更多的用法。方括号用来匹配其中的任何一个字符。在方括号中"-"表明"between","^"表示"not": [qjk] # Either q or j or k [^qjk] # Neither q nor j nor k [a-z] # Anything from a to z inclusive [^a-z] # No lower case letters [a-zA-Z] # Any letter [a-z]+ # Any non-zero sequence of lower case letters 上面提到的已经基本够用了,下面介绍的只做参考: 竖线"|"表示"or",括号(...)可以进行集合: jelly|cream # Either jelly or cream (eg|le)gs # Either eggs or legs (da)+ # Either da or dada or dadada or... 下面是一些其它的特殊字符: \n # A newline \t # A tab \w # Any alphanumeric (word) character. # The same as [a-zA-Z0-9_] \W # Any non-word character. # The same as [^a-zA-Z0-9_] \d # Any digit. The same as [0-9] \D # Any non-digit. The same as [^0-9] \s # Any whitespace character: space, # tab, newline, etc \S # Any non-whitespace character \b # A word boundary, outside [] only \B # No word boundary 象$, |, [, ), \, /这样的字符是很特殊的,如果要引用它们,必须在前面加一个反斜线: \| # Vertical bar \[ # An open square bracket \) # A closing parenthesis \* # An asterisk \^ # A carat symbol \/ # A slash \\ # A backslash RE的例子 我们前面提到过,用RE最好慢慢来。下面是一些例子,当使用它们时应方在/.../中。 [01] # Either "0" or "1" \/0 # A division by zero: "/0" \/ 0 # A division by zero with a space: "/ 0" \/\s0 # A division by zero with a whitespace: # "/ 0" where the space may be a tab etc. \/ *0 # A division by zero with possibly some # spaces: "/0" or "/ 0" or "/ 0" etc. \/\s*0 # A division by zero with possibly some # whitespace. \/\s*0\.0* # As the previous one, but with decimal # point and maybe some 0s after it. Accepts # "/0." and "/0.0" and "/0.00" etc and # "/ 0." and "/ 0.0" and "/ 0.00" etc. Perl初级教程
|
|