perlop

名称

perlop - Perl 操作符和优先级


摘要

下面列出 Perl 的操作符的位置关系和优先级, 按优先级从高到低依次列出. 要注意所有从 C 借用的操作符保留了它们原来的优先级关系, 甚至没有了 C 中的一些混乱现象. (玩过 C 的人会觉得 Perl 很好上手吧)

下面按优先级的顺序解释每一个操作符.


描述


词语和列表操作符(靠左)

词语在 Perl 里有着最高的优先级别. 词语包括变量, 引号或类似引号的操作符, 任何括号里的表达式, 用括号括起参数的函数. 实际上, 这不是函数, 而是一些 列表操作符和一元操作符, 当用括号括起它们的参数时, 它们就象是函数一样. 这些都在记录在文档 perlfunc 中.

如果任何列表操作符 (比如 print() ) 或任何一元操作符 (比如 chdir() ) 后面跟了一个左括号, 该操作符和它所有在括号中参数就有最高的优先级, 就象 普通的函数调用一样.

如果没有括号对, 象 print , sort , 或 chmod 这些列表操作符的优先级别要么很高, 要么很低, 取决于看向操作符的左边还是右边. 例如:

@ary = (1, 3, sort 4, 2); print @ary; # prints 1324

sort 右边的逗号在 sort 被计算前被计算, 而左边的逗号则在之后计算. 换句话说, 列表操作符总是要把其后的所有参数组合起来, 相对它前面的表达式整体作为一个词语. 要注意括号的使用:

# 在这些计算被执行之前程序就会退出: print($foo, exit); # 这明显不是想要的结果 print $foo, exit; # 这样也不行 # 这样可以在退出前打印: (print $foo), exit; # 这是对的 print($foo), exit; # 这也对 print ($foo), exit; # 甚至这也对

也请注意

print ($foo & 255) + 1, "\n";

可能不会产生和一眼看上去应该有的运行结果. 参看 Named Unary Operators 查找更多有关的信息.

结构 do {}eval {} , 也被当作词语处理, 情况和调用子过程及方法一样, 匿名结构 []{} 也一样.

参看 Quote and Quotelike OperatorsI/O Operators .


箭头操作符

和在 C/C++ 里一样, ``->'' 操作符取消引用. 如果它的右边是 [...]{...} 下标, 它的左边必须是对数组或哈希表的直接 或符号化引用(如果不是一个左值(可被赋值), 要是一个存放直接引用的单元). 参看 perlref .

否则, 右边就是一个方法的名字或是一个存放方法名字的简单数值变量, 而左边必须是 一个个对象(一个 blessed reference)或是一个类名(一个包名). 参看 perlobj .


自动增量减量

``++'' 和 ``--'' 和 C 里面的作用是一样的, 如果出现在变量之前, 在返回变量的值前进行增量或减量操作, 如果出现在变量之后, 操作在返回变量的值 后进行.

自动增量操作符还有一个小小的特别功能. 如果对一个数字变量进行增量操作, 或者 只要在数字上下文里使用, 增量的结果没有什么特别. 但如果对一个字符串变量增量, 或者是在字符串上下文里使用增量操作, 对象是满足模式 /^[a-zA-Z]*[0-9]*$/, 那么增量操作是针对整个字符串, 结果保持在模式范围内, 操作是带进位的:

print ++($foo = '99'); # prints '100' print ++($foo = 'a0'); # prints 'a1' print ++($foo = 'Az'); # prints 'Ba' print ++($foo = 'zz'); # prints 'aaa'

自动减量操作没有这个特性.


二元 ``**'' 是幂操作符. 优先级比一元减操作符高, 所以 -2**4 等于 -(2**4), 不是 (-2)**4. (该操作符号是用 C 的 pow(3) 函数实现的, 该函数是对 double 类型操作)


符号一元操作符

一元 ``!'' 代表逻辑非, 即 ``not''. 参看 not , 这是个优先级稍低的版本.

一元 ``-'' 对数字操作数是算术的负号. 作用于字符串时, 如果字符串是标识符, 返回负号 后跟标识符组成的字符串. 否则返回以负号开头的字符串. 这些规则的意思是 -bareword 就等于 ``-bareword'' .

一元 ``~'' 代表按位取反, 即 1 的补码.

一元 ``+'' 不起任何作用, 不管是对数字或字符串. 它的用处是在语法上把函数名 和后面带括号的表达式隔开, 避免后者被当做参数处理. (参看 List Operators 中的例子)

一元 ``\'' 代表引用它后面的东西. 参看 perlref . 这个含义和字符串里的反斜杠的含义不同, 虽然两者都有避免后随被解释的作用.


捆绑操作符

二元 ``=~'' 把表达式捆绑到模式匹配上. 有些操作默认是搜索或修改变量 $_ . 这个操作符能对其他字符串进行同样的操作. 右边的参数是一个搜索模式, 替换, 或转换. 左边的参数代替 $_ 进行搜索, 替换, 转换. 返回值表示了操作是否成功. (如果右边的参数是表达式而非搜索模式, 替换或者转换, 在运行时刻会被解释作搜索模式. 这样做在效率上要比显式的使用搜索模式低, 因为 每次进行表达式求值时模式会被重新编译, 除非使用了 /o.)

二元 ``!~'' 和 ``=~'' 的用法一样, 但返回值是逻辑上相反的.


乘法操作符

二元 ``*'' 把两个数字相乘.

二元 ``/'' 把两个数字相除.

二元 ``%'' 把两个数字求模.

二元 ``x'' 是重复操作符. 在数值上下文里, 返回一个字符串, 由左边的操作数 重复右边操作数指定的次数组成. 在列表上下文里, 如果左边的操作数是括号括住 的列表, 就重复这个列表.

print '-' x 80; # 打印一行虚线 print "\t" x ($tab/8), ' ' x ($tab%8); # tab over @ones = (1) x 80; # 包含80个 1 的列表 @ones = (5) x @ones; # 把所有元素设为 5


加法操作符

二元 ``+'' 把两个数字相加.

二元 ``-'' 把两个数字相减.

二元 ``.'' 连接两个字符串.


移位操作符

二元 ``<<'' 把左边的参数向左移位右边参数指定的次数. 参数必须是整数.

二元 ``>>'' 把左边的参数向右移位右边参数指定的次数. 参数必须是整数.


命名一元操作符

命名的一元操作符带上可选的括号被当作单参数的函数. 如文件测试操作符 -f, -M, 等等. 参看 perlfunc .

如果任何列表操作符 ( print() , 等等.) 或者任何一元操作符 ( chdir() , 等等.) 后跟有左括号, 该操作符和括号里的参数一起具有最高的优先级, 就象普通的 函数调用一样. 例如:

chdir $foo || die; # (chdir $foo) || die chdir($foo) || die; # (chdir $foo) || die chdir ($foo) || die; # (chdir $foo) || die chdir +($foo) || die; # (chdir $foo) || die

但是, * 的优先级比 || 高:

chdir $foo * 20; # chdir ($foo * 20) chdir($foo) * 20; # (chdir $foo) * 20 chdir ($foo) * 20; # (chdir $foo) * 20 chdir +($foo) * 20; # chdir ($foo * 20) rand 10 * 20; # rand (10 * 20) rand(10) * 20; # (rand 10) * 20 rand (10) * 20; # (rand 10) * 20 rand +(10) * 20; # rand (10 * 20)

参看``List Operators''.


关系操作符

二元 ``<'' 返回真, 如果左边的参数在数字上小于右边的参数.

二元 ``>'' 返回真, 如果左边的参数在数字上大于右边的参数.

二元 ``<='' 返回真, 如果左边的参数在数字上小于或等于右边的参数.

二元 ``>='' 返回真, 如果左边的参数在数字上大于或等于右边的参数.

二元 ``lt'' 返回真, 如果左边的参数在字符顺序上小于右边的参数.

二元 ``gt'' 返回真, 如果左边的参数在字符顺序上大于右边的参数.

二元 ``le'' 返回真, 如果左边的参数在字符顺序上小于或等于右边的参数.

二元 ``ge'' 返回真, 如果左边的参数在字符顺序上大于或等于右边的参数.


相等操作符

二元 ``=='' 返回真, 如果左边的参数在数字上等于右边的参数.

二元 ``!='' 返回真, 如果左边的参数在数字上不等于右边的参数.

二元 ``<=>'' 分别返回 -1, 0, 或 1, 取决于左边的参数在数字上是小于, 等于, 或大于 右边的参数.

二元 ``eq'' 返回真, 如果左边的参数在字符顺序上等于右边的参数.

二元 ``ne'' 返回真, 如果左边的参数在字符顺序上不等于右边的参数.

二元 ``cmp'' 分别返回 -1, 0, 或 1, 取决于左边的参数在字符顺序上是小于, 等于, 或大于 右边的参数.


按位 与

二元 ``&'' 返回两个操作数按位进行 与 操作的结果


按位 或 和 异或

二元 ``|'' 返回两个操作数按位进行 或 操作的结果

二元 ``^'' 返回两个操作数按位进行 异或 操作的结果


C 风格的 逻辑与

二元 ``&&'' 进行快速的逻辑与操作. 即如果左边操作数是假, 右边的操作数 不会被求值. 当右边的操作数被求值, 数值或列表上下文会被传递过去.


C 风格的 逻辑或

二元 ``&&'' 进行快速的逻辑或操作. 即如果左边操作数是真, 右边的操作数 不会被求值. 当右边的操作数被求值, 数值或列表上下文会被传递过去.

||&& 与 C 的不同之处是返回值不一定是 0 或 1, 而是最后的计算结果. 因此, 能正确找出 home 目录(不会是 ``0'')的方法应该是:

$home = $ENV{'HOME'} || $ENV{'LOGDIR'} || (getpwuid($<))[7] || die "You're homeless!\n";

为了增加程序的可读性, Perl 提供了 ``and'' 和 ``or'' 操作符(见下). 快速操作的特点 是一样的. 但 ``and'' 和 ``or'' 的优先级要低很多, 所以在不用括号也可以安全地用在 列表操作符后:

unlink &quot;alpha&quot;, &quot;beta&quot;, &quot;gamma&quot; or gripe(), next LINE;

用 C 风格操作符就要写成这样:

unlink(&quot;alpha&quot;, &quot;beta&quot;, &quot;gamma&quot;) || (gripe(), next LINE);


范围操作符

二元 ``..'' 是范围操作符, 在不同的上下文中有不同的含义. 在列表上下文中, 返回一个数组, 值的大小是从左边的操作数的右边的操作数(按一递增). 这可以用于编写 for (1..10) 循环, 或者是对数组做分片操作. 由于当前的实现方式中 使用了临时数组, 所以下面的代码会浪费很多内存:

for (1 .. 1_000_000) { # code }

在数值上下文里, ``..'' 返回一个布尔值. 操作符是对位的, 象开关一样, 功能 类似 sed, awk 等其它编辑器里的行范围(逗号) 操作符. 每个 ``..'' 操作符会维护自己的布尔值状态. 如果左边的操作数是假, 状态为假. 如果左操作数为真, 状态就是真, 直到右边的操作数也为真, 状态变回假. (在范围操作符被再次计算时, 状态才变回假. 它可以测试右操作数并在变成真值的 计算过程中变成假值(象在 awk 里一样), 但还是会变到真值一次. 如果不想等到下一次计算才测试右操作数(象在 sed 里一样), 可以用三点 (``...'') 代替两点.) 当操作符处于 ``假'' 状态时, 右操作数不会被计算, 而当状态为 ``真'' 时, 左操作数不会被计算. 操作符的优先级稍低于 || 和 &&. 返回值是空字符串 代表假值, 或是一串数字(从 1 开始)代表真值. 当范围到达时这串数字被复位。 范围里最后一个数字有追加的字符串 ``E0'', 但不影响它 的数字值, 用来提供到达尾部的信息. 如果要忽略开始点, 可以等待大于 1 的数字. 如果 ``..'' 的左右操作数都不是数字, 操作数隐含地和当前行号变量 $. 进行比较. 例子:

作为数值操作符:

if (101 .. 200) { print; } # 打印第二个一百行 next line if (1 .. /^$/); # 跳过开头的行 s/^/&gt; / if (/^$/ .. eof()); # 引起全体文字

作为列表操作符:

for (101 .. 200) { print; } # 打印 $_ 一百次 @foo = @foo[$[ .. $#foo]; # 耗费资源的空循环 @foo = @foo[$#foo-4 .. $#foo]; # 切出最后 5 个元素

如果操作数是字符串, 范围操作符(在列表上下文里)使用特别的增量算法. 例如用

@alphabet = ('A' .. 'Z');

表示所有英文字母, 或者

$hexdigit = (0 .. 9, 'a' .. 'f')[$num &amp; 15];

表示所有16进制数字, 或者

@z2 = ('01' .. '31'); print $z2[$mday];

打印两位格式的日期. 如果最后指定的值不在算法能产生的范围内, 那么一直计算到 下一个值的长度比最后的值大为止.


条件操作符

三元 ``?:'' 是条件操作符, 和 C 里的一样. 它的意思很象一个 if-then-else. 如果 ? 前面的参数为真, 返回 : 前面的参数, 否则返回 : 后面的参数. 例如:

printf &quot;I have %d dog%s.\n&quot;, $n, ($n == 1) ? '' : &quot;s&quot;;

数值上下文或列表上下文会被传送给选中的参数.

$a = $ok ? $b : $c; # 得到数值 @a = $ok ? @b : @c; # 得到数组 $a = $ok ? @b : @c; # 这不对了

如果第2,3参数都是合法的左值, 操作符可以被赋值:

($a_or_b ? $a : $b) = $c;

但这样写的程序读起来不太容易.


赋值操作符

``='' 是普通的赋值操作符.

赋值符和 C 里的作用一样. 即

$a += 2;

等同于

$a = $a + 2;

但不象 tie() 那样会带来反引用左值的副作用. 其他赋值操作符的使用是类似的:


    **=    +=    *=    &=    <<=    &&=

           -=    /=    |=    >>=    ||=

           .=    %=    ^=

                 x=



要注意这些操作符的优先级比等号高.

和 C 不同, 赋值操作符产生一个有效的左值. 对赋值进行修改等于先赋值, 再修改 用于赋值的变量. 如:

($tmp = $global) =~ tr [A-Z] [a-z];

还有

($a += 2) *= 3;

等同于

$a += 2; $a *= 3;


Comma Operator

Binary ``,'' is the comma operator. In a scalar context it evaluates its left argument, throws that value away, then evaluates its right argument and returns that value. This is just like C's comma operator.

In a list context, it's just the list argument separator, and inserts both its arguments into the list.

The => digraph is mostly just a synonym for the comma operator. It's useful for documenting arguments that come in pairs. As of release 5.001, it also forces any word to the left of it to be interpreted as a string.


List Operators (Rightward)

On the right side of a list operator, it has very low precedence, such that it controls all comma-separated expressions found there. The only operators with lower precedence are the logical operators ``and'', ``or'', and ``not'', which may be used to evaluate calls to list operators without the need for extra parentheses:

open HANDLE, &quot;filename&quot; or die &quot;Can't open: $!\n&quot;;

See also discussion of list operators in List Operators (Leftward).


Logical Not

Unary ``not'' returns the logical negation of the expression to its right. It's the equivalent of ``!'' except for the very low precedence.


Logical And

Binary ``and'' returns the logical conjunction of the two surrounding expressions. It's equivalent to && except for the very low precedence. This means that it short-circuits: i.e. the right expression is evaluated only if the left expression is true.


Logical or and Exclusive Or

Binary ``or'' returns the logical disjunction of the two surrounding expressions. It's equivalent to || except for the very low precedence. This means that it short-circuits: i.e. the right expression is evaluated only if the left expression is false.

Binary ``xor'' returns the exclusive-OR of the two surrounding expressions. It cannot short circuit, of course.


C Operators Missing From Perl

Here is what C has that Perl doesn't:

unary &
Address-of operator. (But see the ``\'' operator for taking a reference.)

unary *
Dereference-address operator. (Perl's prefix dereferencing operators are typed: $, @, %, and &.)

(TYPE)
Type casting operator.


Quote and Quotelike Operators

While we usually think of quotes as literal values, in Perl they function as operators, providing various kinds of interpolating and pattern matching capabilities. Perl provides customary quote characters for these behaviors, but also provides a way for you to choose your quote character for any of them. In the following table, a {} represents any pair of delimiters you choose. Non-bracketing delimiters use the same character fore and aft, but the 4 sorts of brackets (round, angle, square, curly) will all nest.

Customary Generic Meaning Interpolates '' q{} Literal no &quot;&quot; qq{} Literal yes `` qx{} Command yes qw{} Word list no // m{} Pattern match yes s{}{} Substitution yes tr{}{} Translation no

For constructs that do interpolation, variables beginning with ``$'' or ``@'' are interpolated, as are the following sequences:

\t tab \n newline \r return \f form feed \v vertical tab, whatever that is \b backspace \a alarm (bell) \e escape \033 octal char \x1b hex char \c[ control char \l lowercase next char \u uppercase next char \L lowercase till \E \U uppercase till \E \E end case modification \Q quote regexp metacharacters till \E

Patterns are subject to an additional level of interpretation as a regular expression. This is done as a second pass, after variables are interpolated, so that regular expressions may be incorporated into the pattern from the variables. If this is not what you want, use \Q to interpolate a variable literally.

Apart from the above, there are no multiple levels of interpolation. In particular, contrary to the expectations of shell programmers, backquotes do NOT interpolate within double quotes, nor do single quotes impede evaluation of variables when used within double quotes.


Regexp Quotelike Operators

Here are the quotelike operators that apply to pattern matching and related activities.

?PATTERN?
This is just like the /pattern/ search, except that it matches only once between calls to the reset() operator. This is a useful optimization when you only want to see the first occurrence of something in each file of a set of files, for instance. Only ?? patterns local to the current package are reset.

This usage is vaguely deprecated, and may be removed in some future version of Perl.

m/PATTERN/gimosx

/PATTERN/gimosx
Searches a string for a pattern match, and in a scalar context returns true (1) or false (''). If no string is specified via the =~ or !~ operator, the $_ string is searched. (The string specified with =~ need not be an lvalue--it may be the result of an expression evaluation, but remember the =~ binds rather tightly.) See also the perlre manpage .

Options are:

If ``/'' is the delimiter then the initial m is optional. With the m you can use any pair of non-alphanumeric, non-whitespace characters as delimiters. This is particularly useful for matching Unix path names that contain ``/'', to avoid LTS (leaning toothpick syndrome).

PATTERN may contain variables, which will be interpolated (and the pattern recompiled) every time the pattern search is evaluated. (Note that $) and $| might not be interpolated because they look like end-of-string tests.) If you want such a pattern to be compiled only once, add a /o after the trailing delimiter. This avoids expensive run-time recompilations, and is useful when the value you are interpolating won't change over the life of the script. However, mentioning /o constitutes a promise that you won't change the variables in the pattern. If you change them, Perl won't even notice.

If the PATTERN evaluates to a null string, the last successfully executed regular expression is used instead.

If used in a context that requires a list value, a pattern match returns a list consisting of the subexpressions matched by the parentheses in the pattern, i.e. ($1, $2, $3...). (Note that here $1 etc. are also set, and that this differs from Perl 4's behavior.) If the match fails, a null array is returned. If the match succeeds, but there were no parentheses, a list value of (1) is returned.

Examples:

open(TTY, '/dev/tty'); <tty> =~ /^y/i &amp;&amp; foo(); # do foo if desired if (/Version: *([0-9.]*)/) { $version = $1; } next if m#^/usr/spool/uucp#; # poor man's grep $arg = shift; while (<>) { print if /$arg/o; # compile only once } if (($F1, $F2, $Etc) = ($foo =~ /^(\S+)\s+(\S+)\s*(.*)/))

This last example splits $foo into the first two words and the remainder of the line, and assigns those three fields to $F1, $F2 and $Etc. The conditional is true if any variables were assigned, i.e. if the pattern matched.

The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In a list context, it returns a list of all the substrings matched by all the parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

In a scalar context, m//g iterates through the string, returning TRUE each time it matches, and FALSE when it eventually runs out of matches. (In other words, it remembers where it left off last time and restarts the search at that point. You can actually find the current match position of a string using the pos() function--see the perlfunc manpage .) If you modify the string in any way, the match position is reset to the beginning. Examples:

# list context ($one,$five,$fifteen) = (`uptime` =~ /(\d+\.\d+)/g); # scalar context $/ = &quot;&quot;; $* = 1; # $* deprecated in Perl 5 while ($paragraph = <>) { while ($paragraph =~ /[a-z]['&quot;)]*[.!?]+['&quot;)]*\s/g) { $sentences++; } } print &quot;$sentences\n&quot;;

q/STRING/

'STRING'
A single-quoted, literal string. Backslashes are ignored, unless followed by the delimiter or another backslash, in which case the delimiter or backslash is interpolated.

$foo = q!I said, &quot;You said, 'She said it.'&quot;!; $bar = q('This is it.');

qq/STRING/

``STRING''
A double-quoted, interpolated string.

$_ .= qq (*** The previous line contains the naughty word &quot;$1&quot;.\n) if /(tcl|rexx|python)/; # :-)

qx/STRING/

`STRING`
A string which is interpolated and then executed as a system command. The collected standard output of the command is returned. In scalar context, it comes back as a single (potentially multi-line) string. In list context, returns a list of lines (however you've defined lines with $/ or $INPUT_RECORD_SEPARATOR ).

$today = qx{ date };

See I/O Operators for more discussion.

qw/STRING/
Returns a list of the words extracted out of STRING, using embedded whitespace as the word delimiters. It is exactly equivalent to

split(' ', q/STRING/);

Some frequently seen examples:

use POSIX qw( setlocale localeconv ) @EXPORT = qw( foo bar baz );

s/PATTERN/REPLACEMENT/egimosx
Searches a string for a pattern, and if found, replaces that pattern with the replacement text and returns the number of substitutions made. Otherwise it returns false (0).

If no string is specified via the =~ or !~ operator, the $_ variable is searched and modified. (The string specified with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of those, i.e. an lvalue.)

If the delimiter chosen is single quote, no variable interpolation is done on either the PATTERN or the REPLACEMENT. Otherwise, if the PATTERN contains a $ that looks like a variable rather than an end-of-string test, the variable will be interpolated into the pattern at run-time. If you only want the pattern compiled once the first time the variable is interpolated, use the /o option. If the pattern evaluates to a null string, the last successfully executed regular expression is used instead. See the perlre manpage for further explanation on these.

Options are:

Any non-alphanumeric, non-whitespace delimiter may replace the slashes. If single quotes are used, no interpretation is done on the replacement string (the /e modifier overrides this, however). If backquotes are used, the replacement string is a command to execute whose output will be used as the actual replacement text. If the PATTERN is delimited by bracketing quotes, the REPLACEMENT has its own pair of quotes, which may or may not be bracketing quotes, e.g. s(foo)(bar) or s<foo>/bar/. A /e will cause the replacement portion to be interpreter as a full-fledged Perl expression and eval() ed right then and there. It is, however, syntax checked at compile-time.

Examples:

s/\bgreen\b/mauve/g; # don't change wintergreen $path =~ s|/usr/bin|/usr/local/bin|; s/Login: $foo/Login: $bar/; # run-time pattern ($foo = $bar) =~ s/this/that/; $count = ($paragraph =~ s/Mister\b/Mr./g); $_ = 'abc123xyz'; s/\d+/$&amp;*2/e; # yields 'abc246xyz' s/\d+/sprintf(&quot;%5d&quot;,$&amp;)/e; # yields 'abc 246xyz' s/\w/$&amp; x 2/eg; # yields 'aabbcc 224466xxyyzz' s/%(.)/$percent{$1}/g; # change percent escapes; no /e s/%(.)/$percent{$1} || $&amp;/ge; # expr now, so /e s/^=(\w+)/&amp;pod($1)/ge; # use function call # /e's can even nest; this will expand # simple embedded variables in $_ s/(\$\w+)/$1/eeg; # Delete C comments. $program =~ s { /\* # Match the opening delimiter. .*? # Match a minimal number of characters. \*/ # Match the closing delimiter. } []gsx; s/^\s*(.*?)\s*$/$1/; # trim white space s/([^ ]*) *([^ ]*)/$2 $1/; # reverse 1st two fields

Note the use of $ instead of \ in the last example. Unlike sed, we only use the \<digit> form in the left hand side. Anywhere else it's $<digit>.

Occasionally, you can't just use a /g to get all the changes to occur. Here are two common cases:

# put commas in the right places in an integer 1 while s/(.*\d)(\d\d\d)/$1,$2/g; # perl4 1 while s/(\d)(\d\d\d)(?!\d)/$1,$2/g; # perl5 # expand tabs to 8-column spacing 1 while s/\t+/' ' x (length($&amp;)*8 - length($`)%8)/e;

tr/SEARCHLIST/REPLACEMENTLIST/cds

y/SEARCHLIST/REPLACEMENTLIST/cds
Translates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is translated. (The string specified with =~ must be a scalar variable, an array element, or an assignment to one of those, i.e. an lvalue.) For sed devotees, y is provided as a synonym for tr. If the SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of quotes, which may or may not be bracketing quotes, e.g. tr[A-Z][a-z] or tr(+-*/)/ABCD/.

Options:

If the /c modifier is specified, the SEARCHLIST character set is complemented. If the /d modifier is specified, any characters specified by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note that this is slightly more flexible than the behavior of some tr programs, which delete anything they find in the SEARCHLIST, period.) If the /s modifier is specified, sequences of characters that were translated to the same character are squashed down to a single instance of the character.

If the /d modifier is used, the REPLACEMENTLIST is always interpreted exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter than the SEARCHLIST, the final character is replicated till it is long enough. If the REPLACEMENTLIST is null, the SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class.

Examples:

$ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case $cnt = tr/*/*/; # count the stars in $_ $cnt = $sky =~ tr/*/*/; # count the stars in $sky $cnt = tr/0-9//; # count the digits in $_ tr/a-zA-Z//s; # bookkeeper -&gt; bokeper ($HOST = $host) =~ tr/a-z/A-Z/; tr/a-zA-Z/ /cs; # change non-alphas to single space tr [\200-\377] [\000-\177]; # delete 8th bit

If multiple translations are given for a character, only the first one is used:

tr/AAA/XYZ/

will translate any A to X.

Note that because the translation table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval() :

eval &quot;tr/$oldlist/$newlist/&quot;; die $@ if $@; eval &quot;tr/$oldlist/$newlist/, 1&quot; or die $@;


I/O Operators

There are several I/O operators you should know about. A string is enclosed by backticks (grave accents) first undergoes variable substitution just like a double quoted string. It is then interpreted as a command, and the output of that command is the value of the pseudo-literal, like in a shell. In a scalar context, a single string consisting of all the output is returned. In a list context, a list of values is returned, one for each line of output. (You can set $/ to use a different line terminator.) The command is executed each time the pseudo-literal is evaluated. The status value of the command is returned in $? (see the perlvar manpage for the interpretation of $? ). Unlike in csh, no translation is done on the return data--newlines remain newlines. Unlike in any of the shells, single quotes do not hide variable names in the command from interpretation. To pass a $ through to the shell you need to hide it with a backslash. The generalized form of backticks is qx// . (Because backticks always undergo shell expansion as well, see the perlsec manpage for security concerns.)

Evaluating a filehandle in angle brackets yields the next line from that file (newline included, so it's never false until end of file, at which time an undefined value is returned). Ordinarily you must assign that value to a variable, but there is one situation where an automatic assignment happens. If and ONLY if the input symbol is the only thing inside the conditional of a while loop, the value is automatically assigned to the variable $_ . The assigned value is then tested to see if it is defined. (This may seem like an odd thing to you, but you'll use the construct in almost every Perl script you write.) Anyway, the following lines are equivalent to each other:

while (defined($_ = <stdin>)) { print; } while (<stdin>) { print; } for (;<stdin>;) { print; } print while defined($_ = <stdin>); print while <stdin>;

The filehandles STDIN, STDOUT and STDERR are predefined. (The filehandles stdin, stdout and stderr will also work except in packages, where they would be interpreted as local identifiers rather than global.) Additional filehandles may be created with the open() function. See open for details on this.

If a <FILEHANDLE> is used in a context that is looking for a list, a list consisting of all the input lines is returned, one line per list element. It's easy to make a LARGE data space this way, so use with care.

The null filehandle <> is special and can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the @ARGV array is checked, and if it is null, $ARGV [0] is set to ``-'', which when opened gives you standard input. The @ARGV array is then processed as a list of filenames. The loop

while (<>) { ... # code for each line }

is equivalent to the following Perl-like pseudo code:

unshift(@ARGV, '-') if $#ARGV <$[; while ($ARGV="shift)" { open(ARGV, $ARGV); while (<ARGV>) { ... # code for each line } }

except that it isn't so cumbersome to say, and will actually work. It really does shift array @ARGV and put the current filename into variable $ARGV . It also uses filehandle ARGV internally--<> is just a synonym for <ARGV>, which is magical. (The pseudo code above doesn't work because it treats <ARGV> as non-magical.)

You can modify @ARGV before the first <> as long as the array ends up containing the list of filenames you really want. Line numbers ( $. ) continue as if the input were one big happy file. (But see example under eof() for how to reset line numbers on each file.)

If you want to set @ARGV to your own list of files, go right ahead. If you want to pass switches into your script, you can use one of the Getopts modules or put a loop on the front like this:

while ($_ = $ARGV[0], /^-/) { shift; last if /^--$/; if (/^-D(.*)/) { $debug = $1 } if (/^-v/) { $verbose++ } ... # other switches } while (<>) { ... # code for each line }

The <> symbol will return FALSE only once. If you call it again after this it will assume you are processing another @ARGV list, and if you haven't set @ARGV , will input from STDIN.

If the string inside the angle brackets is a reference to a scalar variable (e.g. <$foo>), then that variable contains the name of the filehandle to input from, or a reference to the same. For example:

$fh = \*STDIN; $line = <$fh>;

If the string inside angle brackets is not a filehandle or a scalar variable containing a filehandle name or reference, then it is interpreted as a filename pattern to be globbed, and either a list of filenames or the next filename in the list is returned, depending on context. One level of $ interpretation is done first, but you can't say <$foo> because that's an indirect filehandle as explained in the previous paragraph. In older version of Perl, programmers would insert curly brackets to force interpretation as a filename glob: <${foo}>. These days, it's consdired cleaner to call the internal function directly as glob($foo) , which is probably the right way to have done it in the first place.) Example:

while (<*.c>) { chmod 0644, $_; }

is equivalent to

open(FOO, &quot;echo *.c | tr -s ' \t\r\f' '\\012\\012\\012\\012'|&quot;); while (<foo>) { chop; chmod 0644, $_; }

In fact, it's currently implemented that way. (Which means it will not work on filenames with spaces in them unless you have csh(1) on your machine.) Of course, the shortest way to do the above is:

chmod 0644, <*.c>;

Because globbing invokes a shell, it's often faster to call readdir() yourself and just do your own grep() on the filenames. Furthermore, due to its current implementation of using a shell, the glob() routine may get "Arg list too long" errors (unless you've installed tcsh(1L) as /bin/csh).

A glob only evaluates its (embedded) argument when it is starting a new list. All values must be read before it will start over. In a list context this isn't important, because you automatically get them all anyway. In a scalar context, however, the operator returns the next value each time it is called, or a FALSE value if you've just run out. Again, FALSE is returned only once. So if you're expecting a single value from a glob, it is much better to say

($file) = <blurch*>;

than

$file = <blurch*>;

because the latter will alternate between returning a filename and returning FALSE.

It you're trying to do variable interpolation, it's definitely better to use the glob() function, because the older notation can cause people to become confused with the indirect filehandle notatin.

@files = glob(&quot;$dir/*.[ch]&quot;); @files = glob($files[$i]);


Constant Folding

Like C, Perl does a certain amount of expression evaluation at compile time, whenever it determines that all of the arguments to an operator are static and have no side effects. In particular, string concatenation happens at compile time between literals that don't do variable substitution. Backslash interpretation also happens at compile time. You can say

'Now is the time for all' . &quot;\n&quot; . 'good men to come to.'

and this all reduces to one string internally. Likewise, if you say

foreach $file (@filenames) { if (-s $file &gt; 5 + 100 * 2**16) { ... } }

the compiler will pre-compute the number that expression represents so that the interpreter won't have to.


Integer arithmetic

By default Perl assumes that it must do most of its arithmetic in floating point. But by saying

use integer;

you may tell the compiler that it's okay to use integer operations from here to the end of the enclosing BLOCK. An inner BLOCK may countermand this by saying

no integer;

which lasts until the end of that BLOCK.