Book HomeJava and XSLTSearch this book

4.2. Data Types and Variables

Perl has three basic data types: scalars, arrays, and hashes.

Scalars are essentially simple variables. They are preceded by a dollar sign ($). A scalar is either a number, a string, or a reference. (A reference is a scalar that points to another piece of data. References are discussed later in this chapter.) If you provide a string in which a number is expected or vice versa, Perl automatically converts the operand using fairly intuitive rules.

Arrays are ordered lists of scalars accessed with a numeric subscript (subscripts start at 0). They are preceded by an "at" sign (@).

Hashes are unordered sets of key/value pairs accessed with the keys as subscripts. They are preceded by a percent sign (%).

4.2.1. Numbers

Perl stores numbers internally as either signed integers or double-precision, floating-point values. Numeric literals are specified by any of the following floating-point or integer formats:

12345
Integer

-54321
Negative integer

12345.67
Floating point

6.02E23
Scientific notation

0xffff
Hexadecimal

0377
Octal

4_294_967_296
Underline for legibility

Since Perl uses the comma as a list separator, you cannot use a comma for improving the legibility of a large number. To improve legibility, Perl allows you to use an underscore character instead. The underscore works only within literal numbers specified in your program, not in strings functioning as numbers or in data read from somewhere else. Similarly, the leading 0x for hex and 0 for octal work only for literals. The automatic conversion of a string to a number does not recognize these prefixes—you must do an explicit conversion.

Be aware that in Perl 5.8, there are many changes in how Perl deals with integers and floating-point numbers. Regardless of how your system handles numbers and conversion between characters and numbers, Perl 5.8 works around system deficiencies to force more accurate number handling. Furthermore, whereas prior to 5.8 Perl used floating-point numbers exclusively in math operations, Perl 5.8 now uses and stores integers in numeric conversions and in arithmetic operations.

4.2.2. String Interpolation

Strings are sequences of characters. String literals are usually delimited by either single (') or double (") quotes. Double-quoted string literals are subject to backslash and variable interpolation, and single-quoted strings are not (except for \' and \\, used to put single quotes and backslashes into single-quoted strings). You can embed newlines directly in your strings.

Table 4-1 lists all the backslashed or escape characters that can be used in double-quoted strings.

Table 4-1. Double-quoted string representations

Code

Meaning

\n

Newline

\r

Carriage return

\t

Horizontal tab

\f

Form feed

\b

Backspace

\a

Alert (bell)

\e

ESC character

\033

ESC in octal

\x7f

DEL in hexadecimal

\cC

Ctrl-C

\\

Backslash

\"

Double quote

\u

Force next character to uppercase

\l

Force next character to lowercase

\U

Force all following characters to uppercase

\L

Force all following characters to lowercase

\Q

Backslash all following non-alphanumeric characters

\E

End \U, \L, or \Q

Table 4-2 lists alternative quoting schemes that can be used in Perl. These are useful in diminishing the number of commas and quotes you may have to type, and they allow you not to worry about escaping characters such as backslashes when there are many instances in your data. The generic forms allow you to use any non-alphanumeric, non-whitespace characters as delimiters in place of the slash (/). If the delimiters are single quotes, no variable interpolation is done on the pattern. Parentheses, brackets, braces, and angle brackets can be used as delimiters in their standard opening and closing pairs.

Table 4-2. Quoting syntax in Perl

Customary

Generic

Meaning

Interpolation

''

q//

Literal

No

""

qq//

Literal

Yes

''

qx//

Command

Yes

( )

qw//

Word list

No

( )

qr//

Pattern

Yes

//

m//

Pattern match

Yes

s///

s///

Substitution

Yes

y///

tr///

Translation

No

4.2.3. Here Documents

A line-oriented form of quoting is based on the Unix shell "here-document" syntax. Following a <<, you specify a string to terminate the quoted material, and all lines following the current line down to the terminating string are the value of the item. This is of particular importance if you're trying to print something like HTML that would be cleaner to print as a chunk instead of as individual lines. For example:

#!/usr/local/bin/perl -w

my $Price = 'right';
    
print <<"EOF";
The price is $Price.
EOF

The terminating string does not have to be quoted. For example, the previous example could have been written as:

#!/usr/local/bin/perl -w

my $Price = 'right';
    
print <<EOF;
The price is $Price.
EOF

You can assign here documents to a string:

my $assign_this_heredoc =<< "EOS";
This string is assigned to $whatever.
EOS

You can use a here document to execute commands:

#!/usr/local/bin/perl -w

print <<`CMD`;
ls -l
CMD

You can stack here documents:

#!/usr/local/bin/perl -w

print <<"joe", <<"momma"; # You can stack them
I said foo.
joe
I said bar.
momma

One caveat about here documents: you may have noticed in each of these examples that the quoted text is always left-justified. That's because any whitespace used for indentation will be included in the string. For example:

#!/usr/local/bin/perl -w

print <<"    INDENTED";
    Same old, same old.
    INDENTED

Although you can use a trick of including whitespace in the terminating tag to keep it indented (as we did here), the string itself will have the whitespace embedded—in this case, it will be Same old, same old..

4.2.4. Lists

A list is an ordered group of scalar values. A literal list can be composed as a comma-separated list of values contained in parentheses, for example:

(1,2,3)                  # Array of three values 1, 2, and 3
("one","two","three")    # Array of three values "one", "two", and "three"

The generic form of list creation uses the quoting operator qw// to contain a list of values separated by whitespace:

qw/snap crackle pop/

With the quoting operators, you're not limited to // when you use one of the operators. You can use just about any character you want. The following is exactly the same as the example above:

qw!snap crackle pop!

It's important that you remember not to use any delimiters except whitespace with qw//. If you do, these delimiters will be handled as list members:

@foods = qw/fish, beef, lettuce, cat, apple/; # EL WRONG-O!
foreach (@foods) {
    print $_; # Prints fish and then a literal comma, etc.
}

4.2.5. Variables

A variable always begins with the character that identifies its type: $, @, or %. Most of the variable names you create can begin with a letter or underscore, followed by any combination of letters, digits, or underscores, up to 255 characters in length. Upper- and lowercase letters are distinct. Variable names that begin with a digit can contain only digits, and variable names that begin with a character other than an alphanumeric or underscore can contain only that character. The latter forms are usually predefined variables in Perl, so it is best to name your variables beginning with a letter or underscore.

Variables have the undef value before they are first assigned or when they become "empty." For scalar variables, undef evaluates to 0 when used as a number, and a zero-length, empty string ("") when used as a string.

Simple variable assignment uses the assignment operator (=) with the appropriate data. For example:

$age = 26;                # Assigns 26 to $age
@date = (8, 24, 70);      # Assigns the three-element list to @date
%fruit = ('apples', 3, 'oranges', 6); 
 # Assigns the list elements to %fruit in key/value pairs

Scalar variables are always named with an initial $, even when referring to a scalar value that is part of an array or hash.

Every variable type has its own namespace. You can, without fear of conflict, use the same name for a scalar variable, an array, or a hash (or, for that matter, a filehandle, a subroutine name, or a label). This means that $foo and @foo are two different variables. It also means that $foo[1] is an element of @foo, not a part of $foo.

4.2.5.1. Arrays

An array is a variable that stores an ordered list of scalar values. Arrays are preceded by an "at" sign (@).

@numbers = (1,2,3);        # Set the array @numbers to (1,2,3)

To refer to a single element of an array, use the dollar sign ($) with the variable name (it's a scalar), followed by the index of the element in square brackets (the subscript operator). Array elements are numbered starting at 0. Negative indexes count backwards from the last element in the list (i.e., -1 refers to the last element of the list). For example, in this list:

@date = (8, 24, 70);

$date[2] is the value of the third element, 70.

4.2.5.2. Hashes

A hash is a set of key/value pairs. Hashes are preceded by a percent sign (%). To refer to a single element of a hash, you use the hash variable name followed by the "key" associated with the value in braces. For example, the hash:

%fruit = ('apples', 3, 'oranges', 6);

has two values (in key/value pairs). If you want to get the value associated with the key apples, you use $fruit{'apples'}.

It is often more readable to use the => operator in defining key/value pairs. The => operator is similar to a comma, but it's more visually distinctive and quotes any bare identifiers to the left of it:

%fruit = (
    apples  => 3,
    oranges => 6
);

4.2.6. Scalar and List Contexts

Every operation that you invoke in a Perl script is evaluated in a specific context, and how that operation behaves may depend on the context it is being called in. There are two major contexts: scalar and list. All operators know which context they are in, and some return lists in contexts wanting a list and scalars in contexts wanting a scalar. For example, the localtime function returns a nine-element list in list context:

($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime( );

But in a scalar context, localtime returns the number of seconds since January 1, 1970:

$now = localtime( );

Statements that look confusing are easy to evaluate by identifying the proper context. For example, assigning what is commonly a list literal to a scalar variable:

$a = (2, 4, 6, 8);

gives $a the value 8. The context forces the right side to evaluate to a scalar, and the action of the comma operator in the expression (in the scalar context) returns the value farthest to the right.

Another type of statement that might be confusing is the evaluation of an array or hash variable as a scalar. For example:

$b = @c;

When an array variable is evaluated as a scalar, the number of elements in the array is returned. This type of evaluation is useful for finding the number of elements in an array. The special $#array form of an array value returns the index of the last member of the list (one less than the number of elements).

If necessary, you can force a scalar context in the middle of a list by using the scalar function.

4.2.7. Declarations and Scope

In Perl, only subroutines and formats require explicit declaration. Variables (and similar constructs) are automatically created when they are first assigned.

Variable declaration comes into play when you need to limit the scope of a variable's use. You can do this in two ways:

Dynamic scoping
Creates temporary objects within a scope. Dynamically scoped constructs are visible globally, but take action only within their defined scopes. Dynamic scoping applies to variables declared with local.

Lexical scoping
Creates private constructs that are visible only within their scopes. The most frequently seen form of lexically scoped declaration is the declaration of my variables.

Therefore, we can say that a local variable is dynamically scoped, whereas a my variable is lexically scoped. Dynamically scoped variables are visible to functions called from within the block in which they are declared. Lexically scoped variables, on the other hand, are totally hidden from the outside world, including any called subroutines, unless they are declared within the same scope. See Section 4.7, "Subroutines" later in this chapter for further discussion.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.