Chapter 23. Security

Whether you're dealing with a user sitting at the keyboard typing commands or someone sending information across the network, you need to be careful about the data coming into your programs, since the other person may, either maliciously or accidentally, send you data that will do more harm than good. Perl provides a special security-checking mechanism called taint mode, whose purpose is to isolate tainted data so that you won't use it to do something you didn't intend to do. For instance, if you mistakenly trust a tainted filename, you might end up appending an entry to your password file when you thought you were appending to a log file. The mechanism of tainting is covered in the section Section 23.1, "Handling Insecure Data".

In multitasking environments, offstage actions by unseen actors can affect the security of your own program. If you presume exclusive ownership of external objects (especially files) as though yours were the only process on the system, you expose yourself to errors substantially subtler than those that come from directly handling data or code of dubious provenance. Perl helps you out a little here by detecting some situations that are beyond your control, but for those that you can control, the key is understanding which approaches are proof against unseen meddlers. The section Section 23.2, "Handling Timing Glitches" discusses these matters.

If the data you get from a stranger happens to be a bit of source code to execute, you need to be even more careful than you would with their data. Perl provides checks to intercept stealthy code masquerading as data so you don't execute it unintentionally. If you do want to execute foreign code, though, the Safe module lets you quarantine suspect code where it can't do any harm and might possibly do some good. These are the topics of the section Section 23.3, "Handling Insecure Code".

23.1. Handling Insecure Data

Perl makes it easy to program securely even when your program is being used by someone less trustworthy than the program itself. That is, some programs need to grant limited privileges to their users, without giving away other privileges. Setuid and setgid programs fall into this category on Unix, as do programs running in various privileged modes on other operating systems that support such notions. Even on systems that don't, the same principle applies to network servers and to any programs run by those servers (such as CGI scripts, mailing list processors, and daemons listed in /etc/inetd.conf). All such programs require a higher level of scrutiny than normal.

Even programs run from the command line are sometimes good candidates for taint mode, especially if they're meant to be run by a privileged user. Programs that act upon untrusted data, like those that generate statistics from log files or use LWP::* or Net::* to fetch remote data, should probably run with tainting explicitly turned on; programs that are not prudent risk being turned into "Trojan horses". Since programs don't get any kind of thrill out of risk taking, there's no particular reason for them not to be careful.

Compared with Unix command-line shells, which are really just frameworks for calling other programs, Perl is easy to program securely because it's straightforward and self-contained. Unlike most shell programming languages, which are based on multiple, mysterious substitution passes on each line of the script, Perl uses a more conventional evaluation scheme with fewer hidden snags. Additionally, because the language has more built-in functionality, it can rely less upon external (and possibly untrustworthy) programs to accomplish its purposes.

Under Unix, Perl's home town, the preferred way to compromise system security was to cajole a privileged program into doing something it wasn't supposed to do. To stave off such attacks, Perl developed a unique approach for coping with hostile environments. Perl automatically enables taint mode whenever it detects its program running with differing real and effective user or group IDs.[1] Even if the file containing your Perl script doesn't have the setuid or setgid bits turned on, that script can still find itself executing in taint mode. This happens if your script was invoked by another program that was itself running under differing IDs. Perl programs that weren't designed to operate under taint mode tend to expire prematurely when caught violating safe tainting policy. This is just as well, since these are the sorts of shenanigans that were historically perpetrated on shell scripts to compromise system security. Perl isn't that gullible.

[1] The setuid bit in Unix permissions is mode 04000, and the setgid bit is 02000; either or both may be set to grant the user of the program some of the privileges of the owner (or owners) of the program. (These are collectively known as set-id programs.) Other operating systems may confer special privileges on programs in other ways, but the principle is the same.

You can also enable taint mode explicitly with the -T command-line switch. You should do this for daemons, servers, and any programs that run on behalf of someone else, such as CGI scripts. Programs that can be run remotely and anonymously by anyone on the Net are executing in the most hostile of environments. You should not be afraid to say "No!" occasionally. Contrary to popular belief, you can exercise a great deal of prudence without dehydrating into a wrinkled prude.

On the more security-conscious sites, running all CGI scripts under the -T flag isn't just a good a idea: it's the law. We're not claiming that running in taint mode is sufficient to make your script secure. It's not, and it would take a whole book just to mention everything that would. But if you aren't executing your CGI scripts under taint mode, you've needlessly abandoned the strongest protection Perl can give you.

While in taint mode, Perl takes special precautions called taint checks to prevent traps both obvious and subtle. Some of these checks are reasonably simple, such as verifying that dangerous environment variables aren't set and that directories in your path aren't writable by others; careful programmers have always used checks like these. Other checks, however, are best supported by the language itself, and it is these checks especially that contribute to making a privileged Perl program more secure than the corresponding C program, or a Perl CGI script more secure than one written in any language without taint checks. (Which, as far as we know, is any language other than Perl.)

The principle is simple: you may not use data derived from outside your program to affect something else outside your program--at least, not by accident. Anything that comes from outside your program is marked as tainted, including all command-line arguments, environment variables, and file input. Tainted data may not be used directly or indirectly in any operation that invokes a subshell, nor in any operation that modifies files, directories, or processes. Any variable set within an expression that has previously referenced a tainted value becomes tainted itself, even if it is logically impossible for the tainted value to influence the variable. Because taintedness is associated with each scalar, some individual values in an array or hash might be tainted and others might not. (Only the values in a hash can be tainted, though, not the keys.)

The following code illustrates how tainting would work if you executed all these statements in order. Statements marked "Insecure" will trigger an exception, whereas those that are "OK" will not.

$arg = shift(@ARGV);          # $arg is now tainted (due to @ARGV).
$hid = "$arg, 'bar'";         # $hid also tainted (due to $arg).
$line = <>;                   # Tainted (reading from external file).
$path = $ENV{PATH};           # Tainted due to %ENV, but see below.
$mine = 'abc';                # Not tainted.

system "echo $mine";          # Insecure until PATH set.
system "echo $arg";           # Insecure: uses sh with tainted $arg.
system "echo", $arg;          # OK once PATH set (doesn't use sh).
system "echo $hid";           # Insecure two ways: taint, PATH.

$oldpath = $ENV{PATH};        # $oldpath is tainted (due to $ENV).
$ENV{PATH} = '/bin:/usr/bin'; # (Makes it OK to execute other programs.)
$newpath = $ENV{PATH};        # $newpath is NOT tainted.

delete @ENV{qw{IFS
               CDPATH
               ENV
               BASH_ENV}};    # Makes %ENV safer.

system "echo $mine";          # OK, is secure once path is reset.
system "echo $hid";           # Insecure via tainted $hid.

open(OOF, "< $arg");          # OK (read-only opens not checked).
open(OOF, "> $arg");          # Insecure (trying to write to tainted arg).

open(OOF, "echo $arg|")       # Insecure due to tainted $arg, but...
    or die "can't pipe from echo: $!";

open(OOF,"-|")                # Considered OK: see below for taint
    or exec "echo", $arg      #   exemption on exec'ing a list.
    or die "can't exec echo: $!";

open(OOF,"-|", "echo", $arg   # Same as previous, likewise OKish.
    or die "can't pipe from echo: $!";

$shout = `echo $arg`;         # Insecure via tainted $arg.
$shout = `echo abc`;          # $shout is tainted due to backticks.
$shout2 = `echo $shout`;      # Insecure via tainted $shout.

unlink $mine, $arg;           # Insecure via tainted $arg.
umask $arg;                   # Insecure via tainted $arg.

exec "echo $arg";             # Insecure via tainted $arg passed to shell.
exec "echo", $arg;            # Considered OK! (But see below.)
exec "sh", '-c', $arg;        # Considered OK, but isn't really!

If you try to do something insecure, you get an exception (which unless trapped, becomes a fatal error) such as "Insecure dependency" or "Insecure $ENV{PATH}". See the section Section 23.1.2, "Cleaning Up Your Environment" later.

If you pass a LIST to a system, exec, or pipe open, the arguments are not inspected for taintedness, because with a LIST of arguments, Perl doesn't need to invoke the potentially dangerous shell to run the command. You can still easily write an insecure system, exec, or pipe open using the LIST form, as demonstrated in the final example above. These forms are exempt from checking because you are presumed to know what you're doing when you use them.

Sometimes, though, you can't tell how many arguments you're passing. If you supply these functions with an array[2] that contains just one element, then it's just as though you passed one string in the first place, so the shell might be used. The solution is to pass an explicit path in the indirect-object slot:

system @args;                 # Won't call the shell unless @args == 1.
system { $args[0] } @args;    # Bypasses shell even with one-argument list.

[2]Or a function that produces a list.

23.1.1. Detecting and Laundering Tainted Data

To test whether a scalar variable contains tainted data, you can use the following is_tainted function. It makes use of the fact that evalSTRING raises an exception if you try to compile tainted data. It doesn't matter that the $nada variable used in the expression to compile will always be empty; it will still be tainted if $arg is tainted. The outer evalBLOCK isn't doing any compilation. It's just there to catch the exception raised if the inner eval is given tainted data. Since the $@ variable is guaranteed to be nonempty after each eval if an exception was raised and empty otherwise, we return the result of testing whether its length was zero:

sub is_tainted {
    my $arg = shift;
    my $nada = substr($arg, 0, 0);  # zero-length
    local $@;  # preserve caller's version
    eval { eval "# $nada" };
    return length($@) != 0;
}

But testing for taintedness only gets you so far. Usually you know perfectly well which variables contain tainted data--you just have to clear the data's taintedness. The only official way to bypass the tainting mechanism is by referencing submatches returned by an earlier regular expression match.[3] When you write a pattern that contains capturing parentheses, you can access the captured substrings through match variables like $1, $2, and $+, or by evaluating the pattern in list context. Either way, the presumption is that you knew what you were doing when you wrote the pattern and wrote it to weed out anything dangerous. So you need to give it some real thought--never blindly untaint, or else you defeat the entire mechanism.

[3]An unofficial way is by storing the tainted string as the key to a hash and fetching back that key. Because keys aren't really full SVs (internal name scalar values), they don't carry the taint property. This behavior may be changed someday, so don't rely on it. Be careful when handling keys, lest you unintentionally untaint your data and do something unsafe with them.

It's better to verify that the variable contains only good characters than to check whether it contains any bad characters. That's because it's far too easy to miss bad characters that you never thought of. For example, here's a test to make sure $string contains nothing but "word" characters (alphabetics, numerics, and underscores), hyphens, at signs, and dots:

if ($string =~ /^([-\@\w.]+)$/) {
    $string = $1;                     # $string now untainted.
}
else {
    die "Bad data in $string";        # Log this somewhere.
}

This renders $string fairly secure to use later in an external command, since /\w+/ doesn't normally match shell metacharacters, nor are those other characters going to mean anything special to the shell.[4] Had we used /(.+)/s instead, it would have been unsafe because that pattern lets everything through. But Perl doesn't check for that. When untainting, be exceedingly careful with your patterns. Laundering data by using regular expressions is the only approved internal mechanism for untainting dirty data. And sometimes it's the wrong approach entirely. If you're in taint mode because you're running set-id and not because you intentionally turned on -T, you can reduce your risk by forking a child of lesser privilege; see the section Section 23.1.2, "Cleaning Up Your Environment".

[4] Unless you were using an intentionally broken locale. Perl assumes that your system's locale definitions are potentially compromised. Hence, when running under the use locale pragma, patterns with a symbolic character class in them, such as \w or [[:alpha:]], produce tainted results.

The use re 'taint' pragma disables the implicit untainting of any pattern matches through the end of the current lexical scope. You might use this pragma if you just want to extract a few substrings from some potentially tainted data, but since you aren't being mindful of security, you'd prefer to leave the substrings tainted to guard against unfortunate accidents later.

Imagine you're matching something like this, where $fullpath is tainted:

($dir, $file) = $fullpath =~ m!(.*/)(.*)!s;

By default, $dir and $file would now be untainted. But you probably didn't want to do that so cavalierly, because you never really thought about the security issues. For example, you might not be terribly happy if $file contained the string "; rm -rf * ;", just to name one rather egregious example. The following code leaves the two result variables tainted if $fullpath was tainted:

{
    use re 'taint';
    ($dir, $file) = $fullpath =~ m!(.*/)(.*)!s;
}

A good strategy is to leave submatches tainted by default over the whole source file and only selectively permit untainting in nested scopes as needed:

use re 'taint';
# remainder of file now leaves $1 etc tainted
{
    no re 'taint';
    # this block now untaints re matches
    if ($num =~ /^(\d+)$/) {
        $num = $1;
    }
}

Input from a filehandle or a directory handle is automatically tainted, except when it comes from the special filehandle, DATA. If you want to, you can mark other handles as trusted sources via the IO::Handle module's untaint function:

use IO::Handle;

IO::Handle::untaint(*SOME_FH);          # Either procedurally
SOME_FH->untaint();                     # or using the OO style.

Turning off tainting on an entire filehandle is a risky move. How do you really know it's safe? If you're going to do this, you should at least verify that nobody but the owner can write to the file.[5] If you're on a Unix filesystem (and one that prudently restricts chown(2) to the superuser), the following code works:

use File::stat;
use Symbol 'qualify_to_ref';
sub handle_looks_safe(*) {
    my $fh = qualify_to_ref(shift, caller);
    my $info = stat($fh);
    return unless $info;

    # owner neither superuser nor "me", whose
    # real uid is in the $< variable
    if ($info->uid != 0 && $info->uid != $<) {
        return 0;
    }

    # check whether group or other can write file.
    # use 066 to detect for readability also
    if ($info->mode & 022) {
        return 0;
    }
    return 1;
}

use IO::Handle;
SOME_FH->untaint() if handle_looks_safe(*SOME_FH);

We called stat on the filehandle, not the filename, to avoid a dangerous race condition. See the section Section 23.2.2, "Handling Race Conditions" later in this chapter.

[5] Although you can untaint a directory handle, too, this function only works on a filehandle. That's because given a directory handle, there's no portable way to extract its file descriptor to stat.

Note that this routine is only a good start. A slightly more paranoid version would check all parent directories as well, even though you can't reliably stat a directory handle. But if any parent directory is world-writable, you know you're in trouble whether or not there are race conditions.

Perl has its own notion of which operations are dangerous, but it's still possible to get into trouble with other operations that don't care whether they use tainted values. It's not always enough to be careful of input. Perl output functions don't test whether their arguments are tainted, but in some environments, this matters. If you aren't careful of what you output, you might just end up spitting out strings that have unexpected meanings to whoever is processing the output. If you're running on a terminal, special escape and control codes could cause the viewer's terminal to act strangely. If you're in a web environment and you blindly spit back out data that was given to you, you could unknowingly produce HTML tags that would drastically alter the page's appearance. Worse still, some markup tags can even execute code back on the browser.

Imagine the common case of a guest book where visitors enter their own messages to be displayed when others come calling. A malicious guest could supply unsightly HTML tags or put in <SCRIPT>...</SCRIPT> sequences that execute code (like JavaScript) back in the browsers of subsequent guests.

Just as you should carefully check for only good characters when inspecting tainted data that accesses resources on your own system, you should apply the same care in a web environment when presenting data supplied by a user. For example, to strip the data of any character not in the specified list of good characters, try something like this:

$new_guestbook_entry =~ tr[_a-zA-Z0-9 ,./!?()@+*-][]dc;

You certainly wouldn't use that to clean up a filename, since you probably don't want filenames with spaces or slashes, just for starters. But it's enough to keep your guest book free of sneaky HTML tags and entities. Each data-laundering case is a little bit different, so always spend time deciding what is and what is not permitted. The tainting mechanism is intended to catch stupid mistakes, not to remove the need for thought.

23.1.2. Cleaning Up Your Environment

When you execute another program from within your Perl script, no matter how, Perl checks to make sure your PATH environment variable is secure. Since it came from your environment, your PATH starts out tainted, so if you try to run another program, Perl raises an "Insecure $ENV{PATH}" exception. When you set it to a known, untainted value, Perl makes sure that each directory in that path is nonwritable by anyone other than the directory's owner and group; otherwise, it raises an "Insecure directory" exception.

You may be surprised to find that Perl cares about your PATH even when you specify the full pathname of the command you want to execute. It's true that with an absolute filename, the PATH isn't used to find the executable to run. But there's no reason to trust the program you're running not to turn right around and execute some other program and get into trouble because of the insecure PATH. So Perl forces you to set a secure PATH before you call any program, no matter how you say to call it.

The PATH isn't the only environment variable that can bring grief. Because some shells use the variables IFS, CDPATH, ENV, and BASH_ENV, Perl makes sure that those are all either empty or untainted before it will run another command. Either set these variables to something known to be safe, or else delete them from the environment altogether:

delete @ENV{qw(IFS CDPATH ENV BASH_ENV)};   # Make %ENV safer

Features convenient in a normal environment can become security concerns in a hostile one. Even if you remember to disallow filenames containing newlines, it's important to understand that open accesses more than just named files. Given appropriate ornamentation on the filename argument, one- or two-argument calls to open can also run arbitrary external commands via pipes, fork extra copies of the current process, duplicate file descriptors, and interpret the special filename "-" as an alias for standard input or output. It can also ignore leading and trailing whitespace that might disguise such fancy arguments from your check patterns. While it's true that Perl's taint checking will catch tainted arguments used for pipe opens (unless you use a separated argument list) and any file opens that aren't read-only, the exception this raises is still likely to make your program misbehave.

If you intend to use any externally derived data as part of a filename to open, at least include an explicit mode separated by a space. It's probably safest, though, to use either the low-level sysopen or the three-argument form of open:

# Magic open--could be anything
open(FH, $file)             or die "can't magic open $file: $!";


# Guaranteed to be a read-only file open and not a pipe
# or fork, but still groks file descriptors and "-",
# and ignores whitespace at either end of name.
open(FH, "< $file")         or die "can't open $file: $!";

# WYSIWYG open: disables all convenience features.
open(FH, "<", $file)        or die "can't open $file: $!";

# Same properties as WYSIWYG 3-arg version.
require Fcntl;
sysopen(FH, $file, O_RDONLY)        or die "can't sysopen $file: $!";

Even these steps aren't quite good enough. Perl doesn't prevent you from opening tainted filenames for reading, so you need to be careful of what you show people. A program that opens an arbitrary, user-supplied filename for reading and then reveals that file's contents is still a security problem. What if it's a private letter? What if it's your system password file? What if it's salary information or your stock portfolio?

Look closely at filenames provided by a potentially hostile user[6] before opening them. For example, you might want to verify that there are no sneaky directory components in the path. Names like "../../../../../../../etc/passwd" are notorious tricks of this sort. You can protect yourself by making sure there are no slashes in the pathname (assuming that's your system's directory separator). Another common trick is to put newlines or semicolons into filenames that will later be interpreted by some poor, witless command-line interpreter that can be fooled into starting a new command in the middle of the filename. This is why taint mode discourages uninspected external commands.

[6]And on the Net, the only users you can trust not to be potentially hostile are the ones who are being actively hostile instead.

23.1.3. Accessing Commands and Files Under Reduced Privileges

The following discussion pertains to some nifty security facilities of Unix-like systems. Users of other systems may safely (or rather, unsafely) skip this section.

If you're running set-id, try to arrange that, whenever possible, you do dangerous operations with the privileges of the user, not the privileges of the program. That is, whenever you're going to call open, sysopen, system, backticks, and any other file or process operations, you can protect yourself by setting your effective UID or GID back to the real UID or GID. In Perl, you can do this for setuid scripts by saying $> = $< (or $EUID = $UID if you use English) and for setgid scripts by saying $( = $) ($EGID = $GID). If both IDs are set, you should reset both. However, sometimes this isn't feasible, because you might still need those increased privileges later in your program.

For those cases, Perl provides a reasonably safe way to open a file or pipe from within a set-id program. First, fork a child using the special open syntax that connects the parent and child by a pipe. In the child, reset the user and group IDs back to their original or known safe values. You also get to modify any of the child's per-process attributes without affecting the parent, letting you change the working directory, set the file creation mask, or fiddle with environment variables. No longer executing under extra privileges, the child process at last calls open and passes whatever data it manages to access on behalf of the mundane but demented user back up to its powerful but justly paranoid parent.

Even though system and exec don't use the shell when you supply them with more than one argument, the backtick operator admits no such alternative calling convention. Using the forking technique, we easily emulate backticks without fear of shell escapes, and with reduced (and therefore safer) privileges:

use English;   # to use $UID, etc
die "Can't fork open: $!"   unless defined($pid = open(FROMKID, "-|"));
if ($pid) {           # parent
    while (<FROMKID>) {
        # do something
    }
    close FROMKID;
}
else {
    $EUID = $UID;  # setuid(getuid())
    $EGID = $GID;  # setgid(getgid()), and initgroups(2) on getgroups(2)
    chdir("/")      or die "can't chdir to /: $!";
    umask(077);
    $ENV{PATH} = "/bin:/usr/bin";
    exec 'myprog', 'arg1', 'arg2';
    die "can't exec myprog: $!";
}

This is by far the best way to call other programs from a set-id script. You make sure never to use the shell to execute anything, and you drop your privileges before you yourself exec the program. (But because the list forms of system, exec, and pipe open are specifically exempted from taint checks on their arguments, you must still be careful of what you pass in.)

If you don't need to drop privileges, and just want to implement backticks or a pipe open without risking the shell intercepting your arguments, you could use this:

open(FROMKID, "-|") or exec("myprog", "arg1", "arg2")
    or die "can't run myprog: $!";

and then just read from FROMKID in the parent. As of the 5.6.1 release of Perl, you can write that as:

open(FROMKID, "-|", "myprog", "arg1", "arg2");

The forking technique is useful for more than just running commands from a set-id program. It's also good for opening files under the ID of whoever ran the program. Suppose you had a setuid program that needed to open a file for writing. You don't want to run the open under your extra privileges, but you can't permanently drop them, either. So arrange for a forked copy that's dropped its privileges to do the open for you. When you want to write to the file, write to the child, and it will then write to the file for you.

use English;

defined ($pid = open(SAFE_WRITER, "|-"))
    or die "Can't fork: $!";

if ($pid) {
    # you're the parent. write data to SAFE_WRITER child
    print SAFE_WRITER "@output_data\n";
    close SAFE_WRITER
        or die $! ? "Syserr closing SAFE_WRITER writer: $!"
                  : "Wait status $? from SAFE_WRITER writer";
}
else {
    # you're the child, so drop extra privileges
    ($EUID, $EGID) = ($UID, $GID);

    # open the file under original user's rights
    open(FH, "> /some/file/path")
        or die "can't open /some/file/path for writing: $!";

    # copy from parent (now stdin) into the file
    while (<STDIN>) {
        print FH $_;
    }
    close(FH)   or die "close failed: $!";
    exit;       # Don't forget to make the SAFE_WRITER disappear.
}

Upon failing to open the file, the child prints an error message and exits. When the parent writes to the now-defunct child's filehandle, it triggers a broken pipe signal (SIGPIPE), which is fatal unless trapped or ignored. See the section Section 23.1, "Signals" in Chapter 16, "Interprocess Communication".


22.3. Creating CPAN Modules		23.2. Handling Timing Glitches

Chapter 23. Security

Contents:

23.1. Handling Insecure Data

23.1.1. Detecting and Laundering Tainted Data

23.1.2. Cleaning Up Your Environment

23.1.3. Accessing Commands and Files Under Reduced Privileges