Tying Filehandles (Programming Perl)

14.4. Tying Filehandles

A class implementing a tied filehandle should define the following methods: TIEHANDLE and at least one of PRINT, PRINTF, WRITE, READLINE, GETC, and READ. The class can also provide a DESTROY method, and BINMODE, OPEN, CLOSE, EOF, FILENO, SEEK, TELL, READ, and WRITE methods to enable the corresponding Perl built-ins for the tied filehandle. (Well, that isn't quite true: WRITE corresponds to syswrite and has nothing to do with Perl's built-in write function for printing with format declarations.)

Tied filehandles are especially useful when Perl is embedded in another program (such as Apache or vi) and output to STDOUT or STDERR needs to be redirected in some special way.

But filehandles don't actually have to be tied to a file at all. You can use output statements to build up an in-memory data structure and input statements to read them back in. Here's an easy way to reverse a sequence of print and printf statements without reversing the individual lines:

package ReversePrint;
use strict;
sub TIEHANDLE {
    my $class = shift;
    bless [], $class;
}
sub PRINT {
    my $self = shift;
    push @$self, join '', @_;
}
sub PRINTF {
    my $self = shift;
    my $fmt = shift;
    push @$self, sprintf $fmt, @_;
}
sub READLINE {
    my $self = shift;
    pop @$self;
}
 
package main;
my $m = "--MORE--\n";
tie *REV, "ReversePrint";
 
# Do some prints and printfs.
print REV "The fox is now dead.$m";

printf REV <<"END", int rand 10000000;
The quick brown fox jumps over
over the lazy dog %d times!
END
 
print REV <<"END";
The quick brown fox jumps
over the lazy dog.
END
 
# Now read back from the same handle.
print while <REV>;

This prints:

The quick brown fox jumps 
over the lazy dog.
The quick brown fox jumps over
over the lazy dog 3179357 times!
The fox is now dead.--MORE--

14.4.1. Filehandle-Tying Methods

For our extended example, we'll create a filehandle that uppercases strings printed to it. Just for kicks, we'll begin the file with <SHOUT> when it's opened and end with </SHOUT> when it's closed. That way we can rant in well-formed XML.

Here's the top of our Shout.pm file that will implement the class:

package Shout;
use Carp;                # So we can croak our errors

We'll now list the method definitions in Shout.pm.

CLASSNAME->TIEHANDLE(LIST)

This is the constructor for the class, which as usual should return a blessed reference.

sub TIEHANDLE {
    my $class = shift;
    my $form = shift;
    open my $self, $form, @_   or croak "can't open $form@_: $!";
    if ($form =~ />/) {
        print $self  "<SHOUT>\n";
        $$self->{WRITING} = 1;     # Remember to do end tag
    }
    return bless $self, $class;    # $self is a glob ref
}

Here, we open a new filehandle according to the mode and filename passed to the tie operator, write <SHOUT> to the file, and return a blessed reference to it. There's a lot of stuff going on in that open statement, but we'll just point out that, in addition to the usual "open or die" idiom, the my $self furnishes an undefined scalar to open, which knows to autovivify it into a typeglob. The fact that it's a typeglob is also significant, because not only does the typeglob contain the real I/O object of the file, but it also contains various other handy data structures that come along for free, like a scalar ($$$self), an array (@$$self), and a hash (%$$self). (We won't mention the subroutine, &$$self.)

The $form is the filename-or-mode argument. If it's a filename, @_ is empty, so it behaves as a two-argument open. Otherwise, $form is the mode for the rest of the arguments.

After the open, we test to see whether we should write the beginning tag. If so, we do. And right away, we use one of those glob data structures we mentioned. That $$self->{WRITING} is an example of using the glob to store interesting information. In this case, we remember whether we did the beginning tag so we know whether to do the corresponding end tag. We're using the %$$self hash, so we can give the field a decent name. We could have used the scalar as $$$self, but that wouldn't be self-documenting. (Or it would only be self-documenting, depending on how you look at it.)

SELF->PRINT(LIST)

This method implements a print to the tied handle. The LIST is whatever was passed to print. Our method below uppercases each element of LIST:

sub PRINT {
    my $self = shift;
    print $self map {uc} @_;
}

SELF->READLINE

This method supplies the data when the filehandle is read from via the angle operator (<FH>) or readline. The method should return undef when there is no more data.

sub READLINE {
    my $self = shift;
    return <$self>;
}

Here, we simply return <$self> so that the method will behave appropriately depending on whether it was called in scalar or list context.

SELF->GETC

This method runs whenever getc is used on the tied filehandle.

sub GETC {
    my $self = shift;
    return getc($self);
}

Like several of the methods in our Shout class, the GETC method simply calls its corresponding Perl built-in and returns the result.

SELF->OPEN(LIST)

Our TIEHANDLE method itself opens a file, but a program using the Shout class that calls open afterward triggers this method.

sub OPEN {
    my $self = shift;
    my $form = shift;
    my $name = "$form@_";
    $self->CLOSE;
    open($self, $form, @_)      or croak "can't reopen $name: $!";
    if ($form =~ />/) {
        print $self "<SHOUT>\n" or croak "can't start print: $!";
        $$self->{WRITING} = 1;     # Remember to do end tag
    }
    else {
        $$self->{WRITING} = 0;     # Remember not to do end tag
    }
    return 1;
}

We invoke our own CLOSE method to explicitly close the file in case the user didn't bother to. Then we open a new file with whatever filename was specified in the open and shout at it.

SELF->CLOSE

This method deals with the request to close the handle. Here, we seek to the end of the file and, if that was successful, print </SHOUT> before using Perl's built-in close.

sub CLOSE {
    my $self = shift;
    if ($$self->{WRITING}) {
        $self->SEEK(0, 2)             or return;
        $self->PRINT("</SHOUT>\n")    or return;
    }
    return close $self;
}

SELF->SEEK(LIST)

When you seek on a tied filehandle, the SEEK method gets called.

sub SEEK {
    my $self = shift;
    my ($offset, $whence) = @_;
    return seek($self, $offset, $whence);
}

SELF->TELL

This method is invoked when tell is used on the tied handle.

sub TELL {
    my $self = shift;
    return tell $self;
}

SELF->PRINTF(LIST)

This method is run whenever printf is used on the tied handle. The LIST will contain the format and the items to be printed.

sub PRINTF {
    my $self = shift;
    my $template = shift;
    return $self->PRINT(sprintf $template, @_);
}

Here, we use sprintf to generate the formatted string and pass it to PRINT for uppercasing. There's nothing that requires you to use the built-in sprintf function though. You could interpret the percent escapes to suit your own purpose.

SELF->READ(LIST)

This method responds when the handle is read using read or sysread. Note that we modify the first argument of LIST "in-place", mimicking read's ability to fill in the scalar passed in as its second argument.

sub READ {
    my ($self, undef, $length, $offset) = @_;
    my $bufref = \$_[1];
    return read($self, $$bufref, $length, $offset);
}

SELF->WRITE(LIST)

This method gets invoked when the handle is written to with syswrite. Here, we uppercase the string to be written.

sub WRITE {
    my $self = shift;
    my $string = uc(shift);
    my $length = shift || length $string;
    my $offset = shift || 0;
    return syswrite $self, $string, $length, $offset;
}

SELF->EOF

This method returns a Boolean value when a filehandle tied to the Shout class is tested for its end-of-file status using eof.

sub EOF {
    my $self = shift;
    return eof $self;
}

SELF->BINMODE(DISC)

This method specifies the I/O discipline to be used on the filehandle. If none is specified, it puts the tied filehandle into binary mode (the :raw discipline), for filesystems that distinguish between text and binary files.

sub BINMODE {
    my $self = shift;
    my $disc = shift || ":raw";
    return binmode $self, $disc;
}

That's how you'd write it, but it's actually useless in our case because the open already wrote on the handle. So in our case we should probably make it say:

sub BINMODE { croak("Too late to use binmode") }

SELF->FILENO

This method should return the file descriptor (fileno) associated with the tied filehandle by the operating system.

sub FILENO {
    my $self = shift;
    return fileno $self;
}

SELF->DESTROY

As with the other types of ties, this method is triggered when the tied object is about to be destroyed. This is useful for letting the object clean up after itself. Here, we make sure that the file is closed, in case the program forgot to call close. We could just say close $self, but it's better to invoke the CLOSE method of the class. That way if the designer of the class decides to change how files are closed, this DESTROY method won't have to be modified.

sub DESTROY {
    my $self = shift;
    $self->CLOSE;      # Close the file using Shout's CLOSE method.
}

Here's a demonstration of our Shout class:

#!/usr/bin/perl
use Shout;
tie(*FOO, Shout::, ">filename");
print FOO "hello\n";            # Prints HELLO.
seek FOO, 0, 0;                 # Rewind to beginning.
@lines = <FOO>;                 # Calls the READLINE method.
close FOO;                      # Close file explicitly.
open(FOO, "+<", "filename");    # Reopen FOO, calling OPEN.
seek(FOO, 8, 0);                # Skip the "<SHOUT>\n".
sysread(FOO, $inbuf, 5);        # Read 5 bytes from FOO into $inbuf.
print "found $inbuf\n";         # Should print "hello".
seek(FOO, -5, 1);               # Back up over the "hello".
syswrite(FOO, "ciao!\n", 6);    # Write 6 bytes into FOO.
untie(*FOO);                    # Calls the CLOSE method implicitly.

After running this, the file contains:

<SHOUT>
CIAO!
</SHOUT>

Here are some more strange and wonderful things to do with that internal glob. We use the same hash as before, but with new keys PATHNAME and DEBUG. First we install a stringify overloading so that printing one of our objects reveals the pathname (see Chapter 13, "Overloading"):

# This is just so totally cool!
use overload q("") => sub { $_[0]->pathname };

# This is the stub to put in each function you want to trace.
sub trace {
    my $self = shift;
    local $Carp::CarpLevel = 1;
    Carp::cluck("\ntrace magical method") if $self->debug;
}

# Overload handler to print out our path.
sub pathname {
    my $self = shift;
    confess "i am not a class method" unless ref $self;
    $$self->{PATHNAME} = shift if @_;
    return $$self->{PATHNAME};
}
# Dual moded.
sub debug {
    my $self = shift;
    my $var = ref $self ? \$$self->{DEBUG} : \our $Debug;
    $$var = shift if @_;
    return ref $self ? $$self->{DEBUG} || $Debug : $Debug;
}

And then call trace on entry to all your ordinary methods like this:

sub GETC { $_[0]->trace;           # NEW
    my($self) = @_;
    getc($self);
}

And also set the pathname in TIEHANDLE and OPEN:

sub TIEHANDLE {
    my $class    = shift;
    my $form = shift;
    my $name = "$form@_";          # NEW
    open my $self, $form, @_   or croak "can't open $name: $!";
    if ($form =~ />/) {
        print $self  "<SHOUT>\n";
        $$self->{WRITING} = 1;     # Remember to do end tag
    }
    bless $self, $class;           # $fh is a glob ref
    $self->pathname($name);        # NEW
    return $self;
}

sub OPEN { $_[0]->trace;           # NEW
    my $self = shift;
    my $form = shift;
    my $name = "$form@_";          
    $self->CLOSE;
    open($self, $form, @_)      or croak "can't reopen $name: $!";
    $self->pathname($name);        # NEW
    if ($form =~ />/) {
        print $self "<SHOUT>\n" or croak "can't start print: $!";
        $$self->{WRITING} = 1;     # Remember to do end tag
    }
    else {
        $$self->{WRITING} = 0;     # Remember not to do end tag
    }
    return 1;
}

Somewhere you also have to call $self->debug(1) to turn debugging on. When you do that, all your Carp::cluck calls will produce meaningful messages. Here's one that we get while doing the reopen above. It shows us three deep in method calls, as we're closing down the old file in preparation for opening the new one:

trace magical method at foo line 87
    Shout::SEEK('>filename', '>filename', 0, 2) called at foo line 81
    Shout::CLOSE('>filename') called at foo line 65
    Shout::OPEN('>filename', '+<', 'filename') called at foo line 141

14.4.2. Creative Filehandles

You can tie the same filehandle to both the input and the output of a two-ended pipe. Suppose you wanted to run the bc(1) (arbitrary precision calculator) program this way:

use Tie::Open2;

tie *CALC, 'Tie::Open2', "bc -l";
$sum = 2;
for (1 .. 7) {
    print CALC "$sum * $sum\n";
    $sum = <CALC>;
    print "$_: $sum";
    chomp $sum;
}
close CALC;

One would expect it to print this:

1: 4
2: 16
3: 256
4: 65536
5: 4294967296
6: 18446744073709551616
7: 340282366920938463463374607431768211456

One's expectations would be correct if one had the bc(1) program on one's computer, and one also had Tie::Open2 defined as follows. This time we'll use a blessed array for our internal object. It contains our two actual filehandles for reading and writing. (The dirty work of opening a double-ended pipe is done by IPC::Open2; we're just doing the fun part.)

package Tie::Open2;
use strict;
use Carp;
use Tie::Handle;  # do not inherit from this!
use IPC::Open2;

sub TIEHANDLE {
    my ($class, @cmd) = @_;
    no warnings 'once';
    my @fhpair = \do { local(*RDR, *WTR) };
    bless $_, 'Tie::StdHandle' for @fhpair;
    bless(\@fhpair => $class)->OPEN(@cmd) || die;
    return \@fhpair;
}

sub OPEN {
    my ($self, @cmd) = @_;
    $self->CLOSE if grep {defined} @{ $self->FILENO };
    open2(@$self, @cmd);
}

sub FILENO {
    my $self = shift;
    [ map { fileno $self->[$_] } 0,1 ];
}

for my $outmeth ( qw(PRINT PRINTF WRITE) ) {
    no strict 'refs';
    *$outmeth = sub {
        my $self = shift;
        $self->[1]->$outmeth(@_);
    };
}
for my $inmeth ( qw(READ READLINE GETC) ) {
    no strict 'refs';
    *$inmeth = sub {
        my $self = shift;
        $self->[0]->$inmeth(@_);
    };
}
for my $doppelmeth ( qw(BINMODE CLOSE EOF)) {
    no strict 'refs';
    *$doppelmeth = sub {
        my $self = shift;
        $self->[0]->$doppelmeth(@_) && $self->[1]->$doppelmeth(@_);
    };
}
for my $deadmeth ( qw(SEEK TELL)) {
    no strict 'refs';
    *$deadmeth = sub {
        croak("can't $deadmeth a pipe");
    };
}
1;

The final four loops are just incredibly snazzy, in our opinion. For an explanation of what's going on, look back at the section Section 14.3.7.1, "Closures as function templates" in Chapter 8, "References".

Here's an even wackier set of classes. The package names should give you a clue as to what they do.

use strict;
package Tie::DevNull;

    sub TIEHANDLE {
        my $class = shift;
        my $fh = local *FH;
        bless \$fh, $class;
    }
    for (qw(READ READLINE GETC PRINT PRINTF WRITE)) {
        no strict 'refs';
        *$_  = sub { return };
    }

package Tie::DevRandom;

    sub READLINE { rand() . "\n"; }
    sub TIEHANDLE {
        my $class = shift;
        my $fh = local *FH;
        bless \$fh, $class;
    }

    sub FETCH { rand() }
    sub TIESCALAR {
        my $class = shift;
        bless \my $self, $class;
    }

package Tie::Tee;

    sub TIEHANDLE {
        my $class = shift;
        my @handles;
        for my $path (@_) {
            open(my $fh, ">$path") || die "can't write $path";
            push @handles, $fh;
        }
        bless \@handles, $class;
    }

    sub PRINT {
        my $self = shift;
        my $ok = 0;
        for my $fh (@$self) {
            $ok += print $fh @_;
        }
        return $ok == @$self;
    }

The Tie::Tee class emulates the standard Unix tee(1) program, which sends one stream of output to multiple different destinations. The Tie::DevNull class emulates the null device, /dev/null on Unix systems. And the Tie::DevRandom class produces random numbers either as a handle or as a scalar, depending on whether you call TIEHANDLE or TIESCALAR! Here's how you call them:

package main;

tie *SCATTER,   "Tie::Tee", qw(tmp1 - tmp2 >tmp3 tmp4);
tie *RANDOM,    "Tie::DevRandom";
tie *NULL,      "Tie::DevNull";
tie my $randy,  "Tie::DevRandom";

for my $i (1..10) {
    my $line = <RANDOM>;
    chomp $line;
    for my $fh (*NULL, *SCATTER) {
        print $fh "$i: $line $randy\n";
    }
}

This produces something like the following on your screen:

1: 0.124115571686165 0.20872819474074
2: 0.156618299751194 0.678171662366353
3: 0.799749050426126 0.300184963960792
4: 0.599474551447884 0.213935286029916
5: 0.700232143543861 0.800773751296671
6: 0.201203608274334 0.0654303290639575
7: 0.605381294683365 0.718162304090487
8: 0.452976481105495 0.574026269121667
9: 0.736819876983848 0.391737610662044
10: 0.518606540417331 0.381805078272308

But that's not all! It wrote to your screen because of the - in the *SCATTERtie above. But that line also told it to create files tmp1, tmp2, and tmp4, as well as to append to file tmp3. (We also wrote to the *NULL filehandle in the loop, though of course that didn't show up anywhere interesting, unless you're interested in black holes.)


14.3. Tying Hashes		14.5. A Subtle Untying Trap