Book HomeLearning Perl, 3rd EditionSearch this book

Chapter 16. Simple Databases

Contents:

DBM Files and DBM Hashes
Manipulating Data with pack and unpack
Fixed-length Random-access Databases
Variable-length (Text) Databases
Exercises

Databases permit us to allow data to persist beyond the end of our program. The kinds of databases we're talking about in this chapter are merely simple ones; how to use full-featured database implementations (Oracle, Sybase, Informix, mySQL, and others) is a topic that could fill an entire book, and usually does. The databases in this chapter are those that are simple enough to implement that you don't need to know about modules to use them.[346]

[346]To be sure, on some of these, the core of Perl will load a module for you. But you don't need to know anything about modules to use these databases.

16.1. DBM Files and DBM Hashes

Every system thas has Perl also has a simple database already available in the form of DBM files. This lets your program store data for quick lookup in a file or in a pair of files. When two files are used, one holds the data and the other holds a table of contents, but you don't need to know that in order to use DBM files. We're intentionally being a little vague about the exact implementation, because that will vary depending upon your machine and configuration; see the AnyDBM_file manpage for more information. Also, among the downloadable files from the O'Reilly website is a utility called which_dbm, which tries to tell you which implementation you're using, how many files there are, and what extensions they use, if any.

Some DBM file implementations (we'll call it "a file," even though it may be two actual files) have a limit of around 1000 bytes for each key and value in the file. Your actual limit may be larger or smaller than this number, but as long as you aren't trying to store gigantic text strings in the file, it shouldn't be a problem. There's no limit to the number of individual data items in the file, as long as you have enough disk space.

In Perl, we can access the DBM file as a special kind of hash called a DBM hash. This is a powerful concept, as we'll see.

16.1.1. Opening and Closing DBM Hashes

To associate a DBM database with a DBM hash (that is, to open it), use the dbmopen function,[347] which looks similar to open, in a way:

[347]Here we depart from other beginner documentation, which claims that dbmopen is deprecated and suggests that you use the more complicated tie interface instead. We disagree, since dbmopen works just fine, and it keeps you from having to think harder about what you're doing. Keep the common tasks simple!

dbmopen(%DATA, "my_database", 0644)
  or die "Cannot create my_database: $!";

The first parameter is the name of a Perl hash. (If this hash already has values, the values are inaccessible while the DBM file is open.) This hash becomes connected to the DBM database whose name was given as the second parameter, often stored on disk as a pair of files with the extensions .dir and .pag. (The filename as given in the second parameter shouldn't include either extension, though; the extensions will be automatically added as needed.) In this case, the files might be called my_database.dir and my_database.pag.

Any legal hash name may be used as the name of the DBM hash, although uppercase-only hash names are traditional because their resemblance to filehandles reminds us that the hash is connected to a file. The hash name isn't stored anywhere in the file, so you can call it whatever you'd like.

If the file doesn't exist, it will be created and given a permission mode based upon the value in the third parameter.[348] The number is typically specified in octal; the frequently used value of 0644 gives read-only permission to everyone but the owner, who gets read/write permission. If you're trying to open an existing file, you'd probably rather have the dbmopen fail if the file isn't found, so just use undef as the third parameter.

[348]The actual mode will be modified by the umask; see the perlfuncmanpage for more information.

The return value from the dbmopen is true if the database could be opened or created, and false otherwise, just like open. You should generally use or die in the same spirit as open.

The DBM hash typically stays open throughout the program. When the program terminates, the association is terminated. You can also break the association in a manner similar to closing a filehandle, by using dbmclose:

dbmclose(%DATA);

16.1.2. Using a DBM Hash

Here's the beauty of the DBM hash: it works just like the hashes you already understand! To read from the file, look at an element of the hash. To write to the file, store something into the hash. In short, it's like any other hash, but instead of being stored in memory, it's stored on disk. And thus, when your program opens it up again, the hash is already stuffed full of the data from the previous invocation.

All of the normal hash operations are available:

$DATA{"fred"} = "bedrock";      # create (or update) an element
delete $DATA{"barney"};         # remove an element of the database

foreach my $key (keys %DATA) {  # step through all values
  print "$key has value of $DATA{$key}\n";
}

That last loop could have a problem, since keys has to traverse the entire hash, possibly producing a very large list of keys. If you are scanning through a DBM hash, it's generally more memory-efficient to use the each function:

while (my($key, $value) = each(%DATA)) {
  print "$key has value of $value\n";
}

If you are accessing DBM files that are maintained by C programs, you should be aware that C programs generally tack on a trailing NUL ("\0") character to the end of their strings, for reasons known only to Kernighan and Ritchie.[349] The DBM library routines do not need this NUL (they handle binary data using a byte count, not a NUL-terminated string), and so the NUL is stored as part of the data.

[349]Well, they're not the only ones: it's because C uses the NUL byte as the end-of-string marker.

To cooperate with these programs, you must therefore append a NUL character to the end of your keys and values, and discard the NUL from the end of the returned values to have the data make sense. For example, to look up merlyn in the sendmail aliases database on a Unix system, you might do something like this:

dbmopen(my %ALI, "/etc/aliases", undef) or die "no aliases?";
my $value = $ALI{"merlyn\0"};                  # note appended NUL
$value =~ s/\0$//;                             # remove trailing NUL
print "Randal's mail is headed for: $value\n"; # show result

If your DBM files may be concurrently accessed by more than one process (for example if they're being updated over the Web), you'll generally need to use an auxiliary lock file. The details of this are beyond the scope of this book; see The Perl Cookbook by Tom Christiansen and Nathan Torkington (O'Reilly & Associates, Inc.).



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.