Chapter 5. TCP/IP Name Services

The majority of the conversations between computers these days take place using a protocol called Transmission Control Protocol running over a lower layer called Internet Protocol.[1] These two protocols are commonly lumped together into the acronym TCP/IP. Every machine that participates on a TCP/IP network must be assigned at least one unique numeric identifier, called an IP address. IP addresses are usually written using the form NNN.NNN.N.N, e.g., 192.168.1.9.

[1]This chapter will be discussing IPv4, the current (deployed) standard. IPv6 (the next generation of IP) will probably replace it in due course.

While machines are content to call each other by strings of dot-separated numbers, most people are less enamored by this idea. TCP/IP would have fallen flat on its face as a protocol if users had to remember a unique 12-digit sequence for every machine they wanted to contact. Mechanisms had to be invented to manage and distribute an IP address to human-friendly name mappings.

This chapter describes the evolution of the network name services that allow us to access data at www.oog.org instead of at 192.168.1.9, and what takes place behind the scenes. Along the way we combine a dash of history with a healthy serving of practical advice on how Perl can help to manage this crucial part of any networking infrastructure.

5.1. Host Files

The first approach used to solve the problem of mapping IP addresses to names was the most obvious and simple one: a standard file was created to hold a table of IP addresses and their corresponding computer names. This file can be found as /etc/hosts on Unix systems, Macintosh HD:System Folder:Preferences:hosts on Macs, and \$systemroot$\System32\Drivers\Etc\hosts on NT/2000 machines. On NT/2000 there is also an lmhosts file that serves a slightly different purpose, which we'll talk about later. Here's an example Unix-style host file:

127.0.0.1     localhost       
192.168.1.1   everest.oog.org     everest
192.168.1.2   rivendell.oog.org   rivendell

The limitations of this approach become clear very quickly. If oog.org 's network manager has two machines on a TCP/IP network that communicate with each other, and she wants to add a third which will be addressed by name, she's got to edit the correct file on all of her machines. If oog.org buys yet another machine, there are now four separate host files to be maintained (one on each machine).

As untenable as this may seem, this is what actually happened during the early days of the Internet/ARPAnet. As new sites were connected, every site on the net that wished to talk with the new site needed to update their host files. The central host repository, known as the Network Information Center (NIC) (or more precisely the SRI-NIC, since it was housed at SRI at the time), updated and published a host file for the entire network called HOSTS.TXT. System administrators would anonymously FTP this file from SRI-NIC's NETINFO directory on a regular basis.

Host files are still in use today, despite their limitations and the replacements we'll be talking about later in this chapter. There are some situations where host files are even mandatory. For example, under SunOS, a machine consults its /etc/hosts file to determine its own IP address. Host files also solve the "chicken and egg" problem encountered while a machine boots. If the network name servers that machine will be using are specified by name, there must be some way to determine their IP addresses. But if the network name service isn't operational yet, there's no way (unless it broadcasts for help) to receive this information. The usual solution is to place a stub file (with just a few hosts) in place for booting purposes.

On a small network, having an up-to-date host file that includes all of the hosts on that network is useful. It doesn't even have to reside on each machine in that network to be helpful (since the other mechanisms we'll describe later do a much better job of distributing this information). Just having one around that can be consulted is handy for quick manual lookups and address allocation purposes.

Since these files are still a part of everyday administration, let's look at better ways to manage them. Perl and host files are a natural match, given Perl's predilection for text file processing. Given their affinity for each other, we're going to use the simple host file as a springboard for a number of different explorations.

Let's look at the parsing of host files. Parsing a host file can be a simple as this:

open(HOSTS, "/etc/hosts") or die "Unable to open host file:$!\n";
while (defined ($_ = <HOSTS>)) {
    next if /^#/;  # skip comments lines
    next if /^$/;  # skip empty lines
    s/\s*#.*$//;  # delete in-line comments and preceding whitespace
    ($ip, @names) = split;
    die "The IP address $ip already seen!\n" if (exists $addrs{$ip});
    $addrs{$ip} = [@names];
    for (@names){
	    die "The host name $_ already seen!\n" if (exists $names{$_});
	    $names{$_} = $ip;
    }
}
close(HOSTS);

The previous code walks through an /etc/hosts file (skipping blank lines and comments), creating two data structures for later use. The first data structure is a hash of lists of hostnames keyed by the IP address. For the host file above, the data structure created would look like this:

$addrs{'127.0.0.1'} = ['localhost'];
$addrs{'192.168.1.2'} = ['rivendell.oog.org','rivendell'];
$addrs{'192.168.1.1'} = ['everest.oog.org','everest'];

The second is a hash table of host names, keyed by the name. For the same file, the %names hash would look like this:

$names{'localhost'}='127.0.0.1'
$names{'everest'}='192.168.1.1'
$names{'everest.oog.org'}='192.168.1.1'
$names{'rivendell'}='192.168.1.2'
$names{'rivendell.oog.org'}='192.168.1.2'

Note that in the simple process of parsing this file, we've also added some additional functionality. Our code checks for duplicate host names and IP addresses (both bad news on a TCP/IP network). When dealing with network-related data, use every opportunity possible to check for errors and bad information. It is always better to catch problems early in the game than to be bitten by them once the data has been propagated to your entire network. Because it is so important, I'll return to this topic later in the chapter

5.1.1. Generating Host Files

Now we turn to the more interesting topic of generating host files. Let's assume we have the following host database file for the hosts on our network:

name: shimmer
address: 192.168.1.11
aliases: shim shimmy shimmydoodles
owner: David Davis
department: software
building: main
room: 909
manufacturer: Sun
model: Ultra60
-=-
name: bendir
address: 192.168.1.3
aliases: ben bendoodles
owner: Cindy Coltrane
department: IT
building: west
room: 143
manufacturer: Apple
model: 7500/100
-=-
name: sulawesi
address: 192.168.1.12
aliases: sula su-lee
owner: Ellen Monk
department: design
building: main
room: 1116
manufacturer: Apple
model: 7500/100
-=-
name: sander
address: 192.168.1.55
aliases: sandy micky mickydoo
owner: Alex Rollins
department: IT
building: main
room: 1101
manufacturer: Intergraph
model: TD-325
-=-

The format is simple: fieldname: value with -=- used as a separator between records. You might find you need other fields than those listed above, or have too many records to make it practical to keep in a single flat file. Though we are using a single flat file, the concepts we'll show in this chapter are not backend-specific.

Here's some code that will parse a file like this to generate a host file:

$datafile ="./database";
$recordsep = "-=-\n";

open(DATA,$datafile) or die "Unable to open datafile:$!\n";

$/=$recordsep; # prepare to read in database file one record at a time

print "#\n\# host file - GENERATED BY $0\n# DO NOT EDIT BY HAND!\n#\n";
while (<DATA>) {
    chomp;                           # remove the record separator
    # split into key1,value1,...bingo, hash of record
    %record = split /:\s*|\n/m; 
    print "$record{address}\t$record{name} $record{aliases}\n";
}
close(DATA);

Here's the output:

#
# host file - GENERATED BY createhosts
# DO NOT EDIT BY HAND!
#
192.168.1.11    shimmer shim shimmy shimmydoodles
192.168.1.3     bendir ben bendoodles
192.168.1.12    sulawesi sula su-lee
192.168.1.55    sander sandy micky mickydoo.

Got "System Administration Database" Religion Yet?

In Chapter 3, "User Accounts", I made an impassioned plea for the use of a separate administrative database to track account information. The same arguments are doubly true for network host data. In this chapter we're going to demonstrate how even a simple flat-file host database can be manipulated to produce impressive output that drives each of the services we'll be discussing. For larger sites, a "real" database would serve well. If you'd like to see an example of this output, take a quick glance ahead at the output at the end of the Section 5.1.3, "Improving the Host File Output" section, later in this chapter.

The host database approach is beautiful for a number of reasons. Changes need to be made only to a single file or data source. Make the changes, run some scripts, and presto!, we've generated the configuration files needed for a number of services. These configuration files are significantly less likely to contain small syntax errors (like missing semicolons or comment characters) because they won't be touched by human hands. If we write our code correctly, we can catch most of the other possible errors during the parse stage.

If you haven't seen the wisdom of this "best practice" yet, you will by the end of the chapter.

Let's look at a few of the more interesting Perl techniques in this small code sample. The first unusual thing we do is set $/. From that point on, Perl treats chunks of text that end in -=-\n as a single record. This means the while statement will read in an entire record at a time and assign it to $_.

The second interesting tidbit is the split assign technique. Our goal is to get each record into a hash with a key as the field name and its value as the field value. You'll see why we go to this trouble later as we develop this example further. The first step is to break $_ into component parts using split( ). The array we get back from split( ) is shown in Table 5-1.

Table 5.1. The Array Returned by split()

Element	Value
`$record[0]`	`name`
`$record[1]`	`shimmer`
`$record[2]`	`address`
`$record[3]`	`192.168.1.11`
`$record[4]`	`aliases`
`$record[5]`	`shim shimmy shimmydoodles`
`$record[6]`	`owner`
`$record[7]`	`David Davis`
`$record[8]`	`department`
`$record[9]`	`software`
`$record[10]`	`building`
`$record[11]`	`main`
`$record[12]`	`room`
`$record[13]`	`909`
`$record[14]`	`manufacturer`
`$record[15]`	`Sun`
`$record[16]`	`model`
`$record[17]`	`Ultra60`

Now take a good look at the contents of the list. Starting at $record[0], we have a key-value pair list (i.e., key=Name, value=shimmer\n, key=Address, value=192.168.1.11\n...) which we can just assign to populate a hash. Once this hash is created, we can print the parts we need.

5.1.2. Error Checking the Host File Generation Process

Printing the parts we need is just the beginning of what we can do. One very large benefit of using a separate database that gets converted into another form is the ability to insert error checking into the conversion process. As we mentioned before, this can prevent simple typos from becoming a problem before they get a chance to propagate or be put into production use. Here's the previous code with some simple additions to check for typos:

$datafile ="./database";
$recordsep = "-=-\n";

open(DATA,$datafile) or die "Unable to open datafile:$!\n";

$/=$recordsep; # prepare to read in database file one record at a time

print "#\n\# host file - GENERATED BY $0\n# DO NOT EDIT BY HAND!\n#\n";
while (<DATA>) {
    chomp;  # remove the record separator
    # split into key1,value1,...bingo, hash of record
    %record = split /:\s*|\n/m;

# check for bad hostnames

if ($record{name} =~ /[^-.a-zA-Z0-9]/) {

warn "!!!! $record{name} has illegal host name characters,

skipping...\n";

next;

# check for bad aliases

if ($record{aliases} =~ /[^-.a-zA-Z0-9\s]/) {

warn "!!!! $record{name} has illegal alias name characters,

skipping...\n";

next;

# check for missing address

if (!$record{address}) {

warn "!!!! $record{name} does not have an IP address,

skipping...\n";

next;

# check for duplicate address

if (defined $addrs{$record{address}}) {

warn "!!!! Duplicate IP addr: $record{name} &

$addrs{$record{address}}, skipping...\n";

next;

else {

$addrs{$record{address}} = $record{name};

}

    print "$record{address}\t$record{name} $record{aliases}\n";
}
close(DATA);

5.1.3. Improving the Host File Output

Let's borrow from Chapter 9, "Log Files", and add some analysis to the conversion process. We can automatically add useful headers, comments, and separators to the data. Here's an example output using the exact same database:

#
# host file - GENERATED BY createhosts3
# DO NOT EDIT BY HAND!
#
# Converted by David N. Blank-Edelman (dnb) on Sun Jun  7 00:43:24 1998
#
# number of hosts in the design department: 1.
# number of hosts in the software department: 1.
# number of hosts in the IT department: 2.
# total number of hosts: 4
#

# Owned by Cindy Coltrane (IT): west/143
192.168.1.3     bendir ben bendoodles

# Owned by Alex Rollins (IT): main/1101
192.168.1.55    sander sandy micky mickydoo

# Owned by Ellen Monk (design): main/1116
192.168.1.12    sulawesi sula su-lee

# Owned by David Davis (software): main/909
192.168.1.11    shimmer shim shimmy shimmydoodles

Here's the code that produced that output, followed by some commentary:

$datafile ="./database";

# get username on either WinNT/2000 or Unix
$user = ($^O eq "MSWin32")? $ENV{USERNAME} :
                            (getpwuid($<))[6]." (".(getpwuid($<))[0].")";

open(DATA,$datafile) or die "Unable to open datafile:$!\n";

$/=$recordsep; # read in database file one record at a time

while (<DATA>) {
    chomp;                           # remove the record separator
    # split into key1,value1
    @record = split /:\s*|\n/m; 

    $record ={};                     # create a reference to empty hash
    %{$record} = @record;            # populate that hash with @record

    # check for bad hostname
    if ($record->{name} =~ /[^-.a-zA-Z0-9]/) {
	    warn "!!!! ".$record->{name} .
             " has illegal host name characters, skipping...\n";
	    next;
    }

    # check for bad aliases
    if ($record->{aliases} =~ /[^-.a-zA-Z0-9\s]/) {
	    warn "!!!! ".$record->{name} .
             " has illegal alias name characters, skipping...\n";
	    next;
    }

    # check for missing address
    if (!$record->{address}) {
	    warn "!!!! ".$record->{name} .
             " does not have an IP address, skipping...\n";
	    next;
    }

    # check for duplicate address
    if (defined $addrs{$record->{address}}) {
	    warn "!!!! Duplicate IP addr:".$record->{name}.
             " & ".$addrs{$record->{address}}.", skipping...\n";
	    next;
    }
    else {
	    $addrs{$record->{address}} = $record->{name};
    }

    $entries{$record->{name}} = $record; # add this to a hash of hashes
}
close(DATA);

# print a nice header
print "#\n\# host file - GENERATED BY $0\n# DO NOT EDIT BY HAND!\n#\n";
print "# Converted by $user on ".scalar(localtime)."\n#\n";

# count the number of entries in each department and then report on it
foreach my $entry (keys %entries){
    $depts{$entries{$entry}->{department}}++;
}
foreach my $dept (keys %depts) {
    print "# number of hosts in the $dept department: $depts{$dept}.\n";
}
print "# total number of hosts: ".scalar(keys %entries)."\n#\n\n";

# iterate through the hosts, printing a nice comment and the entry itself
foreach my $entry (keys %entries) {
    print "# Owned by ",$entries{$entry}->{owner}," (",
          $entries{$entry}->{department},"): ",
          $entries{$entry}->{building},"/",
          $entries{$entry}->{room},"\n";
    print $entries{$entry}->{address},"\t",
          $entries{$entry}->{name}," ",
          $entries{$entry}->{aliases},"\n\n";
}

The most significant difference between this code example and the previous one is the data representation. Because there was no need in the previous example to retain the information from a record after it had been printed, we could use the single hash %record. But for this code, we chose to read the file into a slightly more complex data structure (a hash of hashes) so we could do some simple analysis of the data before printing it.

We could have kept a separate hash table for each field (similar to our needspace example in Chapter 2, "Filesystems"), but the beauty of this approach is its maintainability. If we decide later on to add a serial_number field to the database, we do not need to change our program's parsing code; it will just magically appear as $record->{serial_number}. The downside is that Perl's syntax probably makes our code look more complex than it is.

Here's an easy way to look at it: we're parsing the file in precisely the same way we did in the last example. The difference is this time we are storing each record in a newly-created anonymous hash. Anonymous hashes are just like normal hash variables except they are accessed through a reference, instead of a name.

To create our larger data structure (a hash of hashes), we link this new anonymous hash back into the main hash table %entries. We created a key with an associated value that is the reference to the anonymous hash we've just populated. Once we are done, %entries has a key for each machine's name and a value that is a reference to a hash table containing all of the fields associated with that machine name (IP address, room, etc.).

Perhaps you'd prefer to see the output sorted by IP address? No problem, just include a custom sort routine by changing:

foreach my $entry (keys %entries) {

to:

foreach my $entry (sort byaddress keys %entries) {

and adding:

sub byaddress {
   @a = split(/\./,$entries{$a}->{address});
   @b = split(/\./,$entries{$b}->{address});
   ($a[0]<=>$b[0]) ||
   ($a[1]<=>$b[1]) ||
   ($a[2]<=>$b[2]) ||
   ($a[3]<=>$b[3]);
}

Here's the relevant portion of the output, now nicely sorted:

# Owned by Cindy Coltrane (IT): west/143
192.168.1.3     bendir ben bendoodles

# Owned by David Davis (software): main/909
192.168.1.11    shimmer shim shimmy shimmydoodles

# Owned by Ellen Monk (design): main/1116
192.168.1.12    sulawesi sula su-lee

# Owned by Alex Rollins (IT): main/1101
192.168.1.55    sander sandy micky mickydoo

Make the output look good to you. Let Perl support your professional and aesthetic endeavors.

5.1.4. Incorporating a Source Code Control System

In a moment we're going to move on to the next approach to the IP Address-to-Name mapping problem. Before we do, we'll want to add another twist to our host file creation process, because a single file suddenly takes on network-wide importance. A mistake in this file will affect an entire network of machines. To give us a safety net, we'll want a way to back out of bad changes, essentially going back in time to a prior configuration state.

The most elegant way to build a time machine like this is to add a source control system to the process. Source control systems are typically used by developers to:

Keep a record of all changes to important files
Prevent multiple people from changing the same file at the same time, inadvertently undoing each other's efforts
Allow them to revert back to a previous version of a file, thus backing out of problems

This functionality is extremely useful to a system administrator. The error-checking code we added to the conversion process earlier, in Section 5.1.2, "Error Checking the Host File Generation Process", can help with certain kinds of typo and syntax errors, but it does not offer any protection against semantic errors (e.g., deleting an important hostname, assigning the wrong IP address to a host, misspelling a hostname). You could add semantic error checks into the conversion process, but you probably won't catch all of the possible errors. As we've quoted before, nothing is foolproof, since fools are so ingenious.

You might think it would be better to apply source control system functionality to the initial database editing process, but there are two good reasons why it is also important to apply it to the resultant output:

Time: For large data sets, the conversion process might take some time. If your network is flaking out and you need to revert to a previous revision, it's discouraging to have to stare at a Perl process chugging away to generate the file you need (presuming you can get at Perl in the first place at that point).
Database: If you choose to use a real database engine for your data storage (and often this is the right choice), there may not be a convenient way to apply a source control mechanism like this. You'll probably have to write your own change control mechanisms for the database editing process.

My source control system of choice is the Revision Control System (RCS). RCS has some Perl- and system administration-friendly features:

It is multiplatform. There are ports of GNU RCS 5.7 to most Unix systems, Windows NT, MacOS, etc.
It has a well-defined command-line interface. All functions can be performed from the command line, even on GUI-heavy operating systems
It is easy to use. There's a small command set for basic operations that can be learned in five minutes (see Appendix A, "The Five-Minute RCS Tutorial" ).
It has keywords. Magic strin gs can be embedded in the text of files under RCS that are automatically expanded. For instance, any occurrence of $ Date:$ in a file will be replaced with the date the file was last entered into the RCS system.
It's free. The source code for the GNU version of RCS is freely redistributable, and binaries for most systems are also available. A copy of the source can be found at ftp://ftp.gnu.org/gnu/rcs.

If you've never dealt with RCS before, please take a moment to read Appendix A, "The Five-Minute RCS Tutorial". The rest of this section assumes a cursory knowledge of the RCS command set.

Craig Freter has written an object-oriented module called Rcs which makes using RCS from Perl easy. The steps are:

Load the module.
Tell the module where your RCS command-line binaries are located.
Create a new Rcs object; configure it with the name of the file you are using.
Call the necessary object methods (named after their corresponding RCS commands).

Let's add this to our host file generation code so you can see how the module works. Besides the Rcs module code, we've also changed things so the output is sent to a specific file and not STDOUT as in our previous versions. Only the code that has changed is shown. Refer to the previous example for the omitted lines represented by "...":

$outputfile="hosts.$$"; # temporary output file
$target="hosts";        # where we want the converted data stored
...
open(OUTPUT,"> $outputfile") or 
  die "Unable to write to $outputfile:$!\n";

print OUTPUT "#\n\# host file - GENERATED BY $0\n
              # DO NOT EDIT BY HAND!\n#\n";
print OUTPUT "# Converted by $user on ".scalar(localtime)."\n#\n";

...
foreach my $dept (keys %depts) {
    print OUTPUT "# number of hosts in the $dept department:
                  $depts{$dept}.\n";
}
print OUTPUT "# total number of hosts: ".scalar(keys %entries)."\n#\n\n";
# iterate through the hosts, printing a nice comment and the entry
foreach my $entry (sort byaddress keys %entries) {
    print OUTPUT 
          "# Owned by ",$entries{$entry}->{owner}," (",
          $entries{$entry}->{department},"): ",
          $entries{$entry}->{building},"/",
          $entries{$entry}->{room},"\n";
    print OUTPUT 
          $entries{$entry}->{address},"\t",
          $entries{$entry}->{name}," ",
          $entries{$entry}->{aliases},"\n\n";
}

close(OUTPUT);

use Rcs;

# where our RCS binaries are stored

Rcs->bindir('/usr/local/bin');

# create a new RCS object

my $rcsobj = Rcs->new;

# configure it with the name of our target file

$rcsobj->file($target);

# check it out of RCS (must be checked in already)

$rcsobj->co('-l');

# rename our newly created file into place

rename($outputfile,$target) or

die "Unable to rename $outputfile to $target:$!\n";

# check it in

$rcsobj->ci("-u","-m"."Converted by $user on ".scalar(localtime));

This code assumes the target file has been checked in at least once already.

To see the effect of this code addition, we can look at three entries excerpted from the output of rloghosts:

revision 1.5
date: 1998/05/19 23:34:16;  author: dnb;  state: Exp;  lines: +1 -1
Converted by David N. Blank-Edelman (dnb) on Tue May 19 19:34:16 1998
----------------------------
revision 1.4
date: 1998/05/19 23:34:05;  author: eviltwin;  state: Exp;  lines: +1 -1
Converted by Divad Knalb-Namlede (eviltwin) on Tue May 19 19:34:05 1998
----------------------------
revision 1.3
date: 1998/05/19 23:33:35;  author: dnb;  state: Exp;  lines: +20 -0
Converted by David N. Blank-Edelman (dnb) on Tue May 19 19:33:16 1998

The previous example doesn't show much of a difference between file versions (see the lines: part of the entries), but you can see that we are tracking the changes every time the file gets created. If we needed to, we could use the rcsdiff command to see exactly what changed. Under dire circumstances, we would be able to revert to previous versions if one of these changes had wreaked unexpected havoc on the network.


4.6. References for More Information		5.2. NIS, NIS+, and WINS