Book HomeJava and XSLTSearch this book

20.3. The HTTP Modules

The HTTP modules implement an interface to the HTTP messaging protocol used in web transactions. Its most useful modules are HTTP::Request and HTTP::Response, which create objects for client requests and server responses. Other modules provide means for manipulating headers, interpreting server response codes, managing cookies, converting date formats, and creating basic server applications.

Client applications created with LWP::UserAgent use HTTP::Request objects to create and send requests to servers. The information returned from a server is saved as an HTTP::Response object. Both of these objects are subclasses of HTTP::Message, which provides general methods of creating and modifying HTTP messages. The header information included in HTTP messages can be represented by objects of the HTTP::Headers class.

HTTP::Status includes functions to classify response codes into the categories of informational, successful, redirection, error, client error, or server error. It also exports symbolic aliases of HTTP response codes; one could refer to the status code of 200 as RC_OK and refer to 404 as RC_NOT_FOUND.

The HTTP::Date module converts date strings from and to machine time. The HTTP::Daemon module can be used to create web server applications, utilizing the functionality of the rest of the LWP modules to communicate with clients.

20.3.1. HTTP::Request

This module summarizes a web client's request. For a simple GET request, you define an object with the GET method and assign a URL to apply it to. Basic headers would be filled in automatically by LWP. For a POST or PUT request, you might want to specify a custom HTTP::Headers object for the request, or use the contents of a file for an entity body. Since HTTP::Request inherits everything in HTTP::Message, you can use the header and entity body manipulation methods from HTTP::Message in HTTP::Request objects.

The constructor for HTTP::Request looks like this:

$req = http::Request->new (method, url, [$header, [content]]);

The method and URL values for the request are required parameters. The header and content arguments are not required, nor even necessary for all requests. The parameters are defined as follows:

method
A string specifying the HTTP request method. GET, HEAD, and POST are the most commonly used. Other methods defined in the HTTP specification such as PUT and DELETE are not supported by most servers.

url
The address and resource name of the information you are requesting. This argument may be either a string containing an absolute URL (the hostname is required), or a URI::URL object that stores all the information about the URL.

$header
A reference to an HTTP::Headers object.

content
A scalar that specifies the entity body of the request. If omitted, the entity body is empty.

The following methods can be used on HTTP::Request objects.

as_string

$req->as_string

Returns a text version of the request object as a string with \n placed after each line. Information about the object reference is also included in the first line. The returned string looks like this:

-- HTTP::Request=HASH(0x68148) --
PUT http:www.ora.com/example/hi.text
Content-Length: 2
Content-Type: text/plain
hi
------------------
method

$req->method ([method])

Sets or retrieves the HTTP method for an HTTP::Request object. Without an argument, method returns the object's current method.

url

$req->url ([url])

Sets or retrieves the URL for the request object. Without an argument, this method retrieves the current URL for the object. url is a string containing the new URL to set for the request or a URI::URL object.

20.3.2. HTTP::Response

Responses from a web server are described by HTTP::Response objects. An HTTP response message contains a status line, headers, and any content data that was requested by the client (such as an HTML file). The status line is the minimum requirement for a response. It contains the version of HTTP that the server is running, a status code indicating the success, failure, or other condition the request received from the server, and a short message describing the status code.

If LWP has problems fulfilling your request, it internally generates an HTTP::Response object and fills in an appropriate response code. In the context of web client programming, you'll usually get an HTTP::Response object from LWP::UserAgent and LWP::RobotUA.

If you plan to write extensions to LWP or to a web server or proxy server, you might use HTTP::Response to generate your own responses.

The constructor for HTTP::Response looks like this:

$resp = HTTP::Response->new (rc, [msg, [header, [content]]]);

In its simplest form, an HTTP::Response object can contain just a response code. If you would like to specify a more detailed message than "OK" or "Not found," you can specify a text description of the response code as the second parameter. As a third parameter, you can pass a reference to an HTTP::Headers object to specify the response headers. Finally, you can also include an entity body in the fourth parameter as a scalar.

For client applications, it is unlikely that you will build your own response object with the constructor for this class. You receive a client object when you use the request method on an LWP::UserAgent object. For example:

$ua = LWP::UserAgent->new;
$req = HTTP::Request->new(GET, $url)
$resp = $ua->request($req);

The server's response is contained in the object $resp. When you have this object, you can use the HTTP::Response methods to get the information about the response. Since HTTP::Response is a subclass of HTTP::Message, you can also use methods from that class on response objects. See Section 20.3.8, "HTTP::Message" for a description of its methods.

The following methods can be used on objects created by HTTP::Response.

as_string

$resp->as_string(  )

Returns a string version of the response with lines separated by \n. For example, this method would return a response string that looks like this:

-- HTTP::Response=HASH(0xc8548) --
RC: 200 (OK)
Message: all is fine

Content-Length: 2
Content-Type: text/plain

hi
------------------
base

$resp->base(  )

Returns the base URL of the response. If the response was hypertext, any links from the hypertext should be relative to the location returned by this method. LWP looks for the BASE tag in HTML and Content-Base/Content-Location HTTP headers for a base specification. If a base was not explicitly defined by the server, LWP uses the requesting URL as the base.

code

$resp->code ([code])

When invoked without any parameters, this method returns the object's response code. Sets the status code of the object when invoked with an argument.

current_age

$resp->current_age(  )

Returns the number of seconds since the response was generated by the original server.

error_as_HTML

$resp->error_as_HTML(  )

When is_error is true, this method returns an HTML explanation of what happened.

freshness_lifetime

$resp->freshness_lifetime(  )

Returns the number of seconds until the response expires. If expiration was not specified by the server, LWP will make an informed guess based on the Last-Modified header of the response.

fresh_until

$resp->fresh_until(  )

Returns the time when the response expires. The time is based on the number of seconds since January 1, 1970, UTC.

is_error

$resp->is_error(  )

Returns true when the response code is 400 through 599. When an error occurs, you might want to use error_as_HTML to generate an HTML explanation of the error.

is_fresh

$resp->is_fresh(  )

Returns true if the response has not yet expired.

is_info

$resp->is_info(  )

Returns true when the response code is 100 through 199.

is_redirect

$resp->is_redirect(  )

Returns true when the response code is 300 through 399.

is_success

$resp->is_success(  )

Returns true when the response code is 200 through 299.

message

$resp->message ([msg])

When invoked without any parameters, message returns the object's status code message (the short string describing the response code). When invoked with a scalar msg argument, this method defines the object's message.

status_line

$resp->status_line(  )

Returns a string with the HTTP status code and message. If the message attribute is unspecified, the official message associated with code is used.

20.3.3. HTTP::Headers

This module deals with HTTP header definition and manipulation. You can use these methods on HTTP::Request and HTTP::Response objects to retrieve headers they contain, or to set new headers and values for new objects you are building.

The constructor for an HTTP::Headers object looks like this:

$h = HTTP::Headers->new([name => val],...);

This code creates a new headers object. You can set headers in the constructor by providing a header name and its value. Multiple name=>valpairs can be used to set multiple headers.

The following methods can be used by objects in the HTTP::Headers class. These methods can also be used on objects from HTTP::Request and HTTP::Response, since they inherit from HTTP::Headers. In fact, most header manipulation will occur on the request and response objects in LWP applications.

clone

$h->clone(  )

Creates a copy of the current object, $h, and returns a reference to it.

header

$h->header(field [=> $val],...)

When called with just an HTTP header as a parameter, this method returns the current value for the header. For example, $myobject->('content-type') would return the value for the object's Content-Type header. To define a new header value, invoke header with a hash of header=>value pairs, in which the value is a scalar or reference to an array. For example, to define the Content-Type header, you would do this:

$h->header('content-type' => 'text/plain');
init_header

$h->init_header($field, $value)

Sets the specified header to the given value, but only if no previous value for that field is set. The header field name is not case-sensitive, and _ can be used as a replacement for -. The $value argument may be a scalar or a reference to a list of scalars. Previous values of the field are not removed.

push_header

$h->push_header(field => val)

Adds a new header field and value to the object. Previous values of the field are not removed.

$h->push_header(Accept => 'image/jpeg');
remove_header

$h->remove_header(field,...)

Removes the header specified in the parameter(s) and the header's associated value.

scan

$h->scan($sub)

Invokes the subroutine referenced by $sub for each header field in the object. The subroutine is passed the name of the header and its value as a pair of arguments. For header fields with more than one value, the subroutine will be called once for each value.

The HTTP::Headers class allows you to use a number of convenience methods on header objects to set (or read) common field values. If you supply a value for an argument, that value will be set for the field. The previous value for the header is always returned. The following methods are available:

date
expires
if_modified_since
if_unmodified_since
last_modified
content_type
content_encoding
content_length
content_language
title
user_agent
server
from
referrer
www_authenticate
proxy_authenticate
authorization
proxy_authorization
authorization_basic
proxy_authorization_basic

20.3.4. HTTP::Status

This module provides methods to determine the type of a response code. It also exports a list of mnemonics that can be used by the programmer to refer to a status code.

The following methods are used on response objects:

is_info
Returns true when the response code is 100-199.

is_success
Returns true when the response code is 200-299.

is_redirect
Returns true when the response code is 300-399.

is_client_error
Returns true when the response code is 400-499.

is_server_error
Returns true when the response code is 500-599.

is_error
Returns true when the response code is 400-599. When an error occurs, you might want to use error_as_HTML to generate an HTML explanation of the error.

HTTP::Status exports the following constant functions to use as mnemonic substitutes for status codes. For example, you could do something like:

if ($rc = RC_OK) {....}

Here are the mnemonics, followed by the status codes they represent:

RC_CONTINUE (100)
RC_SWITCHING_PROTOCOLS (101)
RC_OK (200)
RC_CREATED (201)
RC_ACCEPTED (202)
RC_NON_AUTHORITATIVE_INFORMATION (203)
RC_NO_CONTENT (204)
RC_RESET_CONTENT (205)
RC_PARTIAL_CONTENT (206)
RC_MULTIPLE_CHOICES (300)
RC_MOVED_PERMANENTLY (301)
RC_MOVED_TEMPORARILY (302)
RC_SEE_OTHER (303)
RC_NOT_MODIFIED (304)
RC_USE_PROXY (305)
RC_BAD_REQUEST (400)
RC_UNAUTHORIZED (401)
RC_PAYMENT_REQUIRED (402)
RC_FORBIDDEN (403)
RC_NOT_FOUND (404)
RC_METHOD_NOT_ALLOWED (405)
RC_NOT_ACCEPTABLE (406)
RC_PROXY_AUTHENTICATION_REQUIRED (407)
RC_REQUEST_TIMEOUT (408)
RC_CONFLICT (409)
RC_GONE (410)
RC_LENGTH_REQUIRED (411)
RC_PRECONDITION_FAILED (412)
RC_REQUEST_ENTITY_TOO_LARGE (413)
RC_REQUEST_URI_TOO_LARGE (414)
RC_UNSUPPORTED_MEDIA_TYPE (415)
RC_REQUEST_RANGE_NOT_SATISFIABLE (416)
RC_INTERNAL_SERVER_ERROR (500)
RC_NOT_IMPLEMENTED (501)
RC_BAD_GATEWAY (502)
RC_SERVICE_UNAVAILABLE (503)
RC_GATEWAY_TIMEOUT (504)
RC_HTTP_VERSION_NOT_SUPPORTED (505)

20.3.5. HTTP::Date

The HTTP::Date module is useful when you want to process a date string. It exports two functions that convert date strings to and from standard time formats.

parse_date

parse_date($str)

Parses a date string and returns it as a list of numerical values followed by a time zone specification. If the date is unrecognized, then the empty list is returned.

str2time

str2time(str [, zone])

Converts the time specified as a string in the first parameter into the number of seconds since epoch. This function recognizes a wide variety of formats, including RFC 1123 (standard HTTP), RFC 850, ANSI C asctime, common log file format, Unix ls -l, and Windows dir, among others. When a time zone is not implicit in the first parameter, this function will use an optional time zone specified as the second parameter, such as -0800, +0500, or GMT. If the second parameter is omitted, and the time zone is ambiguous, the local time zone is used.

time2iso

time2iso([$time])

Same as time2str( ), but returns a "YYYY-MM-DD hh:mm:ss"-formatted string representing time in the local time zone.

time2isoz

time2isoz([$time])

Same as time2str( ), but returns a "YYYY-MM-DD hh:mm:ssZ"-formatted string representing Universal Time.

time2str

time2str([time])

Given the number of seconds since machine epoch, this function generates the equivalent time as specified in RFC 1123, which is the recommended time format used in HTTP. When invoked with no parameter, the current time is used.

20.3.6. HTTP::Cookies

HTTP cookies provide a mechanism for preserving information about a client or user across several different visits to a site or page. The "cookie" is a name/value pair sent to the client on its initial visit to a page. This cookie is stored by the client and sent back in the request upon revisit to the same page.

A server initializes a cookie with the Set-Cookie header. Set-Cookie sets the name and value of a cookie, as well as other parameters such as how long the cookie is valid and the range of URLs to which the cookie applies. Each cookie (a single name/value pair) is sent in its own Set-Cookie header, so if there is more than one cookie sent to a client, multiple Set-Cookie headers are sent in the response. Two Set-Cookie headers may be used in server responses: Set-Cookie is defined in the original Netscape cookie specification, and Set-Cookie2 is the latest, IETF-defined header. Both header styles are supported by HTTP::Cookies. The latest browsers also support both styles.

If a client visits a page for which it has a valid cookie stored, the client sends the cookie in the request with the Cookie header. This header's value contains any name/value pairs that apply to the URL. Multiple cookies are separated by semicolons in the header.

The HTTP::Cookies module is used to retrieve, return, and manage the cookies used by an LWP::UserAgent client application. Setting cookies from an LWP-created server requires only the coding of the proper response headers sent by an HTTP::Daemon server application. HTTP::Cookies is not designed to be used in setting cookies on the server side, although you may find use for it in managing sent cookies.

The new constructor for HTTP::Cookies creates an object called a cookie jar, which represents a collection of saved cookies usually read from a file. Methods on the cookie jar object allow you to add new cookies or send cookie information in a client request to a specific URL. The constructor may take optional parameters, as shown in the following example:

$cjar = HTTP::Cookies->new( file => 'cookies.txt', 
                            autosave => 1,
                            ignore_discard => 0 );

The cookie jar object $cjar created here contains any cookie information stored in the file cookies.txt. The autosave parameter takes a Boolean value that determines if the state of the cookie jar is saved to the file upon destruction of the object. ignore_discard also takes a Boolean value to determine if cookies marked to be discarded are still saved to the file.

Cookies received by a client are added to the cookie jar with the extract_cookies method. This method searches an HTTP::Response object for Set-Cookie and Set-Cookie2 headers and adds them to the cookie jar. Cookies are sent in a client request using the add-cookie-header method. This method takes an HTTP::Request object with the URL component already set, and if the URL matches any entries in the cookie jar, adds the appropriate Cookie headers to the request.

These methods can be used on a cookie jar object created by HTTP::Cookies.

add_cookie_header

$cjar->add_cookie_header($request)

Adds appropriate Cookie headers to an HTTP::Request object $request. $request must already be created with a valid URL address. This method will search the cookie jar for any cookies matching the request URL. If the cookies are valid (i.e., have not expired), they are used to create Cookie headers and are added to the request.

as_string

$cjar->as_string([discard])

Returns the current contents of the cookie jar as a string. Each cookie is output as a Set-Cookie3 header line followed by "0". If discard is given and is true, cookies marked to be discarded will not be output. Set-Cookie3 is a special LWP format used to store cookie information in the save file.

clear

$cjar->clear( [domain, [path, [key] ] ])

Without arguments, this method clears the entire contents of the cookie jar. Given arguments, cookies belonging to a specific domain, path, or with a name, key, will be cleared. The arguments are ordered for increasing specificity. If only one argument is given, all cookies for that domain will be deleted. A second argument specifies a distinct path within the domain. To remove a cookie by keyname, you must use all three arguments.

extract_cookies

$cjar->extract_cookies($response)

Searches an HTTP::Response object $response for any Set-Cookie and Set-Cookie2 headers and stores the cookie information in the cookie jar.

load

$cjar->load( [file] )

Loads cookie information into the cookie jar from the file specified during construction (default) or from the named file. The file must be in the format produced by the save method.

revert

$cjar->revert

Restores the cookie jar to its state before the last save.

save

$cjar->save( [file] )

Saves the state of the cookie jar to the file specified during construction (by default) or to the named file. The cookies are saved in a special LWP format as Set-Cookie3 header lines. This format is not compatible with the standard Set-Cookie and Set-Cookie2 headers, but you are not likely to use this file to set new cookies in response headers.

set_cookie

$cjar->set_cookie(version, key, val, path, domain, port, path_spec, secure, maxages, discard, \%misc)

Sets a cookie in the cookie jar with the information given in the arguments. The number and order of arguments represent the structure of elements in the Set-Cookie3 header lines used to save the cookies in a file.

version
A string containing the cookie-spec version number.

key
The name of the cookie.

val
The value of the cookie.

path
The pathname of the URL for which the cookie is set.

domain
The domain name for which the cookie is set.

port
The port number of the URL for which the cookie is set.

path_spec
A Boolean value indicating if the cookie is valid for the specific URL path or all the URLs in the domain. The path is used if true; otherwise, the cookie is valid for the entire domain.

secure
A Boolean value indicating that the cookie should only be sent over a secure connection for true, or over any connection for false.

maxage
The number of seconds that the cookie will be valid from the time it was received. Adding the maxage to the current time will yield a value that can be used for an expiration date.

discard
A Boolean value indicating that the cookie should not be sent in any future requests and should be discarded upon saving the cookie jar, unless the ignore_discard parameter was set to true in the constructor.

%misc
The final argument is a reference to a hash, %misc, that contains any additional parameters from the Set-Cookie headers such as Comment and URLComment, in key/value pairs.

scan

$cjar->scan( \&callback )

Invokes the callback subroutine for each cookie in the cookie jar. The subroutine is called with the same arguments given to the save method, described above. Any undefined arguments will be given the value undef.

20.3.6.1. HTTP::Cookies::Netscape

The HTTP::Cookies class contains one subclass that supports Netscape-style cookies within a cookie jar object. Netscape-style cookies were defined in the original cookie specification for Navigator 1.1, which outlined the syntax for the Cookie and Set-Cookie HTTP headers. Netscape cookie headers are different from the newer Set-Cookie2-style cookies in that they don't support as many additional parameters when a cookie is set. The Cookie header also does not use a version-number attribute. Many browsers and servers still use the original Netscape cookies, and the Netscape subclass of HTTP::Cookies can be used to support this style.

The new constructor for this subclass creates a Netscape-compatible cookie jar object like this:

$njar = HTTP::Cookies::Netscape->new(
                  File     => "$ENV{HOME}/.netscape/cookies",
                  AutoSave => 1 );

The methods described above can be used on this object, although many of the parameters used in Set-Cookie2 headers will simply be lost when cookies are saved to the cookie jar.

20.3.7. HTTP::Daemon

The HTTP::Daemon module creates HTTP server applications. The module provides objects based on the IO::Socket::INET class that can listen on a socket for client requests and send server responses. The objects implemented by the module are HTTP 1.1 servers. Client requests are stored as HTTP::Request objects, and all the methods for that class can be used to obtain information about the request. HTTP::Response objects can be used to send information back to the client.

An HTTP::Daemon object is created by using the new constructor. Since the base class for this object is IO::Socket::INET, the parameters used in that class's constructor are the same here. For example:

$d = HTTP::Daemon->new ( LocalAddr => 'maude.oreilly.com',
                         LocalPort => 8888,
                         Listen => 5 );

The HTTP::Daemon object is a server socket that automatically listens for requests on the specified port (or on the default port if none is given). When a client request is received, the object uses the accept method to create a connection with the client on the network.

$d = HTTP::Daemon->new;
while ( $c = $d->accept ) {
     $req = $c->get_request;
     # Process request and send response here
     }
$c = undef;   # Don't forget to close the socket

The accept method returns a reference to a new object of the HTTP::Daemon::ClientConn class. This class is also based on IO::Socket::INET and is used to extract the request message and send the response and any requested file content.

The sockets created by both HTTP::Daemon and HTTP::Daemon::ClientConn work the same way as those in IO::Socket::INET. The methods are also the same except for some slight variations in usage. The methods for the HTTP::Daemon classes are listed in the sections below and include the adjusted IO::Socket::INET methods. For more detailed information about sockets and the IO::Socket classes and methods, see Chapter 13, "XML and Perl".

The following methods can be used on HTTP::Daemon objects.

accept

$d->accept ([pkg])

Accepts a client request on a socket object and creates a connection with the client. This method is the same as IO::Socket->accept, except it will return a reference to a new HTTP::Daemon::ClientConn object. If an argument is given, the connection object will be created in the package named by pkg. If no connection is made before a specified timeout, the method will return undef.

product_tokens

$d->product_tokens

Returns the string that the server uses to identify itself in the Server response header.

url

$d->url

Returns the URL string that gives access to the server root.

20.3.7.1. HTTP::Daemon::ClientConn methods

The following methods can be used on HTTP::Daemon::ClientConn objects.

send_file

$c->send_file (filename)

Copies contents of the file filename to the client as the response. filename can be a string that is interpreted as a filename, or a reference to a glob.

20.3.8. HTTP::Message

HTTP::Message is the generic base class for HTTP::Request and HTTP::Response. It provides a couple of methods used on both classes. The constructor for this class is used internally by the Request and Response classes, so you will probably not need to use it. Methods defined by the HTTP::Headers class will also work on Message objects.

add_content

$r->add_content(data)

Appends data to the end of the object's current entity body.

clone

$r->clone(  )

Creates a copy of the current object, $r, and returns a reference to it.

content

$r->content ([content])

Without an argument, content returns the entity body of the object. With a scalar argument, the entity body will be set to content.

content_ref

$r->content_ref(  )

Returns a reference to the string containing the content body. This reference can be used to manage large content data.

headers

$r->headers(  )

Returns the embedded HTTP::Headers object from the message object.

protocol

$r->protocol([string])

Sets or retrieves the HTTP protocol string for the message object. This string looks like HTTP/1.1.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.