Book HomeHTML & XHTML: The Definitive GuideSearch this book

9.2. The <form> Tag

Place a form anywhere inside the body of a document with its elements enclosed by the <form> tag and its respective end tag </form>. You can, and we recommend you often do, include regular body content inside a form to specially label user-input fields and to provide directions.

<form>

Function:

Defines a form

Attributes:

ACCEPT

ACCEPT-CHARSET

ONKEYPRESS

ACTION

ONKEYUP

CLASS

ONMOUSEDOWN

DIR

ONMOUSEMOVE

ENCTYPE

ONMOUSEOUT

ID

ONMOUSEOVER

LANG

ONMOUSEUP

METHOD

ONRESET

NAME

ONSUBMIT

ONCLICK

STYLE

ONDBLCLICK

TARGET

ONKEYDOWN

TITLE

End tag:

</form>; never omitted

Contains:

form_content

Used in:

block

Browsers flow the special form elements into the containing paragraphs as if they were small images embedded into the text. There aren't any special layout rules for form elements, so you need to use other elements, like tables and style sheets, to control the placement of elements within the text flow.

You must define at least two special form attributes, which provide the name of the form's processing server and the method by which the parameters are to be sent to the server. A third, optional attribute lets you change how the parameters get encoded for secure transmission over the network.

9.2.1. The action Attribute

The required action attribute for the <form> tag gives the URL of the application that is to receive and process the form's data.

Most webmasters keep their forms-processing applications in a special directory on their web server, usually named cgi-bin, which stands for Common Gateway Interface-binaries.[57] Keeping these special forms-processing programs and applications in one directory makes it easier to manage and secure the server.

[57]The Common Gateway Interface (CGI) defines the protocol by which servers interact with programs that process form data.

A typical <form> tag with the action attribute looks like this:

<form action="http://www.kumquat.com/cgi-bin/update">
...
</form>

The example URL tells the browser to contact the web server named www in the kumquat.com domain and pass along the user's form values to the application named update located in the cgi-bin directory.

In general, if you see a URL that references a document in a directory named cgi-bin, you can be pretty sure that the document is actually an application that creates the desired page dynamically each time it's invoked.

9.2.2. The enctype Attribute

The browser specially encodes the form's data before it passes that data to the server so that it does not become scrambled or corrupted during the transmission. It is up to the server either to decode the parameters or to pass them, still encoded, to the application.

The standard encoding format is the Internet Media Type " application/x-www-form-urlencoded." You can change that encoding with the optional enctype attribute in the <form> tag. The only optional encoding formats currently supported are "multipart/form-data" and "text/plain."

The multipart/form-data alternative is required for those forms that contain file-selection fields for upload by the user. The text/plain format should be used in conjunction with a mailto URL in the action attribute for sending forms to an email address instead of a server. Unless your forms need file-selection fields or you must use a mailto URL in the action attribute, you probably should ignore this attribute and simply rely upon the browser and your processing server to use the default encoding type. Section 9.5.1.3, "File-selection controls"

9.2.2.1. The application/x-www-form-urlencoded encoding

The standard encoding -- application/x-www-form-urlencoded -- converts any spaces in the form values to a plus sign (+), nonalphanumeric characters into a percent sign (%) followed by two hexadecimal digits that are the ASCII code of the character, and the line breaks in multiline form data into %0D%0A.

The standard encoding also includes a name for each field in the form. (A "field" is a discrete element in the form, whose value can be nearly anything from a single number to several lines of text -- the user's address, for example.) If there is more than one value in the field, the values are separated by ampersands.

For example, here's what the browser sends to the server after the user fills out a form with two input fields labeled name and address; the former field has just one line of text, while the latter field has several lines of input:

name=O'Reilly+and+Associates&address=101+Morris+Street%0D%0A
Sebastopol,%0D%0ACA+95472

We've broken the value into two lines for clarity in this book, but in reality, the browser sends the data in an unbroken string. The name field is "O'Reilly and Associates" and the value of the address field, complete with embedded newline characters, is:

101 Morris Street
Sebastopol,
CA 95472

9.2.2.2. The multipart/form-data encoding

The multipart/form-data encoding encapsulates the fields in the form as several parts of a single MIME-compatible compound document. Each field has its own section in the resulting file, set off by a standard delimiter. Within each section, one or more header lines define the name of the field, followed by one or more lines containing the value of the field. Since the value part of each section can contain binary data or otherwise unprintable characters, no character conversion or encoding occurs within the transmitted data.

This encoding format is by nature more verbose and longer than the application/x-www-form-urlencoded format. As such, it can be used only when the method attribute of the <form> tag is set to post, as described in Section 9.2.4, "The method Attribute".

A simple example makes it easy to understand this format. Here's our previous example, when transmitted as multipart/form-data:

------------------------------146931364513459
Content-Disposition: form-data; name="name"
  
O'Reilly and Associates
------------------------------146931364513459
Content-Disposition: form-data; name="address"
  
101 Morris Street
Sebastopol,
CA 95472
------------------------------146931364513459--

The first line of the transmission defines the delimiter that will appear before each section of the document. It always consists of thirty dashes and a long random number that distinguishes it from other text that might appear in actual field values.

The next lines contain the header fields for the first section. There will always be a Content-Disposition field indicating the section contains form data and providing the name of the form element whose value is in this section. You may see other header fields; in particular, some file-selection fields include a Content-Type header field that indicates the type of data contained in the file being transmitted.

After the headers, there is a single blank line followed by the actual value of the field on one or more lines. The section concludes with a repeat of the delimiter line that started the transmission. Another section follows immediately, and the pattern repeats until all of the form parameters have been transmitted. The end of the transmission is indicated by an extra two dashes at the end of the last delimiter line.

As we pointed out earlier, use multipart/form-data encoding only when your form contains a file-selection field. Here's an example of how the transmission of a file-selection field might look:

------------------------------146931364513459
Content-Disposition: form-data; name="thefile"; filename="test"
Content-Type: text/plain
  
First line of the file
...
Last line of the file
------------------------------146931364513459--

The only notable difference is that the Content-Disposition field contains an extra element, filename, that defines the name of the file being transmitted. There might also be a Content-Type field to further describe the file's contents.

9.2.2.3. The text/plain encoding

Use this encoding only when you don't have access to a form-processing server and need to send the form information by email (the form's action attribute is a mailto URL). The conventional encodings are designed for computer consumption; text/plain is designed with people in mind.

In this encoding, each element in the form is placed on a single line, with the name and value separated by an equal sign. Returning to our name and address example, the form data would be returned as:

name=O'Reilly and Associates
address=101 Morris Street%0D%0ASebastopol,%0D%0ACA 95472

As you can see, the only characters still encoded in this form are the carriage return and line feed characters in multiline text input areas. Otherwise, the result is easily readable and generally parsable by simple tools.

9.2.3. The accept-charset Attribute

The accept-charset attribute was introduced in the HTML 4.0 standard. It lets you specify a list of character sets that the server must support to properly interpret the form data. The value of this attribute is a quote-enclosed list of one or more ISO character set names. The browser may choose to disregard the form or handle it differently if the acceptable character sets do not match the character set in use by the user. The default value of this attribute is unknown, implying that the form character set is the same as the document containing the form.

9.2.4. The method Attribute

The other required attribute for the <form> tag sets the method by which the browser sends the form's data to the server for processing. There are two ways: the POST method and the GET method.

With the POST method, the browser sends the data in two steps: the browser first contacts the form-processing server specified in the action attribute and, once contact is made, sends the data to the server in a separate transmission.

On the server side, POST-style applications are expected to read the parameters from a standard location once they begin execution. Once read, the parameters must be decoded before the application can use the form values. Your particular server will define exactly how your POST-style applications can expect to receive their parameters.

The GET method, on the other hand, contacts the form-processing server and sends the form data in a single transmission step: the browser appends the data to the form's action URL, separated by the question mark character.

The common browsers transmit the form information by either method; some servers receive the form data by only one or the other method. You indicate which of the two methods -- POST or GET -- your forms-processing server handles with the method attribute in the <form> tag. Here's the complete tag including the GET transmission method attribute for the previous form example:

<form method=GET 
   action="http://www.kumquat.com/cgi-bin/update"> 
  ...
</form>

9.2.4.1. POST or GET?

Which one to use if your form-processing server supports both the POST and GET methods? Here are some rules of thumb:

9.2.4.2. Passing parameters explicitly

The foregoing bit of advice warrants some explanation. Suppose you had a simple form with two elements named x and y. When the values of these elements are encoded, they look like this:

x=27&y=33

If the form uses method=GET, the URL used to reference the server-side application looks something like this:

http://www.kumquat.com/cgi-bin/update?x=27&y=33

There is nothing to keep you from creating a conventional <a> tag that invokes the form with any parameter value you desire, like so:

<a href="http://www.kumquat.com/cgi-bin/update?x=19&y=104">

The only hitch is that the ampersand that separates the parameters is also the character-entity insertion character. When placed within the href attribute of the <a> tag, the ampersand will cause the browser to replace the characters following it with a corresponding character entity.

To keep this from happening, you must replace the literal ampersand with its entity equivalent, either &#38; or &amp;. With this substitution, our example of the nonform reference to the server-side application looks like this:

<a href="http://www.kumquat.com/cgi-bin/update?x=19&amp;y=104">

Because of the potential confusion that arises from having to escape the ampersands in the URL, server implementors are encouraged to also accept the semicolon as a parameter separator. You might want to check your server's documentation to see if the server honors this convention. See Appendix F, "Character Entities".

9.2.5. The target Attribute

With the advent of frames, it is possible to redirect the results of a form to another window or frame. Simply add the target attribute to your <form> tag and provide the name of the window or frame to receive the results.

Like the target attribute used in conjunction with the <a> tag, you can use a number of special names with the target attribute in the <form> tag to create a new window or to replace the contents of existing windows and frames. Section 11.7.1, "The target Attribute for the <a> Tag"

9.2.6. The id, name, and title Attributes

The id attribute lets you attach a unique string label to your form for reference by programs (applets) and hyperlinks. Before id was introduced in HTML 4.0, Netscape Navigator used the name attribute to achieve similar effects, although it cannot be used in a hyperlink. To be compatible with the broadest range of browsers, we recommend that for now you include both name and id with <form>, if needed. In the future, you should use only the id attribute for this purpose.

The title attribute defines a quote-enclosed string value to label the form. However, it entitles only the form segment; its value cannot be used in an applet reference or hyperlink. Section 4.1.1.4, "The id attribute" Section 4.1.1.5, "The title attribute"

9.2.7. The class, style, lang, and dir Attributes

The style attribute creates an inline style for the elements enclosed by the form, overriding any other style rule in effect. The class attribute lets you format the content according to a predefined class of the <form> tag; its value is the name of that class. Section 8.1.1, "Inline Styles: The style Attribute" Section 8.3, "Style Classes"

The actual effects of style with <form> are hard to predict, however. In general, style properties affect the body content -- text, in particular -- that you may include as part of the form's contents, but <form> styles do affect the display characteristics of the form elements.

For instance, you may create a special font face and background color style for the form. The form's text labels, but not the text inside a text input form element, will appear in the specified font face and background color. Similarly, the text labels you put beside a set of radio buttons will be in the form-specified style, but not the radio buttons themselves.

The lang attribute lets you specify the language used within the form, with its value being any of the ISO standard two-character language abbreviations, including an optional language modifier. For example, adding lang=en-UK tells the browser that the list is in English ("en") as spoken and written in the United Kingdom (UK). Presumably, the browser may make layout or typographic decisions based upon your language choice.

Similarly, the dir attribute tells the browser which direction to display the list contents, from left to right (dir=ltr) like English or French, or from right to left (dir=rtl), such as with Hebrew or Chinese.

The dir and lang attributes are supported by the popular browsers, even though there are no behaviors defined for any specific language. Section 3.6.1.1, "The dir attribute" Section 3.6.1.2, "The lang attribute"

9.2.8. The Event Attributes

As for most other elements in a document, the <form> tag honors the standard mouse and keyboard event-related attributes the compliant browser will recognize. We describe the majority of these attributes in detail in Chapter 12, "Executable Content". Section 12.3.3, "JavaScript Event Handlers"

Forms have two special event-related attributes: onSubmit and onReset. The value of these event attributes is -- enclosed in quotation marks -- one or a sequence of semicolon-separated JavaScript expressions, methods, and function references. With onSubmit, the browser executes these commands before it actually submits the form's data to the server or sends it to an email address.

You may use the onSubmit event for a variety of effects. The most popular is for a client-side form-verification program that scans the form data and prompts the user to complete one or more missing elements. Another popular and much simpler use is to inform users when a mailto URL form is being processed via email.

The onReset attribute is used just like the onSubmit attribute, except that the associated program code is executed only if the user presses a "Reset" button in the form.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.