Book HomeHTML & XHTML: The Definitive GuideSearch this book

3.5. Document Content

Nearly everything else you put into your HTML or XHTML document that isn't a tag is by definition content, and the majority of that is text. Like tags, document content is encoded using a specific character set, the ISO-8859-1 Latin character set, by default. This character set is a superset of conventional ASCII, adding the necessary characters to support the Western European languages. If your keyboard does not allow you to directly enter the characters you need, you can use character entities to insert the desired characters.

3.5.1. Advice Versus Control

Perhaps the hardest rule to remember when marking up an HTML or XHTML document is that all the tags you insert regarding text display and formatting are only advice for the browser: they do not explicitly control how the browser will display the document. In fact, the browser can choose to ignore all of your tags and do what it pleases with the document content. What's worse, the user (of all people!) has control over the text-display characteristics of his or her own browser.

Get used to this lack of control. The best way to use markup to control the appearance of your documents is to concentrate on the content of the document, not on its final appearance. If you find yourself worrying excessively about spacing, alignment, text breaks, and character positioning, you'll surely end up with ulcers. You will have gone beyond the intent of HTML. If you focus on delivering information to users in an attractive manner, using the tags to advise the browser as to how best to display that information, you are using HTML or XHTML effectively, and your documents will render well on a wide range of browsers.

3.5.2. Character Entities

Besides common text, HTML and XHTML give you a way to display special text characters that you might not normally be able to include in your source document or that have other purposes. A good example is the less-than or opening bracket (<) symbol. In HTML, it normally signifies the start of a tag, so if you insert it simply as part of your text, the browser will get confused and probably misinterpret your document.

For both HTML and XHTML, the ampersand character instructs the browser to use a special character, formally known as a character entity. For example, the command &lt; inserts that pesky less-than symbol into the rendered text. Similarly, &gt; inserts the greater-than symbol, and &amp; inserts an ampersand. There can be no spaces between the ampersand, the entity name, and the required, trailing semicolon. (Semicolons aren't special characters; you don't need to use an ampersand sequence to display a semicolon normally.) Section 16.3.7, "Handling Special Characters"

You also may replace the entity name after the ampersand with a pound symbol (#) and a decimal value corresponding to the entity's position in the character set. Hence, the sequence &#60; does the same thing as &lt; and represents the less-than symbol. In fact, you could substitute all the normal characters within an HTML document with ampersand-special characters, such as &#65; for a capital "A" or &#97; for its lowercase version, but that would be silly. A complete listing of all characters, their names, and numerical equivalents can be found in Appendix F, "Character Entities".

Keep in mind that not all special characters can be rendered by all browsers. Some browsers just ignore many of the special characters; with others, the characters aren't available in the character sets on a specific platform. Be sure to test your documents on a range of browsers before electing to use some of the more obscure character entities.

3.5.3. Comments

Comments are another type of textual content that appear in the source HTML document, but are not rendered by the user's browser. Comments fall between the special <!-- and --> markup elements. Browsers ignore the text between the comment character sequences.

Here's a sample comment:

<!-- This is a comment -->
<!-- This is a 
multiple line comment
that ends on this line -->

There must be a space after the initial <!-- and preceding the final -->, but otherwise you can put nearly anything inside the comment. The biggest exception to this rule is that the HTML standard doesn't let you nest comments.[18]

[18]Netscape does let you nest comments, but the practice is tricky; you cannot always predict how other browsers will react to nested comments.

Internet Explorer also lets you place comments within a special <comment> tag. Everything between the <comment> and </comment> tag is ignored by Internet Explorer, but all other browsers will display the comment to the user. Because of this undesirable behavior, we do not recommend using the <comment> tag for comments. Instead, always use the <!-- and --> sequences to delimit comments.

Besides the obvious use of comments for source documentation, many web servers use comments to take advantage of features specific to the document server software. These servers scan the document for specific character sequences within conventional HTML comments and then perform some action based upon the commands embedded in the comments. The action might be as simple as including text from another file (known as a server-side include) or as complex as executing other commands on the server to generate the document contents dynamically.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.