Book HomeHTML & XHTML: The Definitive GuideSearch this book

Chapter 3. Anatomy of an HTML Document

Contents:

Appearances Can Deceive
Structure of an HTML Document
Tags and Attributes
Well-Formed Documents and XHTML
Document Content
HTML Document Elements
The Document Header
The Document Body
Editorial Markup
The <bdo> Tag

Most HTML and XHTML documents are very simple, and writing one shouldn't intimidate even the most timid of computer users. First, although you might use a fancy WYSIWYG editor to help you compose it, a document is ultimately stored, distributed, and read by a browser as a simple ASCII text file.[16] That's why even the poorest user with a barebones text editor can compose the most elaborate of web pages. (Accomplished webmasters often elicit the admiration of "newbies" by composing astonishingly cool pages using the crudest text editor on a cheap laptop computer and performing in odd places like on a bus or in the bathroom.) Authors should, however, keep several of the popular browsers on hand and alternate among them to view new documents under construction. Remember, browsers differ in how they display a page, not all browsers implement all of the language standards, and some have their own special extensions.

[16]Informally, both the text and the markup tags are ASCII characters. Technically, unless you specify otherwise, text and tags are made up of eight-bit characters as defined in the standard ISO-8859-1 Latin character set. The standards do support alternative character encoding, including Arabic and Cyrillic. See Appendix F, "Character Entities" for details.

3.1. Appearances Can Deceive

Documents never look alike when displayed by a text editor and when displayed by a browser. Take a look at any source document from the World Wide Web. At the very least, return characters, tabs, and leading spaces, although important for readability of the source text document, are ignored for the most part. There also is a lot of extra text in a source document, mostly from the display tags and interactivity markers and their parameters that affect portions of the document, but don't themselves appear in the display.

Accordingly, new authors are confronted with having to develop not only a presentation style for their web pages, but a different style for their source text. The source document's layout should highlight the programming-like markup aspects of HTML and XHTML, not their display aspects. And it should be readable not only by you, the author, but by others as well.

Experienced document writers typically adopt a programming-like style, albeit very relaxed, for their source text. We do the same throughout this book, and that style will become apparent as you compare our source examples with the actual display of the document by a browser.

Our formatting style is simple, but it serves to create readable, easily maintained documents:

The task of maintaining the indentation of your source file ranges from trivial to onerous. Some text editors, like Emacs, manage the indentation automatically; others, like common word processors, couldn't care less about indentation and leave the task completely up to you. If your editor makes your life difficult, you might consider striking a compromise, perhaps by indenting the tags to show structure, but leaving the actual text without indentation to make modifications easier.

No matter what compromises or stands you make on source code style, it's important that you adopt one. You'll be very glad you did when you go back to that document you wrote three months ago searching for that really cool trick you did with. . . . Now, where was that?



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.