Book HomeHTML & XHTML: The Definitive GuideSearch this book

1.4. XHTML: What It Is

You've certainly heard of HTML, but did you know that it is one of many other markup languages? Indeed, HTML is the black sheep in the family of document markup languages. HTML is based on SGML, the Standard Generalized Markup Language. The powers-that-be created SGML with the intent that it be the one and only markup metalanguage from which all other document markup elements would be created. Everything from hieroglyphics to HTML can be defined using SGML, negating any need for any other markup language.

The problem with SGML is that it is so broad and all-encompassing that mere mortals cannot use it. Using SGML effectively requires very expensive and complex tools that are completely beyond the scope of regular people who just want to bang out an HTML document in their spare time. As a result, HTML and other language standards adhere to some, but not all SGML standards,[4] eliminating many of the more esoteric features so that HTML is readily useable and used.

[4]The HTML DTD in Appendix D, "The HTML 4.01 DTD" uses a subset of SGML to define the HTML 4.01 standard.

Recognizing that SGML is unwieldy and not well-suited to describing the very popular HTML in a useful way, and that there was a growing need to define other HTML-like markup languages to handle different network documents, the W3C defined the Extensible Markup Language (XML). Like SGML, XML is a separate formal markup metalanguage that uses select features of SGML to define markup languages. It eliminates many features of SGML that aren't applicable to languages like HTML and simplifies other SGML elements in order to make them easier to use and understand.

HTML Version 4.01 is not XML-compliant. Hence, the W3C offers XHTML, a reformulation of HTML to be compliant under XML. XHTML attempts to support every last nit and feature of HTML 4.01 using the more rigid rules of XML. It generally succeeds but has enough differences to make life difficult for the standards-conscious HTML author.

Confused? Don't be. Learning HTML is still the way to go for most authors and Web developers. The native language endures. Besides, by learning HTML, you learn the working bits of XHTML, effectively the same things. There are some differences, which we explore in Chapter 16, "XHTML", XHTML. But the differences should not affect your work in the foreseeable future.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.