Book HomeJava and XSLTSearch this book

13.5. Expat Encodings

XML documents may be encoded in character sets other than Unicode as long as they can be mapped into the Unicode character set. Expat has further restrictions on encodings. Read the xmlparse.h header file in the expat distribution to see details on these restrictions.

Expat has built-in encodings for: UTF-8, ISO-8859-1, UTF-16, and US-ASCII. Encodings are set through either the XML declaration encoding attribute or the ProtocolEncoding option to XML::Parser or XML::Parser::Expat.

For encodings other than the built-ins, Expat calls the function load_encoding in the Expat package with the encoding name. This function looks for a file in the path list @XML::Parser::Expat::Encoding_Path that matches the lowercased name with a .enc extension. The first one it finds, it loads.

If you wish to build your own encoding maps, check out the XML::Encoding module from CPAN.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.