Book HomeXML SchemaSearch this book


A type of element or complex datatype that cannot be used directly in the instance documents. An abstract element must be substituted and is usually the head of a substitution group. An abstract complex type may be used to define content models, in which case the type will have to be substituted in the instance documents using xsi:type. There is no feature to define simple types as abstract (even though the predefined type xs:NOTATION could be considered abstract).

In a regular expression, an atom expresses a condition on a substring. Atoms may be followed by a quantifier defining the expected number of the atom's occurrences. The atom, with its optional number of occurrences, constitutes a "piece." An atom may be a character, a wildcard, a special character, a character class, or a regular expression.

atomic type
A simple type that is not derived by list or union from another simple type.

Pieces of information attached to an element and defined in its start tag. Considered child nodes by the XPath data model, and considered property nodes by the DOM, attributes are "information items" to the XML Infoset.

attribute groups
Containers that allow you to define, reference, and redefine groups of attributes. B

base type
The datatype that is used as the starting point to define a new datatype by derivation by restriction or extension.

Berners-Lee, Tim
Inventor of HTML and HTTP, and Director of the W3C; he is considered the father of the World Wide Web (see

Elements and complex datatypes that cannot be substituted in the instance documents. A blocked element or complex type is restricted in the substitutions that may occur in the instance documents. There is no feature to block simple types. C

canonical lexical representation
When a value in the value space may have different lexical representations in the lexical space, the W3C XML Schema Recommendation provides (when possible) a canonical representation, which is the most "normal" or "classical" and may be used as a reference. Although most of the types have canonical representations, some such as xs:duration or xs:QName, do not have one.

chameleon design
Importing a schema without a namespace into a schema with a target namespace is known as "chameleon design." This is because the imported schema takes the target namespace of the schema in which it is imported like a chameleon takes the color of the environment in which it is placed.

character class
In a regular expression, a character class is an atom matching a set of characters. Character classes may be classical Perl character classes, Unicode character classes, or user-defined character classes.

classical Perl character class
A set of character classes designated by a single letter, for which upper- and lowercases of the same letter are complementary (for instance, "\d" is all the decimal digits, and "\D" is all the characters that are not decimal digits).

complex content
An element has a complex content model when it has child element nodes only (and no text node).

Something that can be defined and referenced in a schema. Elements, attributes, simple and complex types, and element and attribute groups are components.

Containers that allow the manipulation of a set of elements as a whole and defines their relative order. Compositors include xs:sequence, xs:choice, and xs:all. Compositors may be included in other compositors to form complex combinations (with some limitations). Most can also be used as particles and have minOccurs and maxOccurs attributes, which allow definition of the number of repetitions expected for the whole group of elements that they define. The child elements of a compositor are "particles." A restriction applies to xs:all as a compositor: it can only include xs:element particles.

Consistent Declaration rule
This states that an element referenced by one "location" in a schema cannot be associated with two different simple or complex types.

content model
A description of the structure of children elements and text nodes (independent of attributes). The content model is "simple" when there is a text node but no elements, "complex" when there are element nodes but no text, "mixed" when there are text and element nodes, and "empty" when there are neither text nor element nodes. These definitions are commonly used by XML developers and slightly different from those of W3C XML Schema, for which there are only simple and complex content models. (Mixed models are considered special cases of complex contents, and empty models are considered either simple or complex contents with no child nodes.) D

A term used by W3C XML Schema to qualify both the content and the structure of an element or attribute. Datatypes can be either simple (when they describe an attribute or an element without an embedded element or attribute) or complex (when they describe elements with embedded child elements or attributes). W3C XML Schema datatypes should not be confused with XML 1.0 element types, which are called element names by W3C XML Schema.

default value
A value that is used when no value is provided in the instance document. Default values apply to attributes that are either empty or missing in the instance documents and that apply to empty elements.

The action of defining a datatype by using the definition of one or several other datatypes. Simple datatypes may be defined by derivation by restriction, list, or union, while complex datatypes can be defined by derivation by restriction or extension.

derivation by extension

derivation by list
The action of using a simple datatype (called the list type) to define a new simple datatype as a whitespace-separated list of values of the list type. Derivation by list applies only to simple datatypes.

derivation by restriction
For simple datatypes, a derivation by restriction is the action of defining a simple datatype by adding new constraints (called facets) on the lexical or value space of an existing datatype (called the base type). For complex datatypes, a derivation by restriction is the action of giving a new content model for the datatype that is a restriction of the base type.

derivation by union
The action of using a set of simple datatypes (called the member types) to define a new simple datatype whose lexical space is the union of the lexical spaces of the member types.

derived datatype
A datatype that is defined by derivation from other datatypes. They can be user-defined when defined in a schema, or predefined when defined in the W3C XML Schema Recommendation.

Document Object Model. An object-oriented model of XML documents, including the definition of the API allowing its manipulation. The third version of DOM (DOM Level 3) will include an API named "Abstract Schemas" to facilitate schema-guided editions of XML documents (see

Document Schema Definition Language (DSDL) is a project undertaken by the ISO (ISO/IEC JTC 1/SC 34/WG 1, to be precise) whose objective is "to create a framework within which multiple validation tasks of different types can be applied to an XML document in order to achieve more complete validation results than just the application of a single technology" (see DSDL has classified W3C XML Schema as "object-oriented schema language."

Document Type Definition. XML 1.0 DTDs are inherited from SGML, in which rules were included that allow the customization of the markup itself and played a very central role. Because of the syntactical rules included in their DTDs, SGML applications need a DTD to be able to read an SGML document. One of the simplifications of XML is to state that a XML parser should be able to read a document without needing a DTD. DTDs have therefore been simplified over their SGML ancestors and remain the first incarnation of what is today called a XML Schema language. E

One of the basic type of nodes in the tree represented by a XML document. An element is delimited by start and end tags. In the corresponding tree, an element is a nonterminal node, which may have subnodes of type element, character (text), and namespace and attribute, as well as comment and processing instruction nodes.

element type
Term used in the XML 1.0 Recommendation, which is equivalent to the notion of element names in W3C XML Schema and should not be confused with the simple or complex datatype of an element.

element groups
Containers that allow you to define, reference, and redefine groups of elements.

empty content
An element that has neither child element nor text nodes (with or without attributes). f

A constraint added to the lexical or value space of a simple datatype during a derivation by restriction. The list of facets that can be used depends on the simple datatype. Facets can be "fixed" to disable their use during further derivations.

Elements and datatypes that cannot be substituted or derived any longer in the schema. A final element may not be chosen as the head of a substitution group while a final complex or simple type cannot be used as a base for further derivation.

fixed facets
Facets that are "fixed" during a derivation by restriction cannot be used during further derivations by restriction.

fixed values
A value that must match the value found in the instance document. Used as default values if no value is supplied. G

global definition
All the components (elements, attributes, simple and complex types, element and attribute groups) can be defined at the top level of the schema, directly under the xs:schema document element. Their definition is said to be "global," and they can be referenced elsewhere in the schema, as well as in any schema that has imported or included this schema. I

XML Information Set. A formal description of the information that may be found in a well-formed XML document.

instance document
A XML document that is a candidate to be validated by a schema. Any well-formed XML 1.0 document that conforms to the Namespaces in XML 1.0 Recommendation can be considered a valid or invalid instance document.

item type
The simple datatype that is used as the starting point to define a new simple datatype using a derivation by list. L

lexical space
The set of all representations (after parsing and whitespace processing) allowed for a simple datatype.

local definition
Most of the components (elements, attributes, simple and complex types) can be defined inside of other components where they are used. Their definition is said to be "local" and they cannot be referenced in other parts of the schema.

local name
The name of a component in its namespace, i.e., the part of the qualified name that comes after the namespace prefix. M

member types
The simple datatypes used as the starting point to define a new simple datatype using a derivation by union.

mixed content
The content of an element that contains both child element and text nodes. N

A unique identifier that can be associated with a set of XML elements and attributes. This identifier is a URI, which is not required to point to an actual resource but must "belong" to the author of these elements and attributes. Since this full URI can't be included in the name of each element and attribute, a namespace prefix is assigned to the namespace URI through a namespace declaration. This prefix is added to the local name of the elements and attributes to form a qualified name. Namespaces are optional and elements and attributes may have no namespaces attached. W3C XML Schema has extended the scope of namespaces by using them not only for elements and attributes but also for all the components of a schema. A schema identifies the namespace of the components described in a schema as a target namespace. When these components do not have a namespace, the schema is said to have no target namespace. P

parsed space
The set of values that are sent by the parser to the applications. It is at the interface between the parser and the schema validator. Values from the parsed space undergo whitespace processing, as defined by their simple datatype, to feed the lexical space. The parsed space is, therefore, not visible by the facets.

An element, such as a compositor, a group of elements (xs:group), an element definition or reference (xs:element), or an element wildcard (xs:any), which is included in a compositor to define a list of elements. A restriction applies to xs:all, which cannot be used as a particle even though it is defined as a compositor. The number of occurrences of particles may be constrained using their minOccurs and maxOccurs attributes.

A facet that allows definition of a regular expression, which will be applied to the lexical space to check its validity. By extension, the regular expression defined in a pattern is often called "pattern" as well.

Regular expressions (or patterns) are composed of pieces. Each piece is itself composed of an atom describing a condition on a substring and an optional quantifier defining the expected number of occurrences of the atom.

predefined datatype
The simple datatypes (both primitive and derived) that are defined in the W3C XML Schema Recommendation.

primitive datatype
A simple datatype that cannot be defined by derivation from other datatypes. There is no way to create primitive datatypes, so all the primitive datatypes are therefore predefined.

The Post Schema Validation Infoset. The Infoset after the information gathered during a schema validation is added. Q

qualified element or attribute
Elements and attributes that belong to a namespace; i.e., a namespace URI is defined for them. The name of qualified elements may have no prefix if a default namespace is defined, but the name of qualified attributes must be prefixed.

qualified name
The complete name of a component, including the prefix associated to its target namespace if one is defined. R

Relational DataBase Management System. Developed in the late 70s, this system has taken most of the database market and hosts a significant amount of the data of many organizations. XML Schema languages may help to insure the interface between that information and XML documents.

Specifications published by the W3C. They cannot be officially called "standards," since the W3C is a consortium that does not have the status of the standard body reserved for the ISO and national standard bodies. The specifications, which are finalized and approved by the Director, are then called "W3C Recommendations."

All of the components (elements, attributes, simple and complex types, element and attribute groups) that have been created with a global definition can be referenced when needed in the schema in which they are defined, and in any schema that has imported or included this schema. Their definition is used at the location where they are referenced.

regular expression
A syntax to express conditions on strings. The syntax used by the W3C XML Schema for its patterns is very close to the syntax introduced by the Perl programming language. A regular expression is composed of elementary "pieces."

A grammar-based XML Schema language developed by Murata Makoto and published in March 2000 as a Japanese ISO Standard (see

A grammar-based XML Schema language resulting from a merger between RELAX and TREX (see S

Simple API for XML. A streaming event-based API used between parsers and applications. Its streaming nature means that pipelines of XML processing may be created using SAX (see

A rule-based XML Schema language, developed by Rick Jelliffe, using XPath expressions to describe validation rules (see

serialization space
The set of values as they are stored in a document. These values are transformed by the parser, as defined in the Recommendation XML 1.0, before reaching the application. The serialization space is not visible to the schema processors.

Standard Generalized Markup Language. Created in 1980, the ancestor of XML. XML was designed as a simplified subset of SGML to be used on the Web.

simple content
An element has a simple content model when it has a child text node only (and no subelements). A simple content element has a simple type if it has no attributes, and it has a complex type if it has any attributes.

simple datatype
A datatype that accepts only a text value. Simple datatypes can be directly assigned to attributes and simple content elements that do not accept any attribute. Simple datatypes can be used to define complex datatypes by extension.

The major XML protocol used by Web Services; relies on W3C XML Schema to describe the messages exchanged (see

W3C XML Schema uses the term "space" to mean a set of values (lexical versus value spaces). For completeness, we introduced two additional spaces in this book (the serialization and parsed spaces).

special character
A character that may be used as an atom after a "\" to accept a specific character, either for convenience or because this character is interpreted differently in the context of a regular expression.

substitution group
A feature of W3C XML Schema, allowing you to define groups of elements that may be used interchangeably in instance documents. They are not declared as element groups, but through the substitutionGroup attribute of xs:element global definitions. T

target namespace
The namespace of the components described in a schema. When these components do not have a namespace, the schema is said to have no target namespace.

A grammar-based XML Schema language developed by James Clark (see U

Unicode block
A set of characters classified by their "localization" (Latin, Arabic, Hebrew, Tibetan, and even Gothic or musical symbols).

Unicode category
A set of characters classified by their usage (letters, uppercase, digit, punctuation, etc.).

Unicode character class
A set of character classes defined based on the Unicode blocks and categories.

unqualified element or attribute
Elements and attributes that don't belong to a namespace; i.e., no namespace URI is defined for them. Any unprefixed attribute is unqualified, but unprefixed elements are unqualified only if no default namespace is defined.

UPA rule
The UPA (Unique Particle Attribution) rule states that at any given moment, a W3C XML Schema processor must know—without ambiguity and without needing any forward reference in the document—which particle in the schema describes an element in the instance document. This rule is roughly equivalent to the restrictions known as "non-deterministic content models" for the XML 1.0 DTDs and as "ambiguous content models" by SGML. The UPA rule is often associated with the "Consistent Declaration rule."

Uniform Resource Identifier. Defined by the RFCs 2396 and 2732. URIs were created to extend the notion of URLs (Uniform Resource Locators) to include abstract identifiers that do not necessarily need to "locate" a resource.

Uniform Resource Locator, a common identifier used on the Web. URLs are absolute when the full path to the resource is indicated, and relative when a partial path is given that needs to be evaluated in relation with a base URL.

user-defined character class

user-defined datatype
Datatypes that are defined in a schema. All the datatypes can be defined by derivation or, for the complex datatypes only, by definition. V

A XML document that is well-formed and conforms to a schema (DTD, W3C XML Schema, etc.) of some kind.

value space
The set of all the possible values for a simple datatype, independent of their actual representation in the instance documents. W

World Wide Web Consortium. Originally created to settle HTML and HTTP as de facto standards. The main specification body for the core specifications of the World Wide Web and the keeper of the core XML specifications (see

Web Services
An approach to using the Web for applications, as opposed to the Web for human consumption that we use on a daily basis. Those services rely on the same infrastructure as the Web, and exchange XML documents over HTTP though a layer of protocols (such as SOAP or XML-RPC), which are themselves based on XML. XML Schema languages are used by these services to describe and control the XML documents that are exchanged.

An XML document that meets the conditions defined in the XML 1.0 Recommendation: it must be readable without ambiguity. Syntax errors will be detected by a XML parser without schema of any type.

Characters #x9 (tab), #xA (linefeed), #xD (carriage return), and #x20 (space). These are often used to indent the XML documents to give them a more readable aspect, and are filtered by an operation named "whitespace processing."

whitespace collapsing
The action of applying the whitespace replacement, trimming the leading and trailing spaces, and replacing all the sequences of contiguous whitespaces by a single space between the parsed and lexical spaces. Most of the simple datatypes apply whitespace collapsing.

whitespace preservation
The action of preserving all the whitespaces from the parsed to the lexical space. The xs:string datatypes and the user-defined simple types derived from xs:string (which do not change the value of the xs:whitespace facet) are the only datatypes applying whitespace preservation.

whitespace processing
The operation of filtering that is done on the whitespaces present in the value of a simple datatype. The whitespace processing is done during the transformation between parsed and lexical spaces. W3C XML Schema defines three whitespace processing approaches (depending on the simple type): whitespace preservation, whitespace replacement, and whitespace collapsing.

whitespace replacement
The action of replacing all the occurrences of the characters #x9 (tab), #xA (linefeed), and #xD (carriage return) by a #x20 (space) between the parsed and the lexical space. Whitespace replacement doesn't change the length of the string. xs:normalizedString and the user-defined simple types derived from xs:string and xs:normalizedString (for which the value of the xs:whitespace facet is "replace") are the only datatypes that apply whitespace replacement.

A character used as an atom in a regular expression to accept a set of characters. W3C XML Schema supports only one such wildcard: the character ".", which means "any character." This expression is also used to designate the xs:any and xs:anyAttribute particles. X

The XML parser developed by the XML Apache project (see

A W3C specification defining a general purpose inclusion mechanism for XML documents (see

XML Linking Language is a W3C Recommendation ( "which allows elements to be inserted into XML documents in order to create and describe links between resources."

Extensible Markup Language. A subset of SGML created to be used on the Web. Its core specification (XML 1.0) was published by the W3C in February 1998. New specifications have been added since this date, and the W3C considers that, with the addition of W3C XML Schema, the core specifications are now complete.

Considered the ancestor of SOAP, XML-RPC is a simple XML protocol that may be used to implement Web Services. It does not rely on the W3C XML Schema to describe the content of its messages but has defined a simpler binding mechanism (see

A query language used to identify a set of nodes within a XML document. Originally defined to be used with XSLT, it is also used by XPointer and a simple subset is used in the xs:key, xs:keyref, and xs:unique W3C XML Schema elements. The XQuery specification will be a superset of the second version of XPath. This version will use type information provided by W3C XML Schema (see

XML Query language. This will be a superset of XPath 2.0 that will use type information provided by the W3C XML Schema to optimize its queries, and for features such as sort orders (see

Extensible Stylesheet Language Transformations. A programming language specialized for the transformation of XML documents (see

An open source W3C XML Schema implementation available at

Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.