03. Comparing schema types

Rules written as traditional DTD schemas function perfectly well, but the newer XML Schema language provides several improvements that allow greater control with more flexibility:

The final point in this comparision describes the most marked difference between the two schema types and allow XML Schema much more control over element content that DTD.

Nominating an XSD for XML

The declaration to nomiate an XSD schema that is to be sued with an XML document is made in teh root element's start tag. This is unlike the method that nomiated a DTD schema using a <!DOCTYPE> delcaration.

An attribute called xmlns:xsi must first be added to the root element's start tag - "xmlns" means "XML NameSpace" and "xsi" means "Xml Schema Instance". This attribute is then assigned the URL "http://www.w3.org/2001/XMLSchemaInstance" that makes components available to specify the location of the schema for that XML document.

Many XML documents also include something called a "target namespace" in the schema declaration, but those that don't can indicate the location of their schema by assigning its URL to a special attribute called "xsi:noNamespaceSchemaLocation".

The URL of the XSD schema document can be stated as na absolute address using its full path, such as the fictitious address "http://auxy.com/fruit.xsd" if the schema is in a remote location. Alternatively, the URL can be stated as a relative address such as "fruit.xsd" if the schema is in a local system directory.

<?xml version="1.0" encoding="UTF-8" ?>
<fruit
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="fruit.xsd" >
<apple>Golden Delicious </apple>
</fruit>

Creating an XSD schema

As all XML Schema Documents (XSD) are written in XML , they must each begin with the standard XML identifier processing instructions.

<?xml version = "1.0" encoding = "UTF-8" ?>

The root element of XSD schemas is called xsd:schema and must contain an xmlns:xsd attribute that specifies the schema for that XSD document - sometimes called the "schema of schemas". The stand root element of an XSD schema looks like this:

<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
</xsd:schema>

Element declaration

An XSD schema declares each element within teh XML document as an <xsd:element> element in the schema. This has a name attribute, to specify the XML element's tag name, and a type attribute, to specify the type of content it may contain. For instance, "simple type" elements might allow the xsd:string type for text content, whereas "complex type" elements that allow nested elements will specify the anem of structure definition.

Structure definition

An XSD schema defines element structures using an <xsd:complexType> element. This has a name attribute that is assigned a descriptive name by which it can be referenced. If the structure it is describing is a sequence of XML elements it will contain an <xsd:sequence> element. This, in turn, will contain one or more <xsd:element> elemetns that each have a ref attribute, which reference the tag name specified in teh element declaration.

<?xml version="1.0" encoding="UTF-8" ?>
<doc
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="hello.xsd" >
<msg>Hello World!</msg>
<!-- Uncomment the next line to disobey schema rules. -->
<!-- <msg>Bad content</msg> -->
</doc>

<?xml version="1.0" encoding = "UTF-8" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
<!-- DECLARE ELEMENTS. -->
<xsd:element name="doc" type="docType"/>
<xsd:element name="msg" type="xsd:string"/>
<!-- DEFINE STRUCTURE. -->
<xsd:complexType name="docType">
<xsd:sequence>
<xsd:element ref="msg"/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>

Specifying element sequence

The order in which nested elements must appear is determined by the element structure defined in teh XSD schema - each XML elemetn should appear in the same order as the <xsd:element> element.

<?xml version="1.0" encoding="UTF-8" ?>
<doc
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="memo.xsd" >
<memo>
<title>End of Month</title>
<from>Admin</from>
<to>All Users</to>
<msg>Passwords will be reset tomorrow.</msg>
</memo>
</doc>

<?xml version="1.0" encoding="UTF-8" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
<!-- DECLARE ELEMENTS. -->
<!-- Simple type elements. -->
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="from" type="xsd:string"/>
<xsd:element name="to" type="xsd:string"/>
<xsd:element name="msg" type="xsd:string"/>
<!-- Complex type elements. -->
<xsd:element name="doc" type="docType"/>
<xsd:element name="memo" type="memoType"/>
<!-- DEFINE STRUCTURES. -->
<xsd:complexType name="docType">
<xsd:sequence>
<xsd:element ref="memo" />
</xsd:sequence>
</xsd:complexType>

<xsd:complexType name="memoType">
<xsd:sequence>
<xsd:element ref="title" />
<xsd:element ref="from" />
<xsd:element ref="to" />
<xsd:element ref="msg" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>

Controlling element occurence

Child elements that are specified in an XSD schema complex type structure definition are normally allowed to occur once within the parent element. The number of allowable occurences can be changed by adding minOccurs and maxOccurs attributes. These can specify how many times an element may occur within its parent element, and also how many times a sequence may occur. Each limit can be speciifed numerically, or an infinite number of occurences can be allowed if specified as "unbounded".

The XML doc listed below requires the schema to allow one or more occurences of a sequence of <title>, <forename>, and <surname> elements, but also allow the <title> to be optional

<?xml version="1.0" encoding="UTF-8" ?>
<contacts
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="contacts.xsd" >
<title>Mr</title>
<forename>John</forename>
<!-- Uncomment the next line to break the schema rules. -->
<!-- <forename>William</forename> -->
<surname>Smith</surname>
<forename>Sally</forename>
<surname>James</surname>
</contacts>

<?xml version="1.0" encoding="UTF-8" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
<!-- DECLARE ELEMENTS. -->
<!-- Simple type elements. -->
<xsd:element name="title" type="xsd:string"/>
<xsd:element name="forename" type="xsd:string" />
<xsd:element name="surname" type="xsd:string"/>
<!-- Complex type elements. -->
<xsd:element name="contacts" type="contactsType"/>
<!-- DEFINE STRUCTURE. -->
<xsd:complexType name="contactsType">
<!-- Let the sequence occur indefinitely, but at least once. -->
<xsd:sequence minOccurs="1" maxOccurs="unbounded" >
<!-- Let the title be optional, but not occur more than once. -->
<xsd:element ref="title" minOccurs="0" maxOccurs="1"/>

<xsd:element ref="forename" />
<xsd:element ref="surname" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>

Allowing alternative elements

An element may sometimes be required to allow a choice of child element - so allowable alternatives may be specified within an <xsd:choice> element in an XSD schema structure definition. This contains <xsd:element> elemetns that reference the declarations of those XML elements thate are allowed.

In the XML document listed below the root <doc> element contains a single sequence of <desc> and <image> elements. Each <image> element must contain one of two child elements - either one <src> element or one <alt> element.

<?xml version="1.0" encoding="UTF-8" ?>
<doc
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="image.xsd" >
<desc>The new Dodge Challenger was designed around its muscle car ancestor, but with a twist of modern technology.</desc>
<image>
<src>front-quarter.jpg</src>
<!-- <alt>Exterior shot of the new Dodge Challenger</alt> -->
</image>
<image>
<!-- <src>front-interior.jpg</src> -->
<alt>Inside the leather high-back seats have a sunken ribbed look, just like the seats which came in the 1970 Challenger.</alt>
</image>
</doc>

<?xml version="1.0" encoding="UTF-8" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<!-- DECLARE ELEMENTS. -->
<!-- Simple types. -->
<xsd:element name="desc" type="xsd:string"/>
<xsd:element name="src" type="xsd:string"/>
<xsd:element name="alt" type="xsd:string"/>
<!-- Complex types. -->
<xsd:element name="doc" type="docType"/>
<xsd:element name="image" type="imageType" />
<!-- DEFINE STRUCTURES. -->
<xsd:complexType name="docType">
<xsd:sequence>
<xsd:element ref="desc" />
<xsd:element ref="image" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="imageType">
<xsd:choice>
<xsd:element ref="src" />
<xsd:element ref="alt" />
</xsd:choice>
</xsd:complexType>
</xsd:schema>

Permitting element attributes

A schema can allow an XML element to contain an attribute using an <xsd:attribute> schema element. Unlike schema element declarations, which are made "globally", attribues can be defined "locally" within an element sturcture definition.

An XML elemetn with no nested elements can be defined as a complex type using an <xsd:simpleContent> element. This contains an <xsd:extension> element describing the allowable content. For instance, text as <xsd:extension base = "xsd:string">.

The <xsd:extension> can also contain an <xsd:attribute> element taht adds an allowable XML attribute, stating its name and type. For instance, <xsd:attribute name = "id type = "xsd:string" />.

The XML document below includes an optional id attribute within most <album> elements to identify the year of release.

<?xml version="1.0" encoding="UTF-8" ?>
<discography
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="album.xsd" >
<artist>Pink</artist>
<album id="2000">Can't Take Me Home</album>
<album id="2001">Misundaztood</album>
<album id="2003">Try This</album>
<album id="2006">I'm Not Dead</album>
<album>(...in production)</album>
</discography>

<?xml version="1.0" encoding ="UTF-8" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<!-- DECLARE ELEMENTS. -->
<!-- Simple type. -->
<xsd:element name="artist" type="xsd:string"/>
<!-- Complex types. -->
<xsd:element name="discography" type="discoType"/>
<xsd:element name="album" type="albumType"/>
<!-- DEFINE STRUCTURES. -->
<!-- Sequence structure. -->
<xsd:complexType name="discoType">
<xsd:sequence>
<xsd:element ref="artist"/>
<xsd:element ref="album" maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
<!-- Structure of the <album> element. -->
<xsd:complexType name="albumType">
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute name="id" type="xsd:string"/>
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>
</xsd:schema>

Requiring attribute values

An XSD schema can ensure that an attribute must be included in an alement by including a use attribute within an <xsd:attribute> element and assigning it a value of "required". Additionally, the attribute may be clearly defined as providing a unique element identity by defining its allowable content type as "xsd:ID". Attribute values are note required to be unique unless the schema does define its allowable content type as xsd:ID

Specifying that an attribute type should be xsd:ID and its use be required is useful for attributes that must be included to contain unique values, such as product identification codes.

The XML document listed below requires a unique id attribute to be included within every <cactus> element.

<?xml version="1.0" encoding="UTF-8"?>
<doc
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="cactus.xsd" >
<cactus id="AZ1">
<name>Arizona Barrel</name>
</cactus>
<cactus id="AZ2">
<name>Arizona Beehive</name>
</cactus>
<cactus id="AZ3">
<name>Arizona Fishhook</name>
</cactus>
<cactus id="AZ4">
<name>Arizona Hedgehog</name>
</cactus>
<cactus id="AZ5">
<name>Arizona Pincushion</name>
</cactus>
</doc>

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<!-- DECLARE ELEMENTS. -->
<!-- Simple type. -->
<xsd:element name="name" type="xsd:string"/>
<!-- Complex types. -->
<xsd:element name="doc" type="docType"/>
<xsd:element name="cactus" type="cactusType"/>
<!-- DEFINE STRUCTURES. -->
<!-- Root element sequence structure. -->
<xsd:complexType name="docType">
<xsd:sequence>
<xsd:element ref="cactus" maxOccurs="100"/>
</xsd:sequence>
</xsd:complexType>
<!-- Nested <cactus> element sequence structure. -->
<xsd:complexType name="cactusType">
<xsd:sequence>
<xsd:element ref="name"/>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:ID" use="required"/>
</xsd:complexType>
</xsd:schema>

Adding comments & entities

The <!-- and --> comment tags can be used to add comments within XSD documents - just as they can with XML and HTML. Additionally, XSD provides an <xsd:annotation> element, which can contain <xsd:appinfo> and <xsd:documentation> elements to describe the schema.

XML Schema does not support entities but, where absolutely necessary, they can be defined in an XML document by DTD <!ENTITY> declarations within a <!DOCTYPE> element.

The XML document below has four different entity references to lengthy strings that are all markup language names.

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE doc [
<!-- Define entity values. -->
<!-- Common markup language acronyms. -->
<!ENTITY html "HyperText Markup Language (HTML)" >
<!ENTITY xml "eXtensible Markup Language (XML)" >
<!ENTITY sgml "Standard Generalized Markup Language (SGML)" >
<!ENTITY gml "Generalized Markup Language (GML)" >
]>
<doc
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="history.xsd" >
<para>Both &html; and &xml; are derived from &sgml; which is, in turn, a descendant of the &gml; that was developed in the 1960s by IBM. </para>
</doc>

<?xml version="1.0" encoding="UTF-8" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<!-- Schema information. -->
<xsd:annotation>
<xsd:appinfo>Entity workaround</xsd:appinfo>
<xsd:documentation>info.doc</xsd:documentation>
</xsd:annotation>
<!-- DECLARE ELEMENTS. -->
<!-- Simple type. -->
<xsd:element name="para" type="xsd:string"/>
<!-- Complex type. -->
<xsd:element name="doc" type="docType"/>
<!-- DEFINE STRUCTURE. -->
<xsd:complexType name="docType">
<xsd:sequence>
<xsd:element ref="para" />
</xsd:sequence>
</xsd:complexType>
</xsd:schema>