XML made simple   
Home   Support   Contact  

Home
Products
Downloads
Purchase
Support
Articles
   xml to edi 850
   xml schema editor
Introduction
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Conclusion
   xml editor

Chapter 2: Complexity of XML Schemas

In May of 2001, the W3C (finally) published a new way of defining XML documents that was more flexible and powerful than the classical DTD's of the day, called XML Schemas. XML Schemas overcame a lot of the limitations of DTDs, allowing for much more reusability and scalability. A handful of "native" simple datatypes were introduced, user defined complex datatypes are allowed, and borrowing a page from Object Oriented Programming, nodes can be descendants of other nodes, inheriting the structures and definitions of their ancestors.

Yes, the W3C XML Schema language is a much more powerful way to define an XML document, however with greater power comes greater complexity. While DTDs are a lot more primitive in what they can define, for the most part they are not too difficult to understand at a glance. A simple, basic XML Schema also is not too difficult to understand, however once you add the power of Schemas to your document, it quickly loses it's ability to be easily legible.

Power and Flexibility of XML Schemas

XML Schemas added quite a bit of power and flexibility to how you can define your XML document. Previously, documents were only defined through DTDs. DTDs allowed for primitive definitions of documents. They allowed you to define nodes, their names, and any children they could have, but that was about it. All nodes were defined at the 'top' level, so there were no nesting of nodes and thus all node names had to be unique. Nothing but PCDATA was allowed for their datatypes (i.e. one could not limit a node to being only a number, for example). And to top it off, DTDs were written in a different syntax than XML, so a developer had to learn two languages in order to effectively code in XML.

When XML Schemas were introduced, it was received with great enthusiasm. Written in XML, the developer did not have to learn another syntax in order to define his or her documents. Being XML, you can easily nest nodes. Now you can define more than just CDATA nodes, anything from predefined simple types like string, integer and datetime. Nodes can be defined at a global level and then redefined locally. Also being XML, schemas have a hierarchical format that can somewhat mimic the structure of the XML Document that's being defined.

XML Schemas vs. DTD

DTDs could define XML documents as such:
  • Constrains allowable elements and attributes
  • Limited occurrence of elements
  • Choice of elements in a sequence
  • All elements globally declared
XML Schemas allowed all of the above, but could in addition do the following:
  • Support Primitive Datatypes (string, int, etc.)
  • Greater context support
  • More detailed occurrence control
  • Default values
  • Nested elements

AddressType example

The best way to show how XML Schemas improved upon DTDs is by example. We will use the common example of an Address to show these differences. The XML snippet shown below is a sample of the Address element that we are trying to define.


<Address>
  <StreetAddress>72 S. Main St.</StreetAddress>
  <StreetAddress>Apt #2</StreetAddress>
  <City>JordansVille</City>
  <State>MW</State>
  <Zip>02300</Zip>
</Address>

A classical DTD would define the above XML as such:


<!ELEMENT Address (StreetAddress*, City, State, Zip)>
<!ELEMENT StreetAddress (#PCDATA)>
<!ELEMENT City (#PCDATA)>
<!ELEMENT State (#PCDATA)>
<!ELEMENT Zip (#PCDATA)>

A typical XML Schemas could define the address like this:


<element name="Address">
  <complexType>
    <sequence>
      <element name="StreetAddress" type="string"
               minOccurs="1" maxOccurs="3"/>
      <element name="City" type="string"/>
      <element name="State" type="string"/>
      <element name="Zip" type="string"/>
    </sequence>
  </complexType>
</element>

Here we see that already we have the ability to define with more precision how our XML document should look. Not only do we name the nodes, we also tell it what types they are, where they appear in the sequences, and even how many can appear in that sequence. Also, you may note the difference on where the definitions appear in the document. In the DTD, they are all at the top level of the document, in the Schema, it appears very similar to how the XML document appears (i.e. the StreetAddress element node is a child of the Address element node (through a couple of XSD nodes, of course)).

It should also be noted that another major difference can been seen between these two schema languages even in this simplistic example. A number of times an element can repeat in a DTD is very limited, either 0, 1, or unlimited. With XML Schemas, you have the ability to define a very specific number of occurances with the minOccurs and maxOccurs attributes.

Power begets confusion

The above example is XSD at it's simplest, and easiest to understand state. If that was all we were planning on use Schemas for, there is very little reason to upgrade from DTDs. The real power behind Schemas lies in it's ability to let the definitions be quite extensive and precise. However, with this power comes a great price.

Powerful XML Schema definitions

In addition to what was listed above, XML Schemas can do the following:
  • Derivation of complex and simple types
  • Substitution groups for complex schemas
  • Greater detail of restrictions on simple types
  • Built-in support for documentation
  • Namespace support
  • Reference external schemas
  • Etc.
The more precise you make your document, the less and less legible it becomes.

Flexible yet complex AddressType

Taking the above example of an Address, let's extend it even further, defining a "global" address, one that could be used for a few different countries in the world.


<?xml version="1.0" ?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  <xsd:element name="MainAddress" type="Address"/>

  <xsd:complexType name="Address">
    <xsd:sequence>
      <xsd:element name="Name" type="xsd:string"/>
      <xsd:element name="Street" type="xsd:string"
                   minOccurs="1" maxOccurs="3"/>
      <xsd:element name="City" type="xsd:string"/>
      <xsd:choice>
        <xsd:sequence>
          <xsd:element name="Province" type="xsd:string">
            <xsd:annotation>
              <xsd:documentation>
                Oh Canada!
              </xsd:documentation>
            </xsd:annotation>
          </xsd:element>
          <xsd:element name="PostalCode" type="CAN_PostalCode"/>
        </xsd:sequence>
        <xsd:sequence>
          <xsd:element name="County" type="xsd:string">
            <xsd:annotation>
              <xsd:documentation>
                Address for great britain
              </xsd:documentation>
            </xsd:annotation>
          </xsd:element>
          <xsd:element name="Postcode" type="GBR_Postcode"/>
        </xsd:sequence>
        <xsd:sequence>
          <xsd:element name="State">
            <xsd:annotation>
              <xsd:documentation>
                United States address
              </xsd:documentation>
            </xsd:annotation>
            <xsd:simpleType>
              <xsd:restriction base="xsd:string">
                <xsd:minLength value="2"/>
                <xsd:maxLength value="2"/>
                </xsd:restriction>
            </xsd:simpleType>
          </xsd:element>
          <xsd:element name="ZIP" type="USPS_ZIP"/>
        </xsd:sequence>
      </xsd:choice>
    </xsd:sequence>
  </xsd:complexType>

  <xsd:complexType name="CAN_Address">
    <xsd:complexContent>
      <xsd:extension base="Address">
        <xsd:sequence>
          <xsd:element name="Province" type="xsd:string"/>
          <xsd:element name="PostalCode" type="CAN_PostalCode"/>
        </xsd:sequence>
      </xsd:extension>
    </xsd:complexContent>
  </xsd:complexType>

  <xsd:simpleType name="CAN_PostalCode">
    <xsd:restriction base="xs:string">
      <xsd:pattern value="[A-Z]{1}[0-9]{1}[A-Z]{1} [0-9]{1}[A-Z]{1}[0-9]{1}"/>
    </xsd:restriction>
  </xsd:simpleType>

  <xsd:simpleType name="GBR_Postcode">
    <xsd:restriction base="xsd:string">
      <xsd:pattern value="(([A-Z]{2}[0-9]{2})|([A-Z]{2}[0-9][A-Z])|([A-Z][0-9]{2})) ([0-9][A-Z]{2})"/>
    </xsd:restriction>
  </xsd:simpleType>

  <xsd:simpleType name="USPS_ZIP">
    <xsd:restriction base="xsd:integer">
      <xsd:minInclusive value="01000"/>
      <xsd:maxInclusive value="99999"/>
    </xsd:restriction>
  </xsd:simpleType>
</xsd:schema>

Believe it or not, this defines a relatively simple XML document, the complexity begins with the introduction of choice nodes, documentation, and this is only the beginning. We can make this even more complex by adding additional countries, or adding enumerations for states, etc.

The more complex a schema becomes, the more difficult it becomes to understand.

Necessity to remove complexity

As you can see from the AddressType example above, this particular document is no longer easy to understand at a glance. There is a lot of power that comes from reusability and delegation, however confusion is born from this. The need to clarify schemas becomes more and more obvious the more and more complex our schemas become.

Paradox of defining XML Documents

As I define my XML Documents, I tend to find myself first writing an example of my Document before hopping into DTDs or Schema Creation. I find it a lot more natural to think of what I want the final document to look like, before I create the definition for the document. This retro-creation doesn't work well with current XML Schema editors. There is no aide to showing the final output of the Schema as you create it. It's a 'hit-and-miss' tactic, particularly for complex schemas, where you would write the schema, then validate a document against it, and if it didn't work, go back and edit the schema. Very similar to the archaic, first generation language methodology of development, this does not appeal to anyone who's familiar with the modern visual development tools of our day.

<- Return Continue ->