Introducing XML part 2  

The parts of an XML Document

Declaration

This line declares that what follows is an XML document and specifies the version, currently always 1.0, and optionally the encoding or character set, and whether the documents stands alone or requires an external schema file. Example:

<?xml version="1.0" encoding="utf-8" ?>

Another type of declaration indicates the document type, either by reference to an external DTD or by including it inline:

<!DOCTYPE pcwdoc SYSTEM "http://itwriting.com/itwriting.dtd" >

Comment

XML comments appear between special delimiters:

<!-- a comment -->

Processing instruction

Additional information for applications parsing or processing the document. It may even be a script. Uses <? … ?> delimiters:

<?xml-stylesheet href="mystylesheet.css" type="text/css" ?>

Namespace

The xlmns attribute either defines a default namespace for everything within that element, or else a prefix that resolves potential name conflicts. The full name of a namespace must be globally unique, so it often looks like a web address although in reality it is just a name. In the following example, everything within the mydoc element belongs to the default example namespace unless it is prefixed “con:”, in which case it belongs to the otherexample namespace:

<mydoc
xmlns="http://itwriting.com/2001/example"
xmlns:con="http://itwriting.com/2001/otherexample">

Elements and attributes

The basic building blocks of XML. Elements are hierarchical, must have start and end tags, and optionally include attributes in the start tag. Empty elements are those that have no content, and may use a combined start and end tag like <hr/>. An example element:

<magazine title="Personal Computer World"> ... </magazine>

Character data

Plain text that forms the content of elements. Of course elements can also include other elements. Special characters that would otherwise be interpreted as markup or hard to represent are included with either character references, such as &#8212; for a long dash, or entity references like &lt; for the “<” character. In the first case the number is Unicode reference, in the second a named entity from the built-in list or defined in a DTD.

CDATA section

Raw character data that will not be interpreted as XML markup. Delimited by:

<!CDATA[ … ]]>

Continue to part 3: XML eXplained

Copyright Tim Anderson January 2004. All rights reserved.
You are welcome to post links to this article. If you wish to print, distribute, or copy all or part of it, please contact me for permission, which may be subject to a fee.