XML (Extensible Markup Language) is a W3C Recommendation for creating special-purpose markup languages. It is a simplified subset of SGML, capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of structured text and information across the Internet. Languages based on XML (for example, RDF, SMIL, MathML, XSIL and SVG) are themselves described in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.
An XML document is text, usually Unicode although other encodings may be used.
Unlike, for example, HTML, XML is highly dependent upon structure, content and integrity for its efficacy. In order for a document to be considered "well-formed" [1], it must conform (at the very least) to the following:
Also, again unlike HTML, clever choice of XML element names allows the meaning of the data to be retained as part of the markup. This makes it more easily interpreted by software programs.
As a concrete example, a simple recipe expressed in an XML representation might be:
An XML document that meets certain other criteria in addition to being
well-formed (such as complying with an associated
DTD) is said to be "valid".
Before the advent of generalised data description languages such as SGML and XML, software designers had to define special file formats or small languages to share data between programs. This required writing detailed specifications and special-purpose parsers and writers. For a language based on XML, however, the software designer can specify the basic syntax by writing a DTD, or a more detailed description using an XML Schema. There are readily available (and in some cases free) tools which understand these descriptions -- XML parsers and writers. This may significantly reduce life-cycle development cost.
As a further adjunct to XML is the stylesheet language XSL, which allows users to describe visual properties and transformations of XML data without embedding those instructions into the data itself. The resulting document is then an HTML document which uses CSS for rendering.
An XML document may also be rendered directly in some browsers such as e.g. Internet Explorer 5 or Mozilla with the stylesheet language CSS. This process is still not yet stable as of January 2003. The XML document must include a reference to a style sheet:
The APIs widely used in processing XML data by programming languages are SAX and DOM. SAX is used for serial processing whereas DOM is used for random-access processing.
An XSL processor may be used to render an XML file for displaying or printing. XSL itself is intended for creating PDF files. XSLT is for transforming to other formats, including HTML, other vocabularies of XML, and any other plain-text format.
The native file format of OpenOffice.org and AbiWord is XML. Some parts of Microsoft Office 11 will also be able to edit XML files with a user-supplied schema (but not a DTD). There are dozens of other XML editors available.
The first and current version of XML is 1.0. There is a W3C Candidate
Recommendation for XML 1.1, a set of
changes to XML to address character set issues.
There are also discussions on an XML 2.0, although it remains to be seen if such will ever come about. XML-SW (SW for skunk works), written by one of the original developers of XML, contains some proposals for what an XML 2.0 might look like: elimination of DTDs from syntax, integration of Namespaces, XML Base and XML Information Set into the base standard.Strengths and Weaknesses
The features of XML that make it particularly appropriate for data transfer are :
XML is also heavily used for document storage and processing, both online and offline:
The weaknesses of the format relates to matters of efficiency, since the XML
For matters of generic, loosely bound data transfer the strengths outweigh
weaknesses, and in many neutral applications where efficiency is not a particular concern XML is also coming to be adopted
simply because tools to manipulate XML are now conveniently on-hand.Syntax rules in XML
Element names in XML are case-sensitive: for example
Identifying information accurately enables programs to manipulate it easily: in this example, it is now easy to convert the quantities to other measuring systems, or to print the ingredients as icons for those with low reading skills (or different native language), or to refer to the individual ingredients or steps from elsewhere (another recipe, for example).Document Type Definitions and XML Schemas
Displaying XML on the web
While browser-based XML rendering develops, the alternative is conversion into HTML or PDF or other formats on the server. Programs like Cocoon process an XML file against a stylesheet (and can perform other processing as well) and send the output back to the user's browser without the user needing to be aware of what has been going on in the background.XML Extensions
Processing XML files
Versions of XML