eXtensible Markup Language

What is XML?

     XML is a markup language for documents containing structured information.

     Structured information contains both content (words, pictures, etc.) and some  indication of what role that content plays (for example, content in a section  heading has a different meaning from content in a footnote, which means  something different than content in a figure caption or content in a database  table, etc.). Almost all documents have some structure.

     A markup language is a mechanism to identify structures in a document. The  XML specification defines a standard way to add markup to documents.

 

What is a Document?

     The number of applications currently being developed that are based on, or make  use of, XML documents is truly amazing (particularly when you consider that XML  is not yet a year old)! For our purposes, the word "document" refers not only to  traditional documents, like this one, but also to the miriad of other XML "data  formats". These include vector graphics, e-commerce transactions, mathematical  equations, object meta-data, server APIs, and a thousand other kinds of  structured information.

 

So XML is Just Like HTML?

     No. In HTML, both the tag semantics and the tag set are fixed. An <h1> is always a first level heading and the tag <ati.product.code> is meaningless. The W3C, in conjunction with  browser vendors and the WWW community, is constantly working to extend the  definition of HTML to allow new tags to keep pace with changing technology and  to bring variations in presentation (stylesheets) to the Web. However, these  changes are always rigidly confined by what the browser vendors have implemented  and by the fact that backward compatibility is paramount. And for people who  want to disseminate information widely, features supported by only the latest  releases of Netscape and Internet Explorer are not useful.

     XML specifies neither semantics nor a tag set. In fact XML is really a  meta-language for describing markup languages. In other words, XML provides a  facility to define tags and the structural relationships between them. Since  there's no predefined tag set, there can't be any preconceived semantics. All of  the semantics of an XML document will either be defined by the applications that  process them or by stylesheets.

So XML is just like SGML?

     No. Well, yes, sort of. XML is defined as an application profile of SGML.  SGML is the Standard Generalized Markup Language defined by ISO 8879. SGML has  been the standard, vendor-independent way to maintain repositories of structured  documentation for more than a decade, but it is not well suited to serving  documents over the web (for a number of technical reasons beyond the scope of  this article). Defining XML as an application profile of SGML means that any  fully conformant SGML system will be able to read XML documents. However, using  and understanding XML documents does not require a system that is capable  of understanding the full generality of SGML. XML is, roughly speaking, a  restricted form of SGML.

     For technical purists, it's important to note that there may also be subtle  differences between documents as understood by XML systems and those same  documents as understood by SGML systems. In particular, treatment of white space immediately adjacent to tags may  be different.

Why XML?

     In order to appreciate XML, it is important to understand why it was created.  XML was created so that richly structured documents could be used over the web.  The only viable alternatives, HTML and SGML, are not practical for this  purpose.

     HTML, as we've already discussed, comes bound with a set of semantics and  does not provide arbitrary structure.

     SGML provides arbitrary structure, but is too difficult to implement just for  a web browser. Full SGML systems solve large, complex problems that justify  their expense. Viewing structured documents sent over the web rarely carries  such justification.

     This is not to say that XML can be expected to completely replace SGML. While  XML is being designed to deliver structured content over the web, some of the  very features it lacks to make this practical, make SGML a more satisfactory  solution for the creation and long-time storage of complex documents. In many  organizations, filtering SGML to XML will be the standard procedure for web  delivery.

[Welcome] [Register] [Licensing] [Building] [Request] [Response] [References]