Hello, and thank you for that excellent question. Let me try to break
this answer into several parts.
1. What is XML?
XML is, simply put, an extensible markup language. If we parse those
terms out, we have:
- "language", which means it is a way of representing information;
- "markup", which means it is a way of taking existing information
and modifying it to contain different, or more, information;
- "extensible", which means it is not fixed.
The best way to get a grasp on XML is to compare it to HTML which, if
you will note from the "ML" root, is also a markup language. If you
look at HTML, it is comprised of raw data, the text of a web page,
surrounded by "tags" of the form "<tag>...</tag>". These tags markup
the text to give it, in the case of HTML, formatting information.
Without the markup, the page would simply be the text of the page. But
with the markup, the text is transformed into a richer visual
document.
Now, I am glossing over a lot of things about HTML, but that is enough
to get you to understand XML, because XML is also markup of data, but
unlike HTML, it is:
- not just for presentation formatting, and
- not limited to the fixed number of tags defined by HTML.
In fact, to be precise XML does not really have any tags at all but
rather, it defines the way in which a tag-based markup needs to be
created so that it can be efficiently processed. All the really
interesting stuff in XML is in the specific flavours of XML that use
the rules of XML for specific applications.
For complete newcomers to XML, I usually point them to this excellent
page which distills XML down to 10 easy points:
http://www.w3.org/XML/1999/XML-in-10-points
Rather than simply regurgitating this summary, I will leave you to
read that and I will be happy to clarify any of the points it makes.
I will now move to the meat of your question, which is the role of XML
in web-based apps and why it is getting so much visibility in the
media. Where possible, I will tie in some of these 10 points.
2. What is the role of XML in web-based apps?
As you have probably read, web-based applications, and in particular,
web services, are becoming one of THE hot topics in programming these
days. The web interface is good because it enables the movement of
information between two parties to be efficiently decoupled from the
meat of the information.
In the (pre-Internet) past, if you wanted two things (people and/or
applications) to exchange data over a network, you more often than not
needed to write the transport protocol in addition to the messages
themselves. TCP/IP and the protocols built on top of it, like HTTP,
have dramatically changed that landscape by allowing developers to
focus on the messages and not worry about how they get to where they
are intended. Because the transport layer is standardized, everyone
agrees on how it should work, everyone knows how to use it, and a
great deal of effort can be expended to ensure that the various pieces
are as efficient as possible to make applications that are built on
top of it as fast as possible.
So, where does XML fit in? Well, if two applications agree to tranfer
information using web-based protocols (specifically HTTP) then for
them to be successful, they merely need to agree on what to say and
how to say it. If two apps already agree on the need to communicate,
then they probably already know "what" they need to say, but the "how"
is critically important if each is responsible for understanding the
message. XML gives them a well-defined way in which to structure their
message (points 1 and 3 in the link above) so that they can easily
process it.
By way of analogy, when I talk to you, I use a well-defined syntax:
nouns, verbs, punctuation, etc. I also talk to you in English. Those
two facts enable you to understand any message I send you. In
programming, XML is the syntax. But what is the "English"? Well, that
is the specific tags used in the particular messages you want to
exchange. As I mentioned at the top of this answer, XML is extensible,
which means there are no fixed set of tags (unlike HTML). The tags can
be created by whomever wants to use XML to format messages. It is
therefore critically important that two applications that use XML also
agree on what tags they are going to interpret. If I use French nouns
and verbs to an English speaker, they are not going to understand me.
Here is a concrete example. Let's say I want to create a service that
authenticates users. I have a database of user names and passwords,
and I want to open that up so that any application can ask me if I
know a particular user. And, I want to make it a "web" service so that
it is easy for other apps to ask.
My first step would be to define the authentication message that I am
going to accept, and it will look something like this:
<authenticate>
<username>foo</username>
<password>bar</password>
</authenticate>
What I am saying by this is that if *any* application sends me a
message that looks like that, I can understand it because it is (a)
well-formed XML, and (b) uses tags like "authenticate", "username",
and "password", which I understand. If the message does *not* look
like this, then I don't understand it and will ignore it.
If I process a request, I will send back the following response
message:
<authentication valid="true" />
or,
<authentication valid="false />
to indicate a pass/fail result. Now, I simply put my code behind a web
server (ignoring the details of precisely how that is done) and I am
open for business. If an app want to use me (and knows how to find me,
which is another problem), they can simply hit my web server with a
URL that will trigger my authentication processing script, pass the
authentication packet as POST arguments in an HTTP request, and wait
for the response packet which they would then parse to figure out if
"valid" was either "true" or "false".
The important point here is that, because we agreed on using XML, the
parsing of the messages is trivial. That is because XML is
standardized, and well-documented toolsets exist for manipulating XML
messages. If we did not agree to use XML, we would have to write our
own message parsers which, for this trivial message may not be a big
deal, but for more more complex messages it would be. Also, because
XML is text-based, it is very easy to "tunnel" messages through HTTP.
This is both a good thing and a bad thing, good because it lets
messages be passed using only a web server, and bad because it
effectively circumvents security by allowing *any* message to be sent.
Of course, if there is nothing to receive the message, it is difficult
to cause damage, but anytime a protocol is used for something vastly
different than for which it was intended, that causes security hackles
to go up.
Anyhow, getting back to the example, I hope you can see that XML has
enabled me to describe my authentication service, and clients to use
my authentication service, with little difficulty. Was XML *required*?
Certainly not, but without it we would have had to agree on how to
parse the messages, which meant that either I would have to provide
parsers for every platform I could imagine that might use my service,
or every user would have to write their own. Neither is particularly
appealing.
By way of a more complex example, consider EDI. You might know that
EDI is a protocol for exchanging business documents between "trading
partners": purchase orders, inventory checks, insurance claims, etc.
For every possible business document, a well-defined message was
created, and this has enabled vastly improved efficiencies in the way
businesses do business with each other. Of course, this all happened
WAY before the Internet and XML, but the analogy is strikingly
similar. XML defines a protocol for creating business documents, the
Internet provides the transport, and as long as businesses agree on
the documents, the tags, themselves, then the could create an
XML-based version of EDI.
And that is precisely what is happening. Several groups are working on
defining EDI in terms of XML (see http://www.xmledi-group.org/ for
example), which is a good thing except for the fact that there is not
just one, globally sanctioned, authority on it which makes agreeing on
messages difficult. However, the ebXML standard (
http://www.ebxml.org/ ) appears to be blessed by some of the right
people so it may, in the end, be "the one".
You may have heard of SOAP ( http://www.w3.org/2000/xp/Group/ and
http://www.w3.org/TR/soap12-part1/ ). This is yet another use of XML
in web-based applications, in this case to define a protocol for what
is, effectively, remote method invocation. SOAP is at the core of web
services, and has the backing of Microsoft and IBM, among others.
XML, and all of its spinoff technologies, are controlled by the World
Wide Web Consortium ( http://www.w3.org/ ). They are the last word in
terms of specifications and the status of the various working groups
chartered with expanding the use of XML. Three other excellent sources
of XML-related information are OASIS ( http://www.oasis-open.org/ ),
the XML "Cover Pages" ( http://xml.coverpages.org/ ) and XML.com (
http://www.xml.com/ ). OASIS is the overarching industry consortium
that works with the W3C to get XML adopted commercially. If you are
interested in exploring XML more fully, these are the best places to
visit.
Since my question lock is almost up, I will stop here, but I will be
very happy to clarify any point I made, or anything you might read on
the links I provided. I hope that this has been helpful.
Summary of Related Links:
The World Wide Web Consortium
http://www.w3.org/
W3C Extensible Markup Language
http://www.w3.org/XML/
XML in 10 Points
http://www.w3.org/XML/1999/XML-in-10-points
Organization for the Advancement of Structured Information Standards
http://www.oasis-open.org/
The XML CoverPages
http://xml.coverpages.org/
Electronic Business using eXtensible Markup Language
http://www.ebxml.org/
XML/EDI, the eBusiness Framework
http://www.xmledi-group.org/
Search Strategy:
None, as this was based entirely on personal experience. |