Dear nrunner,
I agree with you that there's a good chance that the people who wrote
the bit about "XML-Enabled Format" don't know what they're talking
about. Nonetheless, you can make a favorable impression in your bid by
demonstrating that XML is not an alien notion to you.
XML is a method of organizing information in a form that does not
require any particular software package, such as Microsoft Word, to
process. An XML document is easy to parse by programmatic means because
it consists of text nested inside markup tags that themselves consist
of text. It should also be easily readable by humans by virtue of being
content-transparent. In other words, the structure of the document should
act as a guide to its content.
To make these principles more concrete, let me show you an XML document
that contains a newspaper article.
<article>
<title> Mice Prefer Cheddar Cheese </title>
<date> 2005.09.27 </date>
<author> leapinglizard </author>
<body>
<paragraph>
Neuroscientists at MIT report that when faced
with a choice between runny French cheeses and
hard Canadian cheddar, 63% of laboratory mice
prefer the cheddar cheese.
</paragraph>
<paragraph>
Professor Egbert H. Bottomley cautions that the
experimental results may not be applicable to wild
mice. "Our mice were raised in a safe, sterile
environment," said Prof. Bottomley. "Intrepid
outdoor mice may well prefer the fragrant
French stuff."
</paragraph>
<paragraph>
The Department of Neuroscience is now seeking a
contract with a major American food manufacturer
to explore ways of commercializing these findings.
</paragraph>
</body>
</article>
Notice that an XML document contains no information about the typeface
size and style, text justification and spacing, or other presentation
concerns. Only the raw content is presented, so that software further
down the pipeline can easily read it and take care of the details of
page layout and output rendering.
By contrast, a Word document is fully styled, with fonts and italics
and indentation. All of this formatting information is saved either in
a binary file as unreadable cruft, or converted into a Word-specific XML
format that doesn't have anything to do with the content of the document
and is therefore unsuitable for consumption by others.
So the bad news is that a Word document is not, as it stands, an
XML-enabled file format. The good news is that it's not hard to generate
documents that are either ready-made XML or that are easily wrapped into
an XML schema. The key is to stick with text in creating your files, and
I do mean text only. You can't even rely on Word's text output facility
to give you a file that's free of formatting commands.
To make a file that is free of any extraneous information, you will want
your transcriptionists to use a very simple editor that doesn't give the
user any formatting options beyond the carriage return. One such editor
is Notepad, which is built into every Windows installation. Saving a
document in Notepad will result in a pure text document that is readily
converted, whether by a little custom text-processing script or by some
easy manual labor, into an XML document. A plain text file is indeed,
in that sense, XML ready: just add markup!
To make an XML document itself, you first design the schema, which can be
something as simple as the one I used above, and then you instruct your
employees to apply it without fail in writing their transcripts. The
fundamental rules of XML markup are that every opening tag must be
paired with a closing tag, and that pairs of tags must be nested without
intersection.
I hope this primer on the spirit and practice of XML gives you renewed
confidence in your bidding effort.
Regards,
leapinglizard |