The phrase "well-formed" generally applies to whether a document
complies with the XML standard (with some quibbling about whether
external entities are to be parsed or not).
On the other hand validating an XML document as proper XHTML would, at
least in the first instance, require reference to the W3C's standard:
[XHTML 1.0 (2nd edition)]
http://www.w3.org/TR/xhtml1/
and in particular the Document Type Definitions (DTDs) provided there.
Obviously the simpler your XHTML generation is, the more easily
convinced you might be that it cannot possibly generate bad XHTML.
What I'm not clear about are the "solutions" you considered, custom
tags and XSLT, as they relate to this concern.
It sounds as if the general problem is a need to include some
tag-delimited content in the input (which I think you are referencing
or wrapping with your custom tags), and it is essential to your
application to be able to retrieve this special content later on.
Using XSLT "templates" does seem like a robust way to generate
well-formed XML, and XHTML as a specific target. Of course the price
one pays is that XSLT takes a well-formed XML document as input, so in
part the checking of well-formedness is simply shifted from one place
to another. Whether this is progress or not will depend, I'd guess,
on the nature of your input.
The phrase "template text" entered your discussion for both these
options, but probably the picture I made in connection with XSLT isn't
accurate. I've used named templates a fair amount in XSLT, to help
with managing recursion and making effective reuse of "subroutine"
templates that implement "global" rules involving parameters. These
might very well be a key ingredient in making the XSLT code as simple
as possible (and no simpler!).
One strategy would be a generate-and-test strategy, in which the
output of each servlet is validated as XHTML, and discarded if found
to have errors (logging the inputs and errors for later analysis).
This may sound like a lot of overhead, but if the application is
complex enough that proper XHTML generation has proved difficult to
manage, then a systematic regression on errors may be the most
productive technique going forward.
regards, mathtalk-ga |